使用python机器学习(二)

上一篇文章《使用python机器学习(一)》介绍过numpy的简单使用,下面介绍scipy,scipy基于numpy。
scipy方便、易于使用、专为科学和工程设计的Python工具包。它包括统计,优化,整合,线性代数模块,傅里叶变换,信号和图像处理,常微分方程求解器等等。

scipy包含的主要模块如下:

Vector quantization / Kmeans: scipy.cluster
Physical and mathematical constants: scipy.constants

Fourier transform: scipy.fftpack

Integration routines: scipy.integrate

Interpolation: scipy.interpolate

Data input and output: scipy.io

Linear algebra routines: scipy.linalg

n-dimensional image package: scipy.ndimage

Orthogonal distance regression: scipy.odr

Optimization: scipy.optimize

Signal processing: scipy.signal

Sparse matrices: scipy.sparse

Spatial data structures and algorithms: scipy.spatial

Any special mathematical functions: scipy.special

Statistics: scipy.stats

常用函数示例:

import numpy as np
from scipy import linalg
arr = np.array([[1, 2],[3, 4]])
##矩阵行列式
print("矩阵行列式:",linalg.det(arr))
print("矩阵的逆:",linalg.inv(arr))
矩阵行列式: -2.0
矩阵的逆: [[-2.   1. ]
 [ 1.5 -0.5]]
#奇异值分解
arr = np.arange(9).reshape((3, 3)) + np.diag([1, 0, 1])
uarr, spec, vharr = linalg.svd(arr)
print(spec)
sarr = np.diag(spec)
svd_mat = uarr.dot(sarr).dot(vharr)
print(svd_mat)
np.allclose(arr,svd_mat)
[ 14.88982544   0.45294236   0.29654967]
[[ 1.  1.  2.]
 [ 3.  4.  5.]
 [ 6.  7.  9.]]





True
##傅里叶变换
##优化
from scipy import optimize
def f(x):
    return x**2 + 10*np.sin(x)
import matplotlib.pyplot as plt
x = np.arange(-10, 10, 0.1)
plt.plot(x, f(x)) 
plt.show() 
##bfgs依赖于初始点,有可能得到局部最小
optimize.fmin_bfgs(f, 0)
array([ 3.83746709])
optimize.fmin_bfgs(f, 3)
Optimization terminated successfully.
         Current function value: 8.315586
         Iterations: 6
         Function evaluations: 21
         Gradient evaluations: 7





array([ 3.83746709])
##全局最优
optimize.basinhopping(f, 0)

计算函数的根

1 只求的一个

root = optimize.fsolve(f, 1)
root
array([ 0.])
##曲线拟合
xdata = np.linspace(-10, 10, num=20)
ydata = f(xdata) + np.random.randn(xdata.size)
#假设满足函数f2,然后求a、b
def f2(x, a, b):
     return a*x**2 + b*np.sin(x)
guess = [2, 2]
params, params_covariance = optimize.curve_fit(f2, xdata, ydata, guess)
params
array([  1.00348624,  10.37354547])
#统计
a = np.random.normal(size=1000)
bins = np.arange(-4, 5)
print(bins)
histogram = np.histogram(a, bins=bins, normed=True)[0]
print(histogram)
bins = 0.5*(bins[1:] + bins[:-1])
print(bins)
from scipy import stats
#pdf概率密度函数probability density function
b = stats.norm.pdf(bins)
print("pdf:",b)
plt.plot(bins, histogram)
plt.plot(bins, b)
plt.show()
loc, std = stats.norm.fit(a)
print("loc:"+str(loc)+"std:"+str(std))
#中位数
np.median(a)
[-4 -3 -2 -1  0  1  2  3  4]
[ 0.001  0.025  0.137  0.339  0.34   0.136  0.02   0.002]
[-3.5 -2.5 -1.5 -0.5  0.5  1.5  2.5  3.5]
pdf: [ 0.00087268  0.0175283   0.1295176   0.35206533  0.35206533  0.1295176
  0.0175283   0.00087268]

loc:-0.00549513299797std:1.00725628853

-0.0037246310284498475

github代码

参考

Scipy Lecture Notes

推荐阅读更多精彩内容