# scikit-learn系列之分类算法筛查

1. 尝试混合具有不同表征的算法 (instances and trees)。
2. 尝试混合不同的学习算法 (different algorithms for learning the same type of representation)。
3. 尝试混合不同的建模类型的算法 (linear and nonlinear functions or parametric and nonparametric)。

1. Logistic Regression
2. Linear Discriminant Analysis

1. K-Nearest Neighbors
2. Naive Bayes
3. Classification and Regression Trees
4. Support Vector Machines

6种算法

1. 逻辑回归假设数值型输入，数值符合高斯分布，用于二分的分类问题。使用LogisticRegression类构建模型。
2. 线性判别分析用于二分或者多分的分类问题。同样假设数值符合高斯分布。使用LinearDiscriminantAnalysis类构建模型。
3. K近邻使用距离的方法为新数据找到K个相似的实例，把邻居的平均作为预测结果。使用KNeighborsClassifier类构建模型。
4. 朴素贝叶斯算法计算每个类别的概率和已知输入值的每个类别的条件概率。新的数据计算以上概率，假设它们彼此独立的情况下，做乘法。使用GaussianNB类构建模型。
5. 分类和回归数(CART or just decision trees) 使用训练数据构建二分树。通过评价数据变量最小化cost函数来获得分割点。使用DecisionTreeClassifier类构建模型。

``````# Logistic Regression Classification
import pandas
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)
def build_model(model_name):
model = model_name()
return model
for model_name in [LogisticRegression,LinearDiscriminantAnalysis,KNeighborsClassifier,GaussianNB,DecisionTreeClassifier,SVC]:
model = build_model(model_name)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print("%s mean accuracy:%s")%(model_name,results.mean())
``````

``````<class 'sklearn.linear_model.logistic.LogisticRegression'> mean accuracy:0.76951469583
<class 'sklearn.discriminant_analysis.LinearDiscriminantAnalysis'> mean accuracy:0.773462064252
<class 'sklearn.neighbors.classification.KNeighborsClassifier'> mean accuracy:0.726555023923
<class 'sklearn.naive_bayes.GaussianNB'> mean accuracy:0.75517771702
<class 'sklearn.tree.tree.DecisionTreeClassifier'> mean accuracy:0.696548188653
<class 'sklearn.svm.classes.SVC'> mean accuracy:0.651025290499
``````

• model_selection.KFold
• model_selection.cross_val_score
• results.mean
• linear_model.LogisticRegression
• discriminat_analysis.LinearDiscriminantAnalysis
• neighbors.KNeighborsClassifier
• naive_bayes.GaussianNB
• tree.DecisionTreeClassifier
• svm.SVC