怎么运用支撑向量机学习非线性数据集（支持向量机非线性）-人工智能-知优网

什么是支持向量机呢?支持向量机是监督机器学习模型，可对数据进行分类分析。实际上，支持向量机算法是寻找能将实例进行分离的优秀超平面的过程。

支持向量机(SVM)

什么是支持向量机呢?支持向量机是监督机器学习模型，可对数据进行分类分析。实际上，支持向量机算法是寻找能将实例进行分离的优秀超平面的过程。

如果数据像上面那样是线性可分离的，那么我们用一个线性分类器就能将两个类分开。如果我们的数据是非线性可分的，我们应该怎么做呢?就像这样:

正如我们所看到的，即使来自不同类的数据点是可分离的，我们也不能简单地画一条直线来进行分类。

那么我们如何使用支持向量机来拟合非线性机器学习数据集呢?

使用SVM进行实验

创建机器学习数据集

首先创建非线性机器学习数据集。Python代码如下：

#Importpackagestovisualizetheclassifer
frommatplotlib.colorsimportListedColormap
importmatplotlib.pyplotasplt
importwarnings
#Importpackagestodotheclassifying
importnumpyasnp
fromsklearn.svmimportSVC
#CreateDataset
np.random.seed(0)
X_xor=np.random.randn(200,2)
y_xor=np.logical_xor(X_xor[:,0]>0,
X_xor[:,1]>0)
y_xor=np.where(y_xor,1,-1)
fig=plt.figure(figsize=(10,10))
plt.scatter(X_xor[y_xor==1,0],
X_xor[y_xor==1,1],
c='b',marker='x',
label='1')
plt.scatter(X_xor[y_xor==-1,0],
X_xor[y_xor==-1,1],
c='r',
marker='s',
label='-1')
plt.xlim([-3,3])
plt.ylim([-3,3])
plt.legend(loc='best')
plt.tight_layout()
plt.show()

尝试使用线性支持向量机

我们首先尝试使用线性支持向量机，Python实现如下：

#Importpackagestodotheclassifying
frommlxtend.plottingimportplot_decision_regions
importnumpyasnp
fromsklearn.svmimportSVC
#CreateaSVCclassifierusingalinearkernel
svm=SVC(kernel='linear',C=1000,random_state=0)
#Traintheclassifier
svm.fit(X_xor,y_xor)
#Visualizethedecisionboundaries
fig=plt.figure(figsize=(10,10))
plot_decision_regions(X_xor,y_xor,clf=svm)
plt.legend(loc='upperleft')
plt.tight_layout()
plt.show()

C是与错误分类相关的成本。C值越高，算法对数据集的正确分离就越严格。对于线性分类器，我们使用kernel='linear'。

如我们所见，即使我们将成本设置得很高，但这条线也无法很好地分离红点和蓝点。

径向基函数核

到目前为止，我们使用的线性分类器为：

正如我们所看到的，g(x)是一个线性函数。当g(x) >为0时，预测值为1。当g(x) <0时，预测值为-1。但是由于我们不能使用线性函数处理像上面这样的非线性数据，我们需要将线性函数转换成另一个函数。

这个分类器似乎是我们非线性数据的理想选择。让我们来看看Python的代码：

#CreateaSVCclassifierusinganRBFkernel
svm=SVC(kernel='rbf',random_state=0,gamma=1/100,C=1)
#Traintheclassifier
svm.fit(X_xor,y_xor)
#Visualizethedecisionboundaries
fig=plt.figure(figsize=(10,10))
plot_decision_regions(X_xor,y_xor,clf=svm)
plt.legend(loc='upperleft')
plt.tight_layout()
plt.show()

gamma是1 / sigma。请记住，sigma是调节函数。因此，gamma值越小，sigma值就越大，分类器对各个点之间的距离就越不敏感。

让我们把伽玛放大看看会发生什么

#CreateaSVCclassifierusinganRBFkernel
svm=SVC(kernel='rbf',random_state=0,gamma=1,C=1)
#Traintheclassifier
svm.fit(X_xor,y_xor)
#Visualizethedecisionboundaries
fig=plt.figure(figsize=(10,10))
plot_decision_regions(X_xor,y_xor,clf=svm)
plt.legend(loc='upperleft')
plt.tight_layout()
plt.show()

好像将伽玛值提高100倍可以提高分类器对训练集的准确性。把伽马值再乘以10会怎么样呢?

#CreateaSVCclassifierusinganRBFkernel
svm=SVC(kernel='rbf',random_state=0,gamma=10,C=1)
#Traintheclassifier
svm.fit(X_xor,y_xor)
#Visualizethedecisionboundaries
fig=plt.figure(figsize=(10,10))
plot_decision_regions(X_xor,y_xor,clf=svm)
plt.legend(loc='upperleft')
plt.tight_layout()
plt.show()

这是否意味着如果我们将伽玛提高到10000，它将更加准确呢?事实上，如果伽玛值太大，则分类器最终会对差异不敏感。

让我们增加C。C是与整个机器学习数据集的错误分类相关的成本。换句话说，增加C将增加对整个数据集的敏感性，而不仅仅是单个数据点。

fromipywidgetsimportinteract,interactive,fixed,interact_manual
importipywidgetsaswidgets
warnings.filterwarnings("ignore")
@interact(x=[1,10,1000,10000,100000])
defsvc(x=1):
#CreateaSVCclassifierusinganRBFkernel
svm=SVC(kernel='rbf',random_state=0,gamma=.01,C=x)
#Traintheclassifier
svm.fit(X_xor,y_xor)
#Visualizethedecisionboundaries
fig=plt.figure(figsize=(10,10))
plot_decision_regions(X_xor,y_xor,clf=svm)
plt.legend(loc='upperleft')
plt.tight_layout()
plt.show()