Coursera ML(4)-Logistic Regression

本节笔记对应第三周Coursera课程 binary classification problem 更多见：李飞阳

Classification is not actually a linear function.

Classification and Representation

Hypothesis Representation

Sigmoid Function(or we called Logistic Function)
![](http://latex.codecogs.com/png.latex?\begin{align}& h_\theta (x) = g ( \theta^T x ) \newline \newline& z = \theta^T x \newline& g(z) = \dfrac{1}{1 + e^{-z}}\end{align})

Sigmoid Function 可以使输出值范围在$(0,1)$之间。$g(z)$对应的图为：

$h_\theta(x)$

will give us the probability that our output is 1.
Some basic knowledge of discrete
![](http://latex.codecogs.com/png.latex?\begin{align}& h_\theta(x) = P(y=1 | x ; \theta) = 1 - P(y=0 | x ; \theta) \newline& P(y = 0 | x;\theta) + P(y = 1 | x ; \theta) = 1\end{align})

Decision Boundary

translate the output of the hypothesis function as follows:
![](http://latex.codecogs.com/png.latex?\begin{align}& h_\theta(x) \geq 0.5 \rightarrow y = 1 \newline& h_\theta(x) < 0.5 \rightarrow y = 0 \newline\end{align})
From these statements we can now say:
![](http://latex.codecogs.com/png.latex?\begin{align}& \theta^T x \geq 0 \Rightarrow y = 1 \newline& \theta^T x < 0 \Rightarrow y = 0 \newline\end{align})

Logistic Regression Model

Cost function for one variable hypothesis

To let the cost function be convex for gradient descent, it should be like this:
![](http://latex.codecogs.com/png.latex?J(\theta) = \dfrac{1}{m} \sum_{i=1}^m \mathrm{Cost}(h_\theta(x^{(i)}),y{(i)}))

![](http://latex.codecogs.com/png.latex?Cost(h_\theta (x), y) =\begin{cases}-log(h_\theta (x)), (y = 1) \-log(1 - h_\theta (x)), (y = 0) \\end{cases})

example
![](http://latex.codecogs.com/png.latex?\begin{align}& \mathrm{Cost}(h_\theta(x),y) = 0 \text{ if } h_\theta(x) = y \newline & \mathrm{Cost}(h_\theta(x),y) \rightarrow \infty \text{ if } y = 0 ; \mathrm{and} ; h_\theta(x) \rightarrow 1 \newline & \mathrm{Cost}(h_\theta(x),y) \rightarrow \infty \text{ if } y = 1 ; \mathrm{and} ; h_\theta(x) \rightarrow 0 \newline \end{align})

Simplified Cost Function and Gradient Descent

compress our cost function's two conditional cases into one case:
![](http://latex.codecogs.com/png.latex?(\mathrm{Cost}(h_\theta(x),y) = - y ; \log(h_\theta(x)) - (1 - y) \log(1 - h_\theta(x)))
entire cost function
![](http://latex.codecogs.com/png.latex?J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))])

Gradient Descent

the general form of gradient descent ，求偏导的得到$J(\theta)$的极值
![](http://latex.codecogs.com/png.latex?\begin{align}& Repeat ; \lbrace \newline & ; \theta_j := \theta_j - \alpha \dfrac{\partial}{\partial \theta_j}J(\theta) \newline & \rbrace\end{align})
using calculus
![](http://latex.codecogs.com/png.latex?\frac{\partial}{\partial \theta_j} J(\theta) = \frac{1}{m}\sum\limits_{i=1}^{m}[(h_\theta (x^{(i)}) - y^{(i)})x_j{(i)}])
get
![](http://latex.codecogs.com/png.latex?\begin{align} & Repeat ; \lbrace \newline & ; \theta_j := \theta_j - \frac{\alpha}{m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} \newline & \rbrace \end{align})

Multiclass Classification: One-vs-all

For more than 2 features of y, do logisitc regression for each feature separately
Train a logistic regression classifier $h_\theta(x)$ for each class to predict the probability that y = i .
To make a prediction on a new x, pick the class that maximizes $ h_\theta (x) $

Solving the Problem of Overfitting

The Problem of Overfitting

mark

address the issue of overfitting

Reduce the number of features:
Manually select which features to keep.
Use a model selection algorithm (studied later in the course).
Regularization:
Keep all the features, but reduce the magnitude of parameters $θ_j$.
Regularization works well when we have a lot of slightly useful features.

Cost Function

in a single summation
![](http://latex.codecogs.com/png.latex?min_\theta\ \dfrac{1}{2m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})2 + \lambda\ \sum_{j=1}^n \theta_j^2)

The λ, or lambda, is the regularization parameter. It determines how much the costs of our theta parameters are inflated.

Regularized Linear Regression

Gradient Descent

$$\begin{align} & \text{Repeat}\ \lbrace \newline & \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_0{(i)} \newline & \ \ \ \ \theta_j := \theta_j - \alpha\ \left[ \left( \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j{(i)} \right) + \frac{\lambda}{m}\theta_j \right] &\ \ \ \ \ \ \ \ \ \ j \in \lbrace 1,2...n\rbrace\newline & \rbrace \end{align}$$

Normal Equation
$$\begin{align}& \theta = \left( X^TX + \lambda \cdot L \right)^{-1} X^Ty \newline& \text{where}\ \ L = \begin{bmatrix} 0 & & & & \newline & 1 & & & \newline & & 1 & & \newline & & & \ddots & \newline & & & & 1 \newline\end{bmatrix}\end{align}$$
L is a matrix with 0 at the top left and 1's down the diagonal, with 0's everywhere else. It should have dimension (n+1)×(n+1)
Recall that if m ≤ n, then $X^TX$ is non-invertible. However, when we add the term λ⋅L, then $X^TX + λ⋅L $becomes invertible.

Summary

我在这里整理一下上述两个方法，补全课程上的相关推导。

Logistic Regression Model

$h_\theta(x)$是假设函数

$$h_\theta (x) = g ( \theta^T x ) = \dfrac{1}{1 + e^{- \theta^T x}} $$
注意假设函数和真实数据之间的区别

Cost Function

$$J(\theta) = - \frac{1}{m} \sum_{i=1}^m \large[ y^{(i)}\ \log (h_\theta (x^{(i)})) + (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))\large]$$
回头看看上边的那个$h_\theta (x)$ ，cost function定义了训练集给出的结果和当前计算结果之间的差距。当然，该差距越小越好，那么需要求导一下。

Gradient Descent

原始公式
$$\theta_j := \theta_j - \alpha \dfrac{\partial}{\partial \theta_j}J(\theta)$$
求导计算
$$\frac{\partial}{\partial \theta_j} J(\theta) = \frac{1}{m}\sum\limits_{i=1}^{m}[(h_\theta (x^{(i)}) - y^{(i)})x_j{(i)}]$$
计算结果
$$\theta_j := \theta_j - \frac{\alpha}{m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} $$

这里推导一下$\frac{\partial}{\partial \theta_j} J(\theta)$：

计算$h_\theta'(x)$导数
$$\begin{align} &h_\theta'(x) = ( \frac1{1+e^{- \theta x}})'\newline &\ \ \ \ \ \ \ \ = \frac{e^{- \theta x}x}{1+e^{- \theta x}}\newline &\ \ \ \ \ \ \ \ = \frac{1+e^{- \theta x}-1}{(1+e^{- \theta x})^2}x\newline &\ \ \ \ \ \ \ \ = \large[\frac{1}{1+e^{- \theta x}}-\frac{1}{(1+e^{- \theta x})^2}\large]x\newline &\ \ \ \ \ \ \ \ = h_\theta(x)(1-h_\theta(x))x \end{align}$$
推导$\frac{\partial}{\partial \theta_j} J(\theta)$

$$\begin{align} &\frac{\partial}{\partial \theta_j} J(\theta) = \frac{\partial}{\partial \theta_j} \frac{1}{m} \sum_{i=1}^m \large[ -y^{(i)}\ \log (h_\theta (x^{(i)})) - (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))\large] \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ = \frac{1}{m} \sum_{i=1}^m \large[ -y^{{(i)} \frac1{h_\theta(x}{(i)})}h_\theta'(x^{(i)}) - (1 - y^{(i)}) \frac{-1}{1-h_\theta(x^{{(i)})}h_\theta'(x}{(i)})\large] \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ = \frac{1}{m} \sum_{i=1}^m \large[ -y^{{(i)} \frac1{h_\theta(x}{(i)})}h_\theta(x^{{(i)})(1-h_\theta(x}{(i)}))x^{(i)} \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ - (1 - y^{(i)}) \frac{-1}{1-h_\theta(x^{{(i)})}h_\theta(x}{(i)})(1-h_\theta(x^{(i)}))x{(i)}\large] \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ = \frac{1}{m} \sum_{i=1}^m \large[ -y^{{(i)}(1-h_\theta(x}{(i)}) x^{(i)})+(1- y)h_\theta(x^{(i)}) x^{(i)})\large] \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ = \frac{1}{m} \sum_{i=1}^m \large[ -x^{(i)}y{(i)}+x^{(i)}y{(i)}h_\theta(x^{(i)}) \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ +x^{{(i)}h_\theta(x}{(i)}) - x^{(i)}y{(i)}h_\theta(x^{(i)}) \large] \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ = \frac{1}{m}\sum\limits_{i=1}^{m}[(h_\theta (x^{(i)}) - y^{(i)})x_j{(i)}] \end{align}$$

即：
$$\begin{align} &\frac{\partial}{\partial \theta_j} J(\theta) = \frac{1}{m}\sum\limits_{i=1}^{m}[(h_\theta (x^{(i)}) - y^{(i)})x_j{(i)}] \end{align}$$

Solving the Problem of Overfitting

其他地方都一样，稍作修改

Cost Function
$$J(\theta) = - \frac{1}{m} \sum_{i=1}^m \large[ y^{(i)}\ \log (h_\theta (x^{(i)})) + (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))\large] + \frac{\lambda}{2m}\sum_{j=1}^n \theta_j^2$$
Gradient Descent
$$\begin{align} & \text{Repeat}\ \lbrace \newline & \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_0{(i)} \newline & \ \ \ \ \theta_j := \theta_j - \alpha\ \left[ \left( \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j{(i)} \right) + \frac{\lambda}{m}\theta_j \right] &\ \ \ \ \ \ \ \ \ \ j \in \lbrace 1,2...n\rbrace\newline & \rbrace \end{align}$$

以上

最后编辑于：2017.12.06 06:08:51

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 159,716评论 4赞 364
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 67,558评论 1赞 294
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 109,431评论 0赞 244
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 44,127评论 0赞 209
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 52,511评论 3赞 287
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 40,692评论 1赞 222
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 31,915评论 2赞 313
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 30,664评论 0赞 202
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 34,412评论 1赞 246
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 30,616评论 2赞 245
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 32,105评论 1赞 260
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 28,424评论 2赞 254
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 33,098评论 3赞 238
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 26,096评论 0赞 8
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 26,869评论 0赞 197
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 35,748评论 2赞 276
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 35,641评论 2赞 271

Coursera ML(4)-Logistic Regression

Classification and Representation

Hypothesis Representation

Decision Boundary

Logistic Regression Model

Cost function for one variable hypothesis

Simplified Cost Function and Gradient Descent

Gradient Descent

Multiclass Classification: One-vs-all

Solving the Problem of Overfitting

The Problem of Overfitting

address the issue of overfitting

Cost Function

Regularized Linear Regression

Summary

Logistic Regression Model

$h_\theta(x)$是假设函数

Cost Function

Gradient Descent

Solving the Problem of Overfitting

推荐阅读更多精彩内容