大学渣的ISLR笔记(7)-Moving Beyond Linearity

So far in this book, we have mostly focused on linear models. Linear models are relatively simple to describe and implement, and have advantages over other approaches in terms of interpretation and inference.

In Chapter 6 we see that we can improve upon least squares using ridge regression, the lasso, principal components regression, and other techniques. In that setting, the improvement is obtained by reducing the complexity of the linear model, and hence the variance of the estimates. But we are still using a linear model, which can only be improved so far!

In this chapter we relax the linearity assumption while still attempting to maintain as much interpretability as possible.We do this by examining very simple extensions of linear models like polynomial regression and step functions, as well as more sophisticated approaches such as splines, local regression, and generalized additive models.

• Polynomial regression extends the linear model by adding extra predictors, obtained by raising each of the original predictors to a power.For example, a cubic regression uses three variables,X,X2, and X3, as predictors. This approach provides a simple way to provide a nonlinear fit to data.

• Step functions cut the range of a variable into K distinct regions in order to produce a qualitative variable. This has the effect of fitting a piecewise constant function.

• Regression splines are more flexible than polynomials and step functions, and in fact are an extension of the two. They involve dividing the range of X into K distinct regions. Within each region, a polynomial function is fit to the data. However, these polynomials are constrained so that they join smoothly at the region boundaries, or knots. Provided that the interval is divided into enough regions, this can produce an extremely flexible fit.

• Smoothing splines are similar to regression splines, but arise in a slightly different situation. Smoothing splines result from minimizing a residual sum of squares criterion subject to a smoothness penalty.

• Local regression is similar to splines, but differs in an important way. The regions are allowed to overlap, and indeed they do so in a very smooth way.

• Generalized additive models allow us to extend the methods above to deal with multiple predictors.

Polynomial Regression


Generally speaking, it is unusual to use d greater than 3 or 4 because for large values of d , the polynomial curve can become overly flexible and can take on some very strange shapes.


Step Functions

Using polynomial functions of the features as predictors in a linear model imposes a global structure on the non-linear function of X. We can instead use step functions in order to avoid imposing such a global structure. Here we break the range of X into bins , and fit a different constant in each bin. This amounts to converting a continuous variable into an ordered categorical variable .


These are sometimes called dummy variables. Notice that for any value of X,C0(X)+C1(X)+. . .+CK(X) = 1, since X must be in exactly one of the K+ 1 intervals.



Basis Functions

Polynomial and piecewise-constant regression models are in fact special cases of a basis function approach.


Note that the basis functions b1(xi), b2(xi), . . . , bK(xi) are fixed and known.In other words, we choose the functions ahead of time.

For polynomial regression, the basis functions are bj(xi) =xji, and for piecewise constant functions they are bj(xi) = I(cjxi< cj+1).

We can think of as a standard linear model with predictors b1(xi), b2(xi), . . . , bK(xi).

Regression Splines

Piecewise Polynomials

Instead of fitting a high-degree polynomial over the entire range of X, piecewise polynomial regression involves fitting separate low-degree polynomials over different regions of X. For example, a piecewise cubic polynomial works by fitting a cubic regression model of the form:


where the coefficients β0,β1,β2, and β3 differ in different parts of the range of X. The points where the coefficients change are called knots .


Constraints and Splines


The Spline Basis Representation


A truncated power basis function is defined as:


where ξ is the knot.

In other words, in order to fit a cubic spline to a data set with K knots, we perform least squares regression with an intercept and 3+K predictors, of the form X,X2,X3, h (X, ξ1), h (X, ξ2), . . . , h (X, ξK), where ξ1, . . . , ξKare the knots. This amounts to estimating a total of K+ 4 regression coefficients; for this reason, fitting a cubic spline with K knots uses K +4 degrees of freedom.


Choosing the Number and Locations of the Knots

One way to do this is to specify the desired degrees of freedom, and then have the software automatically place the corresponding number of knots at uniform quantiles of the data.


How many knots should we use, or equivalently how many degrees of freedom should our spline contain? One option is to try out different numbers of knots and see which produces the best looking curve.

A somewhat more objective approach is to use cross-validation.


Comparison to Polynomial Regression

Regression splines often give superior results to polynomial regression.This is because unlike polynomials, which must use a high degree to produce flexible fits, splines introduce flexibility by increasing the number of knots but keeping the degree fixed.

Smoothing Splines

In the last section we discussed regression splines, which we create by specifying a set of knots, producing a sequence of basis functions, and then using least squares to estimate the spline coefficients. We now introduce a somewhat different approach that also produces a spline.

What we really want is a function g that makes RSS small, but that is also smooth .

How might we ensure that gis smooth? There are a number of ways to do this. A natural approach is to find the function g that minimizes:


where λ is a nonnegative tuning parameter.The function g that minimizes (7.11) is known as a smoothing spline .

Equation 7.11 takes the “Loss+Penalty” formulation that we encounter in the context of ridge regression and the lasso in Chapter 6.

a loss function that encourages g to fit the data well,and the penalty term that penalizes the variability in g.

Choosing the Smoothing Parameter λ

It is possible to show that as λ increases from 0  to ∞, the effective degrees of freedom, which we write dfλ, decrease from n to 2.


Local Regression

Local regression is a different approach for fitting flexible non-linear functions, which involves computing the fit at a target point x0 using only the nearby training observations.


Local regression is sometimes referred to as a memory-based procedure, because like nearest-neighbors, we need all the training data each time we wish to compute a prediction.


Generalized Additive Models

Generalized additive models(GAMs) provide a general framework for extending a standard linear model by allowing non-linear functions of each of the variables, while maintaining additivity.

Just like linear models, GAMs can be applied with both quantitative and qualitative responses.

GAMs for Regression Problems


This is an example of a GAM. It is called an additive model because we calculate a separate fj for each Xj, and then add together all of their contributions.


Pros and Cons of GAMs

GAMs allow us to fit a non-linear fj to each Xj, so that we can automatically model non-linear relationships that standard linear regression will miss. This means that we do not need to manually try out many different transformations on each variable individually.

The non-linear fits can potentially make more accurate predictions for the response Y .

Because the model is additive, we can still examine the effect of each Xj on Y individually while holding all of the other variables fixed. Hence if we are interested in inference, GAMs provide a useful representation.

The smoothness of the function fj for the variable Xj can be summarized via degrees of freedom.

The main limitation of GAMs is that the model is restricted to be additive. With many variables, important interactions can be missed. However, as with linear regression, we can manually add interaction terms to the GAM model by including additional predictors of the form Xj°øXk. In addition we can add low-dimensional interaction functions of the form fjk(Xj,Xk) into the model; such terms can be fit using two-dimensional smoothers such as local regression, or two-dimensional splines (not covered here).

GAMs for Classification Problems

Recall the logistic regression model:


Equation 7.18 is a logistic regression GAM.


最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 159,117评论 4 362
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 67,328评论 1 293
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 108,839评论 0 243
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 44,007评论 0 206
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 52,384评论 3 287
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 40,629评论 1 219
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 31,880评论 2 313
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 30,593评论 0 198
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 34,313评论 1 243
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 30,575评论 2 246
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 32,066评论 1 260
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 28,392评论 2 253
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 33,052评论 3 236
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 26,082评论 0 8
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 26,844评论 0 195
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 35,662评论 2 274
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 35,575评论 2 270

推荐阅读更多精彩内容

  • 今天,是老公的生日,我忙忙叨叨一天,晚上,宝贝们放学回家,一家人一起为老公庆生,就是简单的家常菜,可吃起来的...
    杜欣阳妈妈阅读 217评论 0 0
  • 很长一道时间,我基本上对很多事情无感。甚至打开电视都不知道要看什么,很多朋友推荐了许多综艺节目,基本上看了两三分钟...
    小丸子的铃铛阅读 100评论 0 0
  • 见过磕长头的人吗?他们的脸和手都很脏,可心灵却很干净。 ——《可可西里》 你大概要对自己的生活虔诚,对你的梦想虔诚...
    沈清欢dada阅读 379评论 1 2