“emmeans” package

The emmeans package is one of several alternatives to facilitate post hoc methods application and contrast analysis. It is a relatively recent replacement for the lsmeans package that some R users may be familiar with. It is intended for use with a wide variety of ANOVA models, including repeated measures and nested designs where the initial modeling would employ aov, lm, ez or lme4 (mixed models).

emmeans包是一些R用户可能熟悉的lsmeans包的相对较新的替代品。它适用于多种方差分析模型,包括重复测量和嵌套设计,其中初始建模将使用‘aov’、‘lm’、‘ez’或‘lme4’(混合模型)。

1. Using emmeans for pairwise post hoc multiple comparisons.
1. 用emmeans来进行两两事后多重比较

Initially, a minimal illustration is presented. First is a “pairwise” approach to followup comparisons, with a p-value adjustment equivalent to the Tukey test. The emmeans function requires a model object to be passed as the first argument. We could use either fit1 (the aov object) or fit2 (the lm object) originally created in the base Anova section of this document.

首先,给出了一个简短的说明。首先是后续比较的“成对”方法,p值的矫正用Tukey检验。emmeans函数要求将模型对象作为第一个参数传递。我们可以使用fit1(AOV对象)或fit2(lm对象),这些对象最初是在本文的基础Anova部分中创建的。

# 用emmeans()对主效应进行事后检验,并且做多重比较矫正。
#library(emmeans)
# reminder: fit.2 <- lm(dv~factora, data=hays)
fit2.emm.a <- emmeans(fit.2, "factora", data=hays)
pairs(fit2.emm.a, adjust="tukey")
##  contrast       estimate   SE df t.ratio p.value
##  control - fast      6.2 1.99 27  3.113  0.0117 
##  control - slow      5.6 1.99 27  2.811  0.0239 
##  fast - slow        -0.6 1.99 27 -0.301  0.9513 
## 
## P value adjustment: tukey method for comparing a family of 3 estimates
plot(fit2.emm.a, comparisons = TRUE)
#pairs(fit2.emm.a, adjust="none")

The blue bars on the plot are the confidence intervals. The red arrowed lines represent a scheme to determine homogeneous groups. If the red lines overlap for two groups, then they are not significantly different using the method chosen.

图上的蓝色条是置信区间。红色箭头线表示确定同质组的方案。如果两组的红线重叠,则表示它们没有显著差异。

The adjust argument can take one of several useful methods. ‘tukey’ is default, but others including ‘sidak,’ ‘bonferroni,’ etc can be specified. Specifying ‘none’ produces unadjusted p-values. See help with ‘?emmeans::summary.emmGrid’ for details. Here is an example using the ‘holm’ method of adjustment.

adjust参数可以采用几种有用的方法之一。默认为tukey,但可以指定其他名称,包括 sidakbonferroni等。指定None会产生未矫正的p值。有关详细信息,请参阅?emMeans::Summary y.emmGrid的帮助。下面是一个使用Holm矫正方法的示例。

library(emmeans)
fit2.emm.b <- emmeans(fit.2, "factora", data=hays)
pairs(fit2.emm.b, adjust="holm")
##  contrast       estimate   SE df t.ratio p.value
##  control - fast      6.2 1.99 27  3.113  0.0131 
##  control - slow      5.6 1.99 27  2.811  0.0181 
##  fast - slow        -0.6 1.99 27 -0.301  0.7655 
## 
## P value adjustment: holm method for 3 tests
plot(fit2.emm.b, comparisons = TRUE)

Interaction analysis in emmeans

https://cran.r-project.org/web/packages/emmeans/vignettes/interactions.html#contrasts

emmeans package, Version 1.7.0

Models in which predictors interact seem to create a lot of confusion concerning what kinds of post hoc methods should be used. It is hoped that this vignette will be helpful in shedding some light on how to use the emmeans package effectively in such situations.
交互作用的模型似乎造成了很多困惑,不知道应该使用什么样的“事后”方法。希望这一简介 将有助于揭示如何在这种情况下有效地使用emmeans包。

Contents

  1. Interacting factors
    1. Simple contrasts
  2. Interaction contrasts
  3. Multivariate contrasts
  4. Interactions with covariates
  5. Summary

Index of all vignette topics

Interacting factors

As an example for this topic, consider the auto.noise dataset included with the package. This is a balanced 3x2x2 experiment with three replications. The response – noise level – is evaluated with different sizes of cars, types of anti-pollution filters, on each side of the car being measured.1
作为本主题的示例,请考虑软件包附带的auto.noise数据集。这是一个平衡的3x2x2实验,有三个重复。因变量-噪音水平-是通过测量汽车两侧不同大小的汽车、不同类型的防污染过滤器来评估的。
让我们拟合一个模型并获得ANOVA表(由于数据的规模,我们认为响应是以十分之一分贝记录的;因此我们通过对响应进行缩放来补偿这一点):
Let’s fit a model and obtain the ANOVA table (because of the scale of the data, we believe that the response is recorded in tenths of decibels; so we compensate for this by scaling the response):

noise.lm <- lm(noise/10 ~ size * type * side, data = auto.noise)
anova(noise.lm)
## Analysis of Variance Table
## 
## Response: noise/10
##                Df  Sum Sq Mean Sq  F value    Pr(>F)
## size            2 260.514 130.257 893.1905 < 2.2e-16
## type            1  10.563  10.563  72.4286 1.038e-08
## side            1   0.007   0.007   0.0476 0.8291042
## size:type       2   8.042   4.021  27.5714 6.048e-07
## size:side       2  12.931   6.465  44.3333 8.730e-09
## type:side       1   0.174   0.174   1.1905 0.2860667
## size:type:side  2   3.014   1.507  10.3333 0.0005791
## Residuals      24   3.500   0.146

There are statistically strong 2- and 3-way interactions.
有显著的的二重和三重交互作用

One mistake that a lot of people seem to make is to proceed too hastily to estimating marginal means (even in the face of all these interactions!). They would go straight to analyses like this:
许多人似乎犯的一个错误是过于匆忙地估算边际均值(即使面对所有这些交互!)。他们会直接进行这样的分析:

emmeans(noise.lm, pairwise ~ size)
## NOTE: Results may be misleading due to involvement in interactions
## $emmeans
##  size emmean     SE df lower.CL upper.CL
##  S     82.42 0.1102 24    82.19    82.64
##  M     83.38 0.1102 24    83.15    83.60
##  L     77.25 0.1102 24    77.02    77.48
## 
## Results are averaged over the levels of: type, side 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast estimate    SE df t.ratio p.value
##  S - M      -0.958 0.156 24  -6.147  <.0001
##  S - L       5.167 0.156 24  33.140  <.0001
##  M - L       6.125 0.156 24  39.287  <.0001
## 
## Results are averaged over the levels of: type, side 
## P value adjustment: tukey method for comparing a family of 3 estimates

The analyst-in-a-hurry would thus conclude that the noise level is higher for medium-sized cars than for small or large ones.
匆忙的分析师会因此得出结论,中型汽车的噪音水平要高于小型或大型汽车。
But as is seen in the message before the output, emmeans() valiantly tries to warn you that it may not be a good idea to average over factors that interact with the factor of interest. It isn’t always a bad idea to do this, but sometimes it definitely is.
但正如输出之前的消息所示,' emmeans() '勇敢地试图警告您,对与感兴趣的因素交互的因素进行平均可能不是一个好主意。这样做并不总是一个坏主意,但有时确实是。

What about this time? I think a good first step is always to try to visualize the nature of the interactions before doing any statistical comparisons. The following plot helps.
应该怎么做呢? 我认为,在做任何统计比较之前,最好的第一步总是尝试将相互作用的本质形象化。下面的情节有帮助。

emmip(noise.lm, type ~ size | side)
image.png

Examining this plot, we see that the “medium” mean is not always higher; so the marginal means, and the way they compare, does not represent what is always the case. Moreover, what is evident in the plot is that the peak for medium-size cars occurs for only one of the two filter types. So it seems more useful to do the comparisons of size separately for each filter type. This is easily done, simply by conditioning on type:
通过这张图,我们可以看到“中等”平均值并不总是更高;所以边际均值,和它们比较的方式,并不代表通常的情况。此外,从图中可以明显看出,中型汽车的峰值只出现在两种过滤类型中的一种。因此,对每种过滤类型分别进行大小比较似乎更有用。这很容易做到,只需对类型进行条件设置:

emm_s.t <- emmeans(noise.lm, pairwise ~ size | type)
## NOTE: Results may be misleading due to involvement in interactions
emm_s.t
## $emmeans
## type = Std:
##  size emmean     SE df lower.CL upper.CL
##  S     82.58 0.1559 24    82.26    82.91
##  M     84.58 0.1559 24    84.26    84.91
##  L     77.50 0.1559 24    77.18    77.82
## 
## type = Octel:
##  size emmean     SE df lower.CL upper.CL
##  S     82.25 0.1559 24    81.93    82.57
##  M     82.17 0.1559 24    81.84    82.49
##  L     77.00 0.1559 24    76.68    77.32
## 
## Results are averaged over the levels of: side 
## Confidence level used: 0.95 
## 
## $contrasts
## type = Std:
##  contrast estimate   SE df t.ratio p.value
##  S - M     -2.0000 0.22 24  -9.071  <.0001
##  S - L      5.0833 0.22 24  23.056  <.0001
##  M - L      7.0833 0.22 24  32.127  <.0001
## 
## type = Octel:
##  contrast estimate   SE df t.ratio p.value
##  S - M      0.0833 0.22 24   0.378  0.9245
##  S - L      5.2500 0.22 24  23.812  <.0001
##  M - L      5.1667 0.22 24  23.434  <.0001
## 
## Results are averaged over the levels of: side 
## P value adjustment: tukey method for comparing a family of 3 estimates

Not too surprisingly, the statistical comparisons are all different for standard filters, but with Octel filters, there isn’t much of a difference between small and medium size.

For comparing the levels of other factors, similar judgments must be made. It may help to construct other interaction plots with the factors in different roles. In my opinion, almost all meaningful statistical analysis should be grounded in evaluating the practical impact of the estimated effects first, and seeing if the statistical evidence backs it up. Those who put all their attention on how many asterisks (I call these people “* gazers”) are ignoring the fact that these don’t measure the sizes of the effects on a practical scale.2 An effect can be practically negligible and still have a very small P value – or practically important but have a large P value – depending on sample size and error variance. Failure to describe what is actually going on in the data is a failure to do an adequate analysis. Use lots of plots, and think about the results. For more on this, see the discussion of P values in the “basics” vignette.

为了比较其他因素的水平,必须做出类似的判断。这可能有助于构建与不同角色的因素之间的其他相互作用图。在我看来,几乎所有有意义的统计分析都应该首先评估估计效果的实际影响,看看统计证据是否支持这一点。那些把所有注意力都放在有多少个星号(我把这些人称为“*观察者”)上的人忽略了这样一个事实,即这些星号并不能在实际尺度上衡量效应的大小。根据样本大小和误差方差,效应实际上可以忽略不计,仍然有一个非常小的P值--或者实际上很重要,但有一个很大的P值。未能描述数据中的实际情况就是没有进行充分的分析。使用大量的情节,并考虑结果。有关这方面的更多信息,请参阅 “basics” vignette中关于P值的讨论。

Simple contrasts

An alternative way to specify conditional contrasts or comparisons is through the use of the simple argument to contrast() or pairs(), which amounts to specifying which factors are not used as by variables. For example, consider:

指定条件对比或比较的另一种方法是使用参数 simple作为contrast()pairs(),这相当于指定哪些因素不是不用作 by变量。例如,请考虑:

noise.emm <- emmeans(noise.lm, ~ size * side * type)

Then pairs(noise.emm, simple = "size") is the same as pairs(noise.emm, by = c("side", "type")).

simple的含义是,用作比较的变量。
by的含义是,不用做比较的变量。

One may specify a list for simple, in which case separate runs are made with each element of the list. Thus, pairs(noise.emm, simple = list("size", c("side", "type"))) returns two sets of contrasts: comparisons of size for each combination of the other two factors; and comparisons of side*type combinations for each size.

可以为 simple指定一个list,在这种情况下,对列表的每个元素进行单独的运行。因此,pairs(noise.emm, simple = list("size", c("side", "type")))返回两组对比:其他两个因素的每个组合的大小比较;以及每个大小的side*type组合的比较。

A shortcut that generates all simple main-effect comparisons is to use simple = "each". In this example, the result is the same as obtained using simple = list("size", "side", "type").

simple = "each"表示,对所有主效应进行比较,也就是固定另外两个水平,看某个水平两两比较的结果,注意是所有的。所以其实是三种,一种是固定size和side,比较type。一种是固定size和type,比较side。一种是固定side和type,比较size。等价于 simple = list("size", "side", "type").

Ordinarily, when simple is a list (or equal to "each"), a list of contrast sets is returned. However, if the additional argument combine is set to TRUE, they are all combined into one family:

通常,当simple为列表(或等于each)时,会返回对比集列表。但是,如果额外的参数combine设置为TRUE,结果会全部合并为一个,如下:

contrast(noise.emm, "consec", simple = "each", combine = TRUE, adjust = "mvt")
##  side type  size contrast    estimate    SE df t.ratio p.value
##  L    Std   .    M - S          1.500 0.312 24   4.811  0.0013
##  L    Std   .    L - M         -8.667 0.312 24 -27.795  <.0001
##  R    Std   .    M - S          2.500 0.312 24   8.018  <.0001
##  R    Std   .    L - M         -5.500 0.312 24 -17.639  <.0001
##  L    Octel .    M - S         -0.333 0.312 24  -1.069  0.9768
##  L    Octel .    L - M         -5.667 0.312 24 -18.174  <.0001
##  R    Octel .    M - S          0.167 0.312 24   0.535  0.9999
##  R    Octel .    L - M         -4.667 0.312 24 -14.967  <.0001
##  .    Std   S    R - L         -1.833 0.312 24  -5.880  0.0001
##  .    Std   M    R - L         -0.833 0.312 24  -2.673  0.1713
##  .    Std   L    R - L          2.333 0.312 24   7.483  <.0001
##  .    Octel S    R - L         -0.500 0.312 24  -1.604  0.7748
##  .    Octel M    R - L          0.000 0.312 24   0.000  1.0000
##  .    Octel L    R - L          1.000 0.312 24   3.207  0.0555
##  L    .     S    Octel - Std   -1.000 0.312 24  -3.207  0.0562
##  L    .     M    Octel - Std   -2.833 0.312 24  -9.087  <.0001
##  L    .     L    Octel - Std    0.167 0.312 24   0.535  0.9999
##  R    .     S    Octel - Std    0.333 0.312 24   1.069  0.9768
##  R    .     M    Octel - Std   -2.000 0.312 24  -6.414  <.0001
##  R    .     L    Octel - Std   -1.167 0.312 24  -3.742  0.0168
## 
## P value adjustment: mvt method for 20 tests

The dots (.) in this result correspond to which simple effect is being displayed. If we re-run this same call with combine = FALSE or omitted, these twenty comparisons would be displayed in three broad sets of contrasts, each broken down further by combinations of by variables, each separately multiplicity-adjusted (a total of 16 different tables).

点(.)。在该结果中对应于正在显示哪种简单效果。如果我们使用Combine=False或省略重新运行相同的调用,这20个比较将显示为三组对比,每个对比由by变量的组合进一步细分,每个变量都经过单独的多重性调整(总共16个不同的表)。

Back to Contents

Interaction contrasts

An interaction contrast is a contrast of contrasts. For instance, in the auto-noise example, we may want to obtain the linear and quadratic contrasts of size separately for each type, and compare them. Here are estimates of those contrasts:
交互对比就是对比的对比。
例如,在自动噪声示例中,我们可能希望分别获得每种类型的尺寸的线性和二次对比,并对它们进行比较。以下是对这些对比的估计:

contrast(emm_s.t[[1]], "poly")   ## 'by = "type"' already in previous result 
## type = Std:
##  contrast  estimate    SE df t.ratio p.value
##  linear       -5.08 0.220 24 -23.056  <.0001
##  quadratic    -9.08 0.382 24 -23.786  <.0001
## 
## type = Octel:
##  contrast  estimate    SE df t.ratio p.value
##  linear       -5.25 0.220 24 -23.812  <.0001
##  quadratic    -5.08 0.382 24 -13.311  <.0001
## 
## Results are averaged over the levels of: side

The comparison of these contrasts may be done using the interaction argument in contrast() as follows:

IC_st <- contrast(emm_s.t[[1]], interaction = c("poly", "consec"), by = NULL)
IC_st
##  size_poly type_consec estimate    SE df t.ratio p.value
##  linear    Octel - Std   -0.167 0.312 24  -0.535  0.5979
##  quadratic Octel - Std    4.000 0.540 24   7.407  <.0001
## 
## Results are averaged over the levels of: side

(Using by = NULL restores type to a primary factor in these contrasts.) The practical meaning of this is that there isn’t a statistical difference in the linear trends, but the quadratic trend for Octel is greater than for standard filter types. (Both quadratic trends are negative, so in fact it is the standard filters that have more pronounced downward curvature, as is seen in the plot.) In case you need to understand more clearly what contrasts are being estimated, the coef() method helps:

(使用BY=NULL可将类型恢复为这些对比中的主要因素。)。其实际意义是线性趋势没有统计学差异,但OCTEL的二次趋势大于标准滤波器类型。(这两个二次趋势都是负的,因此实际上是标准滤镜具有更明显的向下曲率,如图所示。)。如果您需要更清楚地了解所估计的对比度,coef()方法会有所帮助:

coef(IC_st)
##   size  type c.1 c.2
## 1    S   Std   1  -1
## 2    M   Std   0   2
## 3    L   Std  -1  -1
## 4    S Octel  -1   1
## 5    M Octel   0  -2
## 6    L Octel   1   1

Note that the 4th through 6th contrast coefficients are the negatives of the 1st through 3rd – thus a comparison of two contrasts.

By the way, “type III” tests of interaction effects can be obtained via interaction contrasts:

test(IC_st, joint = TRUE)
##  df1 df2 F.ratio p.value
##    2  24  27.571  <.0001

This result is exactly the same as the F test of size:type in the anova output.

The three-way interaction may be explored via interaction contrasts too:

contrast(emmeans(noise.lm, ~ size*type*side),
         interaction = c("poly", "consec", "consec"))
##  size_poly type_consec side_consec estimate    SE df t.ratio p.value
##  linear    Octel - Std R - L          -2.67 0.624 24  -4.276  0.0003
##  quadratic Octel - Std R - L          -1.67 1.080 24  -1.543  0.1359

One interpretation of this is that the comparison by type of the linear contrasts for size is different on the left side than on the right side; but the comparison of that comparison of the quadratic contrasts, not so much. Refer again to the plot, and this can be discerned as a comparison of the interaction in the left panel versus the interaction in the right panel.
对此的一种解释是,大小的线性对比的类型在左侧与右侧是不同的;但是对二次对比的比较,就不是那么多了。再次参考该图,可以将左面板中的交互与右面板中的交互进行比较来辨别。

最后,emmeans提供了一个joint_tests()函数,该函数获取并测试模型中所有效果的交互对比,并将它们编译到一个类似Type-III-ANOVA的表中:
Finally, emmeans provides a joint_tests() function that obtains and tests the interaction contrasts for all effects in the model and compiles them in one Type-III-ANOVA-like table:

joint_tests(noise.lm)
##  model term     df1 df2 F.ratio p.value
##  size             2  24 893.190  <.0001
##  type             1  24  72.429  <.0001
##  side             1  24   0.048  0.8291
##  size:type        2  24  27.571  <.0001
##  size:side        2  24  44.333  <.0001
##  type:side        1  24   1.190  0.2861
##  size:type:side   2  24  10.333  0.0006

You may even add by variable(s) to obtain separate ANOVA tables for the remaining factors:

joint_tests(noise.lm, by = "side")
## side = L:
##  model term df1 df2 F.ratio p.value
##  size         2  24 651.714  <.0001
##  type         1  24  46.095  <.0001
##  size:type    2  24  23.524  <.0001
## 
## side = R:
##  model term df1 df2 F.ratio p.value
##  size         2  24 285.810  <.0001
##  type         1  24  27.524  <.0001
##  size:type    2  24  14.381  0.0001

Back to Contents

Multivariate contrasts

多变量对比

在前面的章节中,我们处理相互作用的因素的方法是在其他因素的水平上分别对某些因素()进行比较或对比。这导致了大量的估计和相关的测试。
In the preceding sections, the way we addressed interacting factors was to do comparisons or contrasts of some factors()) separately at levels of other factor(s). This leads to a lot of estimates and associated tests.

Another approach is to compare things in a multivariate way. In the auto-noise example, for example, we have four means (corresponding to the four combinations of type and size) with each size of car, and we could consider comparing these sets of means. Such multivariate comparisons can be done via the Mahalanobis distance (a kind of standardized distance measure) between one set of four means and another. This is facilitated by the mvcontrast() function:
另一种方法是用多变量的方式进行比较。例如,在汽车噪声的例子中,我们有四个平均值(对应于类型和大小的四种组合)来处理每种大小的汽车,我们可以考虑比较这几组平均值。这种多变量比较可以通过一组四个均值与另一个均值之间的马氏距离(一种标准化距离度量)来完成。这可以通过mvcontrast()函数来实现:

mvcontrast(noise.emm, "pairwise", mult.name = c("type", "side"))
##  contrast T.square df1 df2 F.ratio p.value
##  S - M      88.857   4  21  19.438  <.0001
##  S - L    1199.429   4  21 262.375  <.0001
##  M - L    1638.000   4  21 358.312  <.0001
## 
## P value adjustment: sidak

In this output, the T.square values are Hotelling’s <nobr aria-hidden="true" style="transition: none 0s ease 0s; border: 0px; padding: 0px; margin: 0px; max-width: none; max-height: none; min-width: 0px; min-height: 0px; vertical-align: 0px; line-height: normal; text-decoration: none; white-space: nowrap !important;">T2</nobr><math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>T</mi><mn>2</mn></msup></math> statistics, which are the squared Mahalanobis distances among the sets of four means. These results thus accomplish a similar objective as the initial comparisons presented in this vignette, but are not complicated by the issue that the factors interact. (Instead, we lose the directionality of the comparisons.) While all comparisons are “significant,” the T.square values indicate that large cars are statistically most different from the other sizes.

We may still break things down using by variables. Suppose, for example, we wish to compare the two filter types for each size of car, without regard to which side:

update(mvcontrast(noise.emm, "consec", mult.name = "side", by = "size"), 
       by = NULL)
##  contrast    size T.square df1 df2 F.ratio p.value
##  Octel - Std S      11.429   2  23   5.476  0.0113
##  Octel - Std M     123.714   2  23  59.280  <.0001
##  Octel - Std L      14.286   2  23   6.845  0.0047
## 
## P value adjustment: sidak

One detail to note about multivariate comparisons: in order to make complete sense, all the factors involved must interact. Suppose we were to repeat the initial multivariate comparison after removing all interactions:

mvcontrast(update(noise.emm, submodel = ~ side + size + type), 
           "pairwise", mult.name = c("type", "side"))
##  contrast T.square df1 df2  F.ratio p.value
##  S - M      37.786   1  24   37.786  <.0001
##  S - L    1098.286   1  24 1098.286  <.0001
##  M - L    1543.500   1  24 1543.500  <.0001
## 
## P value adjustment: sidak 
## NOTE: Some or all d.f. are reduced due to singularities

Note that each F ratio now has 1 d.f. Also, note that T.square = F.ratio, and you can verify that these values are equal to the squares of the t.ratios in the initial example in this vignette ((−6.147)2=37.786, etc.). That is, if we ignore all interactions, the multivariate tests are exactly equivalent to the univariate tests of the marginal means.

Back to Contents

Interactions with covariates

与协变量的交互作用

当一个协变量和一个因素相互作用时,我们通常不需要EMMs本身,而是对每个因素水平的协变量趋势的斜率进行估计。举个简单的例子,考虑纤维数据集,并拟合一个包含直径(协变量)和机器(因子)之间交互作用的模型:
When a covariate and a factor interact, we typically don’t want EMMs themselves, but rather estimates of slopes of the covariate trend for each level of the factor. As a simple example, consider the fiber dataset, and fit a model including the interaction between diameter (a covariate) and machine (a factor):

fiber.lm <- lm(strength ~ diameter*machine, data = fiber)

This model comprises fitting, for each machine, a separate linear trend for strength versus diameter. Accordingly, we can estimate and compare the slopes of those lines via the emtrends() function:

emtrends(fiber.lm, pairwise ~ machine, var = "diameter")
## $emtrends
##  machine diameter.trend    SE df lower.CL upper.CL
##  A                1.104 0.194  9    0.666     1.54
##  B                0.857 0.224  9    0.351     1.36
##  C                0.864 0.208  9    0.394     1.33
## 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast estimate    SE df t.ratio p.value
##  A - B     0.24714 0.296  9   0.835  0.6919
##  A - C     0.24008 0.284  9   0.845  0.6863
##  B - C    -0.00705 0.306  9  -0.023  0.9997
## 
## P value adjustment: tukey method for comparing a family of 3 estimates

We see the three slopes, but no two of them test as being statistically different.

To visualize the lines themselves, you may use

emmip(fiber.lm, machine ~ diameter, cov.reduce = range)
image.png

The cov.reduce = range argument is passed to ref_grid(); it is needed because by default, each covariate is reduced to only one value (see the “basics” vignette). Instead, we call the range() function to obtain the minimum and maximum diameter.
将cov.duce=range参数传递给ref_grid();它是必需的,因为在默认情况下,每个协变量只减少到一个值(请参阅“基本”插页)。相反,我们调用range()函数来获取最小和最大直径。

对于更复杂的示例,请考虑包中包含的橙子数据集。这些数据涉及两种橙子的销售情况。价格(price1和price2)在不同的商店和不同的时间进行了实验变化,并观察了Sales1和Sales2的反应。让我们考虑这些数据的三个多变量模型,对天数和商店有相加效应,并对价格进行不同程度的拟合:
For a more sophisticated example, consider the oranges dataset included with the package. These data concern the sales of two varieties of oranges. The prices (price1 and price2) were experimentally varied in different stores and different days, and the responses sales1 and sales2 were observed. Let’s consider three multivariate models for these data, with additive effects for days and stores, and different levels of fitting on the prices:

org.quad <- lm(cbind(sales1, sales2) ~ poly(price1, price2, degree = 2)
                                       + day + store, data = oranges)
org.int <- lm(cbind(sales1, sales2) ~ price1 * price2 + day + store, data = oranges)
org.add <- lm(cbind(sales1, sales2) ~ price1 + price2 + day + store, data = oranges)

Being a multivariate model, emmeans methods will distinguish the responses as if they were levels of a factor, which we will name “variety”. Moreover, separate effects are estimated for each multivariate response, so there is an implied interaction between variety and each of the predictors involving price1 and price2. (In org.int, there is an implied three-way interaction.) An interesting way to view these models is to look at how they predict sales of each variety at each observed values of the prices:
作为一种多变量模型,EMMeans方法将区分不同的反应,就好像它们是某一因素的水平,我们将其命名为“多样性”。此外,对每个多变量响应的单独影响进行了估计,因此在品种和涉及价格1和价格2的每个预测因子之间存在隐含的交互作用。(在org.int中,存在隐含的三向交互。)。看待这些模型的一个有趣方法是,看看它们如何预测每个品种在每个观察到的价格值下的销售额:

emmip(org.quad, price2 ~ price1 | variety, mult.name = "variety", cov.reduce = FALSE)
image.png

The trends portrayed here are quite sensible: In the left panel, as we increase the price of variety 1, sales of that variety will tend to decrease – and the decrease will be faster when the other variety of oranges is low-priced. In the right panel, as price of variety 1 increases, sales of variety 2 will increase when it is low-priced, but could decrease also at high prices because oranges in general are just too expensive. A plot like this for org.int will be similar but all the curves will be straight lines; and the one for plot.add will have all lines parallel. In all models, though, there are implied price1:variety and price2:variety interactions, because we have different regression coefficients for the two responses.

Which model should we use? They are nested models, so they can be compared by anova():

anova(org.quad, org.int, org.add)
## Analysis of Variance Table
## 
## Model 1: cbind(sales1, sales2) ~ poly(price1, price2, degree = 2) + day + 
##     store
## Model 2: cbind(sales1, sales2) ~ price1 * price2 + day + store
## Model 3: cbind(sales1, sales2) ~ price1 + price2 + day + store
##   Res.Df Df Gen.var.   Pillai approx F num Df den Df Pr(>F)
## 1     20      22.798                                       
## 2     22  2   21.543 0.074438  0.38658      4     40 0.8169
## 3     23  1   23.133 0.218004  2.64840      2     19 0.0967

It seems like the full-quadratic model has little advantage over the interaction model. There truly is nothing magical about a P value of 0.05, and we have enough data that over-fitting is not a hazard; so I like org.int. However, what follows could be done with any of these models.

To summarize and test the results compactly, it makes sense to obtain estimates of a representative trend in each of the left and right panels, and perhaps to compare them. In turn, that can be done by obtaining the slope of the curve (or line) at the average value of price2. The emtrends() function is designed for exactly this kind of purpose. It uses a difference quotient to estimate the slope of a line fitted to a given variable. It works just like emmeans() except for requiring the variable to use in the difference quotient. Using the org.int model:

emtrends(org.int, pairwise ~ variety, var = "price1", mult.name = "variety")
## $emtrends
##  variety price1.trend    SE df lower.CL upper.CL
##  sales1        -0.749 0.171 22   -1.104   -0.394
##  sales2         0.138 0.214 22   -0.306    0.582
## 
## Results are averaged over the levels of: day, store 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast        estimate   SE df t.ratio p.value
##  sales1 - sales2   -0.887 0.24 22  -3.690  0.0013
## 
## Results are averaged over the levels of: day, store

From this, we can say that, starting with price1 and price2 both at their average values, we expect sales1 to decrease by about .75 per unit increase in price1; meanwhile, there is a suggestion of a slight increase of sales2, but without much statistical evidence. Marginally, the first variety has a 0.89 disadvantage relative to sales of the second variety.

Other analyses (not shown) with price2 set at a higher value will reduce these effects, while setting price2 lower will exaggerate all these effects. If the same analysis is done with the quadratic model, the the trends are curved, and so the results will depend somewhat on the setting for price1. The graph above gives an indication of the nature of those changes.

Similar results hold when we analyze the trends for price2:

emtrends(org.int, pairwise ~ variety, var = "price2", mult.name = "variety")
## $emtrends
##  variety price2.trend    SE df lower.CL upper.CL
##  sales1         0.172 0.102 22  -0.0404    0.384
##  sales2        -0.745 0.128 22  -1.0099   -0.480
## 
## Results are averaged over the levels of: day, store 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast        estimate    SE df t.ratio p.value
##  sales1 - sales2    0.917 0.143 22   6.387  <.0001
## 
## Results are averaged over the levels of: day, store

At the averages, increasing the price of variety 2 has the effect of decreasing sales of variety 2 while slightly increasing sales of variety 1 – a marginal difference of about .92.

Back to Contents

Summary

Interactions, by nature, make things more complicated. One must resist pressures and inclinations to try to produce simple bottom-line conclusions. Interactions require more work and more patience; they require presenting more cases – more than are presented in the examples in this vignette – in order to provide a complete picture.
从本质上讲,交互会让事情变得更加复杂。人们必须顶住压力和倾向,试图得出简单的底线结论。交互需要更多的工作和更多的耐心;它们需要展示更多的案例--比这个简介中的例子更多--以便提供一个完整的图景。

所有简介的索引
Index of all vignette topics


  1. I sure wish I could ask some questions about how how these data were collected; for example, are these independent experimental runs, or are some cars measured more than once? The model is based on the independence assumption, but I have my doubts.↩︎

我当然希望我能问一些问题,关于这些数据是如何收集的;例如,这些是独立的实验运行,还是有些汽车测量了不止一次?这个模型是建立在独立性假设的基础上的,但我对此持怀疑态度。↩︎。

  1. You may have noticed that there are no asterisks in the ANOVA table in this vignette. I habitually opt out of star-gazing by including options(show.signif.stars = FALSE) in my .Rprofile file.↩︎
    您可能已经注意到,在这个小插曲中,ANOVA表中没有星号。我习惯性地通过在我的.Rprofile文件中加入选项(show.signif.star=false)来退出观星。↩︎
emmip(Ratio_raw_fit, Intention ~ Outcome | Interventiontype)
# 三重交互检验的方法有两种:
# 1. 简单简单效应检验。看一个因素在另外两个因素水平的结合上的处理效应。
# 2. 简单交互作用检验。看在一个因素的各个水平上,另外两个因素的交互作用。

# 1. 简单简单效应检验。
# Intention * Outcome * Interventiontype每一种组合下的,边际均值。
Ratio.emm <- emmeans(Ratio_raw_fit, ~ Intention * Outcome * Interventiontype)
# 先看  Intention 和 Outcome,每一种组合下,Interventiontype 主效应的差异。简单简单效应分析
joint_tests(Ratio_raw_fit , by=c("Intention","Outcome"))
# 固定 Intention 和 Outcome,看每一种组合下,Interventiontype两两比较的差异。simple的含义是,用作比较的变量。
pairs(Ratio.emm, simple = "Interventiontype",adjust="bonferroni") 
# # 上一句等价于下面一句,by的含义是,不用做比较的变量。也就是水平固定的变量。
# pairs(Ratio.emm, by = c("Intention","Outcome"),adjust="bonferroni")

# 2. 简单交互作用检验。
# simple参数可以指定一个list格式的参数,含义是,对 list 里的每个元素进行单独的运行。
# 比如在此例子中,运行 Interventiontype,也就是,(固定Intention和Outcome)比较 Interventiontype 不同水平的差异。
# 运行 c("Intention", "Outcome"),也就是(固定Interventiontype)对 Intention和Outcome 每一种,进行两两比较。
pairs(Ratio.emm, simple = list("Interventiontype", c("Intention", "Outcome")))

pairs(Ratio.emm, simple = "each")
contrast(Ratio.emm, "consec", simple = "each", adjust = "mvt")
contrast(Ratio.emm, "consec", simple = "each", combine = TRUE, adjust = "mvt")

a<-emmeans(Ratio_raw_fit,pairwise~Intention*Outcome|Interventiontype,interaction=TRUE)
a$contrasts
pairs(a$contrasts, by = NULL,adjust="bonferroni")

emmeans(Ratio_raw_fit,pairwise~Intention*Outcome|Interventiontype,interaction=FALSE,adjust="bonferroni")

joint_tests(Ratio_raw_fit , by = "Interventiontype")
               
# emms1 <- emmeans(Ratio_raw_fit, ~ Intention*Outcome|Interventiontype)
# con1 <- contrast(emms1, interaction = "pairwise")
# pairs(con1, by = NULL)
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 158,233评论 4 360
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 67,013评论 1 291
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 108,030评论 0 241
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 43,827评论 0 204
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 52,221评论 3 286
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 40,542评论 1 216
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 31,814评论 2 312
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 30,513评论 0 198
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 34,225评论 1 241
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 30,497评论 2 244
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 31,998评论 1 258
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 28,342评论 2 253
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 32,986评论 3 235
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 26,055评论 0 8
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 26,812评论 0 194
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 35,560评论 2 271
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 35,461评论 2 266

推荐阅读更多精彩内容

  • 我们将微生物群落组成研究分为两个主要部分:(1)分类多样性、OTUS和类群的假设检验;(2)类群间差异分析。第一个...
    ZMQ要加油呀阅读 1,670评论 0 2
  • 微生物群落研究的主要目标是比较不同群落的组成(β多样性)。在第6章介绍了β多样性,并举例说明了如何计算β多样性指数...
    ZMQ要加油呀阅读 3,036评论 0 5
  • 方差分析的基本思想及应用条件 方差分析的基本思想 在进行科学研究时,有时要按实验设计将所研究的对象分为多个处理组进...
    backup备份阅读 7,304评论 0 10
  • 原英文文档地址:https://raw.githubusercontent.com/kassambara/rsta...
    王诗翔阅读 5,585评论 0 14
  • 20180502(从有道迁移) 方差分析 当包含的因子是解释变量时,我们关注的重点通常会从预测转向组别差异的分析,...
    KrisKC阅读 516评论 0 0