# tapply

`tapply` 对向量的子集执行批处理操作。

``````> str(tapply)
function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
``````
• `X` 是一个向量
• `INDEX` 因子或因子的列表（或与因子相关）
• `FUN` 批处理的函数
• ... 其他传递给 `FUN` 函数
• `simplify` 是否简化结果

``````> x <- c(rnorm(10), runif(10), rnorm(10, 1))
> f <- gl(3, 10)
> f
[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
Levels: 1 2 3
> tapply(x, f, mean)
1         2         3
0.1045255 0.4867243 0.9131191
``````

``````> tapply(x, f, mean, simplify = FALSE)
\$`1`
[1] 0.1045255

\$`2`
[1] 0.4867243

\$`3`
[1] 0.9131191

``````

``````> tapply(x, f, range)
\$`1`
[1] -0.8040998  1.0022698

\$`2`
[1] 0.04577595 0.95238798

\$`3`
[1] -0.4422177  2.3863979

``````

## split

`split` 根据因子向量或因子列表分组。

``````> str(split)
function (x, f, drop = FALSE, ...)
``````
• `x` 可以是向量、列表、数据框is a vector (or list) or data frame
• `f` 因子或因子列表
• `drop` 是否去除因子水平为空的结果

``````> split(x, f)
\$`1`
[1]  0.06417511  0.77601085  1.66855356  1.38744423
[5] -0.90908770  0.39727163 -2.13528805  0.29087121
[9]  0.82936584  0.53773723

\$`2`
[1] 0.6646064 0.4408925 0.3199122 0.2156969 0.8358507
[6] 0.1408568 0.4088236 0.2258691 0.9606134 0.7945027

\$`3`
[1]  0.65276220  2.46645556  2.72756544  1.77246304
[5]  2.94941952  0.11977102 -0.04283368  2.36610370
[9]  0.44573942  2.31295594

``````

`lapply``split` 配合使用的例子。

``````> lapply(split(x, f), mean)
\$`1`
[1] 0.2907054

\$`2`
[1] 0.5007624

\$`3`
[1] 1.57704

``````

## 拆分数据框

``````> library(datasets)
Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

> s <- split(airquality, airquality\$Month)
> lapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))
\$`5`
Ozone  Solar.R     Wind
NA       NA 11.62258

\$`6`
Ozone   Solar.R      Wind
NA 190.16667  10.26667

\$`7`
Ozone    Solar.R       Wind
NA 216.483871   8.941935

\$`8`
Ozone  Solar.R     Wind
NA       NA 8.793548

\$`9`
Ozone  Solar.R     Wind
NA 167.4333  10.1800

``````

``````> sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))
5         6          7        8        9
Ozone         NA        NA         NA       NA       NA
Solar.R       NA 190.16667 216.483871       NA 167.4333
Wind    11.62258  10.26667   8.941935 8.793548  10.1800

> sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")],
na.rm = TRUE))
5            6             7            8           9
Ozone      23.61538     29.44444     59.115385    59.961538   31.44828
Solar.R   181.29630    190.16667    216.483871   171.857143  167.43333
Wind       11.62258     10.26667      8.941935     8.793548   10.18000
``````

## Splitting on More than One Level

``````> x <- rnorm(10)
> f1 <- gl(2, 5)
> f2 <- gl(5, 2)
> f1
[1] 1 1 1 1 1 2 2 2 2 2
Levels: 1 2
> f2
[1] 1 1 2 2 3 3 4 4 5 5
Levels: 1 2 3 4 5
> interaction(f1, f2)
[1] 1.1 1.1 1.2 1.2 1.3 2.3 2.4 2.4 2.5 2.5
10 Levels: 1.1 2.1 1.2 2.2 1.3 2.3 1.4 ... 2.5
``````

## Splitting on More than One Level

Interactions can create empty levels.

``````> str(split(x, list(f1, f2)))
List of 10
\$ 1.1: num [1:2] -0.378  0.445
\$ 2.1: num(0)
\$ 1.2: num [1:2] 1.4066 0.0166
\$ 2.2: num(0)
\$ 1.3: num -0.355
\$ 2.3: num 0.315
\$ 1.4: num(0)
\$ 2.4: num [1:2] -0.907  0.723
\$ 1.5: num(0)
\$ 2.5: num [1:2] 0.732 0.360
``````

## split

Empty levels can be dropped.

``````> str(split(x, list(f1, f2), drop = TRUE))
List of 6
\$ 1.1: num [1:2] -0.378  0.445
\$ 1.2: num [1:2] 1.4066 0.0166
\$ 1.3: num -0.355
\$ 2.3: num 0.315
\$ 2.4: num [1:2] -0.907  0.723
\$ 2.5: num [1:2] 0.732 0.360
``````