# 1. p34变量类型

int, dbl, chr, dttm, lgl, fctr , date

# 2. dplyr基础

• 核心函数包括： filter(), arrange(),select(), mutate(), summarize(),group_by(),

# 3.filter()

<,<=,>,>=,!=,==

## -逻辑运算符

• &，|，！，&&，||
• `nov_dec <- filter(flights, month %in% c(11,12))`
x %in%y ,取出x是y中的一个值的所有行

## 缺失值

filter()只能筛选出值为TRUE的行
`filter(df,is.na() | x >1)`

# p38练习题

## 1.

a. `filter(flights, arr_delay >=120)`
b. `View(flights)`
`filter(flights, dest == 'IAH' | dest == 'HOU')`
c. `nycflights13::airlines`
`filter(flights, carrier %in% c('UA','AA','DL'))`
d. `summer.flights <- filter(flights, month %in% c(7,8,9)` ||答案：`filter(flights, month >= 7, month <= 9)`
e. `filter(flights, dep_delay == 0 & arr_delay >120)` || 答案： `filter(flights, dep_delay <= 0, arr_delay > 120)`
f. 此题理解为出发时间至少延误一小时，但是到达时间延误时间少了半小时 `filter(flights, dep_delay >= 60, dep_delay - arr_delay > 30)`
g.`filter(flights, dep_time >= 0 & dep_time <=6)` ❌ ||注意到午夜的特殊性，答案：`filter(flights, dep_time <= 600 | dep_time == 2400)` 或者 `filter(flights, dep_time %% 2400 <= 600)`

## 2.

`?between()`后，This is a shortcut for x >= left & x <= right, implemented efficiently in C++ for local values, and translated to the appropriate SQL for remote tables.

## 3.

`filter(flights, is.na(dep_time == NA))``filter(flights, is.na(dep_time))` 到达时间也有缺失值，应该是取消的航班

## 4.

NA | TRUE, 逻辑或运算，只要有一个值为真，结果即为真
FALSE & NA 逻辑与运算，有一个值为假，结果即为假
for all finite, numeric x，x0 = 0，但NA * 0 不等于0，因为，x * ∞ and x−∞ is undefined. R represents undefined results as `NaN`, which is an abbreviation of “[not a number]

# 4. arrange()

## 常用函数

`arrange(flights,year, month,day)`
`arrange(flights,desc(arr_delay))` # 降序排列

## p40 练习题

### 1.

`arrange(flights, desc(is.na(dep_time)), dep_time)`

### 2.

2013年1月9日，9：00应该出发的，HA 51, JFK to HNL,延迟了1301分钟。

### 3.

`arrange(flights, desc(distance/air_time))` ||答案：`arrange(flights, distance / air_time * 60)`

### 4.

`arrange(flights, desc(air_time))` ||

# 5. select()

• 函数多次计入一个变量名，会自动去重复，只保留第一次
• `select(flights,year,month,day)`
• `select(flights, year:day)`
-`select(flghts, -(year:day))` #选择不在“year”和”day“之间的所有列
• 辅助函数
-默认忽略大小写，如需修改，`select(flights, contains("TIME", ignore.case = FALSE))`

-`starts_with("abc")`, `ends_with("xyz")`, `contains("ijk")`,`matches`

• `rename(flights, tail_num = tailnum`)
• `select(flights, time_hour,air_time,everything())` # 把几个变量移到数据框开头

# p42 练习题

## 1.

• `select(flights, dep_time, dep_delay,arr_time,arr_delay)` || 答案：`select(flights, "dep_time", "dep_delay", "arr_time", "arr_delay")` # Specifying all the variables as strings.
`select(flights, 4, 5, 6, 9)` # 使用列的序号，因为列的位置常变动，这种做法不是很推荐。
`select(flights, one_of(c("dep_time", "dep_delay", "arr_time", "arr_delay")))` #几个变量存储在向量中。
``````variables <- c("dep_time", "dep_delay", "arr_time", "arr_delay")
select(flights, one_of(variables))
``````

`select(flights, starts_with("dep_"), starts_with("arr_"))`
`select(flights, matches("^(dep|arr)_(time|delay)\$"))`

## 2.

`select(flights, year, month, day, year, year)`

## 4.

