1027 chapter11 使用forcats

简介

forcats 处理分类变量,因子比字符容易处理

创建因子

①创建有效水平的列表

x1<- c("Dec", "Apr","Jan","Mar")
month_level <- c(
"Jan","Feb","Mar","Apr","May","Jun",
"July","Aug","Sep","Oct","Nov","Dec")

② 创建因子

y1 <- factor(x1,levels = month_level)

不在集合内的会转换为NA

x2 <- c("Dec", "Apr","Jam","Mar")
y2 <- factor(x2,levels = month_level)
  • 省略定义水平的步骤,则按字母排序
  • 因子顺序与初始数据保持一致,两种方法 :a. 水平设置为unique(x) b. 创建因子后对其使用fct_inorder()函数
  • 直接访问因子的有效水平集合,levels()

修改因子水平

fct_recode()对每个水平进行修改或者重新编码
fct_collapse()合并多个水平

p159 练习题

思路: 先用fct_collapse()进行partyid的分类合并,然后通过group_by根据年份统计人数,最后使用gg_plot画出折线图,横坐标是三个分类,时间变化,纵坐标是人数变化,答案如下。

gss_cat %>%
  mutate(partyid =
           fct_collapse(partyid,
                        other = c("No answer", "Don't know", "Other party"),
                        rep = c("Strong republican", "Not str republican"),
                        ind = c("Ind,near rep", "Independent", "Ind,near dem"),
                        dem = c("Not str democrat", "Strong democrat"))) %>%
  count(year, partyid)  %>%
  group_by(year) %>%
  mutate(p = n / sum(n)) %>%
  ggplot(aes(x = year, y = p,
             colour = fct_reorder2(partyid, year, p))) +
  geom_point() +
  geom_line() +
  labs(colour = "Party ID.")

推荐阅读更多精彩内容