Seurat Weekly NO.0 || 开刊词

作为纪念开个专栏:每周精选Seurat社区有趣的问答以飨国内Seurat的用户。

为什么要做这个事情呢?

  • 翻译学习法
  • 认知学习法
  • 通勤学习法
  • 追踪工具动态
  • 相信我,你并不孤独

因为国内的单细胞数据分析人员越来越多,而Seurat为单细胞数据分析提供了一个很好的思考框架,当然Seurat发表的文章有很多。大家在应用这个单细胞数据分析工具的时候不免会遇到这样或那样的问题,这虽然是让人苦恼的,但是以为伟人说过:好的问题要比好的答案价值百倍。于是,我在Seurat的github上watch了所有的问题,所以我的个人邮箱是这样的:

所以在通勤路上看看大家都踩了Seurat的哪些坑,Seurat社区成员又是如何回复的,不能不说是一件有意思的事儿。何况还能学习一下英语啦_.

计划是每周精选不多于10条的Issues,在翻译和复现这些issue的时候也会掺杂一些自己对Seurat的学习心得,以促进Seurat学者了解这个项目。

目前github上面大部分的交流用的还是英语,我们为什么不用汉语在上面交流呢?那样就达不到学习英语的目的了吧。

凡事善始者实繁,克终者盖寡。Seurat Weekly能做几期呢?我们不能承诺,毕竟是一项无人资助的事业,看缘分呢。

最为NO.0,今天主要探索一下选题与排版的一般规则。

选题的第一个规则就是某个话题出现的频率,我们相信一个有很多人愿意讨论的话题更大概率是一个好问题。其次是主编本人觉得有意思的问题。

至于排版,我们按照一般的FAQ的形式给出,问题部分我们给出原文描述,答案就是主编在看过别人的回答之后,夹杂着个人观点的品论。

每个问题,我们也给出在github的链接,方便读者朋友就原问题进行解答。

下面走几个例子:

Getting parameters from pre-existing reduction (umap) (#3053)

github地址: https://github.com/satijalab/seurat/issues/3053

这位oh111 的意思应该是手里有了Seurat对象,想知道这个对象之前的作者是如何操作的。其实Seurat是给了每个操作过程的参数记录的:

就拿我们seurat自带的数据集来看吧:

Seurat::pbmc_small@commands

这返回的是一个list,他说想看PCA的参数,其实这样就可以了:

Seurat::pbmc_small@commands$RunPCA.RNA
Command: RunPCA(object = pbmc_small, features = VariableFeatures(object = pbmc_small),     verbose = FALSE)
Time: 2018-08-28 04:34:56
assay : RNA 
features : PPBP IGLL5 VDAC3 CD1C AKR1C3 PF4 MYL9 GNLY TREML1 CA2 SDPR PGRMC1 S100A8 TUBB1 HLA-DQA1 PARVB RUFY1 HLA-DPB1 RP11-290F20.3 S100A9 
compute.dims : 20 
rev.pca : FALSE 
weight.by.var : TRUE 
verbose : FALSE 
print.dims : 1 2 3 4 5 
features.print : 30 
reduction.name : pca 
reduction.key : PC 
seed.use : 42 

Reorder cells by expression value in DoHeatmap & Dendogram (#3036)

github 地址:https://github.com/satijalab/seurat/issues/3036

这是一个可视化细节的问题,有时候我们用默认参数出来的图并不能直接达到发表的水平,或者自己有些小心思想改一下这个图,最常见的就是该标签的顺序。其实我们知道这个一般是通过因子变量的levels来控制的。

有位仁兄给出了这个链接: https://stackoverflow.com/questions/52136211/how-to-reorder-cells-in-doheatmap-plot-in-seurat-ggplot2

那么我们来看一下Seurat的绘图是如何实现的,看DoHeatmapd的源代码:

 DoHeatmap
function (object, features = NULL, cells = NULL, group.by = "ident", 
    group.bar = TRUE, group.colors = NULL, disp.min = -2.5, disp.max = NULL, 
    slot = "scale.data", assay = NULL, label = TRUE, size = 5.5, 
    hjust = 0, angle = 45, raster = TRUE, draw.lines = TRUE, 
    lines.width = NULL, group.bar.height = 0.02, combine = TRUE) 
{
    cells <- cells %||% colnames(x = object)
    if (is.numeric(x = cells)) {
        cells <- colnames(x = object)[cells]
    }
    assay <- assay %||% DefaultAssay(object = object)
    DefaultAssay(object = object) <- assay
    features <- features %||% VariableFeatures(object = object)
    features <- rev(x = unique(x = features))
    disp.max <- disp.max %||% ifelse(test = slot == "scale.data", 
        yes = 2.5, no = 6)
    possible.features <- rownames(x = GetAssayData(object = object, 
        slot = slot))
    if (any(!features %in% possible.features)) {
        bad.features <- features[!features %in% possible.features]
        features <- features[features %in% possible.features]
        if (length(x = features) == 0) {
            stop("No requested features found in the ", slot, 
                " slot for the ", assay, " assay.")
        }
        warning("The following features were omitted as they were not found in the ", 
            slot, " slot for the ", assay, " assay: ", paste(bad.features, 
                collapse = ", "))
    }
    data <- as.data.frame(x = as.matrix(x = t(x = GetAssayData(object = object, 
        slot = slot)[features, cells, drop = FALSE])))
    object <- suppressMessages(expr = StashIdent(object = object, 
        save.name = "ident"))
    group.by <- group.by %||% "ident"
    groups.use <- object[[group.by]][cells, , drop = FALSE]
    plots <- vector(mode = "list", length = ncol(x = groups.use))
    for (i in 1:ncol(x = groups.use)) {
        data.group <- data
        group.use <- groups.use[, i, drop = TRUE]
        if (!is.factor(x = group.use)) {
            group.use <- factor(x = group.use)
        }
        names(x = group.use) <- cells
        if (draw.lines) {
            lines.width <- lines.width %||% ceiling(x = nrow(x = data.group) * 
                0.0025)
            placeholder.cells <- sapply(X = 1:(length(x = levels(x = group.use)) * 
                lines.width), FUN = function(x) {
                return(RandomName(length = 20))
            })
            placeholder.groups <- rep(x = levels(x = group.use), 
                times = lines.width)
            group.levels <- levels(x = group.use)
            names(x = placeholder.groups) <- placeholder.cells
            group.use <- as.vector(x = group.use)
            names(x = group.use) <- cells
            group.use <- factor(x = c(group.use, placeholder.groups), 
                levels = group.levels)
            na.data.group <- matrix(data = NA, nrow = length(x = placeholder.cells), 
                ncol = ncol(x = data.group), dimnames = list(placeholder.cells, 
                  colnames(x = data.group)))
            data.group <- rbind(data.group, na.data.group)
        }
        lgroup <- length(levels(group.use))
        plot <- SingleRasterMap(data = data.group, raster = raster, 
            disp.min = disp.min, disp.max = disp.max, feature.order = features, 
            cell.order = names(x = sort(x = group.use)), group.by = group.use)
        if (group.bar) {
            default.colors <- c(hue_pal()(length(x = levels(x = group.use))))
            cols <- group.colors[1:length(x = levels(x = group.use))] %||% 
                default.colors
            if (any(is.na(x = cols))) {
                cols[is.na(x = cols)] <- default.colors[is.na(x = cols)]
                cols <- Col2Hex(cols)
                col.dups <- sort(x = unique(x = which(x = duplicated(x = substr(x = cols, 
                  start = 1, stop = 7)))))
                through <- length(x = default.colors)
                while (length(x = col.dups) > 0) {
                  pal.max <- length(x = col.dups) + through
                  cols.extra <- hue_pal()(pal.max)[(through + 
                    1):pal.max]
                  cols[col.dups] <- cols.extra
                  col.dups <- sort(x = unique(x = which(x = duplicated(x = substr(x = cols, 
                    start = 1, stop = 7)))))
                }
            }
            group.use2 <- sort(x = group.use)
            if (draw.lines) {
                na.group <- RandomName(length = 20)
                levels(x = group.use2) <- c(levels(x = group.use2), 
                  na.group)
                group.use2[placeholder.cells] <- na.group
                cols <- c(cols, "#FFFFFF")
            }
            pbuild <- ggplot_build(plot = plot)
            names(x = cols) <- levels(x = group.use2)
            y.range <- diff(x = pbuild$layout$panel_params[[1]]$y.range)
            y.pos <- max(pbuild$layout$panel_params[[1]]$y.range) + 
                y.range * 0.015
            y.max <- y.pos + group.bar.height * y.range
            plot <- plot + annotation_raster(raster = t(x = cols[group.use2]), 
                xmin = -Inf, xmax = Inf, ymin = y.pos, ymax = y.max) + 
                coord_cartesian(ylim = c(0, y.max), clip = "off") + 
                scale_color_manual(values = cols)
            if (label) {
                x.max <- max(pbuild$layout$panel_params[[1]]$x.range)
                x.divs <- pbuild$layout$panel_params[[1]]$x.major
                x <- data.frame(group = sort(x = group.use), 
                  x = x.divs)
                label.x.pos <- tapply(X = x$x, INDEX = x$group, 
                  FUN = median) * x.max
                label.x.pos <- data.frame(group = names(x = label.x.pos), 
                  label.x.pos)
                plot <- plot + geom_text(stat = "identity", data = label.x.pos, 
                  aes_string(label = "group", x = "label.x.pos"), 
                  y = y.max + y.max * 0.03 * 0.5, angle = angle, 
                  hjust = hjust, size = size)
                plot <- suppressMessages(plot + coord_cartesian(ylim = c(0, 
                  y.max + y.max * 0.002 * max(nchar(x = levels(x = group.use))) * 
                    size), clip = "off"))
            }
        }
        plot <- plot + theme(line = element_blank())
        plots[[i]] <- plot
    }
    if (combine) {
        plots <- CombinePlots(plots = plots)
    }
    return(plots)
}
<bytecode: 0x11837bb8>
<environment: namespace:Seurat>

我们发现Seurat的图是用ggplot2实现的,这样我们就可以基于函数的返回对象来用自己的ggplot功底修复了:

require(Seurat)
p <- DoHeatmap(pbmc_small)

p$theme
p$layers
p$mapping
 head(p$data)
        Feature           Cell Expression Identity
1        S100A9 ATGCCAGAACGACT -0.7639656        0
2 RP11-290F20.3 ATGCCAGAACGACT -0.3730316        0
3      HLA-DPB1 ATGCCAGAACGACT -1.0399941        0
4         RUFY1 ATGCCAGAACGACT -0.4098329        0
5         PARVB ATGCCAGAACGACT -0.3461794        0
6      HLA-DQA1 ATGCCAGAACGACT -0.6717070        0

Time series dataset in two conditions #3040

github 地址:https://github.com/satijalab/seurat/issues/3040

这是一个常见的问题,多个样本在许多地方还是值得探讨的,这位回答者直接给出了两篇参考文献:

I would recommend that you review the Guided tutorial, Multiple dataset integration (specifically the SCT method), and the Stimulated vs Control PBMCs, in that order. Due the large probable size of your final dataset, I think that the manipulations found within the Mouse Cell Atlas Vignette might also prove useful. The VIsualization and Cell cycle regression vignettes were also particularly helpful for our analysis of similar conditions. Lastly, I recommend that you review the methods in Farnsworth et al. 2020 as well as Soldatov et al., 2019 for further help. Hope that this was useful.

Optimal strategy to process samples to compare two different condition. (#3019)

这个其实和上个问题比较像,都是处理多个样本,这里也有人士给出:

I wouldn't recommend using cellranger aggr as it will downsample everything to a similar number of counts as the sample with the lowest counts, and so potentially will throw out a lot of data. You could quantify each dataset (using cellranger, or another tool like Alevin) and first merge them in Seurat and check if there are batch differences between the datasets. If there are, you could run the integration methods

Deploying Shiny apps using Seurat library to shinyapps.io #2716

github 地址: https://github.com/satijalab/seurat/issues/2716

这是一类问题属于对Seurat的扩展,这里是想把Seurat扩展到Shiny程序中,也就是给他界面化。这要求再开发者不单要对Seurat的函数和数据对象及其依赖的R包和环境有所了解,还要求他要懂得Shiny的语法和结构。其实这个问题下的讨论多是想要弥补Seurat与Shiny的界限。

好了,今天就到这里,感谢大家的陪伴。对了,你在github上面提问了没?

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 159,835评论 4 364
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 67,598评论 1 295
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 109,569评论 0 244
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 44,159评论 0 213
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 52,533评论 3 287
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 40,710评论 1 222
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 31,923评论 2 313
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 30,674评论 0 203
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 34,421评论 1 246
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 30,622评论 2 245
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 32,115评论 1 260
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 28,428评论 2 254
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 33,114评论 3 238
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 26,097评论 0 8
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 26,875评论 0 197
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 35,753评论 2 276
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 35,649评论 2 271