10X单细胞(10X空间转录组)TCR转录组联合数据分析之TCRdist3(7)neighbor graph analysis(CoNGA)

今天我们来分享CoNGA的分析原理,其实我们最关系的就是graph-vs-graphgraph-vs-feature是怎么计算和运用的。

首先是TCR的分析(10X的结果文件)

Clonotype data from 10x genomics is first converted into a TCRdist 'clones file' and the matrix of TCRdist(这个软件提到过很多次了) distances is computed.

1、克隆数据的过滤
2、Kernel principal components analysis as implemented in scikitlearn's KernelPCA(这个方法在Seurat包中现在可以实现了) class is then used to extract the top 50 components of variation from this distance matrix。
3、these kernel PCs can be directly incorporated into the standard single-cell workflows for clustering and dimensionality reduction in place of the principal components extracted from the gene expression counts matrix.(直接进行类似单细胞转录组一样的后续分析)。降维用UMAP,聚类用louvain.
4、To annotate the Louvain clusters in CoNGA visualizations, the most frequent V segment in each cluster is identified and appended to the cluster name if it is present in at least 50% of the clustered TCRs, uppercased if present in at least 75% of the TCRs (clusters are initially named with consecutive integers, starting at 0 with the largest cluster)。

第二是TCR sequence features

1、For each clonotype, CoNGA calculates a set of TCR sequence-based scores for use in graph-vs-feature analysis and for annotating graph-vs-graph cluster pairs。首先,一组 28 个不同的氨基酸特性在 α 和 β 链 CDR3 环(不包括每个 CDR3 的前 4 个和最后 4 个残基,其中完整的 CDR3 序列定义为从保守的 半胱氨酸,并以 J 区中 GXG 基序之前的苯丙氨酸结尾并包括在内)。 这些分数包括由 VDJtools 包的作者从原始来源汇编的一组以及五个 Atchley 因素 。7个另外的测序分数也进行计算:
  • 'alphadist',当完整的基因片段集按基因组位置排序时,它测量 Alpha 和 Alpha 基因之间的序数距离
  • 'imhc', the iMHC score
  • 'cd8', a simple CD8-versus-CD4 preference score calculated from the TCR V and J gene usage, CDR3 length, and CDR3 amino acid composition, based on frequency differences between flow-sorted CD8+ and CD4+ TCR sequence repertoires。
  • 'cdr3len', total CDR3 length。
  • 'mait', which assigns a score of 1 to TCRs with an alpha chain using the TRAV1-2 and TRAJ33/TRAJ20/TRAJ12 segments (TRAV1 and TRAJ33 in mouse) and a CDR3 length of 12, and 0 to all other TCRs(这个在案例中使用)。
  • 'inkt', which assigns a score of 1 to TCRs with the TRAV10/TRAJ18/TRBV25 gene combination and a CDR3 length of 14, 15, or 16 (TRAV11/TRAJ18 and length 15 for mouse)
  • 'nndists_tcr', which measures the density of TCR sequences nearby the scored clonotype by calculating the average TCR distance to the nearest 1% of clonotypes
iMHC分数的定义,score是TCR序列特征的加权线性组合。
图片.png

接下来基因表达的分析,前面都一样,就是分析到PCA开始

这些基因表达 PC 用于通过采用 PC 空间中具有最小平均欧几里得距离的细胞与克隆型中的其他细胞来选择每个克隆型的单个代表性细胞。一旦数据集减少到每个克隆的单个细胞,UMAP 和 Louvain 聚类工具将应用于 PCA 矩阵以生成基因表达图谱和一组基因表达克隆型cluster。DEGs in clonotype groupings (for example the set of CoNGA hits in a cluster pair) are identified using the sc.tl.rank_genes_groups routine with the 'wilcoxon' method.(scanpy的分析方法,理解起来稍有难度)。当然,对于多样本的分析,还是要进行一定的批次去除,As it was not immediately obvious how to recover the processed gene expression components from the publicly available data, and as a test of CoNGA's robustness to alternative neighbor graphs,we elected to use the provided 3D UMAP coordinates in lieu of gene expression PCs for the CoNGA GEX neighbor calculations described below. We also directly borrowed the GEX clusters from the original paper rather than reclustering the dataset.

接下来重点1 Graph-vs-graph correlation analysis

In CoNGA graph-vs-graph correlation analysis, similarity graphs defined by gene expression and by TCR sequence are compared to identify vertices (clonotypes) whose neighbor sets in the two graphs overlap significantly.

分配给克隆型的 CoNGA 分数等于随机看到其 GEX 和 TCR 邻域之间相等或更大重叠的概率,乘以克隆型总数以校正多重测试。The hypergeometric distribution is used to estimate this probability, as implemented in the scipy.stats module。

Two types of similarity graphs can be used in CoNGA: K nearest neighbor (KNN) graphs, in which each clonotype is connected to its K nearest neighbors in gene expression or TCR space;and cluster graphs, in which each clonotype is connected to all the clonotypes in the same (GEX or TCR) cluster.

The neighbor number K for constructing KNN graphs is specified as a fraction of the total number of clones;for the calculations reported here, neighbor fractions of 0.01 and 0.1 were used.

The CoNGA score assigned to a clonotype is the minimum score over all graph comparisons, of which there were 6 combinations in the calculations reported here (GEX_KNN vs TCR_KNN, GEX_KNN vs TCR_cluster, and GEX_cluster vs TCR_KNN, for both the 0.01 and 0.1 KNN neighbor fractions).(有点难)。This may reflect correlation between neighborhoods of nearby clonotypes, which reduces the effective multiple-testing burden.

重点2 Graph-vs-feature correlation analysis

In CoNGA graph-vs-feature correlation analysis, numerical features defined on the basis of one property (GEX or TCR) are mapped onto similarity graphs defined by the other property, and graph neighborhoods with biased score distributions are identified.

As GEX properties we consider the expression levels of all the individual genes as well as a feature ('nndists_gex') that captures the density of nearby clonotypes by calculating the average distance in GEX space to the nearest 1% of the clonotypes.TCR的这个分析上面介绍过了。

As this analysis involves a large number of differential expression calculations (roughly the number of clonotypes times the number of different similarity graphs times the number of features), we use a two-step procedure that combines a pre-filter with the t-test followed by the more time-intensive Mann-Whitney-Wilcoxon (MWW) calculation for the top 100 hits per clonotype and graph that pass a t-test significance threshold ten times higher than the target threshold. The final significance score assigned to a detected association equals the raw MWW P-value multiplied by the product of the number of clonotypes and the number of features, to correct for multiple testing(计算的有点夸张啊)。

方法就到这里,有点难,一遍可能无法完全理解,下一篇我们分享代码

生活很好,有你更好

©著作权归作者所有,转载或内容合作请联系作者
禁止转载,如需转载请通过简信或评论联系作者。
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 157,298评论 4 360
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 66,701评论 1 290
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 107,078评论 0 237
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 43,687评论 0 202
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 52,018评论 3 286
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 40,410评论 1 211
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 31,729评论 2 310
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 30,412评论 0 194
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 34,124评论 1 239
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 30,379评论 2 242
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 31,903评论 1 257
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 28,268评论 2 251
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 32,894评论 3 233
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 26,014评论 0 8
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 26,770评论 0 192
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 35,435评论 2 269
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 35,312评论 2 260

推荐阅读更多精彩内容