课程论文翻译

古代杂交事件为慈鲷科鱼类的适应辐射提供动力

Ancient hybridization fuels rapid cichlid fish adaptive radiations

摘要 Abstract

在进化生物学中,理解为何一些进化谱系产生了异常高的物种多样性是一个重要的目标。非洲维多利亚湖地区的Haplochrominis族慈鲷包含了在过去的15万年中形成的超过700个物种。这个维多利亚湖地区的“超群”如何在如此短的时间尺度上迅速进化是一个存在已久的问题。在此,我们证明了两个分歧的谱系之间的杂交通过提供遗传突变促进了这一过程,这些遗传突变在杂交后被重组、分选到许多新物种中。值得注意的是,杂交事件在已知涉及适应和物种形成的视蛋白基因上产生了异常的等位基因突变。更普遍的情况是,与父母谱系确定不同的突变会突出新物种间的分化,这种现象在出现适应辐射现象的新物种组合中出现。我们得出的结论是:在与生态机会一致的情况下,分歧谱系之间的杂交可能促进快速、发散的适应辐射。

Understanding why some evolutionary lineages generate exceptionally high species diversity is an important goal in evolutionary biology. Haplochromine cichlid fishes of Africa’s Lake Victoria region encompass >700 diverse species that all evolved in the last 150,000 years. How this ‘Lake Victoria Region Superflock’ could evolve on such rapid timescales is an enduring question. Here, we demonstrate that hybridization between two divergent lineages facilitated this process by providing genetic variation that subsequently became recombined and sorted into many new species. Notably, the hybridization event generated exceptional allelic variation at an opsin gene known to be involved in adaptation and speciation. More generally, differentiation between new species is accentuated around variants that were fixed differences between the parental lineages, and that now appear in many new combinations in the radiation species. We conclude that hybridization between divergent lineages, when coincident with ecological opportunity, may facilitate rapid and extensive adaptive radiation.

引言 Introduction

谱系中突变速率的异质性是形成生物多样性的重要因素。然而,这种异质性背后的生物和环境因素尚未被完全了解1。适应辐射的特征是表现出生态适应多样性的物种快速形成事件,因此成为了研究这些因素的主要研究对象。适应辐射需要在生态和生殖隔离相关的性状中产生高度的遗传突变。然而,连续的物种形成过程间需要不断出现新的相关突变,适应辐射对于这来说通常太快了,因此适应辐射更有可能由已产生的突变产生2。种间杂交可以即刻加快遗传突变,从而加速物种形成和适应辐射3,4,5,6,7,8,9

Heterogeneity in diversification rates among lineages is a major factor shaping biodiversity. Yet, biological and environmental factors underlying this variation are incompletely understood1. Adaptive radiations are prime study systems to learn about these factors as they are characterized by a rapid origin of many species showing a diversity of ecological adaptations. This process requires high levels of heritable variation in traits related to ecological and reproductive isolation. However, adaptive radiations are often too rapid for the emergence of new relevant mutations between successive speciation events2 and are thus more likely to stem from standing variation. Hybridization between species can instantaneously boost genetic variation, which may facilitate speciation and adaptive radiation3,4,5,6,7,8,9.

在适应辐射的成员之间的杂交已经被认为可能有助于物种形成,这个观点被称为“杂婚假说”3。性状的基因渗入参与适应或生殖隔离的现象已经在几个适应辐射案例的成员中被证实了(例如,实蝇科Rhagoletis属果蝇与宿主变化相关的性状10;Heliconius属蝴蝶的翅膀图案11,12、达尔文雀的喙形13)。在其他适应辐射的案例中,一些物种推测存在杂交的祖先,但基因渗入性状和物种形成之间的直接联系还有待进一步的验证(例如坦葛尼喀湖14,15,16、马拉维湖17、维多利亚湖18,19、巴隆比波湖20的慈鲷)。

Hybridization among members of an adaptive radiation, has been suggested to potentially facilitate speciation events, an idea known as the ‘syngameon hypothesis’3. Introgression of traits involved in adaptation or reproductive isolation has been demonstrated among members of several adaptive radiations (for example, traits related to host shift in Rhagoletis fruit flies10, wing patterns in Heliconiusbutterflies11,12 or beak shape in Darwin’s finches13). In other radiations, the hybrid ancestry of some species has been inferred, but a direct link between introgressed traits and speciation awaits further testing (for example, cichlid fishes of Lakes Tanganyika14,15,16, Malawi17, Victoria18,19and Barombi Mbo20).

另一个假说与杂婚假说截然不同,认为杂交在适应辐射现象中起着更基础的作用,不同谱系之间的杂交可能导致整个适应辐射现象的发生3。在不同谱系间发生二次接触,且在新环境的定居期间针对杂种的选择较弱时,这样的杂交可能是常见的4。在这种情况下,杂种群的形成可以通过如下方式来加速适应辐射:(1)提供可以重组为自然选择和配偶选择有利的新性状组合的有功能的遗传突变(2)破坏限制亲本谱系进化的遗传相关性3。除此以外,当引起两个亲本物种的生殖隔离的多个稳定的差异解耦并分离成杂种群时,杂交可能会加速物种形成,这时,针对不相容的基因组合的选择可以产生多于2个生殖隔离的物种21,22,23

Another hypothesis for a perhaps more fundamental role of hybridization in adaptive radiation, distinct from the ‘syngameon hypothesis’, is the idea that hybridization between distinct lineages may seed the onset of an entire adaptive radiation3. Such hybridization can be common when allopatric lineages come into secondary contact4, and selection against hybrids may be weak during colonization of new environments. In this situation, the formation of a hybrid swarm, if coincident with ecological opportunity, may accelerate adaptive radiation by (i) providing functional genetic variation that can recombine into novel trait combinations favoured by selection and mate choice, and (ii) breaking genetic correlations that constrained the evolvability of parental lineages3. In addition, hybridization may facilitate speciation when multiple fixed differences that confer reproductive isolation between the two parental species decouple and segregate in a hybrid swarm, such that selection against incompatible gene combinations can generate more than two new reproductively isolated species21,22,23.

显然,这个“适应辐射的杂种群起源”假说更加难以验证。到目前为止,唯一被粗略验证过拥有杂种群起源的适应辐射现象是夏威夷银剑菊,它是从两个北美麻迪菊物种的异源多倍体杂种种群适应辐射而来的24。因为基因和基因组的重复也被认为会促进适应辐射的发生25,所以在这些案例中,很难将中间杂交和基因、基因组的重复带来的影响区分开来。目前,已经在阿尔卑斯白鲑26、马拉维湖的慈鲷科“mbuna”群的适应辐射27、异源多倍体的夏威夷本土唇形科植物28,29和其他夏威夷的多倍体植物30的案例中发现了与适应辐射的杂种群起源相吻合的证据。然而,在这些系统中,杂交事件是否晚于或早于适应辐射起始的时间点,以及杂交驱动的多态性是否在物种形成和适应多样化中起到影响,还有待验证。

This ‘hybrid swarm origin of adaptive radiation’ hypothesis has been more challenging to test. So far the only adaptive radiation for which a hybrid origin has been robustly demonstrated is the Hawaiian silverswords, which have radiated from an allopolyploid hybrid population between two North American tarweed species24. Because gene and genome duplication are also proposed to facilitate adaptive radiation25, it is difficult to distinguish between effects of hybridization per se and those of gene or genome duplication in this case. Evidence consistent with a hybrid swarm origin of entire radiations has also been found in Alpine whitefish26, the ‘mbuna’ group of the Lake Malawi cichlid fish radiation27, and allopolyploid Hawaiian endemic mints28,29and possibly other polyploid plant radiations on Hawaii30. However, it remains to be tested if, in these systems, hybridization occurred before or after the radiation had started, and if hybridization-derived polymorphisms played a role in speciation and adaptive diversification.

维多利亚湖地区的慈鲷“超群”(Lake Victoria Region Superflock of cichlid fish,下称LVRS)是一个包含超过700种Haplochromine族慈鲷的群体,这些慈鲷是维多利亚湖以及周边约10-20万年前分化出的湖泊系统中的本土物种31,32,33,34。这个“超群”包含了数个适应辐射群体,每个群体位于所在区域的主湖中(维多利亚湖、爱德华湖、阿尔伯特湖、基伍湖)。主湖中最大的是维多利亚湖,它拥有至少500个在过去15000年中进化出来的本土物种34,35,36。每个适应辐射群体在栖息地占领、营养生态、颜色和行为上有着惊人的多样性。尽管LVRS非常年轻,但是它们拥有高水平的分化速率和核基因组突变34,37,这证明大量的稳定突变在适应辐射事件开始时就已经出现2,3。先前的研究发现在重建系统发育树时,LVRS与几个河流慈鲷物种出现了细胞核不协调性。这说明在古代可能存在两个已发生分歧的物种之间的杂交事件,但是目前无从考证37。已分歧物种二次接触时的杂交并非不可能事件,因为不同地区的在数百万年前就已分开的慈鲷可以在实验室条件下很容易地产生可育后代38

The Lake Victoria Region Superflock of cichlid fish (LVRS) is a group of 700 haplochromine cichlid species endemic to the region around Lake Victoria and nearby western rift lakes in East Africa that started diversifying about 100–200 thousand years ago31,32,33,34. It includes several adaptive radiations, one in each of the major lakes of the region (Lakes Victoria, Edward, Albert and Kivu). The largest of them is in Lake Victoria, which has at least 500 endemic species that evolved in the past 15,000 years34,35,36. Each radiation comprises enormous diversity in habitat occupation, trophic ecology, colouration and behaviour. The high diversification rate but also the high nuclear genomic variation in the LVRS despite its young age34,37 suggest that large amounts of standing genetic variation must have been present at the onset of the radiation2,3. Previous work showing cytonuclear discordance in phylogenetic reconstructions between LVRS and several riverine cichlid species raised the possibility of ancient hybridization between divergent species at the base of the radiation but could not demonstrate it37. Hybridization on secondary contact is not unlikely, as allopatric cichlid species, divergent by even millions of years, readily produce fertile offspring in the lab38.

通过使用取样自各个非洲主要河流系统的Haplochromine族慈鲷和来自维多利亚湖地区的所有慈鲷谱系的代表物种的基因组数据,我们在此证明了维多利亚湖地区的慈鲷物种是由一个杂种群进化而来的。每个湖泊中的辐射适应群体都显现出非常相似的混合的祖先的比例,这些混合的祖先源于两个远缘相关的Haplochromine族谱系,这两个谱系在不同的河流系统中独立进化了超过一百万年。我们发现的证据表明,这种杂交事件通过提供已被重组和分选到许多新物种中的遗传突变促进随后的适应辐射事件。来自亲本家系的稳定突变在较年轻的维多利亚湖慈鲷之间出现了明显分化,但在不同物种中出现了许多突变的新组合。值得注意的是,涉及维多利亚湖丽鱼科鱼类适应和物种形成的视蛋白基因的两个主要等位基因39,40可能来自两个亲本家系中的一个。这表明该基因在LVRS中的突变的主要部分来源于这些谱系之间的杂交。我们的研究结果表明,与生态机会相一致的相对较远相关物种之间的杂交可以促进快速适应性辐射。因此,即使发生在遥远的过去,杂交也可能对理解谱系之间现存物种丰富度的变化以及近期多样化速率的变化具有重要意义。

Using genomic data from riverine haplochromine cichlids sampled from all major African drainage systems, and representative species from all lineages within the Lake Victoria region, we demonstrate here that the LVRS evolved from a hybrid swarm. All lake radiations show very similar proportions of mixed ancestry derived from two distantly related haplochromine lineages that had evolved in isolation from one another in different river systems for more than a million years before hybridizing in the Lake Victoria region. We find evidence that this hybridization event facilitated subsequent adaptive radiation by providing genetic variation that has been recombined and sorted into many new species. Variants that were fixed between the parental lineages show accentuated differentiation between young Lake Victoria species, but appear in many new combinations in the different species. Notably, each of the two major allele classes of an opsin gene involved in adaptation and speciation among Lake Victoria cichlids39,40 is likely derived from one of the two parental lineages. This indicates that a major part of the variation at this gene segregating in the LVRS stems from hybridization between these lineages. Our results suggest that hybridization between relatively distantly related species, when coincident with ecological opportunity, may facilitate rapid adaptive radiation. Thus, hybridization, even in the distant past, may have important implications for understanding variation in extant species richness between lineages as well as variation in recent rates of diversification.

结果 Results

识别LVRS的最近亲缘关系

Identifying the closest relatives of the LVRS

为了识别LVRS的最近亲缘关系,我们对来自LVRS和来自Haplochromine族慈鲷栖息的主要河流系统的的Haplochromine族慈鲷进行了全面采样(图1,补充数据1)。我们基于3.15Mb的RAD-seq测序结果(436,166个SNP,8.1%缺失数据,补充数据2)和两个线粒体标记(1,897bp)重建了最大可能性系统发育树。在基于核的系统发育分析中,整个LVRS形成了一个具有强有力支持的分支,该分支还并包含了来自埃及尼罗河下游的物种(Haplochromis spp.Egypt)和来自鲁济济河(流出基伍湖流入坦噶尼喀湖)的物种(Haplochromis sp.'Nyangara'),(图1,补充图1,补充讨论中的橙色标签)。来自卡兰博河(刚果河流入坦葛尼喀湖的支流)的斯塔普氏妊丽鱼(Astatotilapia stappersi,标记为Haplochromis sp.'Chipwa')和来自刚果中部的一种未被记载的的妊丽鱼属物种(A. sp.'Yaekama' ,刚果民主共和国)共同构成了LVRS“超群”的近亲,以下统称为“刚果系”(Congolese lineage,红色标签)。 LVRS的姊妹类群加上刚果系分类群是一个分支,包括来自东非大裂谷封闭河流系统的谱系和来自维多利亚湖地区东部的印度洋河流系统的谱系(“Eastern taxa”,东部分类群,深蓝色)以及来自基伍湖的“Haplochromis” gracilior和来自爱德华湖的Thoracochromis pharyngalis(淡蓝色,补充讨论)。尽管后两个物种与LVRS成员是同域的,但它们显然不是这个适应辐射群本身的一部分。我们将H. gracilior和T. pharyngalis称为“尼罗河上游系”,因为其所有已知的成员的分布都局限于历史上称为尼罗河上游流域的地区37,42。与先前的文献资料一致的是33,43,我们估计这一谱系约160-580万年前从刚果系分裂出来,与东非大裂谷西支的侧翼隆起截断古河网络并将LVR(新尼罗河上游)与刚果河(补充表1;补充图3)隔离开来的时间点相吻合。

To identify the closest relatives of the LVRS, we use comprehensive sampling of haplochromine cichlids from the LVRS and from all major river systems harbouring haplochromines (Fig. 1, Supplementary Data 1). Maximum likelihood phylogenetic trees were reconstructed from 3.15 Mb of concatenated restriction site associated DNA (RAD) sequences (436,166 SNPs, 8.1% missing data, Supplementary Data 2), and from two mitochondrial markers (1,897 bp). In the nuclear phylogeny, the entire LVRS forms a well-supported clade that also includes cichlids from the lower Nile in Egypt (Haplochromis spp. Egypt) and Haplochromis sp. ‘Nyangara’ from the Rusizi River, the outflow of Lake Kivu and an inflow to Lake Tanganyika (orange labels in Fig. 1, Supplementary Fig. 1, Supplementary Discussion). Astatotilapia stappersi from the Kalambo River (an inflow to Lake Tanganyika, Congo drainage, referred to as Haplochromis sp. ‘Chipwa’ in Meyer et al.41), and an undescribed Astatotilapia species from the middle Congo (A. sp. ‘Yaekama’, DRC), together hereafter called the ‘Congolese lineage’ (red labels), form the superflock’s closest relatives. The sister group to the LVRS plus the Congolese lineage taxa is a clade including lineages in endorheic (that is, closed) drainage systems of the Eastern Rift and in Indian Ocean drainage systems, east of the Lake Victoria region (‘Eastern taxa’, dark blue), as well as ‘Haplochromisgracilior from Lake Kivu and Thoracochromis pharyngalis from Lake Edward (light blue, Supplementary Discussion). Although the latter two species are sympatric with LVRS members, they are clearly not part of the radiations themselves. We refer to H. gracilior and T. pharyngalis as the ‘Upper Nile lineage’ because all known members are confined to the region that was historically the uppermost Nile drainage37,42. Consistent with previous publications33,43, we estimate the split of this lineage from the Congolese lineage to date to ∼1.6–5.8 million years ago, coincident with the uplift of the flanks of the Western branch of the East African Rift truncating the paleo-river network and isolating the LVR (the new Upper Nile) from the Congo (Supplementary Table 1; Supplementary Fig. 3).

image

图一:Phylogenetic context of the Lake Victoria Region cichlid radiation.
(a) 基于LVRS慈鲷和已知有亲缘关系的所有Haplochromine族慈鲷(n=156)的连续RADtag序列构建的最大似然系统发育树,适应辐射群体用灰色三角标出,一个谱系的多样本在此图中合并为同一条枝(完整的树在 Supplementary Fig. 1给出)。LVRS成员(包括鲁济济河的Haplochromis sp. ‘Nyangara’ 和 Haplochromis spp. Egypt,详见 Supplementary Discussion )用橙色星号在系统发育树和取样地图(b)中标出 ,并用lake (L) 和river (R)标记采样地点。刚果系的Haplochromines族慈鲷近亲用红色三角标出,尼罗河上游系的成员则用蓝色三角标出。
(b)取样地图。我们取样的河流系统用彩色多边形标出。适应辐射群体的祖先的最近亲属在图中标出: 尼罗河上游系的‘Haplochromis’ gracilior/Thoracochromis pharyngalis 和刚果系的Astatotilapia sp. ‘Yaekama’/A. stappersi。 示于图右灰色三角的维多利亚湖慈鲷代表了部分来自于杂种群的慈鲷。

(a) Maximum likelihood phylogeny built from concatenated RADtag sequences of Lake Victoria Region Superflock (LVRS) cichlids and relatives including all known lineages of haplochromine cichlids (n=156). Radiations are indicated as grey triangles in the phylogenetic tree and multiple samples of a lineage are visually collapsed to a single terminal branch (full tree in Supplementary Fig. 1). Members of the LVRS (including Haplochromis sp. ‘Nyangara’ from the Rusizi River and Haplochromis spp. Egypt, see Supplementary Discussion) are indicated with orange stars both in the tree and in the sampling map (b) and are labelled by lake (L) or river (R) they were sampled in. ‘Congolese lineage’ LVRS relatives are highlighted with red triangles, members of the ‘Upper Nile lineage’ with blue triangles, those from Eastern rivers with dark blue squares, and all other more distantly related lineages with black circles. (b) Sampling map. River drainage systems that we sampled are shown as coloured polygons. The radiation ancestor’s closest living relatives are shown in images: ‘Haplochromisgracilior/Thoracochromis pharyngalis from the Upper Nile lineage and Astatotilapia sp. ‘Yaekama’/A. stappersi from the Congolese lineage. The Lake Victoria cichlids shown in the grey triangle on the right represent some of the many and varied species that arose from the hybrid swarm (Photo credits: Ole Seehausen, Salome Mwaiko, Frans Witte, ‘Teleos’, Uli Schliewen, Adrian Indermaur, Oliver Selz; map adapted from http://www.worldwildlife.org/hydrosheds80).

我们的全基因组核取样为LVRS适应辐射群内部的关系提供了一个全新的的解释(图1,补充图1)。 来自维多利亚湖(包括其主要支流卡格拉河),萨卡火山口湖和其连通的姆潘加河,以及阿尔伯特湖的适应辐射群体都形成了单系分支。 相反,LVRS的爱德华湖群包含了维多利亚湖和萨卡湖的适应辐射群体,以及与基伍湖共同组成分支的其他分类群。 这些发现与假设一致34,即爱德华湖构成了LVRS内现存最古老的适应辐射群体,从其中衍生出其他湖泊的辐射,且爱德华湖和基伍湖之间的连通事件最近才发生。

Our genome-wide nuclear sampling provides unprecedented resolution of the relationships between radiations within the LVRS (Fig. 1, Supplementary Fig. 1). The radiations in Lake Victoria (including its major tributary, the Kagera River), in Crater Lake Saka and associated Mpanga River, and in Lake Albert, each form monophyletic clades. In contrast, the Lake Edward members of the LVRS are a paraphyletic group that includes taxa basal to the radiations of Lakes Victoria and Saka, and others that form a clade together with species from Lake Kivu. These findings are consistent with the hypothesis34 that Lake Edward constitutes the oldest extant radiation within the LVRS, from which the radiations in the others lakes are derived, and that connections between Lakes Edward and Kivu existed until recently.

验证杂交事件

基于串联序列数据的系统发育重建假定整个基因组的形成历史是单一的,尽管基因重组、不完全支系演化和渐渗杂交可能在谱系中引发大量基因组水平的突变44。为了解释这种谱系突变,我们重建了基于SNP的物种树45。由此产生的树支持LVRS与刚果分类群的姐妹关系,但揭示了LVRS与东部分支和尼罗河上游分支之间的不一致性(补充图2)。为了研究基因混合情况,我们计算了Patterson's D统计量(ABBA-BABA检验)46。我们发现,所有的LVRS适应辐射群都显示出与尼罗河上游系的两个物种(H. gracilior和T. pharyngalis)基因混合的强烈信号(图2,补充表2,补充图4)。 LVRS和其他物种之间的超额等位基因共享(excess allele sharing)强度相同(补充表3,第2.1-2.4行,补充图5)。值得注意的是,无论是H. gracilior还是T. pharyngalis,它们与同一区域的的LVRS物种基因混合的程度并不高于和来自其他地理上分离了数千年的湖泊的LVRS物种基因混合的程度(图2和补充图5)。

Phylogenetic reconstruction based on concatenated sequence data assumes a single history across the genome, although recombination, incomplete lineage sorting, and introgressive hybridisation can cause extensive genome-wide variation in genealogy44. To account for such genealogical variation, we reconstructed a SNP based species tree45. The resulting tree supports the sister relationship of the LVRS with the Congolese taxa, but reveals incongruence between this group and the Eastern clades and Upper Nile clade (Supplementary Fig. 2). To test for genetic admixture, we computed Patterson’s D statistics (ABBA-BABA test)46. We found that all LVRS radiations show strong signals of admixture with the two Upper Nile species, H. gracilior and T. pharyngalis (Fig. 2, Supplementary Table 2, Supplementary Fig. 4). Excess allele sharing between the LVRS and either species is equally strong (Supplementary Table 3, rows 2.1-2.4, Supplementary Fig. 5). Importantly, neither H. gracilior nor T. pharyngalis show greater admixture with the LVRS species with which they live in sympatry than with allopatric members of the LVRS from other lakes that have been geographically isolated for many thousands of years (Fig. 2 and Supplementary Fig. 5).

image

图2:Evidence for Congo-Nilotic hybridization in the ancestry of the LVRS.
(a)D statistic 使用的分类家系图 (n=73 individuals, see Supplementary Data 2),在其他图中用到的缩写标注在此图的括号中。推定的基因流方向用箭头标识(注意用5-population test推定的基因流方向未在此图中标识)
(b)检验每个东部分类群和尼罗河上游分类群之前潜在基因流的D statistics。正的D值表示P1和P3之间的基因流,负的D值表示P2和P3之间的基因流,如图(c)中所注。确切的值和更多测试结果在 Supplementary Table 2.中给出。

(a) Schematic genealogy with taxa used for D statistics (n=73 individuals, see Supplementary Data 2). Abbreviations used in other panels are given in parentheses and the color scheme is the same as in Figure 1. The inferred gene flow edge is shown with an arrow (Note the directionality of gene flow is inferred with the five-population test not shown in this figure). (b) D statistics to test for potential gene flow between each Eastern and Upper Nile taxon (P3) separately (abbreviations given in a) and cichlids from each LVR lake radiation (P1) or the Congolese taxon A. stappersi (P2). Vertical bars correspond to three standard errors. Positive D values indicate gene flow between P1 (LVR lake radiation) and P3 (Eastern or Upper Nile taxon), whereas negative D values indicate gene flow between P2 (A. stappersi) and P3 (Eastern or Upper Nile taxon) as illustrated in (c). Exact values and more test results are given in Supplementary Table 2.

为了研究基因流动的方向,我们使用了Partition D statistic的扩展版本进行分析47(5种群测试,补充表3)。我们发现了证据证明来自H. gracilior(图2b中的g)和T. pharyngalis(图2b中t)的基因流流向每个湖泊的LVRS适应辐射群体,但没有证据表明反向的基因流的存在(补充表3,行1.1-1.4)。我们进行了F4-ratio检验,估计得所有LVRS适应辐射群体的上尼罗系血统比例约为20%(补充图6和补充表4)。对维多利亚湖和基伍湖的适应辐射群体的物种以及A. stappersi,T. pharyngalis和H. gracilior的全基因组测序证实了LVRS成员与尼罗河上游系之间的超额等位基因共享,并揭示了约3kb大小的始祖区(ancestry block)。这与基因混合发生在数千代以前的不同湖泊中所有“超群”适应辐射群体的共同祖先上这一证据相吻合(图3a,补充图8)。基因混合体的全基因组标签与维多利亚湖和基伍湖不同物种相关(图3b),这与LVRS形成伊始的共同基因混合事件相吻合。

To test the directionality of gene flow, we applied an extended version of the partitioned D statistic47 (5 population test, Supplementary Table 3). We find evidence for gene flow from H. gracilior (g in Fig. 2b) and from T. pharyngalis (t in Fig. 2b) into each LVRS lake radiation, but no evidence for gene flow in the opposite direction (Supplementary Table 3, rows 1.1–1.4). Using F4-ratio tests48, we estimated the Upper Nile ancestry proportion to be ∼20% in all LVRS radiations (Supplementary Fig. 6 and Supplementary Table 4). Whole-genome sequencing of Lake Victoria and Lake Kivu radiation species, and A. stappersi, T. pharyngalisand H. gracilior, confirms excess allele sharing between LVRS members and the Upper Nile lineage, and reveals ancestry blocks of ∼3 kb, consistent with evidence that the admixture event occurred many thousands of generations ago in the common ancestor of all superflock radiations in the different lakes (Fig. 3a, Supplementary Fig. 8). The genome-wide signatures of admixture are correlated among different species from Lake Victoria and Lake Kivu (Fig. 3b), in line with a shared admixture event at the onset of the LVRS.

image

图3:Congo-Nilotic ancestry blocks in LVRS genomes.
(a)推定的始祖区的长度分布显示大多数始祖区均较小,刚果系(红)的始祖区略大于尼罗河上游系(蓝)。图表显示了五个LVRS适应辐射群体不同大小类别的始祖区的数量统计。因为大多数区块不能跨越多个3kb的框,许多区块也无法明确地指定给刚果或者尼罗血统(灰色),始祖区的平均长度很可能是3kb甚至更小,与数千年前的杂交事件相吻合。
(b)全基因组测序的LVRS成员的始祖区间的相关性总体来说很高,但随着它们之间遗传距离的增加而降低。这个箱线图显示了fd在同种的不同个体、姐妹种、亲缘关系更远的维多利亚湖物种、基伍湖物种、基伍湖和维多利亚湖物种间的相关性。这表明所有的适应辐射群体成员在历史上共同拥有同一次杂交事件,但遗留下来的相同成分随着事件的推移而逐渐分化。这也表明一部分杂交产生的突变仍然在单独的物种中发生分化(由fd在同种的不同个体中的相关性的偏离所揭露)。
(a) The size distribution of putative ancestry blocks shows mostly small ancestry blocks and slightly larger Congolese (red) than Upper Nile blocks (blue). The plot shows the counts of ancestry blocks in different size categories summed up for five LVRS radiation species across all scaffolds calculated with 3 kb windows. As most blocks do not span multiple windows of 3 kb and many blocks cannot be clearly allocated to Congolese or Upper Nile ancestry (grey) (Supplementary Fig. 8), the average ancestry block size is likely 3 kb or smaller, consistent with hybridization many thousands of years ago. (b) Correlation of ancestry blocks between whole-genome sequenced LVRS members is high overall but decreases with phylogenetic distance. The boxplots show correlation of fd (ref. 78) in 10 kb windows between single individuals of conspecifics (Pundamilia individuals of the same species), sister species (Pu. pundamilia versus Pu. nyererei), more distantly related Lake Victoria (LVi) species (Paralabidochromis flavus versus Pu. pundamilia and Pu. nyererei), Lake Kivu (LKi) species (Pa. paucidens versus Harpagochromis vittatus) and Lake Victoria against Lake Kivu species. This suggests that all radiation member species share the same hybridization event in their ancient history but vary in how long after that event they remained part of the same recombining population. It also suggests that some of the admixture variation still segregates within individual species (indicated by the deviation from an fd correlation of 1 among conspecifics).

古代杂交事件的重要性

Importance of the ancient hybridization event

许多已知有助于生态适应和生殖隔离的表型在LVRS的各个适应辐射群体的多个物种形成事件中发生了分化(例如,牙齿形状、雄性繁殖期的体色、视蛋白基因)。这些表型中的一部分在刚果系的妊丽鱼属和尼罗河上游系的Thoracochromis属物种间也存在差异。在拥有类似分化方式的妊丽鱼属物种的杂交实验中发现了内在的不相容性38。此外,湖中适应辐射群体的一些等位基因突变已经显示出早于LVRS的起源(参考文献49,50),也许最值得注意的是长波敏感的(long-wavelength sensitive,LWS)视蛋白基因,维多利亚湖慈鲷被研究的最透彻的基因之一51。该基因编码视网膜锥体中对红色敏感的视色素的蛋白部分,在维多利亚湖地区的慈鲷中异常多样化51。它对于适应维多利亚湖地区的湖泊特有的与水深和浊度梯度相关的急剧变化的环境光梯度方面有着至关重要的作用。由于水中的颗粒物吸收、散射波长较短的光线,水体中的光谱相对更加红移,因此更多的红移LWS视蛋白突变体在深水和浑浊水中是有益的39,52。这种视蛋白基因也可能在行为生殖隔离中起作用,因为具有不同LWS视蛋白基因型的物种之间的色觉差异通常与不同的雄性繁殖期体色相关39,49,54,而这恰好是重要的配偶选择线索55

Many phenotypic traits known to contribute to ecological adaptation and reproductive isolation have diverged in multiple speciation events in each of the LVRS radiations (for example, tooth shapes, male nuptial colouration, opsin alleles)39,40. Some of these traits are also divergent between the Congolese Astatotilapia and Upper Nile Thoracochromisspecies37,42. Intrinsic incompatibilities have also been found in experimental hybrid crosses of similarly divergent Astatotilapiaspecies38. In addition, some of the allelic variation in the lake radiations has been shown to predate the origin of the LVRS (refs 49, 50), perhaps most notably that of the long-wavelength sensitive (LWS) opsin gene, one of the best-studied genes in Lake Victoria cichlids51. This gene codes for the protein moiety of red-sensitive visual pigments in retinal cones and is exceptionally diverse in Lake Victoria Region cichlids51. It plays a crucial role in adaptation on the steep ambient light gradients associated with water depth and turbidity gradients that are characteristic for the lakes in the Lake Victoria region. More red-shifted LWS opsin variants are beneficial in deep and murky water where the light spectrum is relatively more red-shifted because of particulate matter absorbing and scattering light of shorter wave lengths39,52. This opsin gene likely also plays a role in behavioural reproductive isolation because divergence in colour perception between species with different LWS opsin genotypes53 is often associated with divergent male breeding colouration39,49,54, which is an important mate choice cue55.

LWS视蛋白基因在LVRS中具有高度多样性,有两个高度分歧的单倍型分支,通常称为单倍型类(haplotype classes),因为每个分支内的等位基因在功能上彼此相似51。在慈鲷LWS基因的第177位氨基酸上的取代将单倍型I类的吸收峰相对于单倍型Ⅱ类红移了7nm49,51,其它位置上的替代也影响光谱敏感度(补充讨论,补充数据3)。这两个单倍型类通常与不同的光环境有关,I类是浅水和清水,II类是深水和浊水。例如在岩质湖岸刮食藻类的Neochromis和Mbipia属,和在湖底取食碎屑的Enterochromis属,可能代表了维多利亚湖适应辐射群体的早期分歧事件,它占据了维多利亚湖光谱中相对的两端:清澈的浅层和昏暗的深层。先前从Neochromis和Mbipia属鱼类测序的全部438个LWS单倍型是单倍型I类的一部分39,49,而12个Enterochromis属鱼类的LWS单倍型中的11个属于单倍型II类(参考文献39)。在生态类型不同的谱系中,占据浅水和深水,或占据清澈水域和浑浊水域的姐妹物种往往在这两种LWS单倍型类或它们之间的重组型中具有不同的频率,并且LWS突变始终与雄性繁殖期体色的分化相关联(图4,补充讨论)。我们对刚果系和尼罗河上游系分类群中的LWS视蛋白基因进行了测序,发现LVRS中的两个单倍型类别每个仅与这两个亲本谱系中的一个谱系共有51(图4,补充图7,补充数据3,补充讨论)。所有来自刚果系的个体的单倍型在I类分支中占据基础位置,而尼罗河上游系的单倍型在II类分支中占据基础位置。

The LWS opsin gene is highly diverse in the LVRS with two deeply divergent haplotype clades, often referred to as haplotype classes because the alleles within each clade are functionally similar to each other51. A substitution at amino acid position 177 in the cichlid LWS gene shifts peak absorbance towards longer wavelength (red) in haplotype class I relative to class II by 7 nm (refs 49, 51) and also other substitutions influence spectral sensitivity (Supplementary Discussion, Supplementary Data 3). The two haplotype classes are often associated with different light environments, class I with shallow and clear water, and class II with deep and turbid water. For example, rocky shore algae scrapers of the genera Neochromis and Mbipia and mud bottom detritivores of the genus Enterochromis, likely representing an early divergence event in the Lake Victoria radiation, occupy the opposite ends of the light spectrum in Lake Victoria: clear and shallow versus murky and deep. All 438 LWS haplotypes previously sequenced from Neochromis and Mbipia algae scrapers39,49 are part of the haplotype class I, whereas 11 of the 12 Enterochromis LWS haplotypes belong to class II (ref. 39). Within ecologically variable lineages, young sister species occupying shallow versus deep or clear versus turbid waters, often have different frequencies of these two LWS haplotype classes or recombinants between them, and LWS variation is consistently associated with divergence in male nuptial coloration (Fig. 4, Supplementary Discussion). We sequenced the LWS opsin gene in Congolese and Upper Nile taxa, and found that the two haplotype classes51 in the LVRS are each shared exclusively with just one of these parental lineages (Fig. 4, Supplementary Fig. 7, Supplementary Data 3, Supplementary Discussion). All haplotypes in individuals from the Congolese lineage take basal positions in the class I clade, whereas the Upper Nile lineage haplotypes take basal positions in the class II clade.

image

图4:High LWS opsin diversity likely because of the ancient hybridization event.
两个主要的 LWS视蛋白基因单倍型类,I 和 II,在LVRS(橙)中单独地与刚果系(A. stappersi and A. sp. ‘Yaekama’, 红色)或尼罗河上游系(H. gracilior and T. pharyngalis, 蓝色)共有(详见Supplementary Fig. 7; Supplementary Data 3LWS单倍型类 I 通常和生活在浅而清澈的水域中的慈鲷相关,而 LWS单倍型类 II 通常和人生活在更深、更浑浊的水域中的慈鲷相关。不管是在适应辐射的早期还是晚期,通过生境类型分化进行的物种形成似乎都是通过固定可变的LWS单倍型基因完成。例如生活在浅水刮食藻类的Neochromis和Mbipia属对生活在泥底取食碎屑的Enterochromis属。另一个处于分化早期的物种也证明了这一点:主要携带单倍型I的Pondamilia macrocephala'蓝色'(生活在浅水)和主要携带单倍型II的等位基因的Pondamilia macrocephala“黄色”(生活在更深的水域)

The two major LWS opsin haplotype classes, I and II, in the LVRS (orange) are each shared exclusively with either the Congolese (A. stappersi and A. sp. ‘Yaekama’, red) or the Upper Nile lineage (H. gracilior and T. pharyngalis, blue), respectively (details in Supplementary Fig. 7; Supplementary Data 3). LWS haplotype class I is generally associated with cichlids living in shallow and clear water habitats, whereas class II is associated with deeper and more turbid habitats. Speciation by divergence in habitat type seems to have been accompanied by fixation of alternative LWS haplotypes both at early and at late stages of the adaptive radiation. This is exemplified by near fixation of alternative haplotype classes between ecologically divergent genera such as shallow water rocky shore algae scrapers of Neochromis and Mbipia versus mud bottom detritivores of the genus Enterochromis, and by the young incipient species pair of Pundamilia macrocephala ‘blue’ (living very shallow) and ‘yellow’ (living deeper) which have predominantly alleles of haplotype class I and II, respectively (Photo credits: Ole Seehausen, Adrian Indermaur, ‘Teleos’, Oliver Selz, Uli Schliewen).

为了验证是否可以在基因组的其他位置检测到类似的杂交突变的分布模式,我们检验了LVRS物种中高度分化的基因组SNP是否大部分来自于两个亲本谱系的杂交后代。我们推测出了六个表型不同的维多利亚湖物种的双等位SNP的起源,这六个物种分别是食虫的嗜虫朴丽鱼(Pundamilia pundamilia),食虫/食浮游动物的奈里朴丽鱼(Pundamilia nyererei),刮食藻类的杂食朴丽鱼(Neochromis omnicaeruleus),以其他物种的幼体为食的Lipochromis melanopterus,食虫的Paralabidochromis chilotes和食鱼的Harpagochromis cf. serranus(图5)。我们发现维多利亚湖物种有30%的SNP位点发生了异常强烈的分化(LV 异常区),其中的一个等位基因(图5a中3、4类)很可能经由尼罗河上游系的基因渗入引入到该适应辐射群体的祖先中。刚果系和尼罗河系中固定为备选等位基因的位点(图5a中的第4类)在LV异常区非常丰富,并显示出与维多利亚湖慈鲷的祖先突变体镶嵌状的分化相一致的模式(图5b)。第4类SNP在LV异常区的极大丰富似乎并不是由于这些位点有着在这些区域天然有着更高固定可能性,因为在与六种来自于该适应辐射群体之外的控制组慈鲷的两两比较中,LV异常区在对照组中并没有比其他区域固定更多的第4类SNP(补充表5,补充讨论)。我们的全基因组测序数据也支持这样镶嵌状的血统组成模式(补充图8)。

To see if similar patterns of sorting of admixture variation could be detected elsewhere in the genomes, we tested if genomic SNPs that are highly differentiated among LVRS species were commonly derived from admixture between the two parental lineages. We inferred the origin of bi-allelic SNPs in six phenotypically diverse Lake Victoria species, the insectivore Pundamilia pundamilia, the insectivore/zooplanktivore P. nyererei, the algae grazer Neochromis omnicaeruleus, the paedophage Lipochromis melanopterus, the insectivore Paralabidochromis chilotesand the piscivore Harpagochromis cf. serranus (Fig. 5). We found that at 30% of sites that are exceptionally strongly differentiated between Lake Victoria species (LV outliers), one of the alleles was indeed likely introduced into the ancestry of the radiation through Upper Nile lineage introgression (categories 3+4 in Fig. 5a, Supplementary Discussion). Sites at which the Congolese and Upper Nile lineage taxa are fixed for alternative alleles (category 4 in Fig. 5a) are enriched for LV outliers and show a pattern consistent with mosaic-like sorting of ancestral variants amongst Lake Victoria species (Fig. 5b). The enrichment for LV outliers in category 4 SNPs does not seem to be because of inherently increased fixation probability of these loci, as in pairwise comparisons of six control group cichlid species from outside the radiation, LV outliers were not more often differentially fixed in the control groups than non-outliers among category 4 SNPs (Supplementary Table 5, Supplementary Discussion). The mosaic-like ancestry pattern is also corroborated by our whole-genome sequence data (Supplementary Fig. 8).

image

图5:Differential sorting of hybridization-derived variation in the LVRS.
(a)在刚果系(C)和尼罗河上游系(N)分类群中的固定可变等位基因的位点含有大量高FST的SNPs。在六个维多利亚湖物种(如(b)所示)的12890个等位基因SNP中,340个是高FST(LV一场区)的异常值。根据刚果(C)和尼罗河上游系(N)类群中两个等位基因存在或不存在,我们将SNP分配到五种不同的谱系中。灰色条显示每个血统类别中所有SNP中LV异常值的比例。顶部显示每个类别的总SNP计数和双侧Fisher精确测试的P值。祖先类别(1)包括所有在刚果和上尼罗河分类群中一起发现两个LV等位基因中的一个的新SNP(新的LV等位基因或未亲本谱系中的未取样),(2)在刚果类群中发现的两个LV等位基因(3)在刚果人中只发现一个等位基因,但是在上尼罗河分类群中发现的两个等位基因(在没有杂交的LVRS中不可用),以及4)刚果和上尼罗河分类群中的每一个都固定为替代的LV等位基因(在没有杂交的LVRS中不可用)可能包括Bateson-Dobzhansky-Muller不兼容性。类别5包括维多利亚湖初始等位基因频率相似的站点(16%),比父母血统(类别4)中替代等位基因固定的站点,以测试第4类的富集是否仅仅是因为初始等位基因频率较高。
(b)在刚果系和尼罗河上游系(LV物种平均全球FST = 0.52)固定为替代等位基因的LV异常值之间,维多利亚湖慈鲷品种之间的亲本等位基因的差异分类。每个正方形表示根据该物种的等位基因频率进行着色的SNP,从红色(固定为刚果系等位基因)到蓝色(固定为尼罗河上游系等位基因)。除了两个位点(右边的2 + 3)之外,所有的都位于Pundamilia nyererei参考基因组的不同支架上。 22条染色体中的至少10个参与了在祖先杂种群的亲本谱系中固定用于替代等位基因的辐射物种之间的镶嵌样的等位基因分配模式。

(a) Sites fixed for alternative alleles in the Congolese (C) and Upper Nile (N) taxa are enriched for high global FST outlier SNPs in Lake Victoria. Of the 12,890 biallelic SNPs among six sympatric Lake Victoria species (shown in (b)), 340 are outliers of high global FST (LV outliers). We assigned SNPs to five different ancestry categories according to the presence or absence of the two alleles in the Congolese (C) and Upper Nile (N) lineage taxa. The grey bars show the proportion of LV outliers among all SNPs in each ancestry category. Total SNP counts in each category and P-values of two-sided Fisher’s exact tests are shown on top. Ancestry category (1) includes all SNPs for which only one of the two LV alleles was found in the Congolese and Upper Nile taxa together (novel LV allele or unsampled in parental lineages), (2) both LV alleles found in the Congolese taxa (polymorphic in LVRS even without Upper Nile hybridization), (3) only one allele found in Congolese but both alleles found in Upper Nile taxa (not available in LVRS without hybridization), and 4) Congolese and Upper Nile taxa each fixed for alternative LV alleles (not available in LVRS without hybridization) potentially including Bateson–Dobzhansky–Muller incompatibilities. Category 5 includes sites with similar initial allele frequency in Lake Victoria (16%) than sites fixed for alternative alleles in the parental lineages (category 4) to test if the enrichment in category 4 could simply be because of high initial allele frequency. (b) Differential sorting of parental alleles between Lake Victoria cichlid species at LV outliers fixed for alternative alleles in the Congolese and Upper Nile lineage taxa (mean global FST among LV species=0.52). Each square represents a SNP coloured according to the allele frequency in that species ranging from red (fixed for Congolese allele) to blue (fixed for Upper Nile allele). All except two sites (2+3 from the right) are located on different scaffolds of the Pundamilia nyererei reference genome. If known, chromosomal positions on the Oreochromis niloticus genome are shown below. At least 10 of the 22 chromosomes are involved in mosaic-like allele sorting between radiation species at loci that were fixed for alternative alleles in the parental lineages of the ancestral hybrid swarm (Photo credits: Oliver Selz, Ole Seehausen, Adrian Indermaur, ‘Teleos’, Uli Schliewen).

讨论

在这里,我们证明了多个大型动物适应辐射群体起源于杂种群体。我们的研究结果表明,在整个维多利亚湖地区慈鲷科鱼类“超群”形成伊始,发生了两个haplochromine族慈鲷谱系之间的杂交事件,一个谱系来自刚果河上游,另一个谱系来自尼罗河上游(图1,补充图1)。鉴于(1)目前的证据高度支持整个LVRS是一个单系(2)每个湖中的适应辐射群体拥有相似的来自尼罗河上游系和刚果系的血统比例(补充图6)(3)刚果系和尼罗河上游系的(始祖区)高度相关(图3),因此现代或略早一些的尼罗河上游系到各个湖的适应辐射群体的基因渗入不太可能发生。此外,无论是H. gracilior(基伍湖)还是T. pharyngalis(爱德华湖),他们和共同生活的LVRS物种的基因混合程度并不比和其他湖中的LVRS物种的基因混合程度高(图2和补充图5),这个事实也不支持近代发生的基因渗入事件。阿尔伯特湖,萨卡湖和维多利亚湖的适应辐射群体于100,000年前(维多利亚湖)至4,000年前(萨卡湖)在地理上与爱德华湖和基伍湖相隔离,同时也就与T. pharyngalis和H. gracilior这两个物种隔开31,33,34,并表现出相似的血统比例。

Here, we demonstrate that multiple large animal adaptive radiations arose from a hybrid swarm. Our results suggest ancient admixture between two lineages of haplochromines, one from the Upper Congo and one from the Upper Nile drainage, at the origin of the entire Lake Victoria Region Superflock of cichlid fish. Contemporary Upper Nile lineage introgression or earlier independent introgression events into each lake radiation are unlikely given (i) the highly supported genomic monophyly of the entire Lake Victoria Region Superflock (Fig. 1, Supplementary Fig. 1), (ii) similar Congo and Upper Nile lineage ancestry proportions in all lake radiations (Supplementary Fig. 6) and (iii) highly correlated Congolese and Upper Nile lineage ancestry tracts (Fig. 3). In addition, the fact that neither H. gracilior (Lake Kivu) nor T. pharyngalis(Lake Edward) show greater admixture into LVRS species that are sympatric with them than with allopatric members of the LVRS from other lakes (Fig. 2 and Supplementary Fig. 5) speaks against recent introgression. The radiations in Lakes Albert, Saka and Victoria have been geographically separated from Lakes Edward and Kivu, and hence from T. pharyngalis and H. gracilior, for between 4,000 (Saka) and 100,000 (Victoria) years31,33,34 and show similar ancestry proportions.

LVRS物种与H. gracilior和T. pharyngalis(补充表3,第2.1-2.4行,补充图3)之间共有的等位基因数量非常相似,表明与LVRS的刚果系祖先杂交的分类群与尼罗河上游系物种或其祖先的亲缘关系非常近。刚果系很可能在一个较为湿润的时期(例如,14.5万年至12万年前,补充图3)通过汇入维多利亚湖的马拉加拉西(刚果河)支流在维多利亚湖定居,然后遇到了在那时已经占领了维多利亚湖地区的尼罗河上游系慈鲷的代表物种。当时维多利亚湖地区大型湖泊的存在很可能为多样化提供了生态机会,这种机会可以被基因多样性化的杂种群体所利用。

Highly similar amounts of allele sharing between the LVRS species and both H. gracilior and T. pharyngalis (Supplementary Table 3, rows 2.1–2.4, Supplementary Fig. 3) suggests that the taxon that hybridized with the Congolese ancestor of the LVRS was a close relative of both Upper Nile species, or their ancestor. It is likely that the Congolese lineage colonized the Lake Victoria region through capture of Malagarasi (Congo) tributaries during a humid phase (for example, 145,000–120,000 years ago, Supplementary Fig. 3) and encountered representatives of the Upper Nile lineage that would by then have occupied the Lake Victoria region. The existence of large lakes in the Lake Victoria region at that time is likely to have provided ecological opportunity for diversification, which could be exploited by a genetically diverse hybrid swarm.

对于LVRS的两个LWS视蛋白的单倍型类,每一个都与亲本谱系中的一方单独共有(图4,补充图7),这表明古代刚果 - 尼罗河杂交事件是LVPS慈鲷视蛋白基因大量有功能突变的来源39,51。杂种群体中的祖先单倍型内部的重组进一步扩充了可被LVRS适应辐射所利用的LWS单倍型的有功能突变(补充讨论,补充数据3)。这个基因对于适应维多利亚湖水域水体中的光谱红移尤其重要39,51。把刚果系和尼罗河上游系的视蛋白单倍型集中在一起似乎有助于适应极端范围内的的光线条件和视觉生态。在适应辐射的早期和后期阶段,这两个视蛋白单倍型间发生了多次转换。生态分化(Enterochromis对Neochromis / Mbipia)和在较年轻的姐妹物种之间的主要栖息地变化与刚果系-尼罗河上游系来源的单倍型对LWS基因座的补充有关(补充讨论)。

The finding that the two LWS opsin haplotype classes of the LVRS are each shared uniquely with one of the parental lineages (Fig. 4, Supplementary Fig. 7) suggests that the ancient Congo-Nilotic admixture event was the source of the high functional variation among LVRS cichlids at the LWS opsin gene39,51. Recombination between the ancestral haplotypes within the admixed populations further enhanced the functional variation of LWS haplotypes available for the LVRS radiation (Supplementary Discussion, Supplementary Data 3). This gene is particularly important for adaptation to the red-shifted end of ambient aquatic light spectra that characterize the waters of Lake Victoria39,51. Bringing together the Congolese and Upper Nile opsin haplotypes thus appears to have facilitated adaptation to an extreme range of light conditions and visual ecologies, with many transitions between them at both early as well as late stages of the radiation. Divergence into major ecologically differentiated clades (Enterochromisvs Neochromis/Mbipia), but also major habitat shifts between young sister species are associated with recruitment of Congolese versus Upper Nile lineage derived haplotypes at the LWS locus (Supplementary Discussion).

我们的数据表明,在这组较年轻的适应性辐射中,古代的杂交事件对演化的影响要更大、更普遍。慈鲷的杂交实验表明,随着两个杂交物种遗传距离的增加,其内在不相容性38,56和性状的新颖程度57也会增加。在物种形成中这两种变异都是重要的。在先前所述的实验中,刚果系和尼罗河上游系分类群的分歧时间较晚,这使我们预测这两个谱系之间的杂种群将包含内在不相容性和激进的表型突变。与此预测相一致的是,我们发现含有两个亲本谱系中固定的备选等位基因的基因组位点富富含可能参与维多利亚湖中的分歧适应和物种分化的异常位点(图5)。这表明物种形成通常与通过杂交事件在适应辐射群体的祖先中聚集在一起的等位基因的分选有关。显示出负上位相互作用(Bateson Dobzhansky Muller(BDM)不相容性)的位点将被固定在亲代谱系中的备选等位基因,因此归入图5a的祖先类别4中。在这一类别中,LV异常区的大量增加符合杂种群体中出现的物种间BDM不相容性的分化21,22,23,这可能促进了适应辐射群体成员之间的生殖隔离。鉴于刚果-尼罗始祖区模式之间的相关性随着LVRS内物种间系统发育距离的增加而降低(图3b),很可能刚果系和尼罗河上游系的等位基因在LVRS中已经分开了很长时间,并且在物种分化的过程中,这些等位基因在适应辐射群体成员中逐渐被分配。

Our data suggest a much more general impact of the ancient admixture event on evolution in this set of young adaptive radiations. Experimental crosses of cichlids have shown that intrinsic incompatibilities38,56 and phenotypic novelty57 both increase with genetic distance between the crossed species. Both kinds of variation could be important in speciation. The divergence time of the Congolese and Upper Nile lineage taxa, interpreted in the context of this previous experimental work, lets us predict that a hybrid swarm between these lineages would contain both intrinsic incompatibilities and transgressive trait variation. In agreement with this prediction, we found that genomic sites with alternative alleles fixed in the two parental lineages are enriched for outlier loci likely involved in divergent adaptation and species differentiation in Lake Victoria (Fig. 5). This suggests that speciation has commonly been associated with sorting of alleles brought together in the radiation ancestor by the admixture event. Sites showing negative epistatic interactions (Bateson Dobzhansky Muller (BDM) incompatibilities) would be expected to be fixed for alternative alleles in the parental lineages and thus fall in ancestry category 4 in Figure 5a. The strong enrichment of LV outliers in this category is in line with differential sorting of BDM incompatibilities among species that emerged from the hybrid swarm21,22,23, which may have facilitated reproductive isolation among the members of the adaptive radiation. Given that the correlation of Congo-Nilotic ancestry block patterns decreases between species with increasing phylogenetic distance within the LVRS (Fig. 3b), it is likely that Congolese and Upper Nile lineage alleles have been segregating in the LVRS for a long time and that they were progressively sorted among the radiation members during species diversification.

来自LWS视蛋白基因和祖先变异的全基因组排列模式的证据与祖先杂交对于维多利亚湖慈鲷“超群”的进化的重要作用是一致的。由此产生的大量遗传变异可能有助于解释在维多利亚湖地区共形成700个物种,包括维多利亚湖特有的在过去大约15,000年内进化出来的500多种遗传和表型高度分化的物种,的多个适应辐射群体如何在短短10-20万年内形成31,32,。因此,我们的数据为假说3提供了证据,即当与新形成的湖泊的新生态机会(例如在新形成的湖泊定居)一致时,异种谱系之间的杂交可以通过自然选择和性选择来重组、分选杂交衍生的多态性,进而促进快速的适应辐射事件的发生。

The combined evidence from the LWS opsin gene and the genome-wide patterns of sorting of ancestral variation is consistent with an important role of the ancestral hybridization for the evolution of the Lake Victoria Region Superflock. The resulting large genetic variation may help to explain how multiple adaptive radiations in the Lake Victoria Region, together forming 700 species, could arise in just 100–200 thousand years31,32, including 500+ genetically and phenotypically well-differentiated species endemic to Lake Victoria which evolved in probably just 15,000 years32,35. Hence, our data provide evidence for the hypothesis3 that hybridization between divergent lineages, when coincident with novel ecological opportunity such as colonisation of newly formed lakes, may facilitate rapid adaptive radiation through recombination and sorting of admixture-derived polymorphisms by natural and sexual selection.

杂交可能为整个适应辐射系统提供动力的假说已经在数个适应性辐射系统的案例中提出,但是迄今为止仅在异源多倍体的夏威夷银剑菊24中得到了粗略的证明。在这里,我们报告了一个强有力的证据,表明杂交的祖先推动了多种动物的大型适应辐射群体的分化。感谢高通量测序的出现,这使我们验证古代杂交事件和研究其对后续进化的影响的能力大幅增强。未来的研究将揭示主要适应辐射群体的始祖杂交事件是否广泛存在,以及它的发生是否可以解释一些观察到的不同谱系中物种多样化的速率和数量的巨大差异。

That hybridization may fuel entire adaptive radiations has been hypothesized for several systems, but robust evidence has so far been confined to the allopolyploid Hawaiian silverswords24. Here, we report strong evidence that hybrid ancestry fuelled diversification in several large animal adaptive radiations. Thanks to the advent of high-throughput sequencing, our power to test for ancient hybridization and to study its impact on subsequent evolution has increased enormously. Future studies will reveal if hybridization in the ancestry of major adaptive radiations is widespread, and whether its occurrence may explain some of the observed large variation in the rates and volume of species diversification among lineages.

实验方法

Methods

实验设计

Experimental design

为了鉴定维多利亚湖区域慈鲷“超群”(LVRS)的最近亲缘关系和潜在的杂交事件,我们对与维多利亚湖地区连通,或曾经连通过的的所有主要河流系统中的Haplochromine族慈鲷和LVRS所有主要适应辐射群体的代表物种进行了RAD测序。根据这些数据和相同鱼类的线粒体序列利用最大似然法构建系统发育树,进而推断最近亲缘关系。 使用D statistics,5种群检验和F4-ratio test鉴定谱系之间的杂交,分别确定基因流的方向和量化血统比例。然后,对鉴定为LVRS的祖先谱系的鱼类和LVRS的代表物种,进行全基因组测序以证实杂交的特征并估计始祖区大小。

To identify the closest relatives of the Lake Victoria Region Superflock (LVRS) and identify potential hybridization events, we performed restriction associated DNA (RAD) sequencing with haplochromine cichlid species from all major drainage systems that are either currently connected to the Lake Victoria region, or have been in the past, and representatives of all major radiations of the Lake Victoria Region Superflock. The closest relatives were inferred with maximum likelihood trees based on these data and on mitochondrial sequences of the same fish. D statistics, five population tests, and F4-ratio tests, were used to identify hybridization between lineages, determine the direction of gene flow and quantify ancestry proportions, respectively. Whole-genome sequencing was then performed with fish of the lineages identified as ancestral to the LVRS plus representatives of the superflock in order to corroborate the signatures of hybridization and to estimate ancestry block sizes.

物种取样、分类

Taxonomic sampling

我们从维多利亚湖地区的所有主要湖泊和几个小湖泊,以及流出Rwenzori山脉的Mpanga河(补充数据1,图1)中捕捞Haplochromines族慈鲷。此外,我们收集或从其他收藏家获得属于所有系统发生谱系的haplochromines和从几乎所有非洲河流系统的haplochromines。我们还收集了其他非洲大湖的Haplochromines种群成员,包括马拉维湖,坦噶尼喀湖和Mweru湖(图1,补充数据1)。样本是在坦桑尼亚渔业研究所和坦桑尼亚农业,畜牧和渔业部;乌干达国家渔业资源研究所和乌干达农业,畜牧业和渔业部,以及赞比亚渔业部研究和样本出口许可下收集的。所有样本均按照适用的国际和国家动物使用准则和道德标准进行收集。
We sampled haplochromines from all major lakes and several small lakes in the Lake Victoria region and from the Mpanga River that drains the Rwenzori Mountains in the drainage divide between Lakes Edward and Victoria (Supplementary Data 1, Fig. 1). Further, we collected or obtained from other collectors haplochromines belonging to all phylogenetic lineages and from almost all African river drainages hosting haplochromines. We also included members of the haplochromine species flocks of the other African Great Lakes Malawi, Tanganyika and Mweru (Fig. 1, Supplementary Data 1). Samples were collected under research and sample export permissions of the Tanzania Fisheries Research Institute and the Tanzanian Ministry of Agriculture, Livestock and Fisheries; the Ugandan National Fisheries Resources Research Institute and Ministry of Agriculture, Animal Industry and Fisheries Uganda; and the Department of Fisheries Zambia. All samples were collected in compliance with applicable international and national guidelines for the use of animals, and ethical standards.

RAD测序

RAD sequencing

用标准酚 - 氯仿 - 异戊醇提取法从鳍片夹或肌肉组织中提取DNA 58。限制性位点相关DNA测序(RADseq)按照标准方案进行59。使用限制性核酸内切酶HF-SbfI(NewEngland Biolabs)过夜(8-10小时)酶切,每个样品400-1000ng DNA。 P1衔接子含有5-8bp长的条形码,与所有其他条形码至少有两个核苷酸不同。用Covaris S220 Focused-Ultra超声波仪对DNA进行剪切,从琼脂糖凝胶中提取300-600bp长的片段。我们进行了18个PCR循环来扩增RAD片段(30 s 98°C,×18(10 s 98°C,30 s 65°C,30 s 72°C),5 min 72°C)。所有的文库在Illumina HiSeq 2,500测序仪上进行单端测序。读取被解复用并用Stacks管线60中的process_radtags脚本修剪成84个核苷酸(nt,条形码去除之后),纠正条形码中的单个错误并丢弃具有不完全限制性位点的读取。由于阅读结束时质量下降,84 nt的长度是由于去除条形码(大部分是6 nt长)和最后10 nt的修剪而造成的。 FastX工具包(http://hannonlab.cshl.edu/fastx_toolkit
用于删除所有包含至少碱基的Phred质量分数低于10的基因,并读取超过10%的分值小于30的碱基。
然后使用Bowtie2(61)将每个个体的读数映射到Pundamilia nyererei参考基因组,使用端对端比对选项。使用GATK Unified Genotyper v.3.1(参考文献62)调用单核苷酸多态性(SNP)和基因型。然后所有的站点都用自定义的Python脚本和vcftools v.4.1(参考文献63)进行过滤。插入和缺失5nt以内的SNP被删除,以避免由于匹配错误导致的错误SNP,并且要求SNP的分值至少为30。

DNA was extracted from fin clips or muscle tissue with a standard phenol-chloroform-isoamyl alcohol extraction method58. Restriction-site Associated DNA sequencing (RADseq) was performed following a standard protocol59. Restriction digestion was done overnight (8–10 h) using the restriction endonuclease HF-SbfI (NewEngland Biolabs) and 400–1,000 ng DNA per sample. P1 adaptors contained 5–8 bp long barcodes differing by at least two nucleotides from all other barcodes. The DNA was sheared with a Covaris S220 Focused-Ultra sonicator and fragments of 300–600 bp length were extracted from an agarose gel. We performed 18 PCR cycles to amplify the RAD fragments (30 s 98 °C, × 18 (10 s 98 °C, 30 s 65 °C, 30 s 72 °C), 5 min 72 °C). All libraries were single-end sequenced on an Illumina HiSeq 2,500 sequencer. The reads were de-multiplexed and trimmed to 84 nucleotides (nt, after barcode removal) with the process_radtags script from the Stacks pipeline60, correcting single errors in the barcode and discarding reads with incomplete restriction sites. The length of 84 nt results from removing the barcode (mostly 6 nt long) and trimming off the last 10 nt because of reduced quality at the read ends. The FastX toolkit (http://hannonlab.cshl.edu/fastx_toolkit) was used to remove all reads containing at least one base with a Phred quality score below 10 and reads with more than 10% of bases with quality less than 30.

线粒体测序

Mitochondrial sequencing

使用引物ND2Met-F 5'-CAT ACC CCA AAC ATG TTG GT-3'和ND2Trp-R 5'-GTS GST TTT CAC TCC CGC TTA-3'的两个线粒体标记(NADH脱氢酶亚基2(ND2)用引物FISHL15926-F 5'-GAG CGC CGG TCT TGT AA-3'和FISH12s-R 5'-TGC GGA GAC TTG CAT GTG TAA G-3'扩增mtDNA控制区(D环)65, 使用Sanger测序或者从GenBank下载与RADseq数据集(补充数据1)相同的物种序列,序列在BioEdit 7.2.5(参考文献66)中的ClustalW中进行比对,并手动调整以获得正确的局部对准。

Two mitochondrial markers (NADH Dehydrogenase Subunit 2 (ND2)64using the primers ND2Met-F 5′-CAT ACC CCA AAC ATG TTG GT-3′ and ND2Trp-R 5′-GTS GST TTT CAC TCC CGC TTA-3′ and the mtDNA control region (D-loop)65 with the primers FISHL15926-F 5′-GAG CGC CGG TCT TGT AA-3′ and FISH12s-R 5′-TGC GGA GAC TTG CAT GTG TAA G-3′ were amplified with PCR and Sanger sequenced for the same individuals or downloaded from GenBank for the same species as those included in the RADseq dataset (Supplementary Data 1). The sequences were aligned in ClustalW implemented in BioEdit 7.2.5 (ref. 66) and manually curated for correct local alignment.

系统发育分析

Phylogenetic analyses

使用最大似然方法(RAxML v.7.7.7和ExaML v.1.0.4)67,68分别重构线粒体基因和级联RAD序列的系统发育,包括变体和不变位点。对于线粒体数据集,使用三个分区重建最大似然树,一个用于Dloop,一个用于ND2的第一和第二密码子位置,另一个用于ND2的第三密码子位置。对于每个数据集,我们使用速率异质性的GTRGAMMA模型对100个快速启动进行RAxML分析。对于RAD-seq数据集,我们使用不超过40个个体(25%)的连接点重建具有RAxML(参考文献67)和ExaML(参考文献68)的最大似然树。通过从连接的数据集中随机采样替换位点来获得100个自举的每一个,以获得原始大小的数据集。然后按照RAxML-light手册中的建议,使用速率异质性的GTRGAMMA模型,用ExaML(参考文献68)为每个重采样数据集推断最大似然树。我们使用RAxML基于这100个拓扑计算了引导支持值(参考文献67)。核和线粒体的树木与尼罗罗非鱼(Oreochromis niloticus)的参考基因组一起生根,并使用R软件包Ape v.3.1(参考文献69)进行分化和绘图。

Phylogenies were reconstructed for the concatenated mitochondrial genes and the concatenated RAD sequences separately, including both variant and invariant sites using a maximum likelihood approach (RAxML v. 7.7.7 and ExaML v. 1.0.4)67,68. For the mitochondrial dataset, a maximum likelihood tree was reconstructed using three partitions, one for Dloop, one for the first and the second codon positions of ND2, and one for the third codon position of ND2. For each dataset, we performed a RAxML analysis with 100 rapid bootstraps using the GTRGAMMA model of rate heterogeneity. For the RAD-seq dataset, we used all concatenated sites with no more than 40 individuals missing (25%) to reconstruct a maximum likelihood tree with RAxML (ref. 67) and ExaML (ref. 68). Each of 100 bootstraps was performed by randomly sampling with replacement sites from the concatenated dataset to get a dataset of the original size. The maximum likelihood tree was then inferred for each resampled dataset with ExaML (ref. 68) using a GTRGAMMA model of rate heterogeneity, as recommended in the RAxML-light manual68. We calculated bootstrap support values based on these 100 topologies with RAxML (ref. 67). The nuclear and mitochondrial trees were rooted with the reference genome of Oreochromis niloticus and ladderized and plotted using the R-package Ape v. 3.1 (ref. 69).

然后,我们使用RAD衍生的SNP数据来推断LVRS组及其最近亲属与SNAPP的物种树(参考文献45)。 SNAPP绕过基因树,通过整合所有可能的基因树,直接从独立遗传标记计算物种树45。我们将这个分析限制在每个物种的两个个体中(一个用于A. sp。'Yaekama'),并且SNAPP假定位点之间没有连锁,我们只包括彼此间距至少500kb的双等位点。结果数据集包含31个人和1,817个地点。我们运行SNAPP进行了1,000,000次迭代,使用默认的先验对每1000次迭代进行采样。我们丢弃了前50%的树,并将其余500棵树的后部分布看作Densitree70中的共识树。

We then used RAD-derived SNP data to infer the species tree for the LVRS groups and their closest relatives with SNAPP (ref. 45). SNAPP bypasses gene trees and computes species trees directly from independently inherited markers by integrating over all possible gene trees45. We restricted this analysis to two individuals per species (one for A. sp. ‘Yaekama’) and as SNAPP assumes no linkage among loci, we included only biallelic sites that were at least 500 kb apart from each other. The resulting data set contained 31 individuals and 1,817 sites. We ran SNAPP for 1,000,000 iterations, sampling every 1,000th iteration using default priors. We discarded the first 50% of the trees as burn-in and visualized the posterior distribution of the remaining 500 trees as consensus trees in Densitree70.

基于线粒体序列的系统发育时间估计

Mitochondrial chronograms

使用BEAST v.2.3.0(参考文献71)和四组不同的校准节点(补充方法),基于线粒体D-loop65和ND2(参考文献64)序列重建日期系统发育。注意线粒体树只显示了母系的系统发生,而且由于最近(补充方法)的分子速率的增加,最近一百万年以上的时间估计很可能被高估。

Dated phylogenies were reconstructed based on mitochondrial D-loop65 and ND2 (ref. 64) sequences using BEAST v. 2.3.0 (ref. 71) and four different sets of calibration nodes (Supplementary Methods). We caution that the mitochondrial tree only shows the phylogeny of the maternal line and that time estimates more recent than one million years are most likely overestimates because of the increase of molecular rates towards the recent (Supplementary Methods).

为了比较慈鲷系与主要相关地质事件之间的分裂时间,我们根据之前公布的数据和评论(参见补充方法)重建不同时间点的古地理图。

To compare the splitting time between cichlid lineages with the major relevant geological events, we reconstructed paleogeographic maps at different time points based on previously published data and reviews (see Supplementary Methods).

Patterson’s D statistics

为了测试祖先谱系混合的证据,我们计算了Patterson D statistic 46,72(ABBA-BABA测试),一种基于四分类树中不一致SNP谱系的频率检测杂种的方法,其中软件包ADMIXTOOLS v 1.1(参考文献 48)。如果基因型的阅读少于6次或基因型质量Phred评分<20(即错误概率> 1%),则丢弃基因型。 D统计学的重要性用一个分块刀切割程序进行评估,以三个标准差的z分数作为阈值48。我们使用了三个来自Kinneret湖的Astatotilapia flavijosephi作为外群种群,并且我们测试了来自每个东部和上游尼罗河分支进入LVRS的基因流相对于与LVRS最近亲属共享的等位基因的证据,刚果排水分类群A.来自赞比亚的stappersi(图2)或A. sp。来自刚果中部的'Yaekama'。对于这些分析,我们使用了至少50%的位点在至少10个阅读深度测序的所有个体。为了排除D统计偏差的可能性,读数与维多利亚湖物种Pundamilia nyererei相匹配,我们也将读数与Astatotilapia burtoni参考基因组(这是D统计中使用的所有分类群的外群)进行比对。为了排除外群(A. flavijosephi)的选择偏离了D统计量,我们用A.burtoni或A.desfontainii作为外群重复测试。附表2给出了所有具有个体数和个体数的组合。

To test for evidence of ancient admixture among lineages, we computed Patterson’s D statistic46,72 (ABBA-BABA test), a method to detect admixture based on the frequencies of discordant SNP genealogies in a four-taxon tree, with the software package ADMIXTOOLS v. 1.1 (ref. 48). Genotypes were discarded if they had less than 6 reads or a genotype quality Phred score <20 (that is, error probability >1%). Significance of D statistics was assessed with a block jackknife procedure using a z score of three standard errors as a threshold48. We used three individuals of Astatotilapia flavijosephi from Lake Kinneret as the outgroup population and we tested for evidence of gene flow from each Eastern and Upper Nile clade into the LVRS relative to allele sharing with the closest relative of the LVRS, the Congo drainage taxa A. stappersi from Zambia (Fig. 2) or A. sp. ‘Yaekama’ from the central Congo. For these analyses we used all individuals with at least 50% of the sites sequenced at a depth of at least 10 reads. To exclude the possibility that the D statistics are biased by the alignment of the reads to the Lake Victoria species Pundamilia nyererei, we also aligned the reads to the Astatotilapia burtoni reference genome, which is an outgroup to all taxa used in the D statistics. To rule out that the choice of outgroup (A. flavijosephi) biases the D statistics, we repeated the tests with A. burtoni or A. desfontainii as outgroups. All species combinations with number of individuals and SNPs included are given in Supplementary Table 2.

5 population test

为了推断LVRS和Nilotic分类群之间基因流的方向性,我们使用了由Eaton&Ree47开发的分区D统计量检验的扩展版本(我们称之为“5人口检验”)。类似于D统计,这个测试是基于不一致的等位基因共享模式,但是通过考虑五个种群,它允许推断基因流动的方向性。具有拓扑((P1,P2),(P3a,P3b)),O的五个分类群包括基因流的潜在来源(例如P3a),接受基因流的分类群(例如P1),近亲(例如P3b和P2)和外群(O,另见附表3)。如果基因已经从例如P3a渗入到P1中,则渐渗供体(P3a)的近亲(P3b)也将显示与P1共享的过量等位基因,但比P3a的程度要小。这是因为从P3a渗入P1的许多衍生等位基因将由P3a和P3b共享,因为它们最近的共同祖先。相比之下,我们不希望基因流接收者P1的近亲(P2)会显示与P3a和P3b共享的过多的等位基因。在基因组数据中,这将被看作是过量的BABBA模式,其中P1与P3a和P3b共享衍生的等位基因('B'),而P2具有外群等位基因('A'),与数量的ABBBA模式(P2分享派生等位基因与两个P3分类)。另一方面,如果基因流的方向是从P1到P3a,则P3b不会显示与P1共享的过量等位基因,相反,P2会显示与P3a共享的过量等位基因,因为P1和P2之间祖先共有的等位基因渗入P3a 。因此,从P1到P3a的基因流不会影响BABBA和ABBBA模式的相对频率,但是BBBAA(P1,P2和P3a共享一个派生的等位基因)的频率比BBABA更频繁(P1和P2共享一个派生的等位基因P3b,见补充表3)。计数不一致的等位基因共享模式,从而使我们能够推断基因流的方向。

To infer directionality of gene flow between LVRS and the Nilotic taxa, we used an extended version of the partitioned D statistic test developed by Eaton & Ree47 (we call it a ‘five population test’). Similar to the D statistics, this test is based on discordant allele sharing patterns, but by considering five populations it allows one to infer the directionality of gene flow. The five taxa with the topology ((P1,P2), (P3a,P3b)),O include the potential source of gene flow (for example, P3a), the taxon receiving gene flow (for example, P1), a close relative of each of these two taxa (for example, P3b and P2) and an outgroup (O, see also Supplementary Table 3 for visualization). If genes had introgressed from for example, P3a into P1, a close relative (P3b) of the introgression donor (P3a) would also show excess allele sharing with P1, but to a lesser extent than P3a. This is because many derived alleles that introgressed from P3a into P1 will be shared by P3a and P3b because of their recent common ancestry. In contrast, we would not expect that a close relative (P2) of the receiver of gene flow, P1, would show excess allele sharing with P3a and P3b. In the genomic data, this would be seen as an excess number of BABBA patterns, where P1 shares a derived allele (‘B’) with both P3a and P3b, whereas P2 has the outgroup allele (‘A’), as compared with the number of ABBBA patterns (P2 shares a derived allele with both P3 taxa). On the other hand, if the direction of gene flow was from P1 into P3a, P3b would not show excess allele sharing with P1 but instead, P2 would show excess allele sharing with P3a because of ancestrally shared alleles between P1 and P2 that introgressed into P3a. Therefore, gene flow from P1 into P3a would not affect the relative frequencies of BABBA and ABBBA patterns, but the pattern BBBAA (P1, P2 and P3a share a derived allele) would be more frequent than BBABA (P1 and P2 share a derived allele with P3b, see Supplementary Table 3). Counting discordant allele sharing patterns thus allows us to infer the direction of gene flow.

我们用一个定制的脚本计算了五个人口测试,使用每个LVRS组的最少缺失数据的三个个体以及所有其他分类群体中具有最完整数据的单个个体。与基于等位基因频率的ADMIXTOOLS计算的D统计量相比,我们的五个人口测试是根据伊顿和Ree47对每个重点人群的单个个体计算的。我们分别对每个LVRS组的三个个体进行测试,并报告每个辐射的手段。在杂合位点,随机选择一个等位基因。对于每个测试个体的组合,我们计算了使用所有站点计算四个D统计量(补充表3)所需的八个模式,没有缺失数据。我们以s.d为单位计算z分数。从伊顿和Ree47中的100个自举数据集(重新采样替换的站点)。

We computed the five population tests with a custom made script using the three individuals with least missing data for each LVRS group and the single individual with the most complete data for all other taxa. In contrast to the D statistics computed with ADMIXTOOLS, which are based on allele frequencies, our five population tests are calculated from a single individual for each focal population, following Eaton and Ree47. We tested each of the three individuals of each LVRS group separately, and report the means for each radiation. At heterozygous sites, one allele was chosen at random. For each combination of individuals tested, we counted the eight patterns needed to compute the four D statistics (Supplementary Table 3) using all sites without missing data. We calculated z scores in units of s.d. from 100 bootstrapped datasets (sites resampled with replacement) as in Eaton and Ree47.

5-population test

为了推断LVRS和Nilotic分类群之间基因流的方向性,我们使用了由Eaton&Ree47开发的分区D统计量检验的扩展版本(我们称之为“5种群检验”)。类似于D statistic,这个测试是基于不一致的等位基因共享模式,但是通过考虑五个种群,它允许推断基因流动的方向性。具有拓扑((P1,P2),(P3a,P3b)),O的五个分类群包括基因流的潜在来源(例如P3a),接受基因流的分类群(例如P1),近亲(例如P3b和P2)和外群(O,另见附表3)。如果基因已经从例如P3a渗入到P1中,则渐渗供体(P3a)的近亲(P3b)也将显示与P1共享的过量等位基因,但比P3a的程度要小。这是因为从P3a渗入P1的许多衍生等位基因将由P3a和P3b共享,因为它们最近的共同祖先。相比之下,我们不希望基因流接收者P1的近亲(P2)会显示与P3a和P3b共享的过多的等位基因。在基因组数据中,这将被看作是过量的BABBA模式,其中P1与P3a和P3b共享衍生的等位基因('B'),而P2具有外群等位基因('A'),与数量的ABBBA模式(P2分享派生等位基因与两个P3分类)。另一方面,如果基因流的方向是从P1到P3a,则P3b不会显示与P1共享的过量等位基因,相反,P2会显示与P3a共享的过量等位基因,因为P1和P2之间祖先共有的等位基因渗入P3a 。因此,从P1到P3a的基因流不会影响BABBA和ABBBA模式的相对频率,但是BBBAA(P1,P2和P3a共享一个派生的等位基因)的频率比BBABA更频繁(P1和P2共享一个派生的等位基因P3b,见补充表3)。计数不一致的等位基因共享模式,从而使我们能够推断基因流的方向。
To infer directionality of gene flow between LVRS and the Nilotic taxa, we used an extended version of the partitioned D statistic test developed by Eaton & Ree47 (we call it a ‘five population test’). Similar to the D statistics, this test is based on discordant allele sharing patterns, but by considering five populations it allows one to infer the directionality of gene flow. The five taxa with the topology ((P1,P2), (P3a,P3b)),O include the potential source of gene flow (for example, P3a), the taxon receiving gene flow (for example, P1), a close relative of each of these two taxa (for example, P3b and P2) and an outgroup (O, see also Supplementary Table 3 for visualization). If genes had introgressed from for example, P3a into P1, a close relative (P3b) of the introgression donor (P3a) would also show excess allele sharing with P1, but to a lesser extent than P3a. This is because many derived alleles that introgressed from P3a into P1 will be shared by P3a and P3b because of their recent common ancestry. In contrast, we would not expect that a close relative (P2) of the receiver of gene flow, P1, would show excess allele sharing with P3a and P3b. In the genomic data, this would be seen as an excess number of BABBA patterns, where P1 shares a derived allele (‘B’) with both P3a and P3b, whereas P2 has the outgroup allele (‘A’), as compared with the number of ABBBA patterns (P2 shares a derived allele with both P3 taxa). On the other hand, if the direction of gene flow was from P1 into P3a, P3b would not show excess allele sharing with P1 but instead, P2 would show excess allele sharing with P3a because of ancestrally shared alleles between P1 and P2 that introgressed into P3a. Therefore, gene flow from P1 into P3a would not affect the relative frequencies of BABBA and ABBBA patterns, but the pattern BBBAA (P1, P2 and P3a share a derived allele) would be more frequent than BBABA (P1 and P2 share a derived allele with P3b, see Supplementary Table 3). Counting discordant allele sharing patterns thus allows us to infer the direction of gene flow.

我们用一个定制的脚本计算了五个人口测试,使用每个LVRS组的最少缺失数据的三个个体以及所有其他分类群体中具有最完整数据的单个个体。与基于等位基因频率的ADMIXTOOLS计算的D统计量相比,我们的五个人口测试是根据伊顿和Ree47对每个重点人群的单个个体计算的。我们分别对每个LVRS组的三个个体进行测试,并报告每个辐射的手段。在杂合位点,随机选择一个等位基因。对于每个测试个体的组合,我们计算了使用所有站点计算四个D统计量(补充表3)所需的八个模式,没有缺失数据。我们以s.d为单位计算z分数。从伊顿和Ree47中的100个自举数据集(重新采样替换的站点)。

We computed the five population tests with a custom made script using the three individuals with least missing data for each LVRS group and the single individual with the most complete data for all other taxa. In contrast to the D statistics computed with ADMIXTOOLS, which are based on allele frequencies, our five population tests are calculated from a single individual for each focal population, following Eaton and Ree47. We tested each of the three individuals of each LVRS group separately, and report the means for each radiation. At heterozygous sites, one allele was chosen at random. For each combination of individuals tested, we counted the eight patterns needed to compute the four D statistics (Supplementary Table 3) using all sites without missing data. We calculated z scores in units of s.d. from 100 bootstrapped datasets (sites resampled with replacement) as in Eaton and Ree47.

全基因组测序数据

Whole-genome sequencing data

为了证实我们从RAD序列数据中发现的结果以及对祖先块体大小的分析,我们对来自维多利亚湖的九条鱼的全基因组进行了测序,其中两条来自基伍湖,以及我们发现的最接近现存的谱系代表类群直接在LVRS,刚果的Astatotilapia stappersi,和上尼罗胸喉咽和“Haplochromis”gracilior的祖先。全基因组测序数据使用无PCR的文库制备76和Illumina HiSeq 3000配对末端测序法产生11个个体。另外三个人在同一台机器上以相同的方式进行测序,这些人是从McGee et al.77获取的(参见补充数据1了解样本信息)。用Bowtie 2进行局部与Astatotilapia burtoni参考基因组50的比对(参考文献61)。对于不同的调用和基因分型,我们使用单倍型呼叫者(GATK v.3.5)62。使用vcftools v。4.1(参考文献63),删除少于5个阅读,多等位基因和indels的基因型。

To corroborate our findings from RAD sequence data, and for the analysis of ancestry block sizes, we sequenced whole genomes of nine fish from Lake Victoria, two from Lake Kivu, and of the taxa that we found to be the closest extant representatives of the lineages directly ancestral to the LVRS, the Congolese Astatotilapia stappersi, and the Upper Nile Thoracochromis pharyngalis and ‘Haplochromisgracilior. Whole-genome sequencing data was generated using PCR-free library preparation76 and Illumina HiSeq 3000 paired-end sequencing for 11 individuals. Three additional individuals, sequenced the same way on the same machine, were taken from McGee et al.77 (see Supplementary Data 1 for sample information). Local alignment against the Astatotilapia burtoni reference genome50 was performed with Bowtie 2 (ref. 61). For variant calling and genotyping we used Haplotype Caller (GATK v. 3.5)62. Genotypes with fewer than five reads, multiallelic sites, and indels were removed using vcftools v. 4.1 (ref. 63).

使用ADMIXTOOLS 1.1版(参考文献48),使用所有14个基因组中具有1%次要等位基因频率截断和最大缺失数据比例为20%的所有双等位点来计算全基因组D的统计。参考基因组(A.burtoni)被用作外群。

Whole-genome D statistics were calculated with ADMIXTOOLS v. 1.1 (ref. 48) using all biallelic sites with a 1% minor allele frequency cutoff and a maximum missing data proportion of 20% across all 14 genomes. The reference genome (A. burtoni) was used as an outgroup.

为了研究基因组中外加剂的特征,我们使用了Martin等人的fd统计。与D统计相比,fd更适合小基因组区域78。使用等位基因频率计算ABBA和BABA模式计数,按照ABBA或BABA模式拟合每个分离位点46,72,78。作为一个例子,如果一个站点分别在P1,P2,P3和outgroup中得到0,0.5,1和0的等位基因频率,则它将被计为一半的ABBA站点。我们用A. stappersi作为P1,不同的LVRS个体作为P2,H. gracilior和T. pharyngalis作为P3和A. burtoni作为外群(补充图8)。请注意,与D统计相比,P1和P2被切换,其中LVRS用作P1,刚果谱系用作P2。这种差异仅仅是为了与Martin等人78对fd的描述一致,因此这里ABBA代表LVRS和上尼罗河之间共有衍生等位基因的位点,而BABA代表具有刚果和上尼罗河血统之间共享的衍生等位基因的位点类群。 fd计算为ABBA和BABA模式之间的差异,相比于P2和P3之间的基因流量等于随机交配的最大可能差异78。正值fd值表示P2(LVRS)和P3(上尼罗河)之间的基因流。我们使用Martin等人的Python脚本,在A. burtoni脚手架的非重叠滑动窗口中计算fd。排除少于五个ABBA和BABA模式的Windows。由于只有阳性fd值表明上尼罗河和LVRS之间过度的等位基因共享,并且被正确地标准化,所以具有负D分值的值被设置为0,如Martin等人。用统计R-package计算Pearson的不同个体之间fd值的产品时刻相关性。

To study signatures of admixture along the genome, we used fdstatistics by Martin et al.78. In comparison to D statistics, fd is more suited for small genomic regions78. ABBA and BABA pattern counts were calculated using allele frequencies by weighting each segregating site according to its fit to the ABBA or BABA pattern46,72,78. As an example, if a site has derived allele frequencies of 0, 0.5, 1 and 0 in P1, P2, P3 and the outgroup, respectively, it would count as half ABBA site. We used A. stappersi as P1, different LVRS individuals as P2, H. gracilior and T. pharyngalis as P3 and A. burtoni as outgroup (Supplementary Fig. 8). Note that P1 and P2 are switched as compared to D statistics, where LVRS are used as P1 and the Congolese lineage as P2. This difference is simply for consistency with the description of fd in Martin et al.78, and thus here ABBA represents sites with a derived allele shared between LVRS and Upper Nile, whereas BABA represents sites with a derived allele shared between Congolese and Upper Nile lineage taxa. fd is calculated as the difference between ABBA and BABA patterns compared to the maximum possible difference where gene flow between P2 and P3 would equal random mating78. Positive fd values indicate gene flow between P2 (LVRS) and P3 (Upper Nile). We calculated fd in non-overlapping sliding windows of 10 kb along the A. burtoni scaffolds using the Python script by Martin et al.78. Windows with less than five total ABBA and BABA patterns were excluded. As only positive fd values are indicative of excess allele sharing between Upper Nile and LVRS, and are correctly standardized, values with negative D scores were set to 0 as in Martin et al.78. Pearson’s product-moment correlations of fd values between different individuals were calculated with the stats R-package.

通过比较ABBA位点(LVRS分享衍生的等位基因完全与上尼罗河分类单元)的频率和BBAA位点的频率(LVRS仅与刚果血统代表A.stappersi分享派生的等位基因)的频率,评估了推定的刚果人或上尼罗河血统非重叠滑动窗口3 kb。具有<100kb数据的支架被移除。至于fd统计,利用等位基因频率计算ABBA和BBAA模式计数,按照其适合于ABBA或BBAA模式的权重对每个分离位点进行加权。只使用总ABBA和BBAA模式计数超过1的窗口。将最小ABBA比例((ABBA /(ABBA + BBAA))为0.7的窗口定义为上尼罗河血统祖先的候选窗口,并以蓝色着色,而ABBA比例为0.3或更小的基因组区域被定义为推定的刚果祖先的小块被定义为相同颜色(红色或蓝色)的连续滑动窗口,忽略没有数据的单个滑动窗口,用Racimo等人的公式计算出预期的祖先块大小。混合后约5万〜10万代,重组率为2.5×10〜(-8),上尼罗河比例为20%,混合料的预期平均长度为(0.8×2.5×10-8×(5万〜10万) 1)) - 1 = 500bp至1kb(参考文献79)。祖先块体的分裂可能已经减缓,因为原始杂交种群开始经历基因组稳定,在独立的湖泊中形成地理上孤立的辐射,每个湖泊彼此独立地进一步稳定化,可能与祖先变异的差异分选相关。

Putative Congolese or Upper Nile ancestry was assessed by comparing the frequency of ABBA sites (LVRS shares the derived allele exclusively with Upper Nile taxa) with the frequency of BBAA sites (LVRS shares the derived allele exclusively with the Congolese lineage representative A. stappersi) in non-overlapping sliding windows of 3 kb. Scaffolds with <100 kb data were removed. As for the fd statistic, ABBA and BBAA pattern counts were calculated using allele frequencies by weighting each segregating site according to its fit to the ABBA or BBAA pattern. Only windows with a total ABBA and BBAA pattern count exceeding one were used. Windows with a minimum ABBA proportion ((ABBA/(ABBA+BBAA)) of 0.7 were defined as candidate windows of Upper Nile lineage ancestry and are coloured in blue, whereas genomic regions with an ABBA proportion of 0.3 or less were defined as putative Congolese lineage derived windows and are highlighted in red. Ancestry tracts were defined as consecutive sliding windows of the same colour (red or blue) ignoring single sliding windows without data. Expected ancestry block sizes were calculated using a formula from Racimo et al.79. Assuming ∼50,000–100,000 generations since admixture, a recombination rate of 2.5 × 10−8 and an Upper Nile proportion of 20%, the expected mean length of the admixture tracts is (0.8 × 2.5 × 10−8 × (50,000 to 100,000)−1))−1=500 bp to 1 kb (ref. 79). The breakup of ancestry blocks may have slowed as the original hybrid population began to undergo genomic stabilization, speciated and formed geographically isolated radiations in separate lakes each of which underwent further stabilization independent of each other, likely associated with differential sorting of the ancestral variation.

数据获取

Data availability

线粒体和LWS视蛋白基因序列可以在Genbank上获取,编号是KY366716-KY366843 ( ND2), KY366844-KY366970 (D-loop), KY366971-KY366986 (LWS视蛋白基因).。RADseq和全基因组测序结果可以在NCBI Sequence Read Archive 的Bioproject PRJNA355227下获取。5-population test的java项目可以在Github上取得 (https://github.com/joanam/scripts).
Mitochondrial and LWS opsin sequences are available on GenBank under the accession numbers KY366716-KY366843 for ND2, KY366844-KY366970 for D-loop, and KY366971-KY366986 for LWS opsin sequences. RADseq and whole-genome sequencing reads generated in this study can be downloaded from the NCBI Sequence Read Archive under Bioproject PRJNA355227. The JAVA program for 5-population tests is publicly available on GitHub (https://github.com/joanam/scripts).

Reference

  1. Wagner, C. E., Harmon, L. J. & Seehausen, O. Ecological opportunity and sexual selection together predict adaptive radiation. Nature 487, 366–369 (2012).
  2. Barrett, R. D. H. & Schluter, D. Adaptation from standing genetic variation. Trends Ecol. Evol. 23, 38–44 (2008).
  3. Seehausen, O. Hybridization and adaptive radiation. Trends Ecol. Evol. 19, 198–207 (2004).
  4. Abbott, R. et al. Hybridization and speciation. J. Evol. Biol. 26, 229–246 ð2013Þ:
  5. Rieseberg, L. H. Hybrid origins of plant species. Ann. Rev. Ecol. Syst. 28, 359–389 (1997).
  6. Mallet, J. Hybrid speciation. Nature 446, 279–283 (2007).
  7. Anderson, E. & Stebbins, G. L. Hybridization as an evolutionary stimulus. Evolution 8, 378–388 (1954).
  8. Carlquist, S. Island Biology Vol. 581, 5279 (Columbia Univ Press, 1974).
  9. Arnold, M. L. Natural Hybridization and Evolution (Oxford University Press, 1997).
  10. Feder, J. L. et al. Mayr, Dobzhansky, and Bush and the complexities of sympatric speciation in Rhagoletis. Proc. Natl Acad. Sci. USA 102, 6573–6580 (2005).
  11. The Heliconius Genome Consortium. Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature 487, 94–98 ð2012Þ:
  12. Pardo-Diaz, C. et al. Adaptive introgression across species boundaries in Heliconius butterflies. PLOS Genet. 8, e1002752 (2012).
  13. Lamichhaney, S. et al. Evolution of Darwin’s finches and their beaks revealed by genome sequencing. Nature 518, 371–375 (2015).
  14. Salzburger, W., Baric, S. & Sturmbauer, C. Speciation via introgressive hybridization in East African cichlids? Mol. Ecol. 11, 619–625 (2002).
  15. Meyer, B. S., Matschiner, M. & Salzburger, W. Disentangling incomplete lineage sorting and introgression to refine species-tree estimates for Lake Tanganyika cichlid fishes. Syst. Biol. doi: 10.1093/sysbio/syw069 (2016).
  16. Weiss, J. D., Cotterill, F. P. & Schliewen, U. K. Lake Tanganyika—A’Melting Pot’of Ancient and Young Cichlid Lineages (Teleostei: Cichlidae)? PLoS ONE 10, e0125043 (2015).
  17. Genner, M. J. & Turner, G. F. Ancient hybridization and phenotypic novelty within Lake Malawi’s cichlid fish radiation. Mol. Biol. Evol. 29, 195–206 (2012).
  18. Keller, I. et al. Population genomic signatures of divergent adaptation, gene flow and hybrid speciation in the rapid radiation of Lake Victoria cichlid fishes. Mol. Ecol. 22, 2848–2863 (2013).
  19. Meier, J. I. et al. Demographic modelling with whole-genome data reveals parallel origin of similar Pundamilia cichlid species after hybridization. Mol. Ecol, 26, 123–141 (2017).
  20. Schliewen, U. K. & Klee, B. Reticulate sympatric speciation in Cameroonian crater lake cichlids. Front. Zool. 1, 5 (2004).
  21. Seehausen, O. Conditions when hybridization might predispose populations for adaptive radiation. J. Evol. Biol. 26, 279–281 (2013).
  22. Schumer, M., Cui, R., Rosenthal, G. G. & Andolfatto, P. Reproductive isolation of hybrid populations driven by genetic incompatibilities. PLoS Genet. 11, e1005041 (2015).
  23. Hermansen, J. S. et al. Hybrid speciation through sorting of parental incompatibilities in Italian sparrows. Mol. Ecol. 23, 5831–5842 (2014).
  24. Barrier, M., Baldwin, B. G., Robichaux, R. H. & Purugganan, M. D. Interspecific hybrid ancestry of a plant adaptive radiation: alloploidy of the Hawaian silversword alliance (Asteraceae) inferred from floral homeotic gene duplication. Mol. Biol. Evol. 16, 1105–1113 (1999).
  25. Lynch, M. & Conery, J. S. The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155 (2000).
  26. Hudson, A. G., Vonlanthen, P. & Seehausen, O. Rapid parallel adaptive radiations from a single hybridogenic ancestral population. Proc. R Soc. B 278, 58–66 (2011).
  27. Joyce, D. A. et al. Repeated colonization and hybridization in Lake Malawi cichlids. Curr. Biol. 21, R108–R109 (2011).
  28. Lindqvist, C., Motley, T. J., Jeffrey, J. J. & Albert, V. A. Cladogenesis and reticulation in the Hawaiian endemic mints (Lamiaceae). Cladistics 19, 480–495 (2003).
  29. Lindqvist, C. & Albert, V. A. Origin of the Hawaiian endemic mints within North American Stachys (Lamiaceae). Am. J. Bot. 89, 1709–1724 ð2002Þ:
  30. Baldwin, B. G. & Wagner, W. L. Hawaiian angiosperm radiations of North American origin. Ann. Bot. 9, 849–879 (2010).
  31. Verheyen, E., Salzburger, W., Snoeks, J. & Meyer, A. Origin of the superflock of cichlid fishes from Lake Victoria, East Africa. Science 300, 325–329 ð2003Þ:
  32. Seehausen, O. African cichlid fish: a model system in adaptive radiation research. Proc. R. Soc. B 273, 1987–1998 (2006).
  33. Genner, M. J. et al. Age of cichlids: New dates for ancient lake fish radiations. Mol. Biol. Evol. 24, 1269–1282 (2007).
  34. Bezault, E., Mwaiko, S. & Seehausen, O. Population genomic tests of models of adaptive radiation in Lake Victoria Region cichlid fish. Evolution 65, 3381–3397 (2011).
  35. Johnson, T. C., Kelts, K. & Odada, E. The holocene history of Lake Victoria. Ambio 29, 2–11 (2000). 36. Wagner, C. E. et al. Genome-wide RAD sequence data provide unprecedented resolution of species boundaries and relationships in the Lake Victoria cichlid adaptive radiation. Mol. Ecol. 22, 787–798 (2013).
  36. Seehausen, O. et al. Nuclear markers reveal unexpected genetic variation and a Congolese-Nilotic origin of the Lake Victoria cichlid species flock. Proc. R. Soc. Lond. B Biol. Sci. 270, 129–137 (2003).
  37. Stelkens, R. B., Young, K. A. & Seehausen, O. The accumulation of reproductive incompatibilities in African cichlid fish. Evolution 64, 617–632 (2010).
  38. Seehausen, O. et al. Speciation through sensory drive in cichlid fish. Nature 455, 620–626 (2008).
  39. Seehausen, O., van Alphen, J. & Witte, F. Can ancient colour polymorphisms explain why some cichlid lineages speciate rapidly under disruptive sexual selection? Belg. J. Zool. 129, 43–60 (1999).
  40. Meyer, B. S. et al. Back to Tanganyika: a case of recent trans-species-flock dispersal in East African haplochromine cichlid fishes. R. Soc. Open Sci. 2, 140498 (2015).
  41. Greenwood, P. H. Towards a phyletic classification of the ‘genus’ Haplochromis (Pisces, Cichlidae) and related taxa. Part 1. Bull. Br. Mus. Nat. Hist. Zool. 35, 265–322 (1979).
  42. Hermann, C. M., Sefc, K. M. & Koblmu¨ller, S. Ancient origin and recent divergence of a haplochromine cichlid lineage from isolated water bodies in the East African Rift system. J. Fish Biol. 79, 1356–1369 (2011).
  43. Maddison, W. P. Gene trees in species trees. Syst. Biol. 46, 523–536 (1997).
  44. Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N. A. & RoyChoudhury, A. Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Mol. Biol. Evol. 29, 1917–1932 ð2012Þ:
  45. Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).
  46. Eaton, D. A. R. & Ree, R. H. Inferring phylogeny and introgression using RADseq data: an example from flowering plants (Pedicularis: Orobanchaceae). Syst. Biol. 62, 689–706 (2013).
  47. Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012).
  48. Terai, Y. et al. Divergent selection on opsins drives incipient speciation in Lake Victoria cichlids. PLoS Biol. 4, 2244–2251 (2006).
  49. Brawand, D. et al. The genomic substrate for adaptive radiation in African cichlid fish. Nature 513, 375–381 (2014).
  50. Terai, Y., Mayer, W. E., Klein, J., Tichy, H. & Okada, N. The effect of selection on a long wavelength-sensitive (LWS) opsin gene of Lake Victoria cichlid fishes. Proc. Natl Acad. Sci. USA 99, 15501–15506 (2002).
  51. Okullo, W. et al. Parameterization of the inherent optical properties of Murchison Bay, Lake Victoria. Appl. Opt. 46, 8553–8561 (2007).
  52. Maan, M. E., Hofker, K. D., van Alphen, J. J. M. & Seehausen, O. Sensory drive in cichlid speciation. Am. Nat. 167, 947–954 (2006).
  53. Miyagi, R. et al. Correlation between nuptial colors and visual sensitivities tuned by opsins leads to species richness in sympatric Lake Victoria cichlid fishes. Mol. Biol. Evol. 29, 3281–3296 (2012).
  54. Selz, O. M., Pierotti, M. E. R., Maan, M. E., Schmid, C. & Seehausen, O. Female preference for male color is necessary and sufficient for assortative mating in 2 cichlid sister species. Behav. Ecol. 25, 612–626 (2014).
  55. Stelkens, R. B., Schmid, C. & Seehausen, O. Hybrid breakdown in cichlid fish. PLoS ONE 10, e0127207 (2015).
  56. Stelkens, R. B., Schmid, C., Selz, O. & Seehausen, O. Phenotypic novelty in experimental hybrids is predicted by the genetic distance between species of cichlid fish. BMC Evol. Biol. 9, 283–295 (2009).
  57. Sambrook, J. & Russell, D. W. Molecular Cloning: A Laboratory Manual 3rd edn (Cold Spring Harbor Laboratory Press, 2001).
  58. Baird, N. A. et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS ONE 3, e3376 (2008).
  59. Catchen, J., Hohenlohe, P. A., Bassham, S., Amores, A. & Cresko, W. A. Stacks: an analysis tool set for population genomics. Mol. Ecol. 22, 3124–3140 (2013).
  60. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–U354 (2012).
  61. McKenna, A. et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
  62. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
  63. Kocher, T. D., Conroy, J. A., McKaye, K. R., Stauffer, J. R. & Lockwood, S. F. Evolution of NADH dehydrogenase subunit 2 in east African cichlid fish. Mol. Phylogenet. Evol. 4, 420–432 (1995).
  64. Kocher, T. D. et al. Dynamics of mitochondrial DNA evolution in animals: amplification and sequencing with conserved primers. Proc. Natl Acad. Sci. USA 86, 6196–6200 (1989).
  65. Hall, T. A. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl. Acid Symp. 41, 95–98 (1999).
  66. Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006).
  67. Stamatakis, A. et al. RAxML-Light: a tool for computing terabyte phylogenies. Bioinformatics 28, 2064–2066 (2012).
  68. Popescu, A. A., Huber, K. T. & Paradis, E. ape 3.0: new tools for distance-based phylogenetics and evolutionary analysis in R. Bioinformatics 28, 1536–1537 (2012).
  69. Bouckaert, R. R. DensiTree: making sense of sets of phylogenetic trees. Bioinformatics 26, 1372–1373 (2010).
  70. Bouckaert, R. et al. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 10, e1003537 (2014).
  71. Durand, E. Y., Patterson, N., Reich, D. & Slatkin, M. Testing for ancient admixture between closely related populations. Mol. Biol. Evol. 28, 2239–2252 (2011).
  72. Reich, D., Thangaraj, K., Patterson, N., Price, A. L. & Singh, L. Reconstructing Indian population history. Nature 461, 489–494 (2009). 74. Carleton, K. L. & Kocher, T. D. Cone opsin genes of African cichlid fishes: tuning spectral sensitivity by differential gene expression. Mol. Biol. Evol. 18, 1540–1550 (2001).
  73. Excoffier, L. & Lischer, H. E. L. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol. Ecol. Resour. 10, 564–567 (2010).
  74. Kozarewa, I. et al. Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (Gþ C)-biased genomes. Nat. Methods 6, 291–295 (2009).
  75. McGee, M. D., Neches, R. Y. & Seehausen, O. Evaluating genomic divergence and parallelism in replicate ecomorphs from young and old cichlid adaptive radiations. Mol. Ecol. 25, 260–268 (2016).
  76. Martin, S. H., Davey, J. W. & Jiggins, C. D. Evaluating the use of ABBA–BABA statistics to locate introgressed loci. Mol. Biol. Evol. 32, 244–257 (2015).
  77. Racimo, F., Sankararaman, S., Nielsen, R. & Huerta-Sa ´nchez, E. Evidence for archaic adaptive introgression in humans. Nat. Rev. Genet. 16, 359–371 (2015).
  78. Lehner, B., Verdin, K. & Jarvis, A. New global hydrography derived from spaceborne elevation data. EOS 89, 93–94 (2008).

推荐阅读更多精彩内容