更正:上一期所讲的蛋白选择应该修改为分辨率越高(数值越低),结构越清晰。
感谢怡然Rosy同学指出上一期文章的问题
当序列相似度低的时候同源建模是远远不够的,所以多模板建模就要派上用场了~建模的一个重要目标是促进对模型蛋白的功能的理解。比较上一期我们的模型可以发现在93-100区域模型质量不好,说明这个区域是该酶的非常重要的一个区域,其是无序的所以没有PDB的结构。最有可能的原因是长的loop环活性位点是柔性的且缺乏配体,导致制作其颜色图谱。
针对这个原因,modeller提供了三个解决方案:
1.使用多模板建模
2.loop环采用ab-initio(从头建模) 【后面讲解,欢迎关注】
3.基于配体的同源建模 【后面讲解,欢迎关注】
多模板
首先采用salign()
进行多模板建模
# Illustrates the SALIGN multiple structure/sequence alignment
from modeller import *
log.verbose()
env = environ()
env.io.atom_files_directory = './:../atom_files/'
aln = alignment(env)
for (code, chain) in (('模型PDB', '链'), ('模型PDB2', '链2'), ('模型PDB3', '链3')):
mdl = model(env, file=code, model_segment=('FIRST:'+chain, 'LAST:'+chain))
aln.append_model(mdl, atom_files=code, align_codes=code+chain)
for (weights, write_fit, whole) in (((1., 0., 0., 0., 1., 0.), False, True),
((1., 0.5, 1., 1., 1., 0.), False, True),
((1., 1., 1., 1., 1., 0.), True, False)):
aln.salign(rms_cutoff=3.5, normalize_pp_scores=False,
rr_file='$(LIB)/as1.sim.mat', overhang=30,
gap_penalties_1d=(-450, -50),
gap_penalties_3d=(0, 3), gap_gap_score=0, gap_residue_score=0,
dendrogram_file='fm00495.tree',
alignment_type='tree', # If 'progresive', the tree is not
# computed and all structues will be
# aligned sequentially to the first
feature_weights=weights, # For a multiple sequence alignment only
# the first feature needs to be non-zero
improve_alignment=True, fit=True, write_fit=write_fit,
write_whole_pdb=whole, output='ALIGNMENT QUALITY')
aln.write(file='PAP文件格式.pap', alignment_format='PAP')
aln.write(file='AIL文件格式.ali', alignment_format='PIR')
aln.salign(rms_cutoff=1.0, normalize_pp_scores=False,
rr_file='$(LIB)/as1.sim.mat', overhang=30,
gap_penalties_1d=(-450, -50), gap_penalties_3d=(0, 3),
gap_gap_score=0, gap_residue_score=0, dendrogram_file='1is3A.tree',
alignment_type='progressive', feature_weights=[0]*6,
improve_alignment=False, fit=False, write_fit=True,
write_whole_pdb=False, output='QUALITY')
File:脚本1.py
<font color="red" >申明:后面所有的.py的脚本均需要mod+版本号 脚本.py进行执行一次命令</font>
示例:
# Illustrates the SALIGN multiple structure/sequence alignment
from modeller import *
log.verbose()
env = environ()
env.io.atom_files_directory = './:../atom_files/'
aln = alignment(env)
for (code, chain) in (('2mdh', 'A'), ('1bdm', 'A'), ('1b8p', 'A')):
mdl = model(env, file=code, model_segment=('FIRST:'+chain, 'LAST:'+chain))
aln.append_model(mdl, atom_files=code, align_codes=code+chain)
for (weights, write_fit, whole) in (((1., 0., 0., 0., 1., 0.), False, True),
((1., 0.5, 1., 1., 1., 0.), False, True),
((1., 1., 1., 1., 1., 0.), True, False)):
aln.salign(rms_cutoff=3.5, normalize_pp_scores=False,
rr_file='$(LIB)/as1.sim.mat', overhang=30,
gap_penalties_1d=(-450, -50),
gap_penalties_3d=(0, 3), gap_gap_score=0, gap_residue_score=0,
dendrogram_file='fm00495.tree',
alignment_type='tree', # If 'progresive', the tree is not
# computed and all structues will be
# aligned sequentially to the first
feature_weights=weights, # For a multiple sequence alignment only
# the first feature needs to be non-zero
improve_alignment=True, fit=True, write_fit=write_fit,
write_whole_pdb=whole, output='ALIGNMENT QUALITY')
aln.write(file='fm00495.pap', alignment_format='PAP')
aln.write(file='fm00495.ali', alignment_format='PIR')
aln.salign(rms_cutoff=1.0, normalize_pp_scores=False,
rr_file='$(LIB)/as1.sim.mat', overhang=30,
gap_penalties_1d=(-450, -50), gap_penalties_3d=(0, 3),
gap_gap_score=0, gap_residue_score=0, dendrogram_file='1is3A.tree',
alignment_type='progressive', feature_weights=[0]*6,
improve_alignment=False, fit=False, write_fit=True,
write_whole_pdb=False, output='QUALITY')
File: multiple_template/salign.py
具体的参数问题我们后面的章节会讲到,大致就是先写了一个循环将模型均加入到aln实例中,先制作一个粗糙的比对模型,然后通过提供更多信息来改造它。然后写pap以及ali格式的文件,最后再进行一次高质量的比对。
接下来我们再进行多模板比对:
from modeller import *
log.verbose()
env = environ()
env.libs.topology.read(file='$(LIB)/top_heav.lib')
# Read aligned structure(s):
aln = alignment(env)
aln.append(file='上个脚本生成的ALI文件.ali', align_codes='all')
aln_block = len(aln)
# Read aligned sequence(s):
aln.append(file='目标序列.ali', align_codes='TvLDH')
# Structure sensitive variable gap penalty sequence-sequence alignment:
aln.salign(output='', max_gap_length=20,
gap_function=True, # to use structure-dependent gap penalty
alignment_type='PAIRWISE', align_block=aln_block,
feature_weights=(1., 0., 0., 0., 0., 0.), overhang=0,
gap_penalties_1d=(-450, 0),
gap_penalties_2d=(0.35, 1.2, 0.9, 1.2, 0.6, 8.6, 1.2, 0., 0.),
similarity_flag=True)
aln.write(file='AliP文件格式.ali', alignment_format='PIR')
aln.write(file='PAP文件格式.pap', alignment_format='PAP')
File:脚本2.py
示例
from modeller import *
log.verbose()
env = environ()
env.libs.topology.read(file='$(LIB)/top_heav.lib')
# Read aligned structure(s):
aln = alignment(env)
aln.append(file='fm00495.ali', align_codes='all')
aln_block = len(aln)
# Read aligned sequence(s):
aln.append(file='TvLDH.ali', align_codes='TvLDH')
# Structure sensitive variable gap penalty sequence-sequence alignment:
aln.salign(output='', max_gap_length=20,
gap_function=True, # to use structure-dependent gap penalty
alignment_type='PAIRWISE', align_block=aln_block,
feature_weights=(1., 0., 0., 0., 0., 0.), overhang=0,
gap_penalties_1d=(-450, 0),
gap_penalties_2d=(0.35, 1.2, 0.9, 1.2, 0.6, 8.6, 1.2, 0., 0.),
similarity_flag=True)
aln.write(file='TvLDH-mult.ali', alignment_format='PIR')
aln.write(file='TvLDH-mult.pap', alignment_format='PAP')
具体参数我们往后分析
和基础建模类似,加上目标序列比对好后即可进行建模
from modeller import *
from modeller.automodel import *
env = environ()
a = automodel(env, alnfile='上一个脚本产生的ALI比对文件.ali',
knowns=('已知序列+链'), sequence='目标序列')
a.starting_model = 开头编号
a.ending_model = 结尾编号
a.make()
File: 脚本3.py
示例
from modeller import *
from modeller.automodel import *
env = environ()
a = automodel(env, alnfile='TvLDH-mult.ali',
knowns=('1bdmA','2mdhA','1b8pA'), sequence='TvLDH')
a.starting_model = 1
a.ending_model = 5
a.make()
File: multiple_template/model_mult.py
最后我们可以像上一期一样进行DOPE评价
示例
from modeller import *
from modeller.scripts import complete_pdb
log.verbose() # request verbose output
env = environ()
env.libs.topology.read(file='$(LIB)/top_heav.lib') # read topology
env.libs.parameters.read(file='$(LIB)/par.lib') # read parameters
# read model file
mdl = complete_pdb(env, 'TvLDH.B99990001.pdb')
# Assess all atoms with DOPE:
s = selection(mdl)
s.assess_dope(output='ENERGY_PROFILE NO_REPORT', file='TvLDH.profile',
normalize_profile=True, smoothing_window=15)
可以发现关键区域模型质量升高,模型该区域进一步降低得分。下一期我们将讲解modeller提供的另外两个方案,敬请关注。