Python 实战 | srt字幕文件转换txt文本文件

用外语观看电影或电视节目对于学习该语言非常有用,通常可以在字幕网站上找到字幕文件(srt文件)。但是,这些文件不容易阅读,因为它们标有时间戳。因此,本代码旨在将srt字幕文件转换txt文本文件。

使用文本阅读器打开的srt字幕文件是这样的:

172
00:11:20,639 --> 00:11:24,393
To try to quote Ellen Yindel's
outstanding record in the time I have...

173
00:11:24,560 --> 00:11:26,103
would do her a disservice.

174
00:11:26,270 --> 00:11:29,190
Instead I offer the new commissioner
my sympathy...

175
00:11:29,357 --> 00:11:32,526
knowing the impossible job
she is about to face.

但是我们想看到的是这样的文本文件:

To try to quote Ellen Yindel's outstanding record in the time I have...
would do her a disservice.
Instead I offer the new commissioner my sympathy...
knowing the impossible job she is about to face.

使用以下代码可以实现srt字幕文件转换为txt文本文件

Python代码如下:

a = 1
b = 2
c = 3
state = a
text = ''
with open('test1.srt', 'r', utf-8-sig) as f: #打开srt字幕文件,并去掉文件开头的\ufeff
   for line in f.readlines(): #遍历srt字幕文件
       if state == a: #跳过第一行
           state = b
       elif state == b: #跳过第二行
           state = c
       elif state == c: #读取第三行字幕文本
           if len(line.strip()) !=0:
               text += ' ' + line.strip() #将同一时间段的字幕文本拼接
               state = c
           elif len(line.strip()) ==0:
               with open('test1.txt', 'a') as fa: #写入txt文本文件中
                   fa.write(text)
                   text = '\n'
                   state = a

参考资料

  1. Simple Python Script for Extracting Text from an SRT File
  2. srt2txt/srt2txt.py
  3. 去除 \ufeff
  4. python文件读写

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 5,348评论 0 10
  • 文/檐铃化语 01 都说“酒后吐真言”,依我看,酒后只能变脑残。不信?往下看。 一哥们和他老爸举杯对饮,几十个回合...
    檐铃化语阅读 595评论 2 5
  • 以此共勉!错误类型: 错误:image.png 错误原因:django的版本问题 正在使用的版本:1.11 应该配...
    bula_bula_bula_阅读 159评论 0 0