2021-03-31 AI Studio上运行Speech-Transformer项目调试

直接在notebook上执行

!echo $PYTHONPATH
%cd /home/aistudio/Speech-Transformer/egs/aishell
!python /home/aistudio/Speech-Transformer/src/bin/train.py \
--train-json dump/train/deltafalse/data_simplify.json \
--valid-json dump/dev/deltafalse/data_simplify.json \
--dict data/lang_1char/train_chars.txt \
--LFR_m 1 --LFR_n 1 \
--d_input 80 --n_layers_enc 1 --n_head 2 --d_k 64 --d_v 64 \
--d_model 256 --d_inner 512 --dropout 0.1 --pe_maxlen 5000 \
--d_word_vec 256 --n_layers_dec 1 --tgt_emb_prj_weight_sharing 1 \
--label_smoothing 0.1 \
--epochs 60 --shuffle 1 \
--batch-size 64 --batch_frames 0 \
--maxlen-in 800 --maxlen-out 150 \
--num-workers 4 --k 0.2 --warmup_steps 4000 \
--save-folder exp/train_result \
--checkpoint 0 --continue-from "" \
--print-freq 10 --visdom 0 --visdom_lr 0 --visdom_epoch 0 --visdom-id "Transformer Training"

在终端执行

cd Speech-Transformer/egs/aishell/
source path.sh
python /home/aistudio/Speech-Transformer/src/bin/train.py --train-json dump/train/deltafalse/data_simplify.json --valid-json dump/dev/deltafalse/data_simplify.json --dict data/lang_1char/train_chars.txt --LFR_m 1 --LFR_n 1 --d_input 80 --n_layers_enc 1 --n_head 2 --d_k 64 --d_v 64 --d_model 256 --d_inner 512 --dropout 0.1 --pe_maxlen 5000 --d_word_vec 256 --n_layers_dec 1 --tgt_emb_prj_weight_sharing 1 --label_smoothing 0.1 --epochs 60 --shuffle 1 --batch-size 64 --batch_frames 0 --maxlen-in 800 --maxlen-out 150 --num-workers 4 --k 0.2 --warmup_steps 4000 --save-folder exp/train_result --checkpoint 0 --continue-from "" --print-freq 10 --visdom 0 --visdom_lr 0 --visdom_epoch 0 --visdom-id "Transformer Training"

安装调试工具ipdb (彩色显示,默认pdb无彩色显示)

pip install ipdb -i https://pypi.tuna.tsinghua.edu.cn/simple

设置断点使用方法:

import ipdb

...
ipdb.set_trace() # --> 插入断点行 
...

进入ipdb调试状态


常用命令:

  • n 执行下一行
  • q 退出调试
  • l 列出当前位于哪里
  • p 打印变量值
  • a 打印当前函数的参数
2021-03-31 17:28:33,902 - WARNING - DataLoader reader thread raised an exception.
Traceback (most recent call last):
  File "/home/aistudio/Speech-Transformer/src/bin/train.py", line 181, in <module>
    main(args)
  File "/home/aistudio/Speech-Transformer/src/bin/train.py", line 175, in main
    solver.train()
  File "/home/aistudio/Speech-Transformer/src/solver/solver.py", line 80, in train
    tr_avg_loss = self._run_one_epoch(epoch)
  File "/home/aistudio/Speech-Transformer/src/solver/solver.py", line 176, in _run_one_epoch
    for i, (data) in enumerate(data_loader):
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 351, in __next__
    return self._reader.read_next_var_list()
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
  [Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:158)
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 346, in _thread_loop
    six.reraise(*sys.exc_info())
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 317, in _thread_loop
    batch = self._dataset_fetcher.fetch(indices)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/fetcher.py", line 65, in fetch
    data = self.collate_fn(data)
TypeError: 'tuple' object is not callable

由于源码基于pytorch架构,转换到paddle架构之后,需要替代些函数:

问题1 进行单次优化step()方法

self.optimizer.step()
AttributeError: 'Adam' object has no attribute 'step'

查看class paddle.optimizer. Adam


修改对应参数名称

paddle.optimizer.Adam(parameters=model.parameters(), beta1=0.9, beta2=0.98, epsilon=1e-09)
self.optimizer.clear_grad()
# self.optimizer.zero_grad()

修改item() --> numpy().item()

loss.numpy().item()

num_work = 0 和 tgt_emb_prj = 0 情况下已经可以成功执行训练

num_work > 0 仍报错

推荐阅读更多精彩内容