keras学习笔记-SimpleRNN with Keras — generating text

SimpleRNN with Keras — generating text

当初写的时候就是用英文,考虑大部分都是代码,也就懒得在换成中文了:)

RNNs have been used extensively by the natural language processing (NLP) community for various applications. One such application is building language models. A language model allows us to predict the probability of a word in a text given the previous words. Language models are important for various higher level tasks such as machine translation, spelling correction, and so on.
A side effect of the ability to predict the next word given previous words is a generative model that allows us to generate text by sampling from the output probabilities. In language modeling, our input is typically a sequence of words and the output is a sequence of predicted words. The training data used is existing unlabeled text, where we set the label yt at time t to be the input xt+1 at time t+1.
For our first example of using Keras for building RNNs, we will train a character based language model on the text of Alice in Wonderland to predict the next character given 10 previous characters.
We have chosen to build a character-based model here because it has a smaller vocabulary and trains quicker. The idea is the same as using a word-based language model, except we use characters instead of words. We will then use the trained model to generate some text in the same style.

from __future__ import print_function
from keras.layers import Dense, Activation
from keras.layers.recurrent import SimpleRNN
from keras.models import Sequential
import numpy as np

from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
import matplotlib.pyplot as plt
%matplotlib inline

Using TensorFlow backend.

in this case,we download Gutenberg’s Alice’s Adventures in Wonderland as dataset. use the rnn model , the computer can generatting texts.

fp = open("data/Alice.txt", 'rb')
lines = []
for line in fp:
    line = line.strip().lower()
    line = line.decode("ascii", "ignore")
    if len(line) == 0:
        continue
    lines.append(line)
fp.close()
text = " ".join(lines)
print(text[800:3000])
# we print some text in article (from 800 to 3000 chats)
ing nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, and what is the use of a book, thought alice without pictures or conversations? so she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a white rabbit with pink eyes ran close by her. there was nothing so very remarkable in that; nor did alice think it so very much out of the way to hear the rabbit say to itself, oh dear! oh dear! i shall be late! (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the rabbit actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge. in another moment down went alice after it, never once considering how in the world she was to get out again. the rabbit-hole went straight on like a tunnel for some way, and then dipped suddenly down, so suddenly that alice had not a moment to think about stopping herself before she found herself falling down a very deep well. either the well was very deep, or she fell very slowly, for she had plenty of time as she went down to look about her and to wonder what was going to happen next. first, she tried to look down and make out what she was coming to, but it was too dark to see anything; then she looked at the sides of the well, and noticed that they were filled with cupboards and book-shelves; here and there she saw maps and pictures hung upon pegs. she took down a jar from one of the shelves as she passed; it was labelled orange marmalade, but to her great disappointment it was empty: she did not like to drop the jar for fear of killing somebody, 
# 根据文本内容生成字典
chars = set([c for c in text])
nb_chars = len(chars)
char2index = dict((c, i) for i, c in enumerate(chars))
index2char = dict((i, c) for i, c in enumerate(chars))
print('字典大小:', nb_chars,'\r\n字典内容:\r\n' ,chars)
print('字典索引:',char2index)
#char2index
#index2char
字典大小: 55 
字典内容:
 {'l', 'a', 'i', ',', '4', '?', 'c', 'b', 'd', 'u', '-', ']', 'z', 'p', 'h', '%', ' ', 'y', ';', 'g', '2', '@', '.', ':', '6', '9', 'k', '[', 'e', '*', 'v', '!', 'r', 'n', '#', '3', '8', '1', 'o', 'w', '5', '(', '/', '7', '$', 'x', 't', ')', 'm', 'j', '0', '_', 'f', 'q', 's'}
字典索引: {'l': 0, 'a': 1, 'i': 2, ',': 3, '4': 4, '?': 5, 'c': 6, 'b': 7, 'd': 8, 'u': 9, '-': 10, ']': 11, 'z': 12, 'p': 13, 'h': 14, '%': 15, ' ': 16, 'y': 17, ';': 18, 'g': 19, '2': 20, '@': 21, '.': 22, ':': 23, '6': 24, '9': 25, 'k': 26, '[': 27, 'e': 28, '*': 29, 'v': 30, '!': 31, 'r': 32, 'n': 33, '#': 34, '3': 35, '8': 36, '1': 37, 'o': 38, 'w': 39, '5': 40, '(': 41, '/': 42, '7': 43, '$': 44, 'x': 45, 't': 46, ')': 47, 'm': 48, 'j': 49, '0': 50, '_': 51, 'f': 52, 'q': 53, 's': 54}
SEQLEN = 10
STEP = 1
input_chars = []
label_chars = []
for i in range(0, len(text) - SEQLEN, STEP):
    input_chars.append(text[i:i + SEQLEN])
    label_chars.append(text[i + SEQLEN])

#input_chars
#label_chars
print('input_chars:', input_chars[:100])
print('\r\nlabel_chars:', label_chars[:100])

input_chars: ['project gu', 'roject gut', 'oject gute', 'ject guten', 'ect gutenb', 'ct gutenbe', 't gutenber', ' gutenberg', 'gutenbergs', 'utenbergs ', 'tenbergs a', 'enbergs al', 'nbergs ali', 'bergs alic', 'ergs alice', 'rgs alices', 'gs alices ', 's alices a', ' alices ad', 'alices adv', 'lices adve', 'ices adven', 'ces advent', 'es adventu', 's adventur', ' adventure', 'adventures', 'dventures ', 'ventures i', 'entures in', 'ntures in ', 'tures in w', 'ures in wo', 'res in won', 'es in wond', 's in wonde', ' in wonder', 'in wonderl', 'n wonderla', ' wonderlan', 'wonderland', 'onderland,', 'nderland, ', 'derland, b', 'erland, by', 'rland, by ', 'land, by l', 'and, by le', 'nd, by lew', 'd, by lewi', ', by lewis', ' by lewis ', 'by lewis c', 'y lewis ca', ' lewis car', 'lewis carr', 'ewis carro', 'wis carrol', 'is carroll', 's carroll ', ' carroll t', 'carroll th', 'arroll thi', 'rroll this', 'roll this ', 'oll this e', 'll this eb', 'l this ebo', ' this eboo', 'this ebook', 'his ebook ', 'is ebook i', 's ebook is', ' ebook is ', 'ebook is f', 'book is fo', 'ook is for', 'ok is for ', 'k is for t', ' is for th', 'is for the', 's for the ', ' for the u', 'for the us', 'or the use', 'r the use ', ' the use o', 'the use of', 'he use of ', 'e use of a', ' use of an', 'use of any', 'se of anyo', 'e of anyon', ' of anyone', 'of anyone ', 'f anyone a', ' anyone an', 'anyone any', 'nyone anyw']

label_chars: ['t', 'e', 'n', 'b', 'e', 'r', 'g', 's', ' ', 'a', 'l', 'i', 'c', 'e', 's', ' ', 'a', 'd', 'v', 'e', 'n', 't', 'u', 'r', 'e', 's', ' ', 'i', 'n', ' ', 'w', 'o', 'n', 'd', 'e', 'r', 'l', 'a', 'n', 'd', ',', ' ', 'b', 'y', ' ', 'l', 'e', 'w', 'i', 's', ' ', 'c', 'a', 'r', 'r', 'o', 'l', 'l', ' ', 't', 'h', 'i', 's', ' ', 'e', 'b', 'o', 'o', 'k', ' ', 'i', 's', ' ', 'f', 'o', 'r', ' ', 't', 'h', 'e', ' ', 'u', 's', 'e', ' ', 'o', 'f', ' ', 'a', 'n', 'y', 'o', 'n', 'e', ' ', 'a', 'n', 'y', 'w', 'h']
# The next step is to vectorize these input and label texts
X = np.zeros((len(input_chars), SEQLEN, nb_chars), dtype=np.bool)
y = np.zeros((len(input_chars), nb_chars), dtype=np.bool)
for i, input_char in enumerate(input_chars):
    for j, ch in enumerate(input_char):
        X[i, j, char2index[ch]] = 1
    y[i, char2index[label_chars[i]]] = 1
    
print('X[0]', X[0]*1)
print('y[0]', y[0]*1)
X[0] [[0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
y[0] [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]

Finally, we are ready to build our model. We define the RNN's output dimension to have a size of

  1. This is a hyper-parameter that needs to be determined by experimentation. In general, if we
    choose too small a size, then the model does not have sufficient capacity for generating good text, and
    you will see long runs of repeating characters or runs of repeating word groups. On the other hand, if
    the value chosen is too large, the model has too many parameters and needs a lot more data to train
    effectively. We want to return a single character as output, not a sequence of characters, so
    return_sequences=False. We have already seen that the input to the RNN is of shape (SEQLEN and nb_chars).
    In addition, we set unroll=True because it improves performance on the TensorFlow backend.
    The RNN is connected to a dense (fully connected) layer. The dense layer has (nb_char) units, which
    emits scores for each of the characters in the vocabulary. The activation on the dense layer is a
    softmax, which normalizes the scores to probabilities. The character with the highest probability is
    chosen as the prediction. We compile the model with the categorical cross-entropy loss function, a
    good loss function for categorical outputs, and the RMSprop optimizer:
# build our model
HIDDEN_SIZE = 128
BATCH_SIZE = 128
NUM_ITERATIONS = 25
NUM_EPOCHS_PER_ITERATION = 1
NUM_PREDS_PER_EPOCH = 100
model = Sequential()
model.add(SimpleRNN(HIDDEN_SIZE, return_sequences=False,input_shape=(SEQLEN, nb_chars),unroll=True))
model.add(Dense(nb_chars))
model.add(Activation("softmax"))
model.compile(loss="categorical_crossentropy", optimizer="rmsprop")

NUM_ITERATIONS=25
NUM_EPOCHS_PER_ITERATION=1
                                                            
for iteration in range(NUM_ITERATIONS):
    print("=" * 50)
    print("Iteration #: %d" % (iteration))
    model.fit(X, y, batch_size=BATCH_SIZE, epochs=NUM_EPOCHS_PER_ITERATION)
    test_idx = np.random.randint(len(input_chars))
    test_chars = input_chars[test_idx]
    print("Generating from seed: %s" % (test_chars))
    print(test_chars, end="")
    for i in range(NUM_PREDS_PER_EPOCH):
        Xtest = np.zeros((1, SEQLEN, nb_chars))
        for i, ch in enumerate(test_chars):
            Xtest[0, i, char2index[ch]] = 1
        pred = model.predict(Xtest, verbose=0)[0]
        ypred = index2char[np.argmax(pred)]
        print(ypred, end="")
        # move forward with test_chars + ypred
        test_chars = test_chars[1:] + ypred    
    print('\r\n')

==================================================
Iteration #: 0
Epoch 1/1
158773/158773 [==============================] - 18s - loss: 2.3445    
Generating from seed: n a fight 
n a fight he could the so the the the the the the the the the the the the the the the the the the the the the ==================================================
Iteration #: 1
Epoch 1/1
158773/158773 [==============================] - 17s - loss: 2.0517    
Generating from seed: suppose it
suppose it the was in the was in the was in the was in the was in the was in the was in the was in the was in ==================================================
Iteration #: 2
Epoch 1/1
158773/158773 [==============================] - 18s - loss: 1.9478    
Generating from seed: moment to 
moment to the mouth ore the mouth ore the mouth ore the mouth ore the mouth ore the mouth ore the mouth ore th==================================================
Iteration #: 3
Epoch 1/1
158773/158773 [==============================] - 17s - loss: 1.8648    
Generating from seed: guests to 
guests to her sood to the project gutenberg-t the roust i she had the mouse to great out to the project gutenb==================================================
Iteration #: 4
Epoch 1/1
158773/158773 [==============================] - 18s - loss: 1.7956    
Generating from seed:  are confi
 are confing and and and and and and and and and and and and and and and and and and and and and and and and a==================================================
Iteration #: 5
Epoch 1/1
158773/158773 [==============================] - 18s - loss: 1.7410    
Generating from seed: odes that 
odes that she was a the mouse the mock turtle she was a the mouse the mock turtle she was a the mouse the mock==================================================
Iteration #: 6
Epoch 1/1
158773/158773 [==============================] - 18s - loss: 1.6942    
Generating from seed: isplaying,
isplaying, and the more to the said to herself and looked the was the mack to the was the mack to the was the ==================================================
Iteration #: 7
Epoch 1/1
158773/158773 [==============================] - 19s - loss: 1.6559    
Generating from seed: began. you
began. you know it the gryphon a soom alice said the gryphon a soom alice said the gryphon a soom alice said t==================================================
Iteration #: 8
Epoch 1/1
158773/158773 [==============================] - 19s - loss: 1.6233    
Generating from seed: : what a c
: what a conting and be a little the project gutenberg-tm electronic works and the poor and the poor and the p==================================================
Iteration #: 9
Epoch 1/1
158773/158773 [==============================] - 18s - loss: 1.5940    
Generating from seed: on the gro
on the gropped the round the round the round the round the round the round the round the round the round the r==================================================
Iteration #: 10
Epoch 1/1
158773/158773 [==============================] - 19s - loss: 1.5691    
Generating from seed: again, for
again, for the mock turtle with the roust on the work of the mock turtle with the roust on the work of the moc==================================================
Iteration #: 11
Epoch 1/1
158773/158773 [==============================] - 18s - loss: 1.5472    
Generating from seed: g; and the
g; and the fill have in the fill have in the fill have in the fill have in the fill have in the fill have in t==================================================
Iteration #: 12
Epoch 1/1
158773/158773 [==============================] - 18s - loss: 1.5289    
Generating from seed: ot i! said
ot i! said alice, and the cours, and the cours, and the cours, and the cours, and the cours, and the cours, an==================================================
Iteration #: 13
Epoch 1/1
158773/158773 [==============================] - 19s - loss: 1.5119    
Generating from seed: very sleep
very sleep of the work out of the work out of the work out of the work out of the work out of the work out of ==================================================
Iteration #: 14
Epoch 1/1
158773/158773 [==============================] - 19s - loss: 1.4959    
Generating from seed: im sure im
im sure important the round of the eart the mouse was so the dormouse in the mouse was so the dormouse in the ==================================================
Iteration #: 15
Epoch 1/1
158773/158773 [==============================] - 19s - loss: 1.4822    
Generating from seed: olent shak
olent shake the parther alice was not a canten a consing and she had the parther alice was not a canten a cons==================================================
Iteration #: 16
Epoch 1/1
158773/158773 [==============================] - 19s - loss: 1.4691    
Generating from seed: escopes: t
escopes: the doom the door alice was not a song the door alice was not a song the door alice was not a song th==================================================
Iteration #: 17
Epoch 1/1
158773/158773 [==============================] - 19s - loss: 1.4588    
Generating from seed: to execute
to executer in a lough his by the cours, who was a little began in a lough his by the cours, who was a little ==================================================
Iteration #: 18
Epoch 1/1
158773/158773 [==============================] - 19s - loss: 1.4481    
Generating from seed: could let 
could let the door the sure, it was the caterpillar. the come to the could he was to the could he was to the c==================================================
Iteration #: 19
Epoch 1/1
158773/158773 [==============================] - 19s - loss: 1.4388    
Generating from seed: es, and th
es, and the mouse of the mouse of the mouse of the mouse of the mouse of the mouse of the mouse of the mouse o==================================================
Iteration #: 20
Epoch 1/1
158773/158773 [==============================] - 19s - loss: 1.4302    
Generating from seed: t, could n
t, could not me the said to the some to the some to the some to the some to the some to the some to the some t==================================================
Iteration #: 21
Epoch 1/1
158773/158773 [==============================] - 20s - loss: 1.4213    
Generating from seed: to say but
to say but no read to the party said to herself, and the door a peep and the party said to herself, and the do==================================================
Iteration #: 22
Epoch 1/1
158773/158773 [==============================] - 21s - loss: 1.4129    
Generating from seed: ir slates 
ir slates of the thing about the dormouse got the court, and she went on the stite alice had to do any with a ==================================================
Iteration #: 23
Epoch 1/1
158773/158773 [==============================] - 20s - loss: 1.4065    
Generating from seed: at is not 
at is not any all the rood a great did the crout to see it make to herself, and the roon as she went on and th==================================================
Iteration #: 24
Epoch 1/1
158773/158773 [==============================] - 18s - loss: 1.3987    
Generating from seed: y, said al
y, said alice, and the roon a great down on the gryphon remember the white rabbit heard and the words do an on

by 25 times iterations, now we see it can generate some texts, just nonsense, but it better than the beginning :)

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 157,298评论 4 360
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 66,701评论 1 290
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 107,078评论 0 237
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 43,687评论 0 202
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 52,018评论 3 286
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 40,410评论 1 211
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 31,729评论 2 310
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 30,412评论 0 194
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 34,124评论 1 239
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 30,379评论 2 242
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 31,903评论 1 257
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 28,268评论 2 251
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 32,894评论 3 233
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 26,014评论 0 8
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 26,770评论 0 192
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 35,435评论 2 269
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 35,312评论 2 260

推荐阅读更多精彩内容