Python数据科学(三)- python与数据科学应用(Ⅲ)

传送门:

1.使用Python计算文章中的字

speech_text = '''
  I love you,Not for what you are,But for what I amWhen I am with you.I love you,Not
 only for whatYou have made of yourself,But for whatYou are making of me.I love
 youFor the part of meThat you bring out;I love youFor putting your handInto my
 heaped-up heartAnd passing overAll the foolish, weak thingsThat you can’t
 helpDimly seeing there,And for drawing outInto the lightAll the beautiful
 belongingsThat no one else had lookedQuite far enough to find.I love you because
 youAre helping me to makeOf the lumber of my lifeNot a tavernBut a temple;Out of 
the worksOf my every dayNot a reproachBut a song.I love youBecause you have
 doneMore than any creedCould have doneTo make me goodAnd more than any
 fateCould have doneTo make me happy.You have done itWithout a touch,Without a
 word,Without a sign.You have done itBy being yourself.Perhaps that is whatBeing a 
friend means,After all.
'''

speech = speech_text.split()

dic = {}
for word in speech:
    if word not in dic:
        dic[word]=1
    else:
        dic[word]=dic[word] + 1


dic.items()

在使用nltk的时候,发现一直报错,可以使用下边两行命令安装nltk

import nltk
nltk.download()

会弹出以下窗口,下载nltk.


正在下载

如果这种方式下载完成了 那就直接跳过下一步

我下了很多次最后都下载失败了,现在说第二种方法。
直接下载打包好的安装包:下载地址1:云盘密码znx7,下来的包nltk_data.zip 解压到C盘根目录下,这样是最保险的,防止找不到包。下载地址2:云盘密码4cp3

感谢【V_can--Python与自然语言处理_第一期_NLTK入门之环境搭建提供的安装包】

去除停用词

2.使用第二种方法直接使用python中的第三方库Counter

#代码如下
from collections import Counter
c = Counter(speech)
c. most_common(10)#出现的前十名
print(c. most_common(10))

for sw in stop_words:
    del c[sw]
c.most_common(10)
Counter 是实现的 dict 的一个子类,可以用来方便地计数。
  • 附上完整代码

speech_text = '''
I love you,
Not for what you are,
But for what I amWhen I am with you.
I love you,
Not only for whatYou have made of yourself,
But for whatYou are making of me.
I love youFor the part of meThat you bring out;
I love youFor putting your handInto my heaped-up heartAnd passing overAll the foolish, 
weak thingsThat you can’t helpDimly seeing there,
And for drawing outInto the lightAll the beautiful belongingsThat no one else had lookedQuite far enough to find.
I love you because youAre helping me to makeOf the lumber of my lifeNot a tavernBut a temple;
Out of the worksOf my every dayNot a reproachBut a song.
I love youBecause you have doneMore than any creedCould have doneTo make me goodAnd more than any fateCould have doneTo make me happy.
You have done itWithout a touch,
Without a word,
Without a sign.
You have done itBy being yourself.
Perhaps that is whatBeing a friend means,
After all.
'''

#解决大小写的问题
speech = speech_text.lower().split()
print(speech)

dic = {}
for word in  speech:
    if word not in dic:
        dic[word] = 1
    else:
        dic[word] = dic[word] + 1

import operator
swd = sorted(dic.items(),key=operator.itemgetter(1),reverse=True)
print(swd)

#停用词处理
from nltk.corpus import stopwords
stop_words = stopwords.words('English')

for k,v in swd:
    if k not in stop_words:
        print(k,v)


from collections import Counter
c = Counter(speech)
c. most_common(10)#出现的前十名
print(c. most_common(10))

for sw in stop_words:
    del c[sw]
c.most_common(10)

通过这两种方法我们就不难明白为什么现在Python 在数据分析、科学计算领域用得越来越多,除了语言本身的特点,第三方库也很多很好用。

所以还等什么,人生几何,何不Python当歌。 跟我一块学Python吧。