Hi
I get a littel problem to read,sort,count and remove punctuation marks from text file.

data="""I love the python programming
How love the python programming?
We love the python programming
Do you like python for kids?
I like Hello World Computer Programming for Kids and Other Beginners."""



##f = open ("data.txt", "r")
##lines = f.readlines()
##print lines


count = {}
for word in sorted(data.split()):
    if word in count:
        count[word] += 1


    else:
        count[word] = 1
print count




>>> ================================ RESTART ================================
>>> 
{'and': 1, 'Do': 1, 'We': 1, 'Kids': 1, 'love': 3, 'like': 2, 'programming?': 1, 'How': 1, 'I': 2, 'Programming': 1, 'programming': 2, 'for': 2, 'Hello': 1, 'python': 4, 'Computer': 1, 'World': 1, 'the': 3, 'kids?': 1, 'Other': 1, 'you': 1, 'Beginners.': 1}
>>> 

How can I read from data.txt?

Edited 3 Years Ago by tony75

Thanks slate,it work fine.
Now Ineed sort alphabetical,count and remove punctuation marks from text file?

How can I read from data.txt?

This is in every tutotial/book out there,so it should not be a problem.
I try to always use with open(),this is the prefered way(with statement has been in Python since 2006)
One short way.

import re
from collections import Counter

with open('your.txt') as f:
    no_punct = re.findall(r'\w+', f.read())
print Counter(no_punct)

remove punctuation marks from text file?

Without regex.

>>> from string import punctuation
>>> s = 'We@.._ love the?# pytho,..,n progr"""ammi@"ng'
>>> text = ''.join(c for c in s if c not in punctuation)
>>> text
'We love the python programming'

Edited 3 Years Ago by snippsat

Expanding on snippsat's code ...

''' word_frequency102.py

file your.txt has data ...
I love the python programming
How love the python programming?
We love the python programming
Do you like python for kids?
I like Hello World Computer Programming for Kids and Other Beginners.
'''

import re
from collections import Counter

# read the text file and create a list of all words
# in lower case and without the punctuation marks
with open('your.txt') as f:
    no_punct = re.findall(r'\w+', f.read().lower())

#print(no_punct)  # test

# create a list of (word, fequency) tuples
# count has a default sort of most common frequency first
count = Counter(no_punct).most_common()

for word, freq in count:
    print("{} {}".format(freq, word))

''' result ...
4 programming
4 python
3 love
3 the
2 kids
2 like
2 for
2 i
1 and
1 do
1 we
1 how
1 hello
1 beginners
1 computer
1 world
1 other
1 you
'''

print('-'*30)  # line of 30 dashes

# sorted() defaults to first element in tuple (word, frequency)
for word, freq in sorted(count):
    print("{} {}".format(freq, word))

''' result ...
1 and
1 beginners
1 computer
1 do
2 for
1 hello
1 how
2 i
2 kids
2 like
3 love
1 other
4 programming
4 python
3 the
1 we
1 world
1 you
'''
This question has already been answered. Start a new discussion instead.