0

Hello,

I want to count phrases in a text file by using Python. For instance, if a text is like "I love you very much", I want to make a dictionary like: "I love": 1, "love you": 1, "you very": 1, and "very much": 1. I also want to do it like: "I love you": 1, "love you very": 1, "you very much": 1.

Would you give a good idea (sample codes) to do this?

Thank you!!!

3
Contributors
4
Replies
5
Views
7 Years
Discussion Span
Last Post by vegaseat
0

This is loop to generate the phrases as lists. Add counting and put list back to string by your self, and add storage of phrases,input/output according to your requirements.

sentence="I love you very much"
words=sentence.split()
words_in_sentence=len(words)
phrases=[]

for phraselength in range(2,words_in_sentence):
    for startword in range(words_in_sentence-phraselength):
        print words[startword:startword+phraselength]
0

This is loop to generate the phrases as lists. Add counting and put list back to string by your self, and add storage of phrases,input/output according to your requirements.

sentence="I love you very much"
words=sentence.split()
words_in_sentence=len(words)
phrases=[]

for phraselength in range(2,words_in_sentence):
    for startword in range(words_in_sentence-phraselength):
        print words[startword:startword+phraselength]

I have tried to solve it for several days. However, I could not figured it out. I am a novice in programming.

My ultimate purpose is counting phrases in text files. For instance, how many same (similar) phrases are used in a text.

From your codes, I got phrases. However, I have not solved counting them. Would you let me know how to count them and save them?

0

This may not be exactly what you want, but you can develop this rather basic code further to suit your needs ...

# group a text into groups of words
# used Python31 should work with Python26

def group_text(text, group_size):
    """
    groups a text into text groups set by group_size
    returns a list of grouped strings
    """
    word_list = text.split()
    group_list = []
    for k in range(len(word_list)):
        start = k
        end = k + group_size
        group_slice = word_list[start: end]
        # append only groups of proper length/size
        if len(group_slice) == group_size:
            group_list.append(" ".join(group_slice))
    return group_list
        

text = "I love you very much so very much"

group_size = 2
group_list = group_text(text, group_size)
# convert list to set to avoid duplicates
group_set = set(group_list)

print(group_set)

"""result (word_groups are in hash order in the set) >>>
{'very much', 'you very', 'love you', 'so very', 'I love', 'much so'}
"""

# optionally take the word_groups in the set
# and count them in the text
for group in group_set:
    count = text.count(group)
    sf = "'%s' appears %d times in the text"
    print(sf % (group, count))

"""result >>>
'very much' appears 2 times in the text
'you very' appears 1 times in the text
'love you' appears 1 times in the text
'so very' appears 1 times in the text
'I love' appears 1 times in the text
'much so' appears 1 times in the text
"""
This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.