Hello,

I want to count phrases in a text file by using Python. For instance, if a text is like "I love you very much", I want to make a dictionary like: "I love": 1, "love you": 1, "you very": 1, and "very much": 1. I also want to do it like: "I love you": 1, "love you very": 1, "you very much": 1.

Would you give a good idea (sample codes) to do this?

Thank you!!!

Recommended Answers

All 4 Replies

This is loop to generate the phrases as lists. Add counting and put list back to string by your self, and add storage of phrases,input/output according to your requirements.

sentence="I love you very much"
words=sentence.split()
words_in_sentence=len(words)
phrases=[]

for phraselength in range(2,words_in_sentence):
    for startword in range(words_in_sentence-phraselength):
        print words[startword:startword+phraselength]

One of the ways to do this would be to use slicing.

This is loop to generate the phrases as lists. Add counting and put list back to string by your self, and add storage of phrases,input/output according to your requirements.

sentence="I love you very much"
words=sentence.split()
words_in_sentence=len(words)
phrases=[]

for phraselength in range(2,words_in_sentence):
    for startword in range(words_in_sentence-phraselength):
        print words[startword:startword+phraselength]

I have tried to solve it for several days. However, I could not figured it out. I am a novice in programming.

My ultimate purpose is counting phrases in text files. For instance, how many same (similar) phrases are used in a text.

From your codes, I got phrases. However, I have not solved counting them. Would you let me know how to count them and save them?

This may not be exactly what you want, but you can develop this rather basic code further to suit your needs ...

# group a text into groups of words
# used Python31 should work with Python26

def group_text(text, group_size):
    """
    groups a text into text groups set by group_size
    returns a list of grouped strings
    """
    word_list = text.split()
    group_list = []
    for k in range(len(word_list)):
        start = k
        end = k + group_size
        group_slice = word_list[start: end]
        # append only groups of proper length/size
        if len(group_slice) == group_size:
            group_list.append(" ".join(group_slice))
    return group_list
        

text = "I love you very much so very much"

group_size = 2
group_list = group_text(text, group_size)
# convert list to set to avoid duplicates
group_set = set(group_list)

print(group_set)

"""result (word_groups are in hash order in the set) >>>
{'very much', 'you very', 'love you', 'so very', 'I love', 'much so'}
"""

# optionally take the word_groups in the set
# and count them in the text
for group in group_set:
    count = text.count(group)
    sf = "'%s' appears %d times in the text"
    print(sf % (group, count))

"""result >>>
'very much' appears 2 times in the text
'you very' appears 1 times in the text
'love you' appears 1 times in the text
'so very' appears 1 times in the text
'I love' appears 1 times in the text
'much so' appears 1 times in the text
"""
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.