I have a file with a long list of English verbs and I have a set of search strings; now I want to search through the verbs to find any verb which contains any of the strings as a substring. For example, the verbs "forgotten" and "negotiate" both contain the substring "got". I created a file with the set of substrings that I want to search for, then wrote a script which iterates through all the verbs and is supposed to iterate through each substring searching, but it only searches for one substring, then stops. So, it properly iterates through all of the verbs, but does not properly iterate through all of the search strings.

Example Verb File
apolog
becam
forgotten
apologis
apologis
negotiate
apologis
apologis
apologis
becom
aris
arisen
becom
arisen

Example Search String File
apol
got
aris

Here is the code I have right now, which works fine for "aris", but never looks for "got" or "apol".

import string, sys, os
import csv

myFile = open("verbs.txt","r")
data = myFile.readlines()
myFile.close()

aPol = open("apol.txt","r")
aPol_data = aPol.readlines()
aPol.close()

for line in data:
    for old in aPol_data:
        if old in line:
            print line

Example output
>>>
aris
arisen
arisen
>>>

Any help would be appreciated.

Recommended Answers

All 3 Replies

Insert some print statements to see what is happening. You may have a LF, "\n", at the end of each aPol_data item in which case you will see an extra empty line when printing old, so it doesn't match any word without one. If this is true, you will have to strip() both the aPol file and data from myFile. Depending on the size of the files, it might run faster if you use a dictionary as the container for aPol_data, split the myFile data into words, and lookup each word in the dictionary.

import string, sys, os
import csv

myFile = open("verbs.txt","r")
data = myFile.readlines()
myFile.close()

aPol = open("apol.txt","r")
aPol_data = aPol.readlines()
aPol.close()

for line in data:
    print "line =", line
    for old in aPol_data:
        print "     old =", old
        if old in line:
            print line

Cool boots! This reply helped a lot, woooee . Thanks. You were right, I had some new line character issues. Here's my revised code which does what I want. My method of fixing the new line problem may not be ideal, but it works.

import string, sys, os
import csv

verbs = open("verbs.txt","r")       
verbs2 = "".join(verbs)                                         
verbs3 = verbs2.split('\n')                                     

sub_string = open("apol.txt","r")       
sub_string2 = "".join(sub_string)                                         
sub_string3 = sub_string2.split('\n')                                     

for line in verbs3:
    for word in sub_string3:
        if word in line:
            print line

Please add "Solved" to the title. As long as you are accessing each verb, you might as well put them in a dictionary. It is indexed so has faster lookups. If you have sentences with periods, then you will have to use string.replace() to eliminate them. Also consider whether or not there will be upper and lower case letters.

verbs_dic={}
for line in open("verbs.txt", "r"):
   line=line.strip()
   verbs_dic[line.lower()]=0
print verbs_dic

for line in open("apol.txt", "r"):
   line = line.strip()
   words_subs = line.split()
   for word in words_subs:
      if word.lower() in verbs_dic:
         print word, line
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.