I'm a linguist (Python newbie) trying to use Python to help me with some simple NLP processing. I have extracted verb + preposition + VBG triples from the British National Corpus, so I have large csv files containing stuff like this:


The python script below counts tokens within any given column (e.g., how many times does the verb "prevent" occur, or how many times does the preposition "by" occur).

import csv
out_stream = file('counted_test_file.csv', 'w')
x = csv.reader(open('test_file.csv', 'rb'))

count = {}

for verb, prep, vbg in x:
	if verb not in count:
		count[verb] = 0
	count[verb] += 1

for (key, val) in count.items():
	print>>out_stream, "%s,%s" % (key, val)


Now I'm trying to get this code to count all combinations (e.g., how many times does 'prevent' occur with 'from'). I tried the following variation, but this just counts preps (the second element in the csv file):

for verb, prep, vbg in x:
if (verb and prep) not in count:
count[(verb and prep)] = 0
count[(verb and prep)] += 1

Any help would be greatly appreciated!

How about count[(verb,prep)] += 1 ?

How about count[(verb,prep)] += 1 ?

Ahh, yes, count[(verb,prep)] += 1 worked. So simple. Thanks!

If you want both verb and prep to be found in count created by the existing code

from collections import defaultdict
total_found_dic = defaultdict(int) 
if (verb in count) and (prep in count):
     total_found_dic[(verb, prep)]  += 1

Note that you want to test sub-words and print the results to see what happens. I doubt you are looking for "the", but as an example, searching for "the" may or may not give it hit for the word "they", depending on how the dictionary is arranged.