I have various list being generated by a mapper function in this format


>>> mapper("b.txt" , i["b.txt"])
[('thats', 1), ('one', 1), ('small', 1), ('step', 1), ('for', 1), ('a', 1), ('man', 1), ('one', 1), ('giant', 1), ('leap', 1), ('for', 1), ('mankind', 1)]


>>> mapper("c.txt" , i["b.txt"])
[('thats', 1), ('one', 1), ('small', 1), ('step', 1), ('for', 1), ('a', 1), ('man', 1), ('one', 1), ('giant', 1), ('leap', 1), ('for', 1), ('mankind', 1)]


i want to merge the list generated from these 2 functions in a way that if i encounter a common element , then in the augmented list the data to be stored should be
('for' , 2) ( in this case , since for is common in both the results of mapper function) and the rest unique elements to be stored in augmented list as it is ..

PS: mapper function is a self made function

Recommended Answers

All 13 Replies

Personal preference here is to use a dictionary, with the key being the word, pointing to the number. Convert the first list to a dictionary, loop through the second list and if the word is found in the dictionary, add to the number http://www.greenteapress.com/thinkpython/html/book012.html#toc120 Post back with any code you are having coding problems with for additional help.

It can be just me, but it seems something is wrong!

Can you explain a little further.

You want to know the words that exist in the two files, is that it?

Or do you want to count the ocurrences in each file?

@Beat_slayer

well .. the complete picture is i am implementing MapReduce framework .. i didnt find ne ( in python ) for me so decided to code by myself , i have come as far as writting mapper function and got stuck up in this list to dictionary conversion ..

For simplicity purpose i have mentioned 2 files here .. the actual thing will have more than 100 files and size of each file to be approximately around 5mb ( pure text only) and then run the mapper and reduce function in a multi threaded environment .. thats wat is the plan as of now .

@woooee : ur link is very helpful

@woooee:

the 1st part i have coded ,and converted the list to dictionaries

>>> l
[('a', 0), ('c', 2), ('b', 1), ('e', 4), ('d', 3)]
>>> dd= {}
>>> i = 0
>>> while i < len(l):
...  s = l[i]
...  dd[s[0]] = s[1]
...  i = i + 1
... 
>>> dd
{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3}

now , if i get different dictionaries of different lengths from different files , how do i merge them . is there any direct module of doing so ?

I think this should give some insight, if I'm understanding what you are trying to do.

def merge_dic(merged_dic, wordlist):
    for item in wordlist:
        if merged_dic.has_key(item):
            merged_dic[item] += 1
        else:
            merged_dic[item] = 1

file1 = 'this is a dummy sample file for example as sample'
file2 = 'this is another dummy sample file also created for example with \
some samples repeated for example'

list1 = file1.split(' ')
list2 = file2.split(' ')

all_count = {}
file_lists = []
file_lists.extend(list1)
file_lists.extend(list2)

merge_dic(all_count, file_lists)

print 'all_count =', all_count

file_uniques = {}
file_lists = []                 # Converting lists to sets it's the fastest and
file_lists.extend(set(list1))   # simplest way that I know of eliminating
file_lists.extend(set(list2))   # duplicates on a list, when position doesn't mather

merge_dic(file_uniques, file_lists)

print 'file_uniques =', file_uniques

Happy coding!

## Your way
l = [('a', 0), ('c', 2), ('b', 1), ('e', 4), ('d', 3)]
dd= {}
i = 0
while i < len(l):
      s = l[i]
      dd[s[0]] = s[1]
      i = i + 1
     
print dd
#{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3}
## shorter way
dd=dict(l)
print dd

"""Output:
{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3}
{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3}
"""

Use a function and pass the file name to it. Some pseudo-code:

def mapper_dict(fname, word_dict):
    ## assumes word is first element after split()
    fp = open(fname, "r")
    for rec in fp:
        substrs = rec.split()
        word = substrs[0]
        if word not in word_dict:
            word_dict[word] = 0
        word_dict[word] += 1
    return word_dict

word_dict = {}
for fname in ["/a/b/abc", "/d/e/def", "/g/h/ghi"]:
    word_dict = mapper_dict(fname, word_dict)

@tonyjv:

Both of our codes have a bug...... unquestionably ur method is the shortest ... i felt like a moron when i saw that conversions implicitly existed , but foe example

>>> l1
[('the', 1), ('quick', 1), ('brown', 1), ('fox', 1), ('jumped', 1), ('over', 1), ('the', 1), ('lazy', 1), ('grey', 1), ('dogs', 1)]
>>> d1 = dict(l1)
>>> d1
{'brown': 1, 'lazy': 1, 'jumped': 1, 'over': 1, 'fox': 1, 'grey': 1, 'quick': 1, 'the': 1, 'dogs': 1}

you see , although the pair (the , 1 ) repeats twice in the list , the dictionay accepts it only once and rather than updating the d = 2 it ignores the second occurrence. Even my initial code gives the same result :(


i tried writing this way .. but dunno why it isnt working

>>> while i < len(l1):
...  s = l1[i]
...  if s not in d1.keys():
...    d1[s[0]] = s[1]
...  else:
...    d1[s[0]] += 1
...  i = i + 1
... 
>>> d1
{'brown': 1, 'lazy': 1, 'jumped': 1, 'over': 1, 'fox': 1, 'grey': 1, 'quick': 1, 'the': 1, 'dogs': 1}

@woooee and @beat_slayer : i am working on your code ( thanks for providing one ).

This should be:

## changed to s[0] and d1, di.keys() is not necessary
        if s[0] not in d1:
>>> while i < len(l1):
...  s = l1[i]
...  if s[0] not in d1:
...    d1[s[0]] = s[1]
...  else:
...    d1[s[0]] += 1
...  i = i + 1
... 
>>> d1
{}

geting empty dictionary .. :(

How about this?

class Word_Counter():

    def __init__(self):
        self.count = {}

    def add_string(self, s):
        word_list = s.split(' ')
        self.add_list(word_list)

    def add_list(self, wl):
        for item in wl:
            if self.count.has_key(item):
                self.count[item] += 1
            else:
                self.count[item] = 1

    def add_mapper(self, ml):
        for item in ml:
            if self.count.has_key(item[0]):
                self.count[item[0]] += item[1]
            else:
                self.count[item[0]] = item[1]
    


str1 = 'the quick brown fox jumps over the lazy dog'

d = Word_Counter()

d.add_string(str1)

print d.count

"""
{'brown': 1, 'lazy': 1, 'over': 1, 'fox': 1, 'dog': 1, 'quick': 1, 'the': 2, 'jumps': 1}
"""

list1 = ('the', 'quick', 'blue', 'cat', 'jumps', 'over', 'the', 'lazy', 'turtle') 

d.add_list(list1)

print d.count

"""
{'blue': 1, 'brown': 1, 'lazy': 2, 'turtle': 1, 'over': 2, 'fox': 1, 'dog': 1, 'cat': 1, 'quick': 2, 'the': 4, 'jumps': 2}
"""

map1 = [('the', 1), ('quick', 1), ('brown', 1), ('fox', 1), ('jumped', 1), ('over', 1), ('the', 1), ('lazy', 1), ('grey', 1), ('dogs', 1)]

d.add_mapper(map1)

print d.count

"""
{'blue': 1, 'brown': 2, 'lazy': 3, 'turtle': 1, 'grey': 1, 'jumped': 1, 'over': 3, 'fox': 2, 'dog': 1, 'cat': 1, 'dogs': 1, 'quick': 3, 'the': 6, 'jumps': 2}
"""
>>> while i < len(l1):
...  s = l1[i]
...  if s[0] not in d1:
...    d1[s[0]] = s[1]
...  else:
...    d1[s[0]] += 1
...  i = i + 1
... 
>>> d1
{}

geting empty dictionary .. :(

words=[('the', 1), ('quick', 1), ('brown', 1), ('fox', 1), ('jumped', 1), ('over', 1), ('the', 1), ('lazy', 1), ('grey', 1), ('dogs', 1)]
dict_of_words={}
for word,count in words:
     dict_of_words[word] = dict_of_words[word]+count if word in dict_of_words else count

print dict_of_words
"""Output:
{'brown': 1, 'lazy': 1, 'jumped': 1, 'over': 1, 'fox': 1, 'grey': 1, 'quick': 1, 'the': 2, 'dogs': 1}
"""

using dictionary it's very simple

For example:
I have a list like this: ["thing1", "thing2", "thing3", "thing4", "thing1"]
If i understood you correctly, you want thing1 to have a 2 associated with it.

This code would do it:

li = ["thing1", "thing2", "thing3", "thing4", "thing1"]

def maplist(li):
    d = dict()
    for item in li:
        value = d.setdefault(item, 0)
        value += 1
        d[item] = value
    return d

print maplist(li)

To fit your situation, add this to the above code:

def formatconverter(yourformatli):
    li = []
    for tu in yourformatli:
        li.append(tu[0])

    return maplist(li)

Also i got too lazy to read the replies so i dunno if this applies.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.