I've got a problem with replacing letters for a specific vector, stored in a file. The first file contains a list of "x = 0000101" like entries. The second file contains the target words in the 15th colom. I tried to use a dictionary containing the data of the first while and looping through the words in the second one, but did not succeed. Something like:

while t < 5000:
    line = string.split(words)[t])
    new = []
    for x in line[14]:
        if x in da.keys():
            new.append(da[x])

Could anyone give me a suggestion about how to proceed?

Edit: Added code field vegaseat

Recommended Answers

All 6 Replies

Can you give us an idea what the final product is supposed to look like?

Sure, the final product will be list of words coded in vector style. The word 'yo' from the 15th colom in the file would be coded as '0001100 0011100'. The vectors are stored in a seperate file and need to be merged with the words...

You could use a list of tuples.

# working with a list of tuples
tuple1 = ('yo', '0001100 0011100')
tuple2 = ('do', '0001101 0011100')
tuple3 = ('mm', '0001111 0011100')

list_of_tuples = [tuple1, tuple2, tuple3]
print list_of_tuples

# optional sort
list_of_tuples.sort()
print list_of_tuples

# search for a word (or a vector)
for tup in list_of_tuples:
    if 'mm' in tup:
        word, vector = tup
        print word, vector

Thanks for your quick reply but I'm afraid it won't be very helpful. I will have the same problem to create the tupples. I'll try to explain my problem again:
One file contains 40 entries like 'c = 0100', 'a = 0001' and 't = 1000'. The other contains over 5000 coded like '##c#at'. The words need to be transformed into a binary vector like '000000000100000000011000', where # stands for '0000'. As I'm new to python I didn't succeed in simply looping through the files and creating the new vectors. Could anyone please give me a hint about how to tackle this problem?

Hi!

Ok, here is a solution ;) It works if the two files look like

a = 0001
c = 0100
t = 1000

and

##c#at
t#a#c#
act###

Here we go:

one = file("one.dat", "r")
two = file("two.dat", "r")

# read in the first file and create a dict:
d = {}
for line in one:
    # we remove the newline and split the line by =
    k,v = line.strip().split(" = ")
    d[k] = v
d["#"] = "0000"

for line in two:
    result = ""
    # remove the newline
    line = line.strip()
    # for every character in the line ...
    for char in line:
        # check if there is an entry for it in the dict
        if char in d:
            # if yes, add the value to the result-string
            result += d[char]
        else: pass
    # print the result
    print result

Hope this helps.

Regards, mawe

Forgive my slow reaction but many thanks again!!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.