Malinka 0 Newbie Poster

Hello Everyone,

I have a list of digits. What is the most efficient way to count the number of tokens between two known digits?

Here is what I came up with:

def dis(l, a1, a2):
    i1 = l.index(a1)
    i2 = l.index(a2)
    distance = i2 - i1 - 1
    return distance

E.g. l = [1, 2, 3, 4, 5]
>>> print dis(l, 1, 5)
3
>>> print dis(l, 1, 3)
1
>>> print dis(l, 3, 4)
0

Is there a better (=faster) way of doing this?

Thank you,
m.

Malinka 0 Newbie Poster

Hello everybody!

I have to go through each subfolder and extract matching strings from each of the files (File_1, File_2, File_3, File_4, etc.). Unfortunately, I don't know how to do that under Linux.
I have the following structure:

MainFolder:
---Subfolder A
------File_1
------File_2
------File_3
---Subfolder B
------File_4
------File_5
------File_6
---Etc.

My original code only goes through all the files in the first subfolder (Subfolder A).

import os, glob
import sys

path = sys.argv[1]
for file in glob.glob(os.path.join(path,'*.*')):
    print "Current file: ", file
    f = open(file, 'rU')
    split line
    do smth

How do I go through the rest of the subfolders? I tried to use os.walk() but so far I only managed to get it to list all the subfolders and files and I can't get it to open each file. Can anyone help?

Thank you,
M.

Malinka 0 Newbie Poster

Woh, I just fixed this myself :D

This is what does the trick:

import msvcrt
from msvcrt import getch

print "Choose a number from 1 to 7: "
userResponse = ''
while userResponse not in ['1','2','3','4','5','6','7']:
    userResponse = msvcrt.getch()
    if userResponse == '1':
        output = "1\n"
        results.write(output)
    ....

But this works only in Windows. What would be a solution in Linux??

Malinka 0 Newbie Poster

Dear Community,

hopefully someone can explain this to me in simple terms:

I would like to store raw input from a User without waiting for him to press Enter. And would it be possible to still check if the response is one of the predefined options?

My raw input consists of a single character - a number from 1 to 7.
I run the program in a terminal window in Windows.

I looked online and found that to solve this problem, it is necessary to use the msvcrt.getch() function but I don't understand (literally) how to apply this to my code.

I ran the following in the Command line and it worked:
>>> from msvcrt import getch
>>> r = msvcrt.getch()
but I don't know how to apply this to my raw input variable.

This is part of the code I am using now. The good thing is that it checks if the input is one of the predefined options. But the problem is that, the input is stored only after a user presses Enter.

userResponse = ''
while userResponse not in ['1','2','3','4','5','6','7']:
    userResponse = raw_input("Choose a number from 1 to 7")
    if userResponse == '1':
        output = "1\n"
        results.write(output)
    if userResponse == '2':
        output = "2\n"
        results.write(output)
    ......

I will appreciate any help!

Malinka 0 Newbie Poster

Hello again,
today I finally succeeded at re-adjusting the program suggested earlier. I wanted to make several changes because:
- the program didn't work when the same word occurred more than once (as index always got the first occurrence of that word)
- when a pattern successfully matched, the next iteration started with the element that was already in the pattern. That is, given
pattern = [('w1','','w3','','w5')]
sentence =
when the first 5 elements of the sentence were extracted as a candidate pattern, the program continued with w2 (comparing it to 'w1','','w3', etc.). Instead, I wanted to continue with element w6.
- the program compared 5 elements of a pattern to 5 consequent elements in a sentence. Now, it compares the first element of a pattern to every consequent element of a sentence until they match.

Here is the code :)
Pls, let me know if I can improve it any further!

patterntuples = [('w1', '', 'w3', '', 'w5')]
sentences = ['ww w1 A w3 B w5 w6 w7 w8 w9 w10 w1 w2 w3 w4 w5 w17']

def get_patterns(patterntuples,sentences):
    extracted = []
    for pattern in patterntuples:
        for sent in sentences:
            index = 0 #starting position in the sentence
            splittedline = sent.split(' ')
            while (len(splittedline) - index) >= len(pattern):
                temp = []
                for nr, word in enumerate(splittedline[index:(len(pattern) + index)]):
                    if pattern[nr] in ['',word]:
                        temp.append(word)
                    else:
                        break
                if len(temp) == len(pattern):
                    extracted.append(temp)
                    temp = []
                    index = index + len(pattern)
                else: …
Malinka 0 Newbie Poster

Dear lukerobi, thank you very much for a detailed answer!!! Your solution works :) In the future, may be I can do this more efficiently, then I will post my solution here.
You were right about the 'w1 w2 w3 w4 w5' pattern - indeed it should have been in the output of my example.

The notation (or a list with two elements that you can use to check whether an element is one or the other, right?) was new to me :)

Thanks again,
M.

Malinka 0 Newbie Poster

Dear all,

may be someone can help me to find a solution to the following problem:
I have a list of patterns (len=5) that are presented as tuples in a list, e.g.
patterns = [('w1','X1','w1','Y1','w1'), ('w2','w2','X2','w2','Y2'), ('w2','X2','w2','Y2','w2')]
I want to go through all sentences in a text file (one sentence per line) and extract all occurrences of these patterns in each sentence. The problem is that all words (w1,w2) in each pattern have to be exactly the same except for the elements X1, X2, Y1, etc. because what I want to know is which words occur in these places.
I can check for each line in file: whether each element of the pattern is in it. But how do I deal with placeholders X & Y? I can't think of anything to solve this :/ Can anyone help me or point me in the right direction?!

Thank you in advance!
Malinka

***
Example:

#tuples with patterns, the unknown element is an empty string ''
patterns = [('w1','','w3','','w5'), ('w7','w8','','w10',''), ('w8','','w10','',w12)]

sent1: w1 A w3 B w5 w6 w7 w8
sent2: w1 w2 w3 w4 w5 w6 w7 w8 C w10 D w12

#extracted patterns with new words instead of empty strings
extracted_patterns =