Hi, I was hoping someone could help out a python novice like me.

I want to find in my data when there is three or more 1s in a row. This is a sample of what my input data would look like below. With the first number in each column being a position and the rest of the numbers being either not significant (0) or significant (1).

78	79	80	81	82	83
0	0	0	0	0	0
1	0	0	1	1	0
0	0	1	1	1	0
0	1	1	1	1	1

This is my code so far:

def sigfind(in_file,num_repeats):
         numbers = []
	 for line in in_file:
	    numbers = line.split()
            i = 0
            out_file=[]
            while i < len(numbers)-1:
               if [numbers[i]] == [numbers[i+1]] == num_repeats:
                  out_file.append(number[i])
                  i += num_repeats
               else:
	          i += 1
sigfind(in_file,3)

But I am having trouble finding the consecutive numbers as well as outputting it in the right format.

The way in which I want to output it is something like below, where it will tell me the row number (with the row numbering not including the first number --> the position), when the consecutive 1s started and where they ended; if there was three or more 1s in a row.

eg.

row  start_pos  end_pos
3    80         82
4    79         83

I hope this makes sense....Anyone have any ideas?
Thanks!

Recommended Answers

All 2 Replies

You want to start testing at the third element & checking the prior two. What you have will error because when the last element is reached, there is no +1 (also do not use "i", "l", or "O" as single digit variables as they can look like numbers).

## convert to integer
numbers = [int(x) for x in numbers]
for ctr in range(2, len(numbers)):
    if numbers[ctr] == 1:
        if numbers[ctr] == numbers[ctr-1] == numbers[ctr-2]:
            print numbers[ctr], "occurs three times at positions",
            for x in range(3):
                print numbers[ctr-x],
            print

Consider using a pattern. What you want is to match pattern = "1\t1\t1" and then translate the index in the row into the column number.

Note that because of the tabs,

column 1 is index 0
    column 2 is index 3
    column 3 is index 5
... column N is index 2*(N-1)

so your code does something like

pattern = "1\t1\t1"
for row in file:
    try:
        i = row.index(pattern)
        column = 2*(i-1)
    except ValueError:
        pass # don't mind if there's no match

Figuring out how to find the column "name" from the column number is as easy as split()...

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.