I have a subtitle file and I want to extract timestamps from the file. e.g. from below lines:

<SYNC Start=106377><P Class=ENCC>


<SYNC Start=107350><P Class=ENCC>

close the door.

I want to extract timestamps '106377' and '107350' and append them with the name of the file including .bmp at the end. e.g. I want the line to look like
grey-anatomy_106377.bmp. I have written the below code but output only returns line with grey-anatomy_106377grey-anatomy_107350 and so on.

timings = []
file = open('grey-anatomy.smi', "r" )

wholefile = file.readlines()

for line in wholefile:

        line = line.strip().lower()

        if line.startswith("<sync"):

            prefix, data = line.split("=",1)
            timestamp, postfix = data.split(">",1)


            except ValueError:
                print "Ignoring this timestamp"

timings= 'grey-anatomy_'.join(timings).join('bmp')

fout = open("grey-anatomy1.txt", "w")



I want each text output to include .bmp at end of every parsed entry and one entry should be on one single line like

How can I achieve this?

You need to add a '\n' new line character to each line.

Try something like this ...

def extract(text, sub1, sub2):
    extract a substring from text between first
    occurances of substrings sub1 and sub2
    return text.split(sub1, 1)[-1].split(sub2, 1)[0]

text = '<SYNC Start=106377><P Class=ENCC>'

timestamp = extract(text, 'rt=', '><')

filename = 'grey-anatomy_' + timestamp + '.bmp'

# test it ...
print(filename)  # grey-anatomy_106377.bmp

Hi JLM669

I know I have to use '\n' but when a I use it. all the characters on a line get separated. So where to use it exactly to avoid this extra spacing?