Be warned: Beginner questions. Parsing. List. Dictionaries. etc.

Question

roe1and 0 Newbie Poster

12 Years Ago

Hi. I am a beginner with Python. I've done some PHP work in the past but I suspect this is not helping with my progress in Python.

I have a text file that basically looks like this:

a=1
b=2
c=3

a=4
b=5
c=6

a=7
c=8

a=9
b=0
c=11

At the end of the day I would like to have a file that looks like this:

A, B, C
1, 2, 3
4, 5, 6
7, , 8
9, 0, 11

I was asked to do something similar at work last week and while I did manage to get the data into a CSV file(because everybody loves Excel here where I work) I'm not sure the way I went about it was the best:

#!/usr/local/bin/python2.6   
import re

fileout_ = open('output.txt', 'w')

listsml_ = {}
listbig_ = []

with open('input.txt', 'r') as thefile_:

    for line_ in thefile_:

        match_a = re.compile(r'^a=(.*?)$').search(line_)
        if match_a != None:
            listsml_['a'] = match_a.group(1)

        match_b = re.compile(r'^b=(.*?)$').search(line_)
        if match_b != None:
            listsml_['b'] = match_b.group(1)

        match_c = re.compile(r'^c=(.*?)$').search(line_)
        if match_c != None:
            listsml_['c'] = match_c.group(1)

            listbig_.append(listsml_)
            listsml_ = {}


    fileout_.write("A, B, C\n")


for line in listbig_:
    
    if line.has_key('a'):
        A = line['a']
    else:
        A = ""

    if line.has_key('b'):
        B = line['b']
    else:
        B = ""

    if line.has_key('c'):
        C = line['c']
    else:
        C = ""
    
    lineout_ = "%s, %s, %s\n" % (A, B, C)
    fileout_.write(lineout_)

fileout_.close()

I can see a few things that troubles me. For instance "listbig_.append(listsml_)" will only work when the loop gets to a line that contains '^c=(.*?)$'. But this won't work if the data only contains a=whatever and b=whatever.

Also, this is my scripting way of doing things. Do step a then step b then... I'm sure there must be an object oriented approach to this task but I do not know where to begin. At this stage I'm just frustrated as I don't know where to go from here. I'll have more of these sorts of projects to do in the future and I'm not sure what the next step would be. Any help appreciated. Thanks

python

4 Contributors
11 Replies
305 Views
6 Days Discussion Span
Latest Post 12 Years Ago Latest Post by roe1and

All 11 Replies

TrustyTony 888 pyMod

12 Years Ago

This is how I would do it:

import webbrowser

block = {}
with open('roeland_input.txt') as data, open('roeland_output.txt', 'w') as result:
    collected_data = []
    keys = set()
    for line in data:
        if '=' not in line:
            collected_data.append(block)
            block = {}
        else:
            sym, eq, val = line.partition('=')
            block[sym] = int(val)
            keys.add(sym)
    collected_data.append(block)


    keys = sorted(keys)
    result.write(', '.join(key.upper() for key in keys))
    for datadict in collected_data:
        result.write(', '.join(str(datadict[key]) if key in datadict else ''
                               for key in keys) + '\n' )

webbrowser.open('roeland_output.txt')

Edited 12 Years Ago by TrustyTony because: n/a

woooee 814 Nearly a Posting Maven

12 Years Ago

This is a common exercise where you group records based on some indicator, an empty line in this case. This code prints the contents but you can also output to a file using .join)() and append a newline, "\n".

lit="""a=1
b=2
c=3

a=4
b=5
c=6
 
a=7
c=8
 
a=9
b=10
c=11"""

test_data=lit.split("\n")
this_group=[]
for rec in test_data:
    rec = rec.strip()
    if len(rec):     ## not an empty line
        this_group.append(rec)
    else:            ## is empty
        print_list=[]
        for value in this_group:
            ltr, num = value.split("=")
            print_list.append(num)
        print "   ".join(print_list)
        this_group = []     ## empty list for next group

## final group - prints differently
for value in this_group:
    ltr, num = value.split("=")
    print num+"  ",
print

Also if the want the a=7, c=8 group to print in the first and third columns, then you will have to test for the letter and place it in the correct column.

test_group=["a=7", "c=8"]
columns=["a", "b", "c"]
print_list = ["*", "*", "*"]
for rec in test_group:
    ltr, num = rec.split("=")
    col=columns.index(ltr)
    if col > -1:            ## letter found in "columns"
        print_list[col]=num
print "   ".join(print_list)

Edited 12 Years Ago by woooee because: n/a

TrustyTony 888 pyMod

12 Years Ago

You could do like this with os.path.walk and not repeating the huge titles list:

#!/usr/local/bin/python2.6   
import os, sys, os.path
from time import time

def meth(_, path, files):
    print(path)
    for nfile in files:
        if nfile.endswith(("INI", "ini")):        
            with open(os.path.join(path, nfile), 'r') as thefile:
                header_iter = iter(headers)
                for line in thefile:
                    if '=' in line:
                        print('Processing file %s' % nfile)
                        key, eq, val = line.partition('=')
                        # assuming the details will be allways in same order we can
                        # advance the iterator to this key
                        key = key.strip()
                        if key not in headers:
                            print('Invalid data in file %s, line: %s' % (nfile, line.rstrip()))
                            break
                        for header in header_iter:
                            if header == key:
                                output.write((', %s' % listsml[header]))
                                break
                            else:
                                output.write(', ')
                output.write('\n')
       
if __name__ == "__main__":
    headers = ["Comment", "SDCard", "Camera", "Version increment",
               "All Interface Languages",
               "30min limit removal", "Maximum ISO limit removal",
               "PAL<->NTSC Menu", "720p30 height", "480p30 width", "480p30 height",
               "E1 Quality", "E1 Table", "E2 Quality", "E2 Table", "E3 Quality",
               "E3 Table", "E4 Quality", "E4 Table", "F1 Quality",
               "F1 Table", "F2 Quality", "F2 Table",
               "F3 Quality", "F3 Table", "F4 Quality", "F4 Table",
               "Video Bitrate 24H", "Video Bitrate 24L",
               "Video Bitrate FSH/SH", "Video Bitrate FH/H",
               "Auto Quantizer for 1080 modes", "Auto Quantizer for 720 modes",
               "720p50 GOP Size", "720p60 GOP Size",
               "1080i50 and 1080p24 GOP Size", "1080i60 GOP Size",
               "Audio encoding bps", "Volume Indicator 8", "Volume Indicator 7",
               "Volume Indicator 6", "Volume Indicator 5", "Volume Indicator 4",
               "Volume Indicator 3", "Volume Indicator 2", "Volume Indicator 1",
               "AGC 3 Setting", "AGC 2 Setting", "AGC 1 Setting", "AGC 0 Setting",
               "Encoder setting 1 720p", "Encoder setting 2 720p",
               "Encoder setting 3 720p", "Encoder setting 4 720p",
               "Encoder setting 1 1080i/p", "Encoder setting 2 1080p",
               "Encoder setting 3 1080p", "Encoder setting 4 1080p",
               "Encoder setting 2 1080i", "Encoder setting 3 1080i",
               "Encoder setting 4 1080i", "Video buffer", "Video buffer 24p",
               "Audio buffer", "1080 progressive", "Initial quantizer",
               "Quantizer for 1080 modes", "Quantizer for 720 modes",
               "Quantizer table", "1080p24 Scaling I", "1080p24 Scaling P",
               "1080p24 Scaling B", "1080p24 Scaling Fallback",
               "1080i Scaling I", "1080i Scaling P",
               "1080i Scaling B", "1080i Scaling Fallback",
               "720p Scaling I", "720p Scaling P", "720p Scaling B",
               "720p Scaling Fallback", "1080p24 GOP Table",
               "1080i60 GOP Table", "720p60 Opt1 GOP Table", "1080i50 GOP Table",
               "720p50 Opt1 GOP Table", "720p60 Opt2 GOP Table", "720p50 Opt2 GOP Table",
               "1080p24 GOPx2", "1080p24 GOPx2 time", "1080i60 GOPx2",
               "1080i60 GOPx2 time", "1080i50 GOPx2", "1080i50 GOPx2 time",
               "720p60 GOPx2", "720p60 GOPx2 time", "720p50 GOPx2", "720p50 GOPx2 time",
               "Table Flag1 I", "Table Flag1 P", "Table Flag1 B",
               "Table High I", "Table High P", "Table High B",
               "1080p24 FB1", "1080p24 FB2", "1080 SB", "720 SB", "1080i60 FB1",
               "60fps FB2", "1080i50 FB1", "50fps FB2", "720p60 FB1", "720p50 FB1",
               "1080p24 Frame Limit", "60fps Frame Limit", "50fps Frame Limit",
               "1080p24 High Top Setting", "1080p24 High Bottom Setting", "1080p24 Low Top Setting",
               "1080p24 Low Bottom Setting", "1080i Top Setting", "1080i Bottom Setting",
               "720p Top Setting", "720p Bottom Setting", "Other Modes Bottom Setting",
               "Limit 90000", "Limit 63000", "Limit 47700", "Limit 15300", "720p30->720pXfps"]
    
    output = open('output.txt', 'w')
    output.write(', '.join(headers) + '\n')

    path = '/'
    start = time()
    os.path.walk(path, meth, None)
    print(time() - start)

I have not proper input files so I put it to scan all hard disk at least to process wrong format ini files. If you have not the items always in same order you must use the dictionary approach, otherwise this alternative could work also (can not test).

Edited 12 Years Ago by TrustyTony because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

roe1and 0 Newbie Poster · Answer 1 · 2012-02-22T22:32:01+00:00

Wow, that's pretty neat. Thanks for the reply. Besides my Python having none of this:

with open('roeland_input.txt') as data, open('roeland_output.txt', 'w') as result:

(invalid syntax...?) and me never ever having used webbrowser before I think I understand most of this.

There are a few things. My example was a bit over simplified. As you have it sym can be separate words and val won't always be integers. This certainly gives me something to work with though thanks. I'm off home now but I'll be fiddling with this tomorrow for sure.

TrustyTony 888 pyMod Team Colleague Featured Poster · Answer 2 · 2012-02-22T22:52:10+00:00

multiple context managers in one with was added in Python 2.7, you must have earlier version. It can be replaced with nested with and because we actually save the data, also the last part is better in its own with. Int and corresponding str when writing out can be taken out as you do not process the data, spaces in beginning key do not matter, but you could want to strip the key before using. Empty key will also appear as , , instead of ,, it is easy to change space from join to front of value.

webbrowser opens whatever file is thrown at it in default application (not only html documents in webbrowser).

import webbrowser

block = {}
with open('roeland_input.txt') as data:
    collected_data = []
    keys = set()
    for line in data:
        if '=' not in line:
            collected_data.append(block)
            block = {}
        else:
            sym, eq, val = line.partition('=')
            sym = sym.strip()
            block[sym] = val.strip()
            keys.add(sym)
    collected_data.append(block)
    
with open('roeland_output.txt', 'w') as result:
    keys = sorted(keys)
    result.write(', '.join(key.upper() for key in keys) + '\n')
    for datadict in collected_data:
        result.write(','.join((' %s' % datadict[key]) if key in datadict else ''
                               for key in keys) + '\n' )

webbrowser.open('roeland_output.txt')

""" contents of output:
A, B, C
 1, 2, 3
 4, 5, 6
 7,, 8
 9, 0, 11
"""

roe1and 0 Newbie Poster · Answer 3 · 2012-02-23T18:09:46+00:00

Great. At least now I feel like I'm on the right track. Split the lines at the = symbol and stick the resulting bits of information into a dict. Stick the dick into a list, rinse and repeat. Then spit out results, ad ', ' in between values. This can work for what I am trying to do. Thanks for the help.

I see you sorted the keys using

keys = sorted(keys)

correct? This will not work for my purposes. The key's are words and they don't line up like A, B, C unfortunately so is there a way to force the keys to be written into the dict the way they appear in the data or will I have to sort it afterwards? I'll be working with around 130 different keys.

Thanks again

Lucaci Andrew 140 Za s|n · Answer 4 · 2012-02-23T20:37:23+00:00

Hi roe1and. I've look upon your assignment and I've came up with this. It's not much, but I hope it will help you.
Note that you would need the same amount of constants 'a', 'b', 'c' from your source file in order for this to work.

'''
Created on Feb 22, 2012

@author: sin
'''
import time
import sys
import os

class file:
    def __init__(self, filename, newfile):
        """Initializing the segments.
        The __import__ function, which will read the initial file
        The 'filename' and 'newfile' files
        The 'a_Dic', 'b_Dic', 'c_Dic' the dictionary corresponding to
        the keys 'a', 'b', 'c' from 'filename'.
        The 'cl1' and 'vls' lists, which will be used to separate the
        keys from the values, as we want."""
        self.__import__
        self.filename = filename
        self.newfile = newfile
        self.a_Dic = {}
        self.b_Dic = {}
        self.c_Dic = {}
        self.cl1 = []
        self.vls = []
 
    def __import__(self):
        """Isolating the parts, whether they are the 'a', 
        or the 'b', or the 'c', and then combining them 
        with their correspondent values from the initial 
        file"""
        cl = []
        fh = open(self.filename, 'r')
        _fileline = open(self.filename).read().splitlines()
        vls = []
        for j in range(0, len(_fileline)):
            if str(_fileline[j])[2:] == '':continue #ignoring the lines which are empty.
            self.vls.append(int(str(_fileline[j])[2:]))
        for j in range(0, len(_fileline)):
            if str(_fileline[j])[2:] == '':continue
            cl.append(str(_fileline[j])[::-1])
        for i in range(0, len(cl)):
                if len(str(cl[i])) >= 4:
                    self.cl1.append(str(cl[i])[3:])
                else:
                    self.cl1.append(str(cl[i])[2:])
        if len(self.cl1) == len(self.vls):        
            for i in range(0, len(self.cl1)):
                """Assigning unique key-names to the dictionary keys
                Useful when having for example a1, a11, a2, and want
                to print them sorted by keys, it will print a1 first
                than a11, and only after that a2, when it should give
                a1, a2 than a11"""
                if self.cl1[i] == 'a':
                    if i < 10:
                        self.a_Dic[self.cl1[i] + '-a' + str(i)] = self.vls[i]
                    elif 10 <= i < 20:
                        self.a_Dic[self.cl1[i] + '-b' + str(i)] = self.vls[i]
                    elif 20 <= i < 30:
                        self.a_Dic[self.cl1[i] + '-c' + str(i)] = self.vls[i]
                    elif 30 <= i < 40:
                        self.a_Dic[self.cl1[i] + '-d' + str(i)] = self.vls[i]
                    elif 40 <= i < 50:
                        self.a_Dic[self.cl1[i] + '-e' + str(i)] = self.vls[i]
                elif self.cl1[i] == 'b':
                    if i < 10:
                        self.b_Dic[self.cl1[i] + '-a' + str(i)] = self.vls[i]
                    elif 10 <= i < 20:
                        self.b_Dic[self.cl1[i] + '-b' + str(i)] = self.vls[i]
                    elif 20 <= i < 30:
                        self.b_Dic[self.cl1[i] + '-c' + str(i)] = self.vls[i]
                    elif 30 <= i < 40:
                        self.b_Dic[self.cl1[i] + '-d' + str(i)] = self.vls[i]
                    elif 40 <= i < 50:
                        self.b_Dic[self.cl1[i] + '-e' + str(i)] = self.vls[i]
                elif self.cl1[i] == 'c':
                    if i < 10:
                        self.c_Dic[self.cl1[i] + '-a' + str(i)] = self.vls[i]
                    elif 10 <= i < 20:
                        self.c_Dic[self.cl1[i] + '-b' + str(i)] = self.vls[i]
                    elif 20 <= i < 30:
                        self.c_Dic[self.cl1[i] + '-c' + str(i)] = self.vls[i]
                    elif 30 <= i < 40:
                        self.c_Dic[self.cl1[i] + '-d' + str(i)] = self.vls[i]
                    elif 40 <= i < 50:
                        self.c_Dic[self.cl1[i] + '-e' + str(i)] = self.vls[i]
        fh.close()
           
    def __export__(self):
        """Making an 'old way' dictionary sort by key
        and then writing the values of the keys from 
        the dictionaries to 'filename'."""
        k1 = self.a_Dic.keys()
        k1.sort() 
        k2 = self.b_Dic.keys()
        k2.sort()
        k3 = self.c_Dic.keys()
        k3.sort()
        fh = open(self.newfile, 'w') #preparing the file to be written, overwriting what was already in.
        fh.write(str('%15s %10s %10s' % ("A", "B", "C")) + '\n')
        for i in range(0, len(k1)):
            fh.write(str("%15s %10s %10s" % (self.a_Dic[k1[i]], self.b_Dic[k2[i]], self.c_Dic[k3[i]])) + '\n')
        fh.close()
       
            
def run():
    pathname = os.path.dirname(sys.argv[0])
    i = -1
    while i != 'ok':
        fl = raw_input("The input file name (if you don't have a file, type 'exit'): \n") + '.txt'
        if fl == 'exit.txt':
            print "Initialized sys.exit() command. BB!"
            time.sleep(2)
            sys.exit()
        _rawpath = os.path.isfile(pathname + '\\' + fl)
        errors = []
        if _rawpath == True:
            flnw = raw_input("The output file name: ") + '.txt'
            run = file(fl, flnw)
            run.__import__()
            run.__export__()
            time.sleep(1)
            print "Done writing...\nCheck your file", flnw, "to see the output of the initial values."
            time.sleep(1)
            print "Exiting in 3 sec."
            time.sleep(3)
            exit()
        else: 
            er = "There is no such file located at the address: \n" + str(pathname)
            errors.append(er)
            print errors[0]
        if len(errors) == 0:
            i = 'ok'
if __name__ == '__main__':
    run()

An example of source file.txt:

"""
a=1
b=2
c=3

a=4
b=5
c=6

a=7
b=8
c=9

a=10
b=11
c=12
"""

See how there's the same amount of constants 'a', 'b', 'c'=4
The input file name requires the name of your source file, without the ending '.txt'
At the output file name, you can enter any name, that file will be created by the Python script.

"""
My output:

              A          B          C
              1          2          3
              4          5          6
              7          8          9
             10         11         12
"""

roe1and 0 Newbie Poster · Answer 5 · 2012-02-24T19:55:53+00:00

Hi everyone,

I was tinkering with this at home last night and I have managed to use what I have learnt here to write a short script that reads data from separate files into a CSV type file that I will be sharing on Google docs. I'm sure there are still some improvements that could be made and any suggestions are welcome. I am very happy with all the help I have received. Thanks again.

The new script:

#!/usr/local/bin/python2.6   
import os, sys, os.path
from datetime import datetime

output_ = open('/dc/pao/pcift0/data/test/output.txt', 'w')
headers_=["Comment", "SD_Card", "Camera", "Version increment", "All Interface Languages", "30min limit removal", "Maximum ISO limit removal", "PAL<->NTSC Menu", "720p30 height", "480p30 width", "480p30 height", "E1 Quality", "E1 Table", "E2 Quality", "E2 Table", "E3 Quality", "E3 Table", "E4 Quality", "E4 Table", "F1 Quality", "F1 Table", "F2 Quality", "F2 Table", "F3 Quality", "F3 Table", "F4 Quality", "F4 Table", "Video Bitrate 24H", "Video Bitrate 24L", "Video Bitrate FSH/SH", "Video Bitrate FH/H", "Auto Quantizer for 1080 modes", "Auto Quantizer for 720 modes", "720p50 GOP Size", "720p60 GOP Size", "1080i50 and 1080p24 GOP Size", "1080i60 GOP Size", "Audio encoding bps", "Volume Indicator 8", "Volume Indicator 7", "Volume Indicator 6", "Volume Indicator 5", "Volume Indicator 4", "Volume Indicator 3", "Volume Indicator 2", "Volume Indicator 1", "AGC 3 Setting", "AGC 2 Setting", "AGC 1 Setting", "AGC 0 Setting", "Encoder setting 1 720p", "Encoder setting 2 720p", "Encoder setting 3 720p", "Encoder setting 4 720p", "Encoder setting 1 1080i/p", "Encoder setting 2 1080p", "Encoder setting 3 1080p", "Encoder setting 4 1080p", "Encoder setting 2 1080i", "Encoder setting 3 1080i", "Encoder setting 4 1080i", "Video buffer", "Video buffer 24p", "Audio buffer", "1080 progressive", "Initial quantizer", "Quantizer for 1080 modes", "Quantizer for 720 modes", "Quantizer table", "1080p24 Scaling I", "1080p24 Scaling P", "1080p24 Scaling B", "1080p24 Scaling Fallback", "1080i Scaling I", "1080i Scaling P", "1080i Scaling B", "1080i Scaling Fallback", "720p Scaling I", "720p Scaling P", "720p Scaling B", "720p Scaling Fallback", "1080p24 GOP Table", "1080i60 GOP Table", "720p60 Opt1 GOP Table", "1080i50 GOP Table", "720p50 Opt1 GOP Table", "720p60 Opt2 GOP Table", "720p50 Opt2 GOP Table", "1080p24 GOPx2", "1080p24 GOPx2 time", "1080i60 GOPx2", "1080i60 GOPx2 time", "1080i50 GOPx2", "1080i50 GOPx2 time", "720p60 GOPx2", "720p60 GOPx2 time", "720p50 GOPx2", "720p50 GOPx2 time", "Table Flag1 I", "Table Flag1 P", "Table Flag1 B", "Table High I", "Table High P", "Table High B", "1080p24 FB1", "1080p24 FB2", "1080 SB", "720 SB", "1080i60 FB1", "60fps FB2", "1080i50 FB1", "50fps FB2", "720p60 FB1", "720p50 FB1", "1080p24 Frame Limit", "60fps Frame Limit", "50fps Frame Limit", "1080p24 High Top Setting", "1080p24 High Bottom Setting", "1080p24 Low Top Setting", "1080p24 Low Bottom Setting", "1080i Top Setting", "1080i Bottom Setting", "720p Top Setting", "720p Bottom Setting", "Other Modes Bottom Setting", "Limit 90000", "Limit 63000", "Limit 47700", "Limit 15300", "720p30->720pXfps"]
output_.write('Comment, SD_Card, Camera, Version increment, All Interface Languages, 30min limit removal, Maximum ISO limit removal, PAL<->NTSC Menu, 720p30 height, 480p30 width, 480p30 height, E1 Quality, E1 Table, E2 Quality, E2 Table, E3 Quality, E3 Table, E4 Quality, E4 Table, F1 Quality, F1 Table, F2 Quality, F2 Table, F3 Quality, F3 Table, F4 Quality, F4 Table, Video Bitrate 24H, Video Bitrate 24L, Video Bitrate FSH/SH, Video Bitrate FH/H, Auto Quantizer for 1080 modes, Auto Quantizer for 720 modes, 720p50 GOP Size, 720p60 GOP Size, 1080i50 and 1080p24 GOP Size, 1080i60 GOP Size, Audio encoding bps, Volume Indicator 8, Volume Indicator 7, Volume Indicator 6, Volume Indicator 5, Volume Indicator 4, Volume Indicator 3, Volume Indicator 2, Volume Indicator 1, AGC 3 Setting, AGC 2 Setting, AGC 1 Setting, AGC 0 Setting, Encoder setting 1 720p, Encoder setting 2 720p, Encoder setting 3 720p, Encoder setting 4 720p, Encoder setting 1 1080i/p, Encoder setting 2 1080p, Encoder setting 3 1080p, Encoder setting 4 1080p, Encoder setting 2 1080i, Encoder setting 3 1080i, Encoder setting 4 1080i, Video buffer, Video buffer 24p, Audio buffer, 1080 progressive, Initial quantizer, Quantizer for 1080 modes, Quantizer for 720 modes, Quantizer table, 1080p24 Scaling I, 1080p24 Scaling P, 1080p24 Scaling B, 1080p24 Scaling Fallback, 1080i Scaling I, 1080i Scaling P, 1080i Scaling B, 1080i Scaling Fallback, 720p Scaling I, 720p Scaling P, 720p Scaling B, 720p Scaling Fallback, 1080p24 GOP Table, 1080i60 GOP Table, 720p60 Opt1 GOP Table, 1080i50 GOP Table, 720p50 Opt1 GOP Table, 720p60 Opt2 GOP Table, 720p50 Opt2 GOP Table, 1080p24 GOPx2, 1080p24 GOPx2 time, 1080i60 GOPx2, 1080i60 GOPx2 time, 1080i50 GOPx2, 1080i50 GOPx2 time, 720p60 GOPx2, 720p60 GOPx2 time, 720p50 GOPx2, 720p50 GOPx2 time, Table Flag1 I, Table Flag1 P, Table Flag1 B, Table High I, Table High P, Table High B, 1080p24 FB1, 1080p24 FB2, 1080 SB, 720 SB, 1080i60 FB1, 60fps FB2, 1080i50 FB1, 50fps FB2, 720p60 FB1, 720p50 FB1, 1080p24 Frame Limit, 60fps Frame Limit, 50fps Frame Limit, 1080p24 High Top Setting, 1080p24 High Bottom Setting, 1080p24 Low Top Setting, 1080p24 Low Bottom Setting, 1080i Top Setting, 1080i Bottom Setting, 720p Top Setting, 720p Bottom Setting, Other Modes Bottom Setting, Limit 90000, Limit 63000, Limit 47700, Limit 15300, 720p30->720pXfps\n')

def walk(dir):
    """ walks a directory, and meth on each file! """
    dir = os.path.abspath(dir)
    for file in [file for file in os.listdir(dir) if not file in [".",".."]]:
        nfile = os.path.join(dir,file)
        if os.path.isfile(nfile):
            meth(nfile)
        else:
            print "%s" % (nfile)
            walk(nfile)

def meth(nfile):
    
    fileExt = nfile[-3:]
    if not fileExt in ["INI", "ini"]: return
    

    with open(nfile, 'r') as thefile_:
        listsml_ = {}

        for line_ in thefile_:

            if '=' in line_:
                key_, eq, val_ = line_.partition('=')
                key_ = key_.strip()
                listsml_[key_] = val_.strip()

        
        for header_ in headers_:
 
            if header_ in listsml_:
                output_.write(', %s' % listsml_[header_])
            else:
                output_.write(', ')

        output_.write('\n')
        
       
if __name__ == "__main__":

    path_ = "/dc/pao/pcift0/data/test/in"
    startTime = datetime.now()
    walk(path_)
    print(datetime.now()-startTime)

The old script was 1230 lines and did the same using regex! Very ugly for sure.

roe1and 0 Newbie Poster · Answer 6 · 2012-02-28T16:29:19+00:00

Thanks again for all the help. The final product looks like this(unless anyone can suggest something more!):

#!/usr/local/bin/python2.6   
import os, sys, os.path
from time import time

def meth(_, path, files):
    print(path)
    headers = []
    with open('in/head.ini') as headersfile:
        for line in headersfile:
	    if '=' in line:
	        key, eq, val = line.partition('=')
                headers.append(key.strip())
    output = open('output.txt', 'w')
    output.write('|'.join(headers) + '\n')            
    for nfile in files:
        if nfile.endswith(("INI", "ini")):        
            with open(os.path.join(path, nfile), 'r') as thefile:
                header_iter = iter(headers)
                print('Processing file %s' % nfile)
                for line in thefile:
                    if '=' in line:
                        key, eq, val = line.partition('=')
                        key = key.strip()
                        if key not in headers:
		            print('Invalid data in file %s, line: %s' % (nfile, key))
                        else:
                            for header in header_iter:
                                if header == key:
                                    output.write('|%s' % val.strip())
                                    break
                output.write('\n')
       
if __name__ == "__main__":

    path = './in'
    start = time()
    os.path.walk(path, meth, None)
    print(time() - start)

I have decided to do the headers on the fly. The file 'head.ini' will always be included with the input data and will always contain the most up to date headers. Thanks again for all the help I am really chuffed with this. One more question. Does this qualify as pythonic?

TrustyTony 888 pyMod Team Colleague Featured Poster · Answer 7 · 2012-02-28T16:36:36+00:00

Looks quite neat to me, for headers it looks little stange to read line with '=' drop val, but it is good design if you include comment:
# for headers it suffice to copy one data line that has all the headers used, notice that program assumes allways same ordering of headers as data keys inside one datafile.

One problem though is that you are processing your file of headers in the meth function for every time path (ie directory is changed) is scanned, even looks it is the same file as it has not path relative the file under scannning. So you could have the contents as global variable (as it only need be read) or pass it as parameter instead of None now (and rename _ variable to headers). For me looks better that output is in the meth funcition. Probably you would later want to add logging of error messages in addition of printing them out real time. My code for you had file left open, now you have output.txt unclosed in meth and you are overwriting the contents for each path visitted, so that is not nice.

Also amount of output is probably little excessive with debug print 'Processing file...' used, better comment it out after debug. Otherwise any problematic line will not be noticed.

Remove also unnecessary import of os.path and sys.

roe1and 0 Newbie Poster · Answer 8 · 2012-02-28T18:27:33+00:00

Dude, you rock!

#!/usr/local/bin/python2.6   
import os
from time import time

def meth(_, path, files):

    for nfile in files:
        if nfile.endswith(("INI", "ini")):        
            with open(os.path.join(path, nfile), 'r') as thefile:
                header_iter = iter(headers)
                for line in thefile:
                    if '=' in line:
                        key, eq, val = line.partition('=')
                        key = key.strip()
                        if key not in headers:
		            print('Invalid data in file %s, line: %s' % (nfile, key))
                        else:
                            for header in header_iter:
                                if header == key:
                                    output.write('|%s' % val.strip())
                                    break
                                else:
                                    output.write('|')
                                
                output.write('\n')
       
if __name__ == "__main__":

    headers = []
    with open('in/head.txt') as headersfile:
        for lines in headersfile:
            if '=' in lines:
                key, eq, val = lines.partition('=')
                headers.append(key.strip())
    output = open('output.txt', 'w')
    output.write('|'.join(headers) + '\n')  
    path = './in'
    start = time()
    os.path.walk(path, meth, None)
    print(time() - start)
    output.close()

I think that addresses most of your concerns. I don't think I'll bother with error logging for now as I don't think I'll ever work with more than 20 or so files. As for the headers. I've changed the file to a txt file. While it contains all the correct header information, the data contained in this file is not relevant. I also fixed

else:
    output.write('|')

Is there something I missed

Be warned: Beginner questions. Parsing. List. Dictionaries. etc.

Recommended Answers Collapse Answers

All 11 Replies

Recommended Answers