Im working on a code that loops through a folder break up the file names in it into specific parts and then reads off sertain parts of the broken name and writes it to a csv sile. The files ser formated as follows test_PAQT_B2H.csv, test_PAQT_B4.csv, and test_PINI_B1H.csv. when it jsut has one file type like just AQT files it works fine but when there is INI files in it when it writes to the csv file for results it writes the ini file multiple times messing up the readability of the data. I jsut want to know how to get it to read each file only once.

import datetime,glob,os,csv,fnmatch,StringIO,smtplib,argparse,math,re

parser = argparse.ArgumentParser(description='Search art folders.')
parser.add_argument('-b', help='The base path', required=False, dest='basePath', metavar='Base directory path',default='/home/hatterx/Desktop/beds')
parser.add_argument('-o', help='File Output Location', required=False, dest ='fileOutput', metavar='File Output', default='/home/hatterx/Desktop/bedsused')
args = parser.parse_args()

filestart=args.basePath
outputCount= args.fileOutput
DT = datetime.datetime.now().strftime("%Y_%m_%d")
dt = datetime.datetime.now().strftime("%Y/%m/%d %I:%M:%S%p")

def fileBreak(pathname):
    filepresent = os.path.isfile(args.fileOutput+'/filecount_'+DT+'.csv')   
    newrow={'Date':'', 'Total Files':'', 'Total Beds':'', 'Total SQFT':'', 'AQT Files':'', 'INI Files':'','AQT Beds':'','INI Beds':'','AQT Total SQFT':'','INI Total SQFT':'', 'AQT Half Beds':'','INI Half Beds':''}
    new_field_names = newrow.keys()

    filecount = {}
    bedcount = {}
    halfbedcount = {}
    sqftFactor = {"AQT":64, "INI":50, "n/a":10}

    for filename in os.listdir(pathname):
        print filename
        Extbreak = re.split('[.]', filename)[0]
        Printbreak = re.split('_p', Extbreak, flags=re.I)[1]
        Typebreak = re.split('_b', Printbreak, flags=re.I)[0]
        Bedbreak = re.split('_b', Extbreak, flags=re.I)[1]
        Halfsearch = re.search('h', Bedbreak, flags=re.I)
        if Halfsearch:
            Numbreak = re.split('h', Bedbreak, flags=re.I)[0]
            #print int(Numbreak)*.5

        else:
            Numbreak = re.split('h', Bedbreak, flags=re.I)[0]
            #print Numbreak


        if Typebreak not in filecount:
            filecount[Typebreak] = 0

        if Typebreak not in bedcount:
            bedcount[Typebreak] = 0

        if Typebreak not in halfbedcount:
            halfbedcount[Typebreak] = 0

        filecount[Typebreak] = filecount[Typebreak]+1
        if Halfsearch:
            halfbedcount[Typebreak] = halfbedcount[Typebreak] + int(Numbreak)*.5
        bedcount[Typebreak] = bedcount[Typebreak] + int(Numbreak)
        for type in filecount:
            print dt, type, str(filecount[type]), str(bedcount[type] - halfbedcount[type]), str(sqftFactor[type] * bedcount[type]-(sqftFactor[type]*halfbedcount[type]))
            with open(args.fileOutput+'/filecount.csv','ab') as f:
                data = [filename]
                writer = csv.writer(f)
                for item in data:
                    writer.writerow(data)
                data = [dt]
                writer = csv.writer(f)
                for item in data:
                    writer.writerow(data)
                data = [type]
                writer = csv.writer(f)
                for item in data:
                    writer.writerow(data)
                data = [type+" files: "+str(filecount[type])] 
                writer = csv.writer(f)
                for item in data:
                    writer.writerow(data)
                data = [type+" bed count: "+str(bedcount[type] - halfbedcount[type])]
                writer = csv.writer(f)
                for item in data:
                    writer.writerow(data)
                data = [type+" SQFT: "+str(sqftFactor[type] * bedcount[type]-(sqftFactor[type]*halfbedcount[type]))]
                writer = csv.writer(f)
                for item in data:
                    writer.writerow(data)
fileBreak(filestart)

Recommended Answers

All 5 Replies

Your code is very difficult to understand. I think you can do a lot with a single regular expression, like in this example

# -*- coding: utf-8 -*-
"""
Created on Fri Jul 25 19:23:53 2014
python 2 or 3
@author: Gribouillis
"""
import re

pattern = re.compile(
    r"^(?P<prefix>(?:[^_.]|_(?!p))+)"
    r"_p(?P<type>(?:[^_.]|_(?!b))+)"
    r"_b(?P<bed>(?P<num>[^h.]+)(?P<half>h?))\.csv",
    flags = re.I
)

if __name__ == "__main__":
    for filename in [
        "test_PAQT_B2H.csv",
        "test_PAQT_B4.csv",
        "test_PINI_B1H.csv",
    ]:
        match = pattern.match(filename)
        print(match.groupdict())

""" my output --->
{'prefix': 'test', 'num': '2', 'type': 'AQT', 'bed': '2H', 'half': 'H'}
{'prefix': 'test', 'num': '4', 'type': 'AQT', 'bed': '4', 'half': ''}
{'prefix': 'test', 'num': '1', 'type': 'INI', 'bed': '1H', 'half': 'H'}
"""

Thank you and i ahve hit a new problem that falls inot htis same question. I changed the 'ab' to 'wb' and now it only writes the last item that it processes so it ends up being jsut one field with the information for INI where i need it to have bothe AQT and INI

The strange thing is that you open the file within a loop. It means that the same file is opened repeatedly by the program. Normally you open the output file once and loop to write a series of records.

OK so how do i fix this because i jsut had it print out the output all as one data line and it printed over 90 lines and it was jsut files being looped over again and again till it was finished

Ok i figured it out. I moved it above the for statement and it does everything i want it to do now

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.