RE.Split confussion

Question

abaddon2031 0 Junior Poster in Training

10 Years Ago

Im working on a code that loops through a folder break up the file names in it into specific parts and then reads off sertain parts of the broken name and writes it to a csv sile. The files ser formated as follows test_PAQT_B2H.csv, test_PAQT_B4.csv, and test_PINI_B1H.csv. when it jsut has one file type like just AQT files it works fine but when there is INI files in it when it writes to the csv file for results it writes the ini file multiple times messing up the readability of the data. I jsut want to know how to get it to read each file only once.

import datetime,glob,os,csv,fnmatch,StringIO,smtplib,argparse,math,re

parser = argparse.ArgumentParser(description='Search art folders.')
parser.add_argument('-b', help='The base path', required=False, dest='basePath', metavar='Base directory path',default='/home/hatterx/Desktop/beds')
parser.add_argument('-o', help='File Output Location', required=False, dest ='fileOutput', metavar='File Output', default='/home/hatterx/Desktop/bedsused')
args = parser.parse_args()

filestart=args.basePath
outputCount= args.fileOutput
DT = datetime.datetime.now().strftime("%Y_%m_%d")
dt = datetime.datetime.now().strftime("%Y/%m/%d %I:%M:%S%p")

def fileBreak(pathname):
    filepresent = os.path.isfile(args.fileOutput+'/filecount_'+DT+'.csv')   
    newrow={'Date':'', 'Total Files':'', 'Total Beds':'', 'Total SQFT':'', 'AQT Files':'', 'INI Files':'','AQT Beds':'','INI Beds':'','AQT Total SQFT':'','INI Total SQFT':'', 'AQT Half Beds':'','INI Half Beds':''}
    new_field_names = newrow.keys()

    filecount = {}
    bedcount = {}
    halfbedcount = {}
    sqftFactor = {"AQT":64, "INI":50, "n/a":10}

    for filename in os.listdir(pathname):
        print filename
        Extbreak = re.split('[.]', filename)[0]
        Printbreak = re.split('_p', Extbreak, flags=re.I)[1]
        Typebreak = re.split('_b', Printbreak, flags=re.I)[0]
        Bedbreak = re.split('_b', Extbreak, flags=re.I)[1]
        Halfsearch = re.search('h', Bedbreak, flags=re.I)
        if Halfsearch:
            Numbreak = re.split('h', Bedbreak, flags=re.I)[0]
            #print int(Numbreak)*.5

        else:
            Numbreak = re.split('h', Bedbreak, flags=re.I)[0]
            #print Numbreak


        if Typebreak not in filecount:
            filecount[Typebreak] = 0

        if Typebreak not in bedcount:
            bedcount[Typebreak] = 0

        if Typebreak not in halfbedcount:
            halfbedcount[Typebreak] = 0

        filecount[Typebreak] = filecount[Typebreak]+1
        if Halfsearch:
            halfbedcount[Typebreak] = halfbedcount[Typebreak] + int(Numbreak)*.5
        bedcount[Typebreak] = bedcount[Typebreak] + int(Numbreak)
        for type in filecount:
            print dt, type, str(filecount[type]), str(bedcount[type] - halfbedcount[type]), str(sqftFactor[type] * bedcount[type]-(sqftFactor[type]*halfbedcount[type]))
            with open(args.fileOutput+'/filecount.csv','ab') as f:
                data = [filename]
                writer = csv.writer(f)
                for item in data:
                    writer.writerow(data)
                data = [dt]
                writer = csv.writer(f)
                for item in data:
                    writer.writerow(data)
                data = [type]
                writer = csv.writer(f)
                for item in data:
                    writer.writerow(data)
                data = [type+" files: "+str(filecount[type])] 
                writer = csv.writer(f)
                for item in data:
                    writer.writerow(data)
                data = [type+" bed count: "+str(bedcount[type] - halfbedcount[type])]
                writer = csv.writer(f)
                for item in data:
                    writer.writerow(data)
                data = [type+" SQFT: "+str(sqftFactor[type] * bedcount[type]-(sqftFactor[type]*halfbedcount[type]))]
                writer = csv.writer(f)
                for item in data:
                    writer.writerow(data)
fileBreak(filestart)

python

2 Contributors
5 Replies
362 Views
3 Hours Discussion Span
Latest Post 10 Years Ago Latest Post by abaddon2031

All 5 Replies

Gribouillis 1,391 Programming Explorer

10 Years Ago

Your code is very difficult to understand. I think you can do a lot with a single regular expression, like in this example

# -*- coding: utf-8 -*-
"""
Created on Fri Jul 25 19:23:53 2014
python 2 or 3
@author: Gribouillis
"""
import re

pattern = re.compile(
    r"^(?P<prefix>(?:[^_.]|_(?!p))+)"
    r"_p(?P<type>(?:[^_.]|_(?!b))+)"
    r"_b(?P<bed>(?P<num>[^h.]+)(?P<half>h?))\.csv",
    flags = re.I
)

if __name__ == "__main__":
    for filename in [
        "test_PAQT_B2H.csv",
        "test_PAQT_B4.csv",
        "test_PINI_B1H.csv",
    ]:
        match = pattern.match(filename)
        print(match.groupdict())

""" my output --->
{'prefix': 'test', 'num': '2', 'type': 'AQT', 'bed': '2H', 'half': 'H'}
{'prefix': 'test', 'num': '4', 'type': 'AQT', 'bed': '4', 'half': ''}
{'prefix': 'test', 'num': '1', 'type': 'INI', 'bed': '1H', 'half': 'H'}
"""

Edited 10 Years Ago by Gribouillis

Gribouillis 1,391 Programming Explorer

10 Years Ago

The strange thing is that you open the file within a loop. It means that the same file is opened repeatedly by the program. Normally you open the output file once and loop to write a series of records.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

abaddon2031 0 Junior Poster in Training · Answer 1 · 2014-07-25T18:26:01+00:00

Thank you and i ahve hit a new problem that falls inot htis same question. I changed the 'ab' to 'wb' and now it only writes the last item that it processes so it ends up being jsut one field with the information for INI where i need it to have bothe AQT and INI

abaddon2031 0 Junior Poster in Training · Answer 2 · 2014-07-25T19:28:20+00:00

OK so how do i fix this because i jsut had it print out the output all as one data line and it printed over 90 lines and it was jsut files being looped over again and again till it was finished

abaddon2031 0 Junior Poster in Training · Answer 3 · 2014-07-25T19:32:39+00:00

Ok i figured it out. I moved it above the for statement and it does everything i want it to do now

RE.Split confussion

Recommended Answers Collapse Answers

All 5 Replies

Recommended Answers