Looking for numbers in file name

Question

abaddon2031 0 Junior Poster in Training

10 Years Ago

I am trying to do a search for sub-folders in a directory that conaint numbers. The format could be a date range for example: 1-1, 01-1, 1-01, 01-01. The first number will only go up to 12 and the second one will go as high as 31 and im trying to figure out how to read the date of files that are in there then once it finds the correctly formated file name it kicks off the code to go into that file and do what else my code is set to do . If there is a simple way to do this please let me knwo cause this has me running in circles and if my code is needed i will post it.

python

Edited 10 Years Ago by abaddon2031 because: update on project

4 Contributors
20 Replies
475 Views
5 Days Discussion Span
Latest Post 10 Years Ago Latest Post by Gribouillis

vegaseat 1,735 DaniWeb's Hypocrite

10 Years Ago

Maybe this will help ...

s = "test01-15.dat"

q = s.split('.')
print(q)
print(q[0])

numeric = "".join(n for n in s.split('.')[0] if n in '0123456789-')
print(numeric)
print(numeric.split('-'))

''' result ...
['test01-15', 'dat']
test01-15
01-15
['01', '15']
'''

Gribouillis 1,391 Programming Explorer

10 Years Ago

Another way, returning a list of integers

>>> import re
>>> def getints(string):
...     return [int(x) for x in re.findall(r'\d+', string)]
... 
>>> getints("foo23bar01-14qux2")
[23, 1, 14, 2]

snippsat 661 Master Poster

10 Years Ago

it does for the files but i jsut got told they arent files but subfolders that have the numbers in the names which makes this so much more confussing for me.

Use os.walk() it recursive scan all folder and subfolders.
Example.

import os
import re

search_pattern = r'\d'
target_path = os.path.abspath(".") #current folder
for path, dirs, files in os.walk(target_path):
    for folder_name in dirs:
        if re.findall(search_pattern, folder_name):
            print folder_name # Folder with numbers
            print(os.path.join(path, folder_name)) # Full path

Edited 10 Years Ago by snippsat

Gribouillis 1,391 Programming Explorer

10 Years Ago

In python, the largest of a sequence can be obtained with the max() function with a key argument to compute the score of an item.

>>> import re
>>> def getints(string):
...  return tuple(int(x) for x in re.findall(r'\d+', string))
... 
>>> L = ['test02-05','test01-15','test03-2','test02-17',]
>>> max(L, key = getints)
'test03-2'

snippsat 661 Master Poster

10 Years Ago

That 100 line long fileBreak function is really not good at all.
You should split it up,functions should be around 10-15 lines.

Do not try to do to much in a single function.
Do a couple of task and have a clear return value.
This make code much eaiser to read and test.

Edited 10 Years Ago by snippsat

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

abaddon2031 0 Junior Poster in Training · Answer 1 · 2014-08-01T18:09:53+00:00

it does for the files but i jsut got told they arent files but subfolders that have the numbers in the names which makes this so much more confussing for me.

abaddon2031 0 Junior Poster in Training · Answer 2 · 2014-08-04T18:18:12+00:00

snippsat that works great. One last question is how do i get it to retunr the largest of the numbers cuase i ran it on the files i have which are 08-1, 8-02, 8-3, and 08-04 it returns them all and i jsut need it returnign the largest of the number sets cause at times the subfolders wont get deleted till the end of the month so it can have as many as 31 subfolders there and we jsut need the one for a specified date set.

abaddon2031 0 Junior Poster in Training · Answer 3 · 2014-08-04T19:08:10+00:00

I just tried that and had it print out and it doesnt match like i want it to there are 3 subfolders that it just returns the first digit which is the month when i want it to return the day and i only need it to return the largest date where when i printed it returned that largest date off of all the folders that are there.

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 4 · 2014-08-04T21:24:17+00:00

Instead of getints(), define your own score function

def score(dirname):
    """return a value extracted from the dirname"""
    value = ??? # your code here
    return value

thedir = max(L, key=score)

abaddon2031 0 Junior Poster in Training · Answer 5 · 2014-08-05T15:35:40+00:00

that did the same thing on several of them it returned the month and not the day and i need it to check the day to see if it is formated like the date profided by a argument or if it matches in any way to that date. i will post my code to see if that helps cause this is got me realyl confussed right now.

import datetime,glob,os,csv,fnmatch,StringIO,smtplib,argparse,math,re, sys

parser = argparse.ArgumentParser(description='Search art folders.')
parser.add_argument('-b', help='The base path', required=False, dest='basePath', metavar='Base directory path',default='/home/hatterx/Desktop/beds/')
parser.add_argument('-o', help='File Output Location', required=False, dest ='fileOutput', metavar='File Output', default='/home/hatterx/Desktop/bedsused')
args = parser.parse_args()
parser.add_argument('-d', help='Subfolder Date', required=False, dest ='fileDate', metavar='Subfolder Date', default=datetime.datetime.now().strftime("%m-%d"))
parser.add_argument('--AQT', help='AQT SQFT Factor', required=False, dest ='AQTFactor', metavar='AQT Factor', default=64)
parser.add_argument('--INI', help='INI SQFT Factor', required=False, dest ='INIFactor', metavar='INI Factor', default=50)
parser.add_argument('--N/A', help='Not Avalable SQFT Factor', required=False, dest ='naFactor', metavar='N/A Factor', default=50)
args = parser.parse_args()


filestart=args.basePath
outputCount= args.fileOutput
DT = datetime.datetime.now().strftime("%Y_%m_%d")
dt = datetime.datetime.now().strftime("%Y/%m/%d %I:%M:%S%p")
fileDate = datetime.datetime.now().strftime("%m-%d")

def fileBreak(pathname):
    filecount = {}
    bedcount = {}
    halfbedcount = {}
    sqftFactor = {"AQT":args.AQTFactor, "INI":args.INIFactor, "n/a":args.naFactor}
    total = {"files":0, "beds":0, "half beds":0, "full bed sqft":0, "half bed sqft":0}

    for filename in os.listdir(pathname):
        fileNameWithNoExtension = re.split('\.', filename)[0]
        printerTypesearch = re.search('[-_]p', fileNameWithNoExtension, flags=re.I)
        if printerTypesearch == None:
            print filename + ' is not formated with _P correctly.'
            continue
        printerRemoval = re.split('[-_]p', fileNameWithNoExtension, flags=re.I)[1]
        bedInfosearch = re.search('[-_]b', printerRemoval, flags=re.I)
        if bedInfosearch == None:
            print filename + ' is not formated with _B correctly.'
            continue
        printerType = re.split('[-_]b', printerRemoval, flags=re.I)[0]
        if printerType not in sqftFactor:
            sqftFactor[printerType]=sqftFactor["n/a"]
        bedInfo = re.split('[-_]b', printerRemoval, flags=re.I)[1]
        halfBedsearch = re.search('h', bedInfo, flags=re.I)
        if halfBedsearch:
            bedNumber = re.split('h', bedInfo, flags=re.I)[0]

        else:
            bedNumber = bedInfo
        if bedNumber == '':
            bedNumber = '1'
        if printerType not in filecount:
            filecount[printerType] = 0

        if printerType not in bedcount:
            bedcount[printerType] = 0

        if printerType not in halfbedcount:
            halfbedcount[printerType] = 0

        filecount[printerType] = filecount[printerType]+1
        total['files'] = total['files'] + 1
        if halfBedsearch:
            halfbedcount[printerType] = halfbedcount[printerType] + int(bedNumber)
            total['half beds'] = total['half beds'] + int(bedNumber)
            total['half bed sqft'] = total['half bed sqft'] + int(bedNumber)* sqftFactor[printerType]*.5
        else:
            bedcount[printerType] = bedcount[printerType] + int(bedNumber)
            total['beds'] = total['beds'] + int(bedNumber)
            total['full bed sqft'] = total['full bed sqft'] + int(bedNumber)* sqftFactor[printerType]

    with open(args.fileOutput+'/Filecount.csv','wb') as f:
        data=['Printer Type', 'File Count']
        writer = csv.writer(f)
        writer.writerow(data)
        for type in filecount:
            data = [type,str(filecount[type])]
            writer = csv.writer(f)
            writer.writerow(data)

    with open(args.fileOutput+'/Bedcount.csv','wb') as f:
        data=['Printer Type','Total Beds','Half Beds','Full Beds']
        writer = csv.writer(f)
        writer.writerow(data)
        for type in filecount:
            data =[type,str(bedcount[type]+halfbedcount[type]*0.5),str(halfbedcount[type]),str(bedcount[type])]
            writer = csv.writer(f)
            writer.writerow(data)

    with open(args.fileOutput+'/SQFTcount.csv','wb') as f:
        data=['Printer Type','Total SQFT','Half Bed SQFT','Full Bed SQFT']
        writer = csv.writer(f)
        writer.writerow(data)
        for type in filecount:
            data =[type,str(sqftFactor[type] * bedcount[type]+(sqftFactor[type]*halfbedcount[type]*.5)),str(sqftFactor[type]*halfbedcount[type]*.5),str(sqftFactor[type] * bedcount[type])]
            writer = csv.writer(f)
            writer.writerow(data)

    with open(args.fileOutput+'/FullInfo.csv','wb') as f:
        data=['Date','Printer Type','Total Beds','Total SQFT']
        writer = csv.writer(f)
        writer.writerow(data)
        for type in filecount:
            data = [dt,type,str(filecount[type]),str(bedcount[type] + halfbedcount[type]*0.5),str(sqftFactor[type] * bedcount[type]+(sqftFactor[type]*halfbedcount[type]*.5))]
            writer = csv.writer(f)
            writer.writerow(data)

    with open(args.fileOutput+'/TotalInfo.csv','wb') as f:
        data=['Total File Count','Total Beds','Total Full Beds','Total Half Beds','Total SQFT', 'Total Full Bed SQFT', 'Total Half Bed SQFT']
        writer = csv.writer(f)
        writer.writerow(data)
        writer = csv.writer(f)
        writer.writerow([total['files'], total['beds']+total['half beds']*.5,total['beds'], total['half beds'],total['half bed sqft']+total['full bed sqft'],total['full bed sqft'],total['half bed sqft']])

print args.fileDate

search_pattern = r'\d'
target_path = os.path.abspath(filestart)
for path, dirs, files in os.walk(target_path):
    for folder_name in dirs:
        if folder_name == args.fileDate:
            fileBreak(filestart+args.fileDate)
            print args.fileDate + ' was the correct format'
            sys.exit()
        else:
            if re.findall(search_pattern, folder_name):
                fileBreak(filestart+folder_name)
                print folder_name + ' was the found format'

            else:
                print 'Proper Folder Format Not Found

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 6 · 2014-08-05T17:11:02+00:00

how do i get it to retunr the largest of the numbers cuase i ran it on the files i have which are 08-1, 8-02, 8-3, and 08-04 it returns them all and i jsut need it returnign the largest of the number sets

I don't see anywhere in your code where you are looking for the largest of the number sets, whatever that means. There is no call to max(). You must describe the issue more precisely.

abaddon2031 0 Junior Poster in Training · Answer 7 · 2014-08-05T17:25:28+00:00

i tried the max thing and it didnt work so i reverted back to my way of searching. Which returns all the files in teh subfolder but what im wanting to to do is look for the dat eprovided by args.fileDate and if thats not there to compare the files that it finds to that to see if there is a close match.

abaddon2031 0 Junior Poster in Training · Answer 8 · 2014-08-05T18:05:24+00:00

Ok i will keep that in mind i jsu treally could use some help with the subfolder thing cause thats my last hurdle and this code will be finished and ready to be deployed

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 9 · 2014-08-05T20:29:55+00:00

Perhaps you could describe a small hierarchy of folders, then tell us which one is the good one and why, assuming that you are searching a single folder in the hierarchy.

abaddon2031 0 Junior Poster in Training · Answer 10 · 2014-08-05T21:09:16+00:00

right now it goes base directoy, then folder containing the beds, then subfolders of days of the month that contain the print information. what im wanting to do is search the subfolders for the current date no matter how its formated or a date that has been input through the argument so that it finds the correct subfolder and cna then do its magic with breaking the print files names up and writing out the information.

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 11 · 2014-08-06T03:42:18+00:00

This does not tell us which one is the correct subfolder and why. We don't have the names of the folders.

abaddon2031 0 Junior Poster in Training · Answer 12 · 2014-08-06T15:10:30+00:00

the sub folders could be named in any four of these formats: 08-01, 8-01, 08-1, 8-1. It could be any of those but jsut different days of the month. So the correct would would be either the one that is 08-01 or which ever one of the days matches the current day the best. OS for example today is formated as 8-6 where the args.fileDate says todays date is 08-06.

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 13 · 2014-08-06T18:53:18+00:00

Here is a progran to create an example hierarchy of folders and search a correct folder. The idea is to extract a pair of numbers from the names and compare this pair to a target pair

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import datetime
import os
import os.path
import re

def random_folder_names(cnt):
    """this function was used to create random folder names"""
    import random
    def year():
        day = datetime.timedelta(days = 1)
        start = datetime.date(2012, 1, 1)
        for i in range(366):
            yield start
            start += day
    year = list(year())
    def names():
        for day in year:
            m, d = day.month, day.day
            for fmt in ("{}-{}","{:0>2s}-{}","{}-{:0>2s}","{:0>2s}-{:0>2s}"):
                yield fmt.format(str(m), str(d))
    names = sorted(set(names()))
    return random.sample(names, cnt)

def create_example_hierarchy():
    """Create a random hierarchy of directories for testing purposes"""
    import shutil
    base = 'example_base'
    beds = 'beds'
    other = 'other'
    folder_names = [
        '6-5', '6-14', '5-03', '7-8', '5-21',
        '09-02', '03-27', '08-14', '06-30', '4-20',
        '06-13', '07-30', '11-07', '12-01', '10-29',
        '10-03', '12-5', '3-04', '7-26', '10-14',
        '01-14', '3-28', '5-09', '10-21', '6-18'
        ]
    try:
        shutil.rmtree(base)
    except OSError:
        pass
    os.makedirs(os.path.join(base, beds))
    os.makedirs(os.path.join(base, other))
    for name in folder_names:
        os.mkdir(os.path.join(base, beds, name))

def dir_sequence():
    """returns the sequence of subdirs"""
    return next(os.walk('example_base/beds'))[1]

def extract_md(dirname):
    """extracts month and day as a tuple of 2 integers"""
    t = tuple(int(x) for x in re.findall(r'\d+', dirname))
    assert len(t) == 2
    return t

if __name__ == '__main__':
    create_example_hierarchy()
    print "directories:", dir_sequence()
    file_date = '3-4'
    pair = extract_md(file_date)
    correct_dir = [d for d in dir_sequence() if extract_md(d) == pair][0]
    print 'correct_dir:', correct_dir

The output is

directories: ['10-29', '5-03', '12-5', '7-8', '6-14', '5-09', '3-28', '10-03', '06-30', '5-21', '10-14', '09-02', '12-01', '7-26', '07-30', '11-07', '3-04', '10-21', '06-13', '01-14', '4-20', '6-18', '6-5', '03-27', '08-14']
correct_dir: 3-04

24a708dcca8f5f22d0e709f028a7fd55

abaddon2031 0 Junior Poster in Training · Answer 14 · 2014-08-06T19:27:46+00:00

Thank you for all the help i actually figured out something simpler.

import datetime
parser.add_argument('-d', help='Subfolder Time', required=False, dest ='fileTime', metavar='Subfolder Time', default=datetime.datetime.now().strftime("%Y-%m-%d"))
args = parser.parse_args()

fileDate = datetime.datetime.strptime(args.fileTime, "%Y-%m-%d")
day = int(fileDate.strftime('%d'))
month = int(fileDate.strftime('%m'))
for dirn in ['{:02d}-{:02d}'.format(day,month), '{:d}-{:02d}'.format(day,month), '{:d}-{:02d}'.format(day,month), '{:d}-{:d}'.format(day,month)]:
    print dirn + ' exists: ' + str(os.path.exists(dirn))

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 15 · 2014-08-06T19:47:08+00:00

Gribouillis 1,391 Programming Explorer

10 Years Ago

Simplicity is in the eye of the beholder ;)