I am currently troubleshooting a utility that I have been working on. The main file of the utility is below. Please not that `flatten_dict` and `makerows` are seperate files and functions, respectively.
My objectives are to:
* Recursively traverse a directory - done
* Find XML and text files only - any thoughts on how to build this into the function below would be very helpful
* Print these files out to the console - done
* Apply `flatten_dict` to take each of the txt or XML files and
1) parse it into a string of key value pairs (done) and
2) `makerows` to write out to CSV (done).
**MAIN CONUNDRUM**: How do I modify the below syntax for writing out to CSV:
writer = csv.writer(open("save.csv", 'wt'))
writer.writerows(self.makerows(flatten_dict(root)))
such that I can write out an individual CSV for each XML/text file that is inputted for processing?
# import the os.path library
import os.path
#import the sys library
import sys
from parsexml2 import flatten_dict, ElementTree
import csv
# The class name
class IterateFiles(object):
#helper function for generator object for writing out to CSV
def makerows(self, pairs):
#write out to CSV
headers = []
columns = {}
for k, v in pairs:
if k in columns:
columns[k].extend((v,))
else:
headers.append(k)
columns[k] = [k, v]
m = max(len(c) for c in columns.values())
for c in columns.values():
c.extend('' for i in range(len(c), m))
L = [columns[k] for k in headers]
rows = list(zip(*L))
return rows
def open_and_parse(self, filename):
try:
with open(filename, 'r', encoding='utf-8') as f:
xml_string = f.read()
xml_string= xml_string.replace('�', '') #optional to remove ampersands.
root = ElementTree.XML(xml_string)
for item in root:
print(root)
writer = csv.writer(open("save.csv", 'wt'))
writer.writerows(self.makerows(flatten_dict(root)))
except:
raise IOError("it's monday and the sun is shining")
# A function which iterates through the directory
def findFiles(self, directory):
# check whether the current directory exits
if os.path.exists(directory):
# check whether the given directory is a directory
if os.path.isdir(directory):
# list all the files within the directory
dirFileList = os.listdir(directory)
# Loop through the individual files within the directory
for filename in dirFileList:
# Check whether file is directory or file
if(os.path.isdir(os.path.join(directory,filename))):
print(os.path.join(directory,filename) + \
' is a directory and therefore ignored!')
elif(os.path.isfile(os.path.join(directory,filename))):
# print(os.path.join(directory,filename))
print(os.path.basename(filename))
self.open_and_parse(filename)
else:
print(filename + ' is NOT a file or directory!')
else:
print(directory + ' is not a directory!')
else:
print(directory + ' does not exist!')
def run(self):
# Set the folder to search
searchFolder = 'C:\\Users\\samples\\'
self.findFiles(searchFolder)
# Run the script from command line – note the two underscores
if __name__ == '__main__':
obj = IterateFiles()
obj.run()
Thanks.
Saran_1
0
Junior Poster in Training
Recommended Answers
Jump to PostYou can write
dest = self.destination_csv(filename) with open(dest, 'wt') as fh: writer = csv.writer(fh) writer.writerows(self.makerows(flatten_dict(root)))
Then you need a method
def destination_csv(self, filename): """Compute a destination filename from a source filename for example if filename is C:\foo\bar\baz\awesomedata.xml the result could be …
Jump to PostYou did not understand my advice the function
destination_csv()
is only supposed to take a filename argument (such asC:\foo\bar\baz\awesomedata.xml
) and return another string, such as"C:\\Users\\Desktop\\Playground\\Samples\\CSV_Records\\awesomedata.csv"
. It is not at all supposed to open or parse the file.
Jump to PostThe directory needs to be made only once.
destination_csv()
does not rename the XML file. It only creates a new destination filename where the csv data will be written without modifying the source XML file. The name of the destination file is built from the name of the source file, …
Jump to PostYou don't need to change the initial
findFiles()
function which callsopen_and_parse()
. Thedestination_csv()
function must not rename any file nor iterate over a listdir etc.In your original code, you only need to replace lines 60 and 61 with
dest = self.destination_csv(filename) with open(dest, …
Jump to PostIndentation of line 87 is incorrect. Now here is an hypothetical example which shows how destination_csv() should work
>>> obj.destination_csv('C:\\Playground\\Samples\\FOO.xml') C:\Playground\Samples\CSV_Reports\FOO.csv
Edit: I shortened the path for the example.
All 21 Replies
Gribouillis
1,391
Programming Explorer
Team Colleague
Saran_1
0
Junior Poster in Training
Gribouillis
1,391
Programming Explorer
Team Colleague
Saran_1
0
Junior Poster in Training
Gribouillis
1,391
Programming Explorer
Team Colleague
Saran_1
0
Junior Poster in Training
Gribouillis
1,391
Programming Explorer
Team Colleague
Saran_1
0
Junior Poster in Training
Gribouillis
1,391
Programming Explorer
Team Colleague
Saran_1
0
Junior Poster in Training
Gribouillis
1,391
Programming Explorer
Team Colleague
Saran_1
0
Junior Poster in Training
Saran_1
0
Junior Poster in Training
Gribouillis
1,391
Programming Explorer
Team Colleague
Saran_1
0
Junior Poster in Training
Saran_1
0
Junior Poster in Training
Gribouillis
1,391
Programming Explorer
Team Colleague
Saran_1
0
Junior Poster in Training
Saran_1
0
Junior Poster in Training
Gribouillis
1,391
Programming Explorer
Team Colleague
Saran_1
0
Junior Poster in Training
Be a part of the DaniWeb community
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.