First-timer and first post. I have been given a task of convert large data files to/from a self-defined format. In the "xml" form, the format for each record is:
<power> <name> </name> <level> </level> <kind> </kind> <source> </source> <flavor> </flavor> <type> </type> <keywords> </keywords> <action> </action> <attackTypeAndRange> </attackTypeAndRange> <trigger></trigger> <prerequisiteText> </prerequisiteText> <target> </target> <attack> </attack> <attackModifier> </attackModifier> <defense> </defense> <hit> </hit> <hitDamageModifier> </hitDamageModifier> <miss> </miss> <effect></effect> <notes></notes> </power>
Each set of data contained within the <power></power> tags is considered one database "record". I've written code (see below) that tears through this type of file, strips the tags and spaces around the text, then joins them into one string (with </power> signifying end of line). The code that I have written works like a charm but it's a bit brute force and inelegant. This was proven when I was requested to maintain the tag structure of the data (ie: if there is no data in a given tag or if a tag isn't found inside a record, then I fill in the missing tag with "None" as the value, while maintaining the tag order. This is to allow us to re-import the data into a database and maintain the link order of our data to the form inside.
What I would like to do is the following:
Read in the .xml file
Strip the tabs and spaces out, leaving a clean string version of a line of text.
Find some way of comparing the order of tags of the list to the order of tags in my structure, and filling in the missing tag name with the value set to "None".
Join the data, between <Name> and <Notes>, into a single string (as in my code) and write out the resulting data to a file.
Questions about files:
1) Is there any way to read blocks of text that the user defines (in this case, read from <power> to </power>), do my comparison, then load in the next block?
2) Is a list the best way to handle the incoming file data?
Sorry if I am asking a lot. I've looked through all the Python docs and have tried varied approaches, but nothing seems to provide what I need.
Hoping anyone can help,
import re # This will read in the Comma Separated file and sort out each Power database record # # Define the location of your xml filename below # xml_read = (open("C:/infile.xml")) # # Define the location of your csv save filename below # outFile = open('C:/outfile.csv','w') # def remove_tags(data): p = re.compile(r'<.*?>') return p.sub('', data) xmlEntry = # for line in xml_read: stripText = line.strip('\t') stripTags =remove_tags(stripText) stripText = stripTags.strip() if stripText == "": xmlEntry.append ('\n') else: finalText=(r'"'+stripText+'"') xmlEntry.append (finalText) csvEntry = ''.join([('%s') %each for each in xmlEntry]) csvLine = (csvEntry.replace('""','","')) print csvLine outFile.write(csvLine) outFile.close() # # Done!