Saran_1 0 Junior Poster in Training

Here is the CSV file that I am working with:

`"A","B","C","D","E","F","G","H","I","J"

"88",18,1,"<Req TID=""34"" ReqType=""MS""><IISO /><CID>2</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>","<Response T=""3"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",0-JAN-10 12.00.02 AM,27-JUN-15 12.00.00 AM,"26",667,0
"22",22,1,"<Req TID=""45"" ReqType=""MS""><IISO /><CID>4</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>","<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",0-JAN-22 12.00.02 AM,27-JUN-22 12.00.00 AM,"26",667,0
"32",22,1,"<Req TID=""15"" ReqType=""MS""><IISO /><CID>45</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>","<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",0-JAN-20 12.00.02 AM,27-JUN-34 12.00.00 AM,"26",667,0`

The below function is annotated. Briefly,the function get_clientresponses_two reads the above CSV, selects column E's data instances (XML data).There are two two generator functions to parse the XML data in **column E **in order to convert the XML tags and their text into a Python dictionary. Specifically, the flatten_dict() function returns an iterable sequence of (key, value) pairs. One can turn this to a list of pairs with list(flatten_dict(root)).

The output, as it is written so far is generate a dictionary. Then, def allocate_and_write_data_ then takes those and creates two different collections. One is a set that is updated using the keys from flatten_dict() . This is to ensure that the element tags from the XML are included in the headers (along with their corresponding values) in the newly written CSV. The code is written such to maintain the integrity of the headers (no duplicates) and allow for new element tags to be converted into headers (along with their values). Moreover, the headers and values that already exist should be flexible enough to be updated with new instances (again - unique, as well). In addition all the other rows are intended to be stored and updated. I then conver the headers into a list and ensure that any missing data instances are accounted for (with a ' ') using the list comprehension data

import csv
from collections import OrderedDict
from xml.etree.ElementTree import ParseError
import collections 
from __future__ import print_function


def get_clientresponses_two(filename = 's.csv'):

    with open(filename, 'rU') as infile:
        reader = csv.DictReader(infile)               # read the file as a dictionary for each row ({header : value})
        data = {}
        for row in reader:
            for header, value in row.items():
                try:
                    data[header].append(value)
                except KeyError:
                    data[header] = [value]

        client_responses = data['E'] #returns a list
        for client_response in client_responses:
            xml_string = (''.join(client_response))
            xml_string = xml_string.replace('&amp;', '')
            try:
                root = ElementTree.XML(xml_string)
                print(root)   #check that I am getting the root 
                return dict(flatten_dict(root)  ####return generator function with output as XML root as dictionary
            except ET.ParseError:
                print("catastrophic failure")
                continue

def allocate_and_write_data_2(get_clientresponses_two):

    with open(filename, 'r') as infile:
        reader = csv.DictReader(infile)         # read the file as a dictionary for each row ({header : value})
        header = set()
        results = []
        #     data = {} # this is not needed for the purpose of this organization
        for row in reader:
            xml_data = get_clientresponses_two()
            row.update(xml_data) # just for XML data
            results.append(row)         # everything else
            headers.update(row.keys())  # can't forget headers

    #     print(row) # returns dictionary of key values pairs (headers : values)
    #     print(results) # returns list wrapper for dictionary
    #     print(headers) #returns set of all headers
        headers_list = list(headers)
    #     print(headers_list) #list form of set

        with open('csv_output.csv', 'wt') as f:
            writer = csv.writer(f)
            writer.writerow(headers_list)
            for row in results:
                data = [row.get(x, '') for x in headers_list]
                writer.writerow(data)

The output is the following:

C,HPhone,Locator,IISO,E,S,FName,LaName,J,D,MemID,ResponseRequestType,T,Email,I,Ob,G,MemPass,Address,A,PrivateMembers,H,CNum,ResponseT,CID,B,F
1,,,,"<Response T=""3"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",[REMOVED],,,0,"<Req TID=""34"" ReqType=""MS""><IISO /><CID>2</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>",,MS,,,667,0-12-af,27-JUN-15 12.00.00 AM,,,88,,26,[REMOVED],10,,18,0-JAN-10 12.00.02 AM
1,,,,"<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",[REMOVED],,,0,"<Req TID=""45"" ReqType=""MS""><IISO /><CID>4</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>",,MS,,,667,0-12-af,27-JUN-22 12.00.00 AM,,,22,,26,[REMOVED],10,,22,0-JAN-22 12.00.02 AM
1,,,,"<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",[REMOVED],,,0,"<Req TID=""15"" ReqType=""MS""><IISO /><CID>45</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>",,MS,,,667,0-12-af,27-JUN-34 12.00.00 AM,,,32,,26,[REMOVED],10,,22,0-JAN-20 12.00.02 AM

However, my intention is to minimify, refactor and modularize the code. I would like to ideally have the processes still written to memory. Moreover, when I try to iterate through the generator, I receive an error:

TypeError: 'function' object is not iterable. I would like guidance and feedback with specifically how to best rectify this.