Reassigning keys in dictionary of lists and then writing out to CSV file? - Page 3

Question

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 1 · 2015-06-24T20:07:06+00:00

Because you changed my functions flatten_dict() and flatten_list(). Take the versions I wrote above.

Saran_1 0 Junior Poster in Training · Answer 2 · 2015-06-24T20:20:31+00:00

I did as per your directive. The header output in CSV format should be:

DateTime1 DateTime2 DateTime3

1/1/0001 12:00:00 AM 1/1/0001 12:00:00 AM 1/1/0001 12:00:00 AM

Instead, I received this...do I enumerate as before?

DateTime

1/1/0001 12:00:00 AM

I followed the same format for organizing the functions like you have done so.

I used this to call the function:

def main():

    with open('2-Response.xml', 'r', encoding='utf-8') as f: 
        xml_string = f.read() 
    xml_string= xml_string.replace('&#x0;', '') #optional to remove ampersands. 
    root = ElementTree.XML(xml_string) 
    for item in root:
        print(root)
    writer = csv.writer(open("test_out.csv", 'wt'))
    writer.writerows(makerows(pairs_from_root(root)))

if __name__ == "__main__":
        main()

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 3 · 2015-06-24T20:25:33+00:00

In the xml file, there is no datetime1 datetime2 datetime3. The 1 2 3 etc must be added somewhere. That's what I meant when I said earlier that I dont understand your rules for key generation (or header generation).

When I say what do you want to have instead of 'foo.bar.baz.datetime' and you tell me that you want datetime, you get datetime. If you want datetime1, you must give a very precise and descriptive way to know how 'foo.bar.baz.datetime' becomes datetime1. I can not invent this rule, nor can python.

Saran_1 0 Junior Poster in Training · Answer 4 · 2015-06-24T20:30:12+00:00

Just like we applied:

<Response ID="24856-775" RequestType="Moverview">        
        <MonthDayCount>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
        </MonthDayCount>
        <Warnings />
        <SList />
        <LList />
        <EA>Y</EA>
        <EHA>Y</EHA>
        <EBY>Y</EBY>
        <EOTH>Y</EOTH>
        <EIL>Y</EIL>
        <EM>Y</EM>
        <ED>Y</ED>
        <EQ>Y</EQ>
        <ERS>Y</ERS>
        <ECCS>Y</ECCS>
        <EES>Y</EES>
        <UAS>Y</UAS>
        <PA>False</PA>
        <PL>False</PL>
        <PC>False</PC>
        <PCs>False</PCs>
        <PJ>False</PJ>
        <OITC>0</OITC>
        <MG />
        <R />
        <CCGoods />
</Response>

We used the modified function:

def flatten_dict(parent_element, prefix=''):
    prefix = prefix + parent_element.tag + '.'
    if parent_element.items():
        for k, v in parent_element.items():
            yield prefix + k, v
    for element in parent_element:
        eprefix = prefix + element.tag + '.'
        if element:
            # treat like dict - we assume that if the first two tags 
            # in a series are different, then they are all different. 
            if len(element) == 1 or element[0].tag != element[1].tag: 
                yield from flatten_dict(element, prefix=prefix)
            # treat like list - we assume that if the first two tags 
            # in a series are the same, then the rest are the same. 
            else: 
                # here, we put the list in dictionary; the key is the 
                # tag name the list elements all share in common, and 
                # the value is the list itself
                yield from flatten_list(element, prefix=eprefix+element[0].tag+'.')
            # if the tag has attributes, add those to the dict
            if element.items():
                for k, v in element.items():
                    yield eprefix+k, v 
        # this assumes that if you've got an attribute in a tag, 
        # you won't be having any text. This may or may not be a 
        # good idea -- time will tell. It works for the way we are 
        # currently doing XML configuration files... 
        elif element.items(): 
            for k, v in element.items():
                yield eprefix+k, v
        # finally, if there are no child tags and no attributes, extract 
        # the text 
        else:
            yield eprefix.rstrip('.'), element.text

to get this output......

('Response.RequestType', 'Moverview')
('Response.ID', '24856-775')
('Response.MonthDayCount.1', '0')
('Response.MonthDayCount.2', '0')
('Response.MonthDayCount.3', '0')
('Response.MonthDayCount.4', '0')
('Response.MonthDayCount.5', '0')
('Response.MonthDayCount.6', '0')
('Response.MonthDayCount.7', '0')
('Response.MonthDayCount.8', '0')
('Response.MonthDayCount.9', '0')
('Response.MonthDayCount.10', '0')
('Response.MonthDayCount.11', '0')
('Response.MonthDayCount.12', '0')
('Response.MonthDayCount.13', '0')
('Response.MonthDayCount.14', '0')
('Response.MonthDayCount.15', '0')
('Response.MonthDayCount.16', '0')
('Response.MonthDayCount.17', '0')
('Response.MonthDayCount.18', '0')
('Response.MonthDayCount.19', '0')
('Response.MonthDayCount.20', '0')
('Response.MonthDayCount.21', '0')
('Response.MonthDayCount.22', '0')
('Response.MonthDayCount.23', '0')
('Response.MonthDayCount.24', '0')
('Response.MonthDayCount.25', '0')
('Response.Warnings.', None)
('Response.SList.', None)
('Response.LList.', None)
('Response.EA.', 'Y')
('Response.EHA.', 'Y')
('Response.EBY.', 'Y')
('Response.EOTH.', 'Y')
('Response.EIL.', 'Y')
('Response.EM.', 'Y')
('Response.ED.', 'Y')
('Response.EQ.', 'Y')
('Response.ERS.', 'Y')
('Response.ECCS.', 'Y')
('Response.EES.', 'Y')
('Response.UAS.', 'Y')
('Response.PA.', 'False')
('Response.PL.', 'False')
('Response.PC.', 'False')
('Response.PCs.', 'False')
('Response.PJ.', 'False')
('Response.OITC.', '0')
('Response.MG.', None)
('Response.R.', None)
('Response.CCGoods.', None)

We then used makerows function to write out such that all of the Tag elements would be the keys and assigned as headers and the text (if any would be the values). That seems to contradict the logic for when we have nested lists.

My expectation was that

MonthDayCount1 MonthDayCount2 MonthDayCount3 etc.... would be the headers with their values as the "rows"

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 5 · 2015-06-24T20:41:17+00:00

The question is why MonthDayCount1, MonthDayCount2 etc and not Int321, Int322, etc. By which rule do the Int32 vanish ?

Saran_1 0 Junior Poster in Training · Answer 6 · 2015-06-24T20:45:59+00:00

Those were replaced with the flatten_list function, correct?

ef flatten_list(aList, prefix=''):
    for i, element in enumerate(aList, 1):
        eprefix = "{}{}".format(prefix, i)
        if element:
            # treat like dict 
            if len(element) == 1 or element[0].tag != element[1].tag: 
                yield from flatten_dict(element, eprefix)
            # treat like list 
            elif element[0].tag == element[1].tag: 
                yield from flatten_list(element, eprefix)
        elif element.text: 
            text = element.text.strip() 
            if text: 
                yield eprefix, text

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 7 · 2015-06-24T20:48:39+00:00

The code will not define the rules. It works the other way: you define precise rules, then they can be implemented in code. Otherwise the program will work for this example xml file but not with another.

Saran_1 0 Junior Poster in Training · Answer 8 · 2015-06-24T20:51:16+00:00

Uunderstood - I will try to work backwards and follow your advice

Saran_1 0 Junior Poster in Training · Answer 9 · 2015-06-25T16:31:09+00:00

I am still attempting to remove the chaiend elements to the index element. So far, I have attempted to do the following using this sample:

<Response ID="24856-775" RequestType="Moverview">        
        <MonthDayCount>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
            <Int32>0</Int32>
        </MonthDayCount>
            <FeeCount>
                    <Int32>0</Int32>
                    <Int32>0</Int32>
                    <Int32>0</Int32>
                    <Int32>0</Int32>
                    <Int32>0</Int32>
                    <Int32>0</Int32>
            </FeeCount>
            <PaymentBucketAmount>
                        <Double>0</Double>
                        <Double>0</Double>
                        <Double>0</Double>
                        <Double>0</Double>
                        <Double>0</Double>
                        <Double>0</Double>
                        <Double>0</Double>
                        <Double>0</Double>
                    </PaymentBucketAmount>
                    <PaymentBucketDueDate>
                        <DateTime>1/1/0001 12:00:00 AM</DateTime>
                        <DateTime>1/1/0001 12:00:00 AM</DateTime>
                        <DateTime>1/1/0001 12:00:00 AM</DateTime>
                        <DateTime>1/1/0001 12:00:00 AM</DateTime>
                        <DateTime>1/1/0001 12:00:00 AM</DateTime>
                        <DateTime>1/1/0001 12:00:00 AM</DateTime>
                        <DateTime>1/1/0001 12:00:00 AM</DateTime>
                        <DateTime>1/1/0001 12:00:00 AM</DateTime>
                    </PaymentBucketDueDate>
        <Warnings />
        <SList />
        <LList />
        <PA>False</PA>
        <PL>False</PL>
        <PC>False</PC>
        <PCs>False</PCs>
        <PJ>False</PJ>
        <OITC>0</OITC>
        <MG />
        <R />
        <CCGoods />
</Response>

Using this:

import xml.etree.cElementTree as ElementTree 
from xml.etree.ElementTree import XMLParser
import csv

def flatten_list(aList, prefix=''):
    for i, element in enumerate(aList, 1):
        eprefix = "{}{}".format(prefix, i)
        if element:
            # treat like dict 
            if len(element) == 1 or element[0].tag != element[1].tag: 
                yield from flatten_dict(element, eprefix)
            # treat like list 
            elif element[0].tag == element[1].tag: 
                yield from flatten_list(element, eprefix)
        elif element.text: 
            text = element.text.strip() 
            if text: 
                yield eprefix[:].rstrip('.'), element.text

def flatten_dict(parent_element, prefix=''):
    prefix = prefix + parent_element.tag 
    if parent_element.items():
        for k, v in parent_element.items():
            yield prefix + k, v
    for element in parent_element:
        eprefix = prefix + element.tag  
        if element:
            # treat like dict - we assume that if the first two tags 
            # in a series are different, then they are all different. 
            if len(element) == 1 or element[0].tag != element[1].tag: 
                yield from flatten_dict(element, prefix=prefix)
            # treat like list - we assume that if the first two tags 
            # in a series are the same, then the rest are the same. 
            else: 
                # here, we put the list in dictionary; the key is the 
                # tag name the list elements all share in common, and 
                # the value is the list itself
                yield from flatten_list(element, prefix=eprefix)
            # if the tag has attributes, add those to the dict
            if element.items():
                for k, v in element.items():
                    yield eprefix+k
        # this assumes that if you've got an attribute in a tag, 
        # you won't be having any text. This may or may not be a 
        # good idea -- time will tell. It works for the way we are 
        # currently doing XML configuration files... 
        elif element.items(): 
            for k, v in element.items():
                yield eprefix+k
        # finally, if there are no child tags and no attributes, extract 
        # the text 
        else:
            yield eprefix, element.text                

def makerows(pairs):
    headers = []
    columns = {}
    for k, v in pairs:
        if k in columns:
            columns[k].extend((v,))
        else:
            headers.append(k)
            columns[k] = [k, v]
    m = max(len(c) for c in columns.values())
    for c in columns.values():
        c.extend(' ' for i in range(len(c), m))
    L = [columns[k] for k in headers]
    rows = list(zip(*L))
    return rows                   


def main():
    with open('sample.xml', 'r', encoding='utf-8') as f: 
        xml_string = f.read() 
    xml_string= xml_string.replace('&#x0;', '') #optional to remove ampersands. 
    root = ElementTree.XML(xml_string) 
    for key, value in flatten_dict(root):
        key = key.rstrip('.').rsplit('.', 1)[-1]
        print(key,value)            

if __name__ == "__main__":
    main()

I receive this output:

ResponseRequestType Moverview
ResponseID 24856-775
ResponseMonthDayCount1 0
ResponseMonthDayCount2 0
ResponseMonthDayCount3 0
ResponseMonthDayCount4 0
ResponseMonthDayCount5 0
ResponseMonthDayCount6 0
ResponseMonthDayCount7 0
ResponseMonthDayCount8 0
ResponseMonthDayCount9 0
ResponseMonthDayCount10 0
ResponseMonthDayCount11 0
ResponseMonthDayCount12 0
ResponseMonthDayCount13 0
ResponseMonthDayCount14 0
ResponseMonthDayCount15 0
ResponseMonthDayCount16 0
ResponseMonthDayCount17 0
ResponseMonthDayCount18 0
ResponseMonthDayCount19 0
ResponseMonthDayCount20 0
ResponseMonthDayCount21 0
ResponseMonthDayCount22 0
ResponseMonthDayCount23 0
ResponseMonthDayCount24 0
ResponseMonthDayCount25 0
ResponseFeeCount1 0
ResponseFeeCount2 0
ResponseFeeCount3 0
ResponseFeeCount4 0
ResponseFeeCount5 0
ResponseFeeCount6 0
ResponsePaymentBucketAmount1 0
ResponsePaymentBucketAmount2 0
ResponsePaymentBucketAmount3 0
ResponsePaymentBucketAmount4 0
ResponsePaymentBucketAmount5 0
ResponsePaymentBucketAmount6 0
ResponsePaymentBucketAmount7 0
ResponsePaymentBucketAmount8 0
ResponsePaymentBucketDueDate1 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate2 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate3 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate4 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate5 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate6 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate7 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate8 1/1/0001 12:00:00 AM
ResponseWarnings None
ResponseSList None
ResponseLList None
ResponsePA False
ResponsePL False
ResponsePC False
ResponsePCs False
ResponsePJ False
ResponseOITC 0
ResponseMG None
ResponseR None
ResponseCCGoods None

When I write it out to the CSV, using :

writer = csv.writer(open("try2.csv", 'wt')) 
writer.writerows(makerows(flatten_dict(root)))`

I still receive the headers with the Response chained to the subelements of the root wit the tags' text as the values (which is just fine). I am perplexed by the algorithm as my goal is to only have the sublements as the headers (along with their values). I would appreciate any direction on where to start working or to further understand in your function- I still relish the problem solving aspect of deconstructing your generators for the ActiveState recipe. Thanks!