Splitting a csv up by a certain value

Question

abaddon2031 0 Junior Poster in Training

11 Years Ago

I have a very large csv file that had different values for certian fields and im wanting to split it up so that each of the different values gets put into its own csv file for example all of the fields whos value is 1110 would get read and written to a file called Pc_Numbers_1110.csv. Is this even possible and if so could someone give me some help on getting started with this.

python

3 Contributors
4 Replies
234 Views
1 Day Discussion Span
Latest Post 11 Years Ago Latest Post by rrashkin

All 4 Replies

rrashkin 41 Junior Poster in Training

11 Years Ago

I think there are 2 ways depending on just how big the file really is.

If it can all be read into memory, then I suggest:

        data=[]
        with open(<csv file name>) as fid:   
            lines=fid.read().split('\n')
        for i in lines: 
            data.append(i.split(','))
        data.sort(key=lambda x: x[3])
        for d in data:
            field=d[3]
            with open('pc_Numbers_'+field+'.csv') as fid:
                while d[3]=field: 
                    fid.write(d+'\n')

This assumes the field in question is in the fourth position (index 3).

If the file is really too big for this, then you'll have to read and write each line, changing the file name based on the field. Depending on how many files you expect, you may have to open and close them each time. That could make the program very slow.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

abaddon2031 0 Junior Poster in Training · Answer 1 · 2014-04-08T13:02:04+00:00

there is close to 1600 entries in this file that im wanting to split up and the pc number which is the field im wanting to go by is in collum a and starts in cell 2 if that helps any

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 2 · 2014-04-08T19:45:56+00:00

1600 entries should not be problem in PC's of today. So why actually you want to do it? Are you really getting memory error? Working in memory is simpler and faster.

rrashkin 41 Junior Poster in Training · Answer 3 · 2014-04-09T14:07:20+00:00

pyTony's question is valid. Why split it up? But if you must, then my code above ought to work with a couple of changes. I forgot to open the output file for "write", and you probably want the first (header) row to be skipped:

        data=[]
        with open(<csv file name>) as fid:   
            lines=fid.read().split('\n')
        for i in lines[1:]: 
            data.append(i.split(','))
        data.sort(key=lambda x: x[3])
        for d in data:
            field=d[3]
            with open('pc_Numbers_'+field+'.csv','a') as fid:
                while d[3]=field: 
                    fid.write(d+'\n')

Splitting a csv up by a certain value

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers