I have a very large csv file that had different values for certian fields and im wanting to split it up so that each of the different values gets put into its own csv file for example all of the fields whos value is 1110 would get read and written to a file called Pc_Numbers_1110.csv. Is this even possible and if so could someone give me some help on getting started with this.

Recommended Answers

All 4 Replies

I think there are 2 ways depending on just how big the file really is.

If it can all be read into memory, then I suggest:

        data=[]
        with open(<csv file name>) as fid:   
            lines=fid.read().split('\n')
        for i in lines: 
            data.append(i.split(','))
        data.sort(key=lambda x: x[3])
        for d in data:
            field=d[3]
            with open('pc_Numbers_'+field+'.csv') as fid:
                while d[3]=field: 
                    fid.write(d+'\n')

This assumes the field in question is in the fourth position (index 3).

If the file is really too big for this, then you'll have to read and write each line, changing the file name based on the field. Depending on how many files you expect, you may have to open and close them each time. That could make the program very slow.

there is close to 1600 entries in this file that im wanting to split up and the pc number which is the field im wanting to go by is in collum a and starts in cell 2 if that helps any

1600 entries should not be problem in PC's of today. So why actually you want to do it? Are you really getting memory error? Working in memory is simpler and faster.

pyTony's question is valid. Why split it up? But if you must, then my code above ought to work with a couple of changes. I forgot to open the output file for "write", and you probably want the first (header) row to be skipped:

        data=[]
        with open(<csv file name>) as fid:   
            lines=fid.read().split('\n')
        for i in lines[1:]: 
            data.append(i.split(','))
        data.sort(key=lambda x: x[3])
        for d in data:
            field=d[3]
            with open('pc_Numbers_'+field+'.csv','a') as fid:
                while d[3]=field: 
                    fid.write(d+'\n')
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.