I have some files, consisting of end of day stock data in the following format :

Filename: NYSE_20120116.txt
<ticker>,<date>,<open>,<high>,<low>,<close>,<vol>
A,20120116,36.15,36.36,35.59,36.19,3327400
AA,20120116,10.73,10.78,10.53,10.64,20457600

How can I create files for every symbol? For example for the company A

Filename : A.txt
<ticker>,<date>,<open>,<high>,<low>,<close>,<vol>
A,20120116,36.15,36.36,35.59,36.19,3327400
A,20120117,39.76,40.39,39.7,39.99,4157900

(I don't want A.txt to contain the first line (the line with <ticker>) and the first column (the column with the symbol A))

I have tried to do it using a bash script, but the script is extremely slow.

Thank you.

I'm not quite clear. You're saying you want to be able to open the file on the fly, and based on what you call, it will automatically ignore the first line and column? Or do you want to alter all the files in one go?

Collect the info to dictionary and write each key to it's own file.

f = ["""\
<ticker>,<date>,<open>,<high>,<low>,<close>,<vol>
A,20120116,36.15,36.36,35.59,36.19,3327400
AA,20120116,10.73,10.78,10.53,10.64,20457600
""", """\
<ticker>,<date>,<open>,<high>,<low>,<close>,<vol>
A,20120117,26.15,36.36,35.59,36.49,3327400
AA,20120117,10.73,20.78,10.53,10.64,20457600
"""
     ]

collect = dict()

for day in f:
    for line in day.splitlines()[1:]:
        key, d = line.split(',', 1)
        collect.setdefault(key, []).append(d)


for key, info in sorted(collect.items()):
    print(key)
    print('\n'.join(info) + '\n')

Output:

A
20120116,36.15,36.36,35.59,36.19,3327400
20120117,26.15,36.36,35.59,36.49,3327400

AA
20120116,10.73,10.78,10.53,10.64,20457600
20120117,10.73,20.78,10.53,10.64,20457600

Edited 4 Years Ago by pyTony

PyTony's way is a nice solution. I just wanted to add that since you already mentioned trying to do this through bash, have you looked into the awk package? It runs through the shell and is apt at manipulating files, extracting rows, columns etc... on the fly through the terminal. My buddy swears by it for on-the-fly data manipulations.

Maybe I was not so clear.
I have 24 files of the type nyse_20120116.
Each such file contain 3200 stock symbols with their open, high, low, close, volume.
I want to create 3200 files of the form stock_name.txt (for example A.txt, AA.txt) with each file containing all the stock data.
I believe it is clear now.

This article has been dead for over six months. Start a new discussion instead.