Hi everyone,

I have a fairly simple problem, but having not used python in awhile, I just can't seem to get things working.
Basically, I have a text file with a number of comma separated fields (attached).
What I want to do is split the string, and extract the "File" item from each line. I then need to write this to a new file. (I also want to skip the first line.)

So my desired output file would just have:
2008308_017_079.tif
2008308_017_080.tif
2008308_017_081.tif
etc...

If anyone out there could help me with this, I'd be very grateful!

Attachments
"Id","File","Easting","Northing","Alt","Omega","Phi","Kappa","Photo","Roll","Line","Roll_line","Orient","Camera"
1800,2008308_017_079.tif,530658.110,5005704.180,2031.100000,0.351440,-0.053710,0.086470,79,2008308,17,308_17,rightX,Jen73900229d
1801,2008308_017_080.tif,531793.060,5005709.230,2033.170000,0.385000,-0.044790,-0.057690,80,2008308,17,308_17,rightX,Jen73900229d
1802,2008308_017_081.tif,532930.810,5005709.150,2032.250000,0.350180,-0.044950,0.271100,81,2008308,17,308_17,rightX,Jen73900229d
1803,2008308_017_082.tif,534066.230,5005706.620,2037.630000,0.345480,-0.036860,0.234700,82,2008308,17,308_17,rightX,Jen73900229d
1804,2008308_017_083.tif,535212.280,5005706.990,2037.470000,0.336650,-0.045540,0.306690,83,2008308,17,308_17,rightX,Jen73900229d
1805,2008308_017_084.tif,536359.740,5005707.850,2033.760000,0.333610,-0.050390,0.086950,84,2008308,17,308_17,rightX,Jen73900229d
1806,2008308_017_085.tif,537494.570,5005708.610,2035.620000,0.343970,-0.052050,0.303690,85,2008308,17,308_17,rightX,Jen73900229d
1807,2008308_017_086.tif,538627.990,5005709.840,2035.100000,0.328450,-0.054550,-0.091990,86,2008308,17,308_17,rightX,Jen73900229d
1808,2008308_017_087.tif,539779.710,5005708.090,2030.540000,0.326280,-0.057570,0.227650,87,2008308,17,308_17,rightX,Jen73900229d
1809,2008308_017_088.tif,540906.110,5005711.370,2032.730000,0.347700,-0.029520,0.389650,88,2008308,17,308_17,rightX,Jen73900229d
2268,2008310_016_008.tif,540912.710,5003700.770,2010.400000,-0.323050,0.056930,179.710620,8,2008310,16,310_16,left+X,Jen73900229d
2269,2008310_016_007.tif,539788.120,5003693.790,2014.890000,-0.345960,0.084340,179.153550,7,2008310,16,310_16,left+X,Jen73900229d
2270,2008310_016_006.tif,538654.060,5003698.770,2027.840000,-0.331110,0.057120,179.118960,6,2008310,16,310_16,left+X,Jen73900229d
2271,2008310_016_005.tif,537504.470,5003715.740,2026.880000,-0.326870,0.043910,178.785490,5,2008310,16,310_16,left+X,Jen73900229d
2272,2008310_016_004.tif,536349.200,5003739.500,2010.940000,-0.329510,0.060200,179.274040,4,2008310,16,310_16,left+X,Jen73900229d
2273,2008310_016_003.tif,535232.560,5003746.840,2009.070000,-0.329740,0.053120,179.544540,3,2008310,16,310_16,left+X,Jen73900229d
2274,2008310_016_002.tif,534088.210,5003743.760,2024.100000,-0.326980,0.045690,179.670860,2,2008310,16,310_16,left+X,Jen73900229d
2275,2008310_016_001.tif,532945.090,5003737.280,2027.930000,-0.359200,0.060830,179.319580,1,2008310,16,310_16,left+X,Jen73900229d
2276,2008310_015_088.tif,536328.710,5001730.620,2019.370000,0.340480,-0.039560,-0.596140,88,2008310,15,310_15,rightX,Jen73900229d
2277,2008310_015_089.tif,537474.370,5001721.580,2007.930000,0.348600,-0.061310,-0.316810,89,2008310,15,310_15,rightX,Jen73900229d
2278,2008310_015_090.tif,538611.770,5001705.930,2008.260000,0.343580,-0.043240,0.696690,90,2008310,15,310_15,rightX,Jen73900229d
2279,2008310_015_091.tif,539738.100,5001707.080,2016.300000,0.351750,-0.027060,0.357080,91,2008310,15,310_15,rightX,Jen73900229d
2280,2008310_015_092.tif,540882.920,5001717.380,2024.100000,0.339980,-0.035750,0.330010,92,2008310,15,310_15,rightX,Jen73900229d
3112,2008313_014_240.tif,538621.930,4999720.280,1997.920000,4.276300,2.002480,0.107910,240,2008313,14,313_14,rightX,Jen73900229d
3113,2008313_014_241.tif,539762.130,4999724.300,1989.260000,0.458230,0.112320,-0.054790,241,2008313,14,313_14,rightX,Jen73900229d
3114,2008313_014_242.tif,540894.990,4999726.760,1994.060000,0.463020,0.106710,-0.033460,242,2008313,14,313_14,rightX,Jen73900229d

Quite simple with the code snippet I posted today, only add removing of quoting.

# text based data input with data accessible
# with named fields or indexing
from __future__ import print_function ## Python 3 style printing
from collections import namedtuple
import string

filein = open("cb2.txt")
quotes = '\'\"'
datadict = {}

headerline = filein.readline().lower() ## lowercase field names Python style
## first non-letter and non-number is taken to be the separator
separator = headerline.strip(string.lowercase + string.digits + quotes)[0]
print("Separator is '%s'" % separator)

headerline = [field.strip(string.whitespace + quotes) for field in headerline.split(separator)]
Dataline = namedtuple('Dataline',headerline)
print ('Fields are:',Dataline._fields,'\n')

for data in filein:
    data = [f.strip(string.whitespace + quotes) for f in data.split(separator)]
    d = Dataline(*data)
    datadict[d.id] = d ## do hash of id values for fast lookup (key field)

for id in  datadict.keys():
    print(datadict[id].file)

input('Ready') ## let the output be seen when run directly

One soultion with regular expression,not hard to wirte regex for this just a couple of min.

import re

text = '''\
"Id","File","Easting","Northing","Alt","Omega","Phi","Kappa","Photo","Roll","Line","Roll_line","Orient","Camera"
1800,2008308_017_079.tif,530658.110,5005704.180,2031.100000,0.351440,-0.053710,0.086470,79,2008308,17,308_17,rightX,Jen73900229d
1801,2008308_017_080.tif,531793.060,5005709.230,2033.170000,0.385000,-0.044790,-0.057690,80,2008308,17,308_17,rightX,Jen73900229d
1802,2008308_017_081.tif,532930.810,5005709.150,2032.250000,0.350180,-0.044950,0.271100,81,2008308,17,308_17,rightX,Jen73900229d
1803,2008308_017_082.tif,534066.230,5005706.620,2037.630000,0.345480,-0.036860,0.234700,82,2008308,17,308_17,rightX,Jen73900229d
1804,2008308_017_083.tif,535212.280,5005706.990,2037.470000,0.336650,-0.045540,0.306690,83,2008308,17,308_17,rightX,Jen73900229d
'''

test_match = re.findall(r'\d{7}\_\d{3}\_\d{3}\.\btif\b',text)
print test_match #Give us a list

#Looping over item in list
for item in test_match:
    print item

'''-->Out
['2008308_017_079.tif', '2008308_017_080.tif', '2008308_017_081.tif', '2008308_017_082.tif', '2008308_017_083.tif']
2008308_017_079.tif
2008308_017_080.tif
2008308_017_081.tif
2008308_017_082.tif
2008308_017_083.tif
'''

Edited 6 Years Ago by snippsat: n/a

For simple, inflexible solution you can do only:

filein = open("cb2.txt")
filein.readline() # drop first line
for line in filein:
    print line.split(',')[1]

Thanks tonyjv and Snippsat,

Your suggestions helped me get back on track.

filein = open("cb2.txt")
filein.readline()

for line in filein:
    namedata = []
    namedata = line.split(",")[1]
    print namedata + "\n"
    fileout = open("copyimg.txt" , "a")
    fileout.write(namedata + "\n")
    fileout.close()

Edited 3 Years Ago by mike_2000_17: Fixed formatting

it is better though to move line 8 out of loop to line 3 with less indent. Then also mode 'w' is ok instead of 'a'. Of course closing must do after loop not inside (one indent less)

Also print is providing the newline automatically, if you prefer you can use it also to file like this:

filein = open("cb2.txt")
filein.readline()
fileout = open("copyimg.txt" , "w")

for line in filein:
    namedata = []
    namedata = line.split(",")[1]
    print namedata
    print >>fileout,namedata

fileout.close()

Oh, terrific!
I didn't know that was an option with "print". I know I'll use that method again in the future.
Many thanks for the great help and advice :)

This question has already been answered. Start a new discussion instead.