Hi,

I have many files made by a software. I want to extract some data from those files, and when I open the file with textedit, I see that what i need is on the first line. i have many of these files, and when I run the script, it works only for couple of them. Here I put the script I use, and example of a working and not working file. I spend hours on that, it would be great if you can have a look.

import glob

path = "/Users/Desktop/files/" #you might need to modify
ListFiles =glob.glob(path+'*')
print ListFiles

for i in ListFiles:
      print i
      handle = open(i,'r')
      text = handle.readlines()
      print text
      print text[0]
      handle.close()

print text returns a list of lines for both files, but text[0] only works for the second file.
Thanks

Recommended Answers

All 5 Replies

Here's what I would do. Is it better?

import glob
filepat = "/Users/Desktop/files/*.txt" # assuming they all have .txt suffix

file_names = glob.glob(filepat)
print file_names

for fn in file_names:
    print fn
    with open(fn,'r') as f: # with is succinct and useful
      f = open(fn,'r')
      line1 = f.readline().strip() # note: Just one line
      print line1

Gris you forgot about provideing the first word.

line[0]

Nice job mate :)

Gris you forgot about provideing the first word.

line[0]

Nice job mate :)

Thanks for the thumbs up, but in this case it is you who mis-understood: OP's code put all the lines into the array of lines named text using the readlines() (note the plural) method, so text[0] provides the first line, not the first word.

I, on the other hand, accidentally didn't delete the extraneous and wrong line 10 when I copied @aint's program to mine. Sigh. It should be:

import glob
 filepat = "/Users/Desktop/files/*.txt" # assuming they all have .txt suffix
  
 file_names = glob.glob(filepat)
 print(file_names)
  
 for fn in file_names:
     print(fn)
     with open(fn,'r') as f: # with is succinct and useful
         line1 = f.readline().strip() # Just one line with trailing \n removed
         print(line1)

(while I was there, I also fixed a comment and changed the print calls to be Python 3.x compatible)

Hi,
Both codes work as the one I posted, but the problem is still there, it works only on the second txt file, but not on the first one.
actually now I tried to print all the lines one by one from both files, although I see the complete list when I say "print text", print text[x] only works for some of the elements on the list. I tried to see what is different, but did not menage. when I open the file with textedit, it looks okay, and the lines on the list are full of \x00\x00...
It would be great if you can try the files I attached.
thanks

Hi again,

I now menage to overcome the problem. only change I introduced was to open the files with a codec

import glob
import codecs


path = "/Users/koray/Desktop/Files/"
ListFiles =glob.glob(path+'*')
print ListFiles

for i in ListFiles:
      print i
      handle = codecs.open(i, "r", "macroman")
      text = handle.readlines()
      print text[0]

Thanks again for replying

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.