Hi there, I started Python today. My first mini-project is supposed to find strings in a text-file. Here is what I have written:

infile = open("Python/es.txt","r")
text = infile.read()
infile.close()
print text
search = 'du'
index = text.find(search)
if index==-1:
	print "nothing found"
else:
	search, "found at index", index

in ex.txt is written:

m du asdf

I expect an output of "du found at index ..." however I get "nothing found".
the "print text" command returns:

■m

in the notepad++ console
and

■m    d u  a s d f

in the command line console.
Any tips on how to fix?

Recommended Answers

All 9 Replies

Try print repr(text) to see what the string 'text' actually contains.

Also if your file contains non ascii data, you should try the "rb" opening mode.

line 10 should be [B]print[/B] search, 'found at index', index With that change, works for me

P.S. Tabs in python files are seen as a newbie mistake. Indents are usually 2 or 3 spaces

P.S. Tabs in python files are seen as a newbie mistake. Indents are usually 2 or 3 spaces

As a newbie, use the recommended (and widely used) 4 spaces indentation. You can configure your editor to put 4 spaces when you hit the tab key.

Make sure the encoding of text file is plain ascii.

Just to show an alterntive print line with string formatting.
Now you see it find car one time.
Try to change the code so it find both cases off car in the text.

text = '''I like to drive my car.
My car is black.'''    

search_word = 'car'
index = text.find(search_word)
if index == -1:
    print "Nothing found"
else:
    print "%s found at index %s" % (search_word, index)

'''-->Out
car found at index 19
'''

Thanks for the replies. I set the default tab-thingie to 4 spaces.
Resaving the txt file as ansi fixed the problem.
Yep, I forgot the "print" in the last line.

I am ultimatly interested in searching for strings containing unicode (chinese charachters).
Do you know how to adjust the code for that? I guess it has something to do with the 'rb'-mode Gribouillis mentioned.

As long as the unicode characters can be encoded in UCS2 (two-byte unicode) then the behavior is effectively the same since internally, Python characters are UCS2. You may find you need to read the file via some technique to re-encode its contents the same way. Best I recall, all modern Chinese scripts can be encoded in UCS2, so if my memory is correct, you should have no trouble unless you get into historical texts.

If you are searching in long texts, you may want to read the files one line (or one chunk) at a time rather than all at once

Hmm, I will investigate this further tomorrow. I do not know yet how to handle UCS2. So thanks again and nightynight.

Edit: A quick search yielded that I should probably switch to Python 3.xx for Unicode stuff.

All python versions handle unicode (UCS2 encoding). You might want to spend half an hour reading about unicode (which is a concept and an ordered list of characters) versus encodings (which are ways to specify the index of the character) versus script/glyph which are what the character looks like. http://en.wikipedia.org/wiki/Unicode or http://www.unicode.org/faq/basic_q.html

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.