Newbie formatting problem

Question

bjoernh 0 Newbie Poster

15 Years Ago

Hi there, I started Python today. My first mini-project is supposed to find strings in a text-file. Here is what I have written:

infile = open("Python/es.txt","r")
text = infile.read()
infile.close()
print text
search = 'du'
index = text.find(search)
if index==-1:
	print "nothing found"
else:
	search, "found at index", index

in ex.txt is written:

m du asdf

I expect an output of "du found at index ..." however I get "nothing found".
the "print text" command returns:

■m

in the notepad++ console
and

■m    d u  a s d f

in the command line console.
Any tips on how to fix?

python

5 Contributors
9 Replies
543 Views
8 Hours Discussion Span
Latest Post 15 Years Ago Latest Post by griswolf

All 9 Replies

Gribouillis 1,391 Programming Explorer

15 Years Ago

Try print repr(text) to see what the string 'text' actually contains.

Also if your file contains non ascii data, you should try the "rb" opening mode.

Edited 15 Years Ago by Gribouillis because: n/a

griswolf 304 Veteran Poster

15 Years Ago

line 10 should be [B]print[/B] search, 'found at index', index With that change, works for me

P.S. Tabs in python files are seen as a newbie mistake. Indents are usually 2 or 3 spaces

Edited 15 Years Ago by griswolf because: n/a

Gribouillis 1,391 Programming Explorer

15 Years Ago

P.S. Tabs in python files are seen as a newbie mistake. Indents are usually 2 or 3 spaces

As a newbie, use the recommended (and widely used) 4 spaces indentation. You can configure your editor to put 4 spaces when you hit the tab key.

Edited 15 Years Ago by Gribouillis because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 1 · 2010-05-27T00:02:56+00:00

TrustyTony 888 ex-Moderator

15 Years Ago

Make sure the encoding of text file is plain ascii.

snippsat 661 Master Poster · Answer 2 · 2010-05-27T00:18:54+00:00

Just to show an alterntive print line with string formatting.
Now you see it find car one time.
Try to change the code so it find both cases off car in the text.

text = '''I like to drive my car.
My car is black.'''    

search_word = 'car'
index = text.find(search_word)
if index == -1:
    print "Nothing found"
else:
    print "%s found at index %s" % (search_word, index)

'''-->Out
car found at index 19
'''

bjoernh 0 Newbie Poster · Answer 3 · 2010-05-27T01:51:45+00:00

Thanks for the replies. I set the default tab-thingie to 4 spaces.
Resaving the txt file as ansi fixed the problem.
Yep, I forgot the "print" in the last line.

I am ultimatly interested in searching for strings containing unicode (chinese charachters).
Do you know how to adjust the code for that? I guess it has something to do with the 'rb'-mode Gribouillis mentioned.

griswolf 304 Veteran Poster · Answer 4 · 2010-05-27T01:57:12+00:00

As long as the unicode characters can be encoded in UCS2 (two-byte unicode) then the behavior is effectively the same since internally, Python characters are UCS2. You may find you need to read the file via some technique to re-encode its contents the same way. Best I recall, all modern Chinese scripts can be encoded in UCS2, so if my memory is correct, you should have no trouble unless you get into historical texts.

If you are searching in long texts, you may want to read the files one line (or one chunk) at a time rather than all at once

bjoernh 0 Newbie Poster · Answer 5 · 2010-05-27T02:11:38+00:00

Hmm, I will investigate this further tomorrow. I do not know yet how to handle UCS2. So thanks again and nightynight.

Edit: A quick search yielded that I should probably switch to Python 3.xx for Unicode stuff.

griswolf 304 Veteran Poster · Answer 6 · 2010-05-27T02:33:17+00:00

All python versions handle unicode (UCS2 encoding). You might want to spend half an hour reading about unicode (which is a concept and an ordered list of characters) versus encodings (which are ways to specify the index of the character) versus script/glyph which are what the character looks like. http://en.wikipedia.org/wiki/Unicode or http://www.unicode.org/faq/basic_q.html

Newbie formatting problem

Recommended Answers Collapse Answers

All 9 Replies

Recommended Answers