os.listdir with Unicode in Python 3

Question

soltak 0 Newbie Poster

13 Years Ago

Hi, I'm using Windows and Python 3. I'm having problems using os.listdir with Unicode. Let's say I have a directory which contains files with Unicode file names. The name and path of the directory itself might or might not be Unicode. When it is Unicode, I can't seem to get listdir to accept it as an argument. It always raises a WindowsError exception complaining that the Unicode string isn't a valid directory path and displaying it very literally in the error message. For example, "C:/aXb", with the X representing some particular Unicode character, would be displayed by the error message as "\ufeffC:/a\u6771b\\*.*". When I output the same string to a UTF-8 text file it displays correctly.

I also tried calling listdir using a converted bytes argument instead, by using the string's encode method with a 'utf-8' argument. This works, but when I output the resulting list to a UTF-8 file (decoding it or writing it in binary format), the filenames show up with all the Unicode characters replaced by question marks.

The only time things work properly is when I move the files to a directory with a non-Unicode name and use that directory as a string argument to listdir.

Hopefully that explains the situation. Does anyone know what I'm doing wrong? It's probably something simple, but despite doing a lot of searching I can't figure it out.

python

Edited 13 Years Ago by soltak because: n/a

2 Contributors
2 Replies
2K Views
2 Days Discussion Span
Latest Post 13 Years Ago Latest Post by soltak

Gribouillis 1,391 Programming Explorer

13 Years Ago

Did you try

next(os.walk(directory))

In the doc, it seems that os.walk only works with strings (unicode), so that you shouldn't have encoding/decoding issues.

Edited 13 Years Ago by Gribouillis because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

soltak 0 Newbie Poster · Answer 1 · 2012-01-30T03:10:55+00:00

Thanks for replying, but that ended up having the same problem. But the solution was pretty much what I expected: I forgot to strip the original string of the UTF-8 BOM that I got when reading it in from a file. Thus the ""\ufeff" at the start. For some reason I overlooked that.

I still can't get it to work at all when using bytes arguments though.