Hi, I'm using Windows and Python 3. I'm having problems using os.listdir with Unicode. Let's say I have a directory which contains files with Unicode file names. The name and path of the directory itself might or might not be Unicode. When it is Unicode, I can't seem to get listdir to accept it as an argument. It always raises a WindowsError exception complaining that the Unicode string isn't a valid directory path and displaying it very literally in the error message. For example, "C:/aXb", with the X representing some particular Unicode character, would be displayed by the error message as "\ufeffC:/a\u6771b\\*.*". When I output the same string to a UTF-8 text file it displays correctly.

I also tried calling listdir using a converted bytes argument instead, by using the string's encode method with a 'utf-8' argument. This works, but when I output the resulting list to a UTF-8 file (decoding it or writing it in binary format), the filenames show up with all the Unicode characters replaced by question marks.

The only time things work properly is when I move the files to a directory with a non-Unicode name and use that directory as a string argument to listdir.

Hopefully that explains the situation. Does anyone know what I'm doing wrong? It's probably something simple, but despite doing a lot of searching I can't figure it out.

Recommended Answers

All 2 Replies

Did you try

next(os.walk(directory))

In the doc, it seems that os.walk only works with strings (unicode), so that you shouldn't have encoding/decoding issues.

Thanks for replying, but that ended up having the same problem. But the solution was pretty much what I expected: I forgot to strip the original string of the UTF-8 BOM that I got when reading it in from a file. Thus the ""\ufeff" at the start. For some reason I overlooked that.

I still can't get it to work at all when using bytes arguments though.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.