I have a HUGE text file (over 60,000 lines) and I'm using python to do regex fixing on the file.

I want to wrap any lines over 100 characters, and wrap on a space.

I know I can use ".{100}" to find 100 characters but when I do this:

new_text, num = re.subn("(.{100})", "\g<1>\n", file.read()
print num

it says 0 matches found.

Any ideas?

Recommended Answers

All 4 Replies

.{100} matches 100 non newline characters, unless you put re.DOTALL . Also I don't understand your replacement string. I think the 0 matches found mean that none of your lines is 100 characters long. You could print a report of the lengths of the lines in your file like this

from collections import defaultdict

def printLineLengths(file):
    D = defaultdict(int)
    for line in file:
        D[len(line)] += 1
    L = sorted(D.items())
    print("Line lengths in file:")
    for length, count in L:
        print("%d : %d" % (length, count))

printLineLengths(file)

Why reinvent the wheel? Python has a text wrapping module built right in

didnt know about that module. thanks for the help.

One thing I've noticed is that Python often has a module to mundane tasks that we can sometimes waste time doing by ourselves. It's especially surprising if I've been programming in a language like C or C++ for a while.

It really is batteries included.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.