0

I have a HUGE text file (over 60,000 lines) and I'm using python to do regex fixing on the file.

I want to wrap any lines over 100 characters, and wrap on a space.

I know I can use ".{100}" to find 100 characters but when I do this:

new_text, num = re.subn("(.{100})", "\g<1>\n", file.read()
print num

it says 0 matches found.

Any ideas?

4
Contributors
4
Replies
5
Views
8 Years
Discussion Span
Last Post by scru
0

.{100} matches 100 non newline characters, unless you put re.DOTALL . Also I don't understand your replacement string. I think the 0 matches found mean that none of your lines is 100 characters long. You could print a report of the lengths of the lines in your file like this

from collections import defaultdict

def printLineLengths(file):
    D = defaultdict(int)
    for line in file:
        D[len(line)] += 1
    L = sorted(D.items())
    print("Line lengths in file:")
    for length, count in L:
        print("%d : %d" % (length, count))

printLineLengths(file)
0

One thing I've noticed is that Python often has a module to mundane tasks that we can sometimes waste time doing by ourselves. It's especially surprising if I've been programming in a language like C or C++ for a while.

It really is batteries included.

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.