Is there any problem with manipulating very long strings?
When I say long, I'm meaning like over 100,000 characters long.
I'm using examplelongString = data.readline() to get it from an html file. I then use examplelongString .find('text to find') to search for stuff.

I've been trying to do it, it works fine for a string I have of 60,000 characters in length (doesn't take more than a second to do BTW), but when I try it on a 100,000 character string, it doesn't work, and seems to be getting into memory space it shouldn't have access to. (It doesn't give any errors or anything though), from the debugging I've done so far.

Does this sound right? Is there a better way of dealing with long strings?

Recommended Answers

All 5 Replies

Mayby your algorithm performance slows down like n**2 or n!. (O(n**2) or O(n!)) Or the memory is tight. Maybe you could put somewhere for .... in open(..) instead of string to compare?

Could you post the hardest working function and maybe time the performance of with different. Also you could do simple profiling or add one global counter, which you increase every entry to thetightest loop function and print when the function call from outside finishes (for recursive function)

## example of code that hits soon the recursion limit of Python
def reverse(a):
    global n
    n+=1
    if a=='': return ''
    return a[-1]+reverse(a[:-1])

for size in range(1000,2000, 1000):
    s=''
    for i in range(size):
        s+=chr(i % 64 +32)

    n=0

    print s
    print 'Reversing'
    print reverse(s)
    print n
##    print 'Easy way was'
##    print s[::-1]

Is there any problem with manipulating very long strings?
When I say long, I'm meaning like over 100,000 characters long.
I'm using examplelongString = data.readline() to get it from an html file. I then use examplelongString .find('text to find') to search for stuff.

I've been trying to do it, it works fine for a string I have of 60,000 characters in length (doesn't take more than a second to do BTW), but when I try it on a 100,000 character string, it doesn't work, and seems to be getting into memory space it shouldn't have access to. (It doesn't give any errors or anything though), from the debugging I've done so far.

Does this sound right? Is there a better way of dealing with long strings?

The length of the string shouldn't make any difference unless you run out of memory. You say it doesn't give any error, but how do you know it accesses invalid memory space ? (what does this mean by the way ?) You should also try and print the repr() of your string to see if it contains the text you're looking for.

I have seen this kind of behaviour for entering too long operation, usually because wrong coding or wrong algorithm. I do not think Python can go out of its string limits without very complicated code digging into insides of the interpreter.

So, I am curious for which kind of info the program got out of other memory space.

Strings can not be changed in Python, they are immutable, so if you would happen to do many changes to string you could be better of with b = list(a) before and a = "".join(b) after those operations for string a.

>>> a='abc'
>>> b=list(a)
>>> b
['a', 'b', 'c']
>>> a=str(b)
>>> a
"['a', 'b', 'c']"
>>> a="".join(b)
>>> a
'abc'
>>>

These string operations at least were instantaneous with string of big word list:

>> f=open('words.txt').read()
>>> len(f)
1020385
>>> print 'Every 10000th letter:\n',f[0::10000]
Every 10000th letter:
kli
kipksoossiklaimslearlmjkjeutysiruiksoujs
kitbheotytkavsttkimpskioiieyennu
hlauänoaiäuata
ij


l
ok

>>>

This was at least instantaneous:

>> f=open('words.txt').read()
>>> len(f)
1020385
>>> print 'Every 10000th letter:\n',f[0::10000]
Every 10000th letter:
kli
kipksoossiklaimslearlmjkjeutysiruiksoujs
kitbheotytkavsttkimpskioiieyennu
hlauänoaiäuata
ij


l
ok

>>>

Post us some code (and input and what it is supposed to do info), man!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.