Dear all,

I am trying to parse a lot of text. for small amounts of text, the WHILE loop I use to find all spaces in the text works well:

markerlist=[]
counter=0
while len(markerlist)<text.count(marker):
markerlist.append(text.find(marker,counter)
counter=text.find(marker,counter)+1

This iterative process is very, very slow when working with a few megabites of text. Can someone advise a faster method?

Best,

Wheaton

Recommended Answers

All 6 Replies

Something like this should be faster, since it avoids the use of function calls:

text = """\
A list of Pythonyms ...

pythonanism: spending way too much time programming Python
pythoncology: the science of debugging Python programs
pythondemic: something bad that lots of Pyton programmers do
pythonerous: something hard to do in Python
pythong: a short piece of Python code that works
pythonorean: someone who knows the esoteric technical aspects of Python
pythonus: something you don't want to do in Python
pythonym: one of these words
ptyhoon: someone who is really bad at Python
sython: to use someone else's Python code
pythug: someone who uses python to crash computers
Pythingy: a function or tool that works, but that you don't understand
Pythonian: somebody that insists on using a very early version of python
PythUH? -  block of code that yields a very unexpected output
Pythealot:  a Python fanatic
"""

spaces = 0
for c in text:
    # add to the count of spaces
    if c == " ":
        spaces += 1

print( "This text has %d spaces" % spaces )

Thanks Ene.

But this is actually not the same thing as what I'm trying to do. I am trying to make a list of positions at which each space occurs. if I wanted to count the spaces I would just do

text.count(' ')

But I need a list of positions in the string.

It's not that hard to do that with the example given, granted the resulting code is a little bit nasty:

spaces = []
position = 0
for c in text:
    if c == ' ':
        spaces.append(position)
    position += 1

Not so hard is it?

Okay, so this is faster because it doesn't call text.find and text.count?

I used my method originally because I thought iterating through every element would be slower. I don't know much about processing speed though.

Please let me know!

You can use a list comprehension to speed things up ...

# create a list of indexes for spaces in a text

text = "According to Sigmund Freud fuenf comes between fear and sex."
space_ix = [ix for ix, c in enumerate(text) if c == " "]
print(space_ix)  # [9, 12, 20, 26, 32, 38, 46, 51, 55]
commented: Very helpful +1

vegaseat, that is really quick! thanks very much. works like a charm. I've not explored list comprehension before, but I'm going to go look it up now.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.