Profile Results Misleading

Question

shavais 1 Newbie Poster

15 Years Ago

I was looking for a StringBuilder type of thing to use in Python (I've been working in PHP, at work, recently, and cStringIO had momentarily slipped my mind), and found StringIO and cStringIO, but in doing so, I found a post that claimed that they both performed very poorly in comparison with plain old, naive string concatenation. (!)

Here's the test program they posted to prove this:

def test_string(nrep, ncat):
    for i in range(nrep):
        s = ''
        for j in range(ncat):
            s += 'word'

def test_StringIO(nrep, ncat):
    for i in range(nrep):
        s = StringIO.StringIO()
        for j in range(ncat):
            s.write('word')
        s.getvalue()

def test_cStringIO(nrep, ncat):
    for i in range(nrep):
        s = cStringIO.StringIO()
        for j in range(ncat):
            s.write('word')
        s.getvalue()

test_string(10, 10)
test_StringIO(10, 10)
test_cStringIO(10, 10)

profile.run('test_string(10, 1000)')
profile.run('test_StringIO(10, 1000)')
profile.run('test_cStringIO(10, 1000)')

# sample execution and output:
# ~> python stringbuf.py | grep seconds
#     15 function calls in 0.004 CPU seconds
#     50065 function calls in 0.920 CPU seconds
#     10035 function calls in 0.200 CPU seconds

As you can see from their output, the profiler shows a clear preference for naive string concatenation. (Way fewer calls, much less CPU time.)

Well, this seemed naive to me (to pummel a pun). It seemed likely to me that the profiler was picking apart the calls to the string io modules, and making calls individually, and counting the time surrounding making them, etc., while it wasn't really doing that for the built-in, naive concatenation call, so I tried simply timing the test functions, like this:

.
.
def timeMethod(method, *args):
    from time import time
    t1 = time()
    method(*args)
    t2 = time()
    return t2 - t1

print "test_string:\t%.4f" % timeMethod(test_string, 10, 1000000)
print "test_stringIO:\t%.4f" % timeMethod(test_string, 10, 1000000)
print "test_cStringIO:\t%.4f" % timeMethod(test_string, 10, 1000000)

# sample execution and output:
# -> python test.stringbuf.py
# test_string:    1.0545
# test_stringIO:  1.0005
# test_cStringIO: 0.9869

From this output, it appears to me that I was correct, that the profiler doesn't pick apart built-in calls to the same degree that it picks apart module calls, and that cStringIO is actually slightly faster than naive string concatenation. (Surprise, surprise.)

Surprising to me still, however, is how slight the difference is - it seems like we're looking at about a 6% difference, even after 1,000,000 concatenations of the word 'word'. So it does seem like cStringIO is hardly worth the bother, in most applications.

It seems like Python must be using some sort of StringBuilder-like pattern internally, at this point, for string concatenation, or at least for appending to the end of a string. I can't imagine that Python is actually making a copy of the entire string for every += call, and still coming in at around 1 second for this test. I mean, after 250,000 concatenations of the word 'word', we have 1 million character string, right? So at the very least, we're talking about copying a buffer that is 1 million bytes or larger, 750,000 times! That would be like moving more than 750 gigs of memory from one spot to another. (10 times, in this test, actually.) In one second? I don't think so, not on this computer! So Python must not be doing that anymore, if it ever did.
__

python

4 Contributors
3 Replies
121 Views
2 Weeks Discussion Span
Latest Post 14 Years Ago Latest Post by snippsat

All 3 Replies

nezachem 616 Practically a Posting Shark

15 Years Ago

Something is wrong here, don't you think?

print "test_string:\t%.4f" % timeMethod([B]test_string[/B], 10, 1000000)
print "test_stringIO:\t%.4f" % timeMethod([B]test_string[/B], 10, 1000000)
print "test_cStringIO:\t%.4f" % timeMethod([B]test_string[/B], 10, 1000000)

Gribouillis commented: Indeed ! +3

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 1 · 2010-06-18T03:57:30+00:00

See posts in http://www.daniweb.com/forums/post1250774.html#post1250774
for related discussion for stringIO vs catenation.

snippsat 661 Master Poster · Answer 2 · 2010-06-18T04:21:54+00:00

There are some ting to think about when doing measurement to get it correct.
Look at post from Alex Martelli and se how he dos it.
As most python fans now,he knows what he is talking about.
http://stackoverflow.com/questions/3054604/iterate-over-the-lines-of-a-string/3054831#3054831

Profile Results Misleading

Recommended Answers Collapse Answers

All 3 Replies

Recommended Answers