Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

performance - Python string concatenation Idiom. Need Clarification.

From http://jaynes.colorado.edu/PythonIdioms.html

"Build strings as a list and use ''.join at the end. join is a string method called on the separator, not the list. Calling it from the empty string concatenates the pieces with no separator, which is a Python quirk and rather surprising at first. This is important: string building with + is quadratic time instead of linear! If you learn one idiom, learn this one.

Wrong: for s in strings: result += s

Right: result = ''.join(strings)"

I'm not sure why this is true. If I have some strings I want to join them, for me it isn't intuitively better to me to put them in a list then call ''.join. Doesn't putting them into a list create some overhead? To Clarify...

Python Command Line:

>>> str1 = 'Not'
>>> str2 = 'Cool'
>>> str3 = ''.join([str1, ' ', str2]) #The more efficient way **A**
>>> print str3
Not Cool
>>> str3 = str1 + ' ' + str2 #The bad way **B**
>>> print str3
Not Cool

Is A really linear time and B is quadratic time?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Yes. For the examples you chose the importance isn't clear because you only have two very short strings so the append would probably be faster.

But every time you do a + b with strings in Python it causes a new allocation and then copies all the bytes from a and b into the new string. If you do this in a loop with lots of strings these bytes have to be copied again, and again, and again and each time the amount that has to be copied gets longer. This gives the quadratic behaviour.

On the other hand, creating a list of strings doesn't copy the contents of the strings - it just copies the references. This is incredibly fast, and runs in linear time. The join method then makes just one memory allocation and copies each string into the correct position only once. This also takes only linear time.

So yes, do use the ''.join idiom if you are potentially dealing with a large number of strings. For just two strings it doesn't matter.

If you need more convincing, try it for yourself creating a string from 10M characters:

>>> chars = ['a'] * 10000000
>>> r = ''
>>> for c in chars: r += c
>>> print len(r)

Compared with:

>>> chars = ['a'] * 10000000
>>> r = ''.join(chars)
>>> print len(r)

The first method takes about 10 seconds. The second takes under 1 second.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...