14.14 让你的程序跑的更快

问题

Your program runs too slow and you’d like to speed it up without the assistance of moreextreme solutions, such as C extensions or a just-in-time (JIT) compiler.

解决方案

While the first rule of optimization might be to “not do it,” the second rule is almostcertainly “don’t optimize the unimportant.” To that end, if your program is running slow,you might start by profiling your code as discussed in Recipe 14.13.More often than not, you’ll find that your program spends its time in a few hotspots,such as inner data processing loops. Once you’ve identified those locations, you can usethe no-nonsense techniques presented in the following sections to make your programrun faster.

Use functionsA lot of programmers start using Python as a language for writing simple scripts. Whenwriting scripts, it is easy to fall into a practice of simply writing code with very littlestructure. For example:

somescript.py

import sysimport csv

with open(sys.argv[1]) as f:
for row in csv.reader(f):

Some kind of processing...

A little-known fact is that code defined in the global scope like this runs slower thancode defined in a function. The speed difference has to do with the implementation oflocal versus global variables (operations involving locals are faster). So, if you want tomake the program run faster, simply put the scripting statements in a function:

somescript.pyimport sysimport csv

def main(filename):with open(filename) as f:for row in csv.reader(f):# Some kind of processing...
main(sys.argv[1])

The speed difference depends heavily on the processing being performed, but in ourexperience, speedups of 15-30% are not uncommon.

Selectively eliminate attribute accessEvery use of the dot (.) operator to access attributes comes with a cost. Under the covers,this triggers special methods, such as getattribute() and getattr(), whichoften lead to dictionary lookups.You can often avoid attribute lookups by using the from module import name form ofimport as well as making selected use of bound methods. To illustrate, consider thefollowing code fragment:

import math

def compute_roots(nums):
result = []for n in nums:

result.append(math.sqrt(n))

return result

Testnums = range(1000000)for n in range(100):

r = compute_roots(nums)

When tested on our machine, this program runs in about 40 seconds. Now change thecompute_roots() function as follows:

from math import sqrt

def compute_roots(nums):

result = []result_append = result.appendfor n in nums:

result_append(sqrt(n))

return result

This version runs in about 29 seconds. The only difference between the two versions ofcode is the elimination of attribute access. Instead of using math.sqrt(), the code usessqrt(). The result.append() method is additionally placed into a local variable result_append and reused in the inner loop.However, it must be emphasized that these changes only make sense in frequently ex‐ecuted code, such as loops. So, this optimization really only makes sense in carefullyselected places.

Understand locality of variablesAs previously noted, local variables are faster than global variables. For frequently ac‐cessed names, speedups can be obtained by making those names as local as possible.For example, consider this modified version of the compute_roots() function justdiscussed:

import math

def compute_roots(nums):
sqrt = math.sqrtresult = []result_append = result.appendfor n in nums:

result_append(sqrt(n))

return result

In this version, sqrt has been lifted from the math module and placed into a localvariable. If you run this code, it now runs in about 25 seconds (an improvement overthe previous version, which took 29 seconds). That additional speedup is due to a locallookup of sqrt being a bit faster than a global lookup of sqrt.Locality arguments also apply when working in classes. In general, looking up a valuesuch as self.name will be considerably slower than accessing a local variable. In innerloops, it might pay to lift commonly accessed attributes into a local variable. For example:

Slowerclass SomeClass:

...def method(self):

for x in s:op(self.value)

Fasterclass SomeClass:

...def method(self):

value = self.valuefor x in s:

op(value)

Avoid gratuitous abstractionAny time you wrap up code with extra layers of processing, such as decorators, prop‐erties, or descriptors, you’re going to make it slower. As an example, consider this class:

class A:def init(self, x, y):self.x = xself.y = y
@propertydef y(self):

return self._y

@y.setterdef y(self, value):

self._y = value

Now, try a simple timing test:

>>> from timeit import timeit
>>> a = A(1,2)
>>> timeit('a.x', 'from __main__ import a')
0.07817923510447145
>>> timeit('a.y', 'from __main__ import a')
0.35766440676525235
>>>

As you can observe, accessing the property y is not just slightly slower than a simpleattribute x, it’s about 4.5 times slower. If this difference matters, you should ask yourselfif the definition of y as a property was really necessary. If not, simply get rid of it andgo back to using a simple attribute instead. Just because it might be common for pro‐grams in another programming language to use getter/setter functions, that doesn’tmean you should adopt that programming style for Python.

Use the built-in containersBuilt-in data types such as strings, tuples, lists, sets, and dicts are all implemented in C,and are rather fast. If you’re inclined to make your own data structures as a replacement(e.g., linked lists, balanced trees, etc.), it may be rather difficult if not impossible to matchthe speed of the built-ins. Thus, you’re often better off just using them.

Avoid making unnecessary data structures or copiesSometimes programmers get carried away with making unnecessary data structureswhen they just don’t have to. For example, someone might write code like this:

values = [x for x in sequence]squares = [x*x for x in values]

Perhaps the thinking here is to first collect a bunch of values into a list and then to startapplying operations such as list comprehensions to it. However, the first list is com‐pletely unnecessary. Simply write the code like this:

squares = [x*x for x in sequence]

Related to this, be on the lookout for code written by programmers who are overlyparanoid about Python’s sharing of values. Overuse of functions such as copy.deepcopy() may be a sign of code that’s been written by someone who doesn’t fully under‐stand or trust Python’s memory model. In such code, it may be safe to eliminate manyof the copies.

讨论

Before optimizing, it’s usually worthwhile to study the algorithms that you’re using first.You’ll get a much bigger speedup by switching to an O(n log n) algorithm than bytrying to tweak the implementation of an an O(n**2) algorithm.If you’ve decided that you still must optimize, it pays to consider the big picture. As ageneral rule, you don’t want to apply optimizations to every part of your program,because such changes are going to make the code hard to read and understand. Instead,focus only on known performance bottlenecks, such as inner loops.You need to be especially wary interpreting the results of micro-optimizations. Forexample, consider these two techniques for creating a dictionary:

a = {‘name' : ‘AAPL',‘shares' : 100,‘price' : 534.22
}

b = dict(name='AAPL', shares=100, price=534.22)

The latter choice has the benefit of less typing (you don’t need to quote the key names).However, if you put the two code fragments in a head-to-head performance battle, you’llfind that using dict() runs three times slower! With this knowledge, you might beinclined to scan your code and replace every use of dict() with its more verbose al‐ternative. However, a smart programmer will only focus on parts of a program whereit might actually matter, such as an inner loop. In other places, the speed difference justisn’t going to matter at all.If, on the other hand, your performance needs go far beyond the simple techniques inthis recipe, you might investigate the use of tools based on just-in-time (JIT) compilationtechniques. For example, the PyPy project is an alternate implementation of the Python

interpreter that analyzes the execution of your program and generates native machinecode for frequently executed parts. It can sometimes make Python programs run anorder of magnitude faster, often approaching (or even exceeding) the speed of codewritten in C. Unfortunately, as of this writing, PyPy does not yet fully support Python3. So, that is something to look for in the future. You might also consider the Numbaproject. Numba is a dynamic compiler where you annotate selected Python functionsthat you want to optimize with a decorator. Those functions are then compiled intonative machine code through the use of LLVM. It too can produce signficant perfor‐mance gains. However, like PyPy, support for Python 3 should be viewed as somewhatexperimental.Last, but not least, the words of John Ousterhout come to mind: “The best performanceimprovement is the transition from the nonworking to the working state.” Don’t worryabout optimization until you need to. Making sure your program works correctly isusually more important than making it run fast (at least initially).