Python High Performance Programming

This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!

Python High Performance Programming

Performance Analysis

Time Analysis

  1. Use decorators to automatically measure time:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    from functools import wraps
    import time

    def timefn(fn):
    @wraps(fn)
    def measure_time(*args, **kwargs):
    t1=time.time()
    result=fn(*args, **kwargs)
    t2=time.time()
    print("@timefn:"+fn.__name__+" took "+str(t2-t1)+" seconds")
    return result
    return measure_time

    In IPython, you can use the %timeit magic function to time function runs:

    1
    %timeit <function>
  2. Use the UNIX time command to time:

    1
    /usr/bin/time -p (--verbose) python <script-name>
  3. Use the cProfile module:

    cProfile is a built-in analysis tool in the standard library. It hooks into CPython's virtual machine to measure the time it takes for each function to run. Add the @profile decorator before the function to be analyzed, and run the following command for performance analysis:

    1
    python -m cProfile -s cumulative <script-name> <options>

    You can also save the analysis results to a statistics file and use other tools for analysis:

    1
    python -m cProfile -o <outputfilename> <script-name> <options>
  4. Use line_profiler for line-by-line analysis:

    Install: pip install line_profiler

    Add the @profile decorator before the function to be analyzed, and run the following command for performance analysis:

    1
    kernprof -l -v <script-name>

Space Analysis

  1. Use memory_profiler to analyze memory usage:

    Install: pip install memory_profiler

    Add the @profile decorator before the function to be analyzed, and run the following command for performance analysis:

    1
    python -m memory_profiler <script-name>

    Use mprof to plot memory usage. This can be done in two steps:

    Run the program using mprof, which will generate an mprof*.dat file in the directory:

    1
    mprof run <script-name>

    Plot the graph:

    1
    mprof plot

Lists and Tuples

Data structures such as lists and tuples are called arrays, which are flat lists of data in some intrinsic order.

The main differences between lists and tuples are as follows:

  • Lists are dynamic arrays that are mutable and can be resized. Tuples are static arrays that are immutable and cannot be changed once created.
  • Tuples are cached in the Python runtime environment, which means that there is no need to access the kernel to allocate memory every time a tuple is used.
  • Lists require more memory overhead than tuples.

Lists

The query operation for lists is O(1).

The default search operation for lists (list.index()) is a linear operation, with a worst-case time complexity of O(n). Sorting the list can reduce the search time to O(log n) (using binary search).

Python lists have a built-in sorting algorithm that uses Tim sort. Tim sort sorts with a complexity of O(n) in the best case (O(n log n) in the worst case). It mixes insertion sort and merge sort to achieve this performance (guessing which algorithm is better for a given data set using a probing method).

The bisect module can keep the order of the list unchanged (default from small to large) when inserting into the list. Some documentation for bisect is as follows[2]:

  • bisect.bisect_left(a, x, lo=0, hi=len(a), *, key=None)

    Find the leftmost point of insertion for x in list a

  • bisect.bisect_right(a, x, lo=0, hi=len(a), *, key=None)

    Find the rightmost point of insertion for x in list a

  • bisect.bisect(a, x, lo=0, hi=len(a), *, key=None)

    Same as bisect.bisect_left

  • bisect.insort_left(a, x, lo=0, hi=len(a), *, key=None)

    Insert x into list a in sorted order, with the insertion point determined by bisect.bisect_left

  • bisect.insort_right(a, x, lo=0, hi=len(a), *, key=None)

    Insert x into list a in sorted order, with the insertion point determined by bisect.bisect_right

  • bisect.insort(a, x, lo=0, hi=len(a), ***, key=None)

    Same as bisect.insort_right

Performance Notes

When writing time-sensitive code using bisect() and insort(), keep the following concepts in mind.

  • Binary search is efficient for searching a certain range of values. For locating a specific value, dictionary performance is better.
  • The time complexity of the insort() function is O(n) because the logarithmic search steps are dominated by the linear insertion steps.
  • These search functions are stateless and discard the results of the key function after they are used. Therefore, if the search function is used in a loop, the key function may be called repeatedly on the same data element. If the key function is not fast, consider wrapping it with functools.cache() to avoid redundant calculations. Alternatively, consider searching a precomputed key array to locate the insertion point (as demonstrated in the example section below).

When allocating memory for a list, a certain amount of reserved space is added. When a list of size N needs to add new elements, Python creates a new list that is large enough to hold the original N elements plus the additional elements. In actual allocation, the size allocated is not N+1, but N+M, where M is calculated as follows:

M = (N >> 3) + (N < 9 ? 3 : 6) + 1

N 0 1-4 5-8 9-16 17-25 26-35 ... 991-1120
M 0 1 2 3 4 5 ... 140