Built-in Optimizations#

Index:

Overview#

Pyjion is a JIT compiler. It compiles native CPython bytecode into machine code. Without Pyjion, CPython uses a master evaluation loop (called the frame evaluation loop) to iterate over opcodes

The Pyjion compiler has 3 main stages:

Build a “stack table” of the abstract types at each opcode position
Compile CPython opcodes into CIL opcodes
Emit the CIL opcodes into the .NET EE compiler to convert to native machine code/assembly

Guiding principles#

Optimizations must follow these principles:

They must support custom objects (remember that everything can be overidden in Python!)
They must pass the CPython test suite
They must be isolated and testable
They must be atomic and simple

Compiler design#

The rough design of the compile stage is to emit isolated CIL opcodes for each bytecode in the compiled frame.

Because of the flexibility of Python’s type system, the opcodes will mimic CPython’s eval loop:

Take/put references onto a value stack
Call a C API method
Take/put results onto a value stack
Check for errors
Go to next

There is some added complexity, because .NET CIL requires that the value stack is a LIFO stack, whereas CPython implements a value “stack” but actually uses it like an array.

The calling of the C API method is wrapped by the .NET CEE_CALL opcode, which calls a wrapper method implemented in the intrins.cpp file.

Without further optimizations, Pyjion would perform roughly the same as CPython (as fast or slow as the C API methods its calling).

The benefits of Pyjion come from changing the emitted machine code by deterministic observations about the inputs, constants or environment.

Machine code optimizations#

CIL to machine code/assembly optimizations are done by the .NET/EE compiler. They are configured by CorJitInfo::getJitFlags().

By default, Pyjion will flag the EE compiler to use the CORJIT_FLAG_SPEED_OPT profile. If you want to compile “debuggable” JIT code, use the EE_DEBUG_CODE option in CMake.

Boxing and unboxing of variables#

Work was done in early versions of Pyjion to box/unbox Python integers and floats into native long/float types. This allows for native CIL opcodes like ADD, SUB, IMUL, etc. to be used on native integers instead of having to use the complex (and slow) PyLongObject/PyFloatObject operations. I’ve found this initial prototype to be unstable so the boxing and unboxing has been completely rewritten.

Pyjion’s escape analysis is acheived using an instruction graph to traverse supported unboxed types and opcodes then to tag transition stack variables as unboxed. This functionality complements PGC.

Tracing/profiling#

Neither tracing or profiling callbacks will be emitted in the compiled code by default. This is advantageous over CPython, which would otherwise check the state of the tracing/profiling flag for every opcode.

References#

Checkout EMCA CIL reference for a list of what is possible in the CIL.
See my book for a comprehensive guide to the CPython compiler and design CPython Internals (ISBN 9781775093350)
Checkout Discussions for any discussion about potential optimizations.

Optimization Matrix#

Default Optimization Levels#
Optimization	Level 0	Level 1	Level 2
OPT-1 Inline DECREF operations when the reference count is >1	Off	On	On
OPT-2 Optimize “is” operations to pointer comparisons	Off	On	On
OPT-3 Optimize == and != comparisons for short-integers to pointer comparisons	Off	On	On
OPT-5 Inline frame push/pop instructions	Off	On	On
OPT-6 Shortcut STORE_SUBSCR for known types	Off	On	On
OPT-7 Shortcut BINARY_SUBSCR for known types and indexes	Off	On	On
OPT-9 Inline list iterators into assembly instructions	Off	On	On
OPT-10 Precompute the hashes for LOAD_NAME and LOAD_GLOBAL dictionary lookups	Off	On	On
OPT-12 Pre-load methods for builtin types and bypass LOAD_METHOD	Off	On	On
OPT-13 Pre-load functions for binary operations to known types	Off	On	On
OPT-14 Optimize function calls which use the CALL_FUNCTION opcode	Off	On	On
OPT-15 Optimize the LOAD_ATTR opcode	Off	On	On
OPT-16 Optimize arithmetic by unboxing float, int and bool values	Off	On	On
OPT-17 Inline “is None” and “is not None” statements	Off	On	On

Configuring Optimizations#

Python API#

The optimization level can be set using the set_optimization_level(level: int) method:

import pyjion
pyjion.set_optimization_level(0)

The default level is 1. Setting to level 2 will enable all optimizations.

Level 0 disables all optimizations.

Runtime information#

You can see which optimizations were applied by accessing the optimizations property of the JitInfo class returned from pyjion.info(func):

>>> import pyjion
>>> pyjion.enable()
>>> def half(x):
...    return x/2
>>> half(2)
1.0
>>> pyjion.info(half).optimizations
<OptimizationFlags.InlineFramePushPop|InlineDecref: 10>