Built-in Optimizations#
- OPT-1 Inline DECREF operations when the reference count is >1
- OPT-2 Optimize “is” operations to pointer comparisons
- OPT-3 Optimize == and != comparisons for short-integers to pointer comparisons
- OPT-5 Inline frame push/pop instructions
- OPT-6 Shortcut STORE_SUBSCR for known types
- OPT-7 Shortcut BINARY_SUBSCR for known types and indexes
- OPT-9 Inline list iterators into assembly instructions
- OPT-10 Precompute the hashes for LOAD_NAME and LOAD_GLOBAL dictionary lookups
- OPT-12 Pre-load methods for builtin types and bypass LOAD_METHOD
- OPT-13 Pre-load functions for binary operations to known types
- OPT-14 Optimize function calls which use the CALL_FUNCTION opcode
- OPT-15 Optimize the LOAD_ATTR opcode
- OPT-16 Optimize arithmetic by unboxing float, int and bool values
- OPT-17 Inline “is None” and “is not None” statements
Overview#
Pyjion is a JIT compiler. It compiles native CPython bytecode into machine code. Without Pyjion, CPython uses a master evaluation loop (called the frame evaluation loop) to iterate over opcodes
The Pyjion compiler has 3 main stages:
Build a “stack table” of the abstract types at each opcode position
Compile CPython opcodes into CIL opcodes
Emit the CIL opcodes into the .NET EE compiler to convert to native machine code/assembly
Guiding principles#
Optimizations must follow these principles:
They must support custom objects (remember that everything can be overidden in Python!)
They must pass the CPython test suite
They must be isolated and testable
They must be atomic and simple
Compiler design#
The rough design of the compile stage is to emit isolated CIL opcodes for each bytecode in the compiled frame.
Because of the flexibility of Python’s type system, the opcodes will mimic CPython’s eval loop:
Take/put references onto a value stack
Call a C API method
Take/put results onto a value stack
Check for errors
Go to next
There is some added complexity, because .NET CIL requires that the value stack is a LIFO stack, whereas CPython implements a value “stack” but actually uses it like an array.
The calling of the C API method is wrapped by the .NET CEE_CALL
opcode, which calls a wrapper method implemented in the intrins.cpp
file.
Without further optimizations, Pyjion would perform roughly the same as CPython (as fast or slow as the C API methods its calling).
The benefits of Pyjion come from changing the emitted machine code by deterministic observations about the inputs, constants or environment.
Machine code optimizations#
CIL to machine code/assembly optimizations are done by the .NET/EE compiler. They are configured by CorJitInfo::getJitFlags()
.
By default, Pyjion will flag the EE compiler to use the CORJIT_FLAG_SPEED_OPT
profile. If you want to compile “debuggable” JIT code, use the EE_DEBUG_CODE
option in CMake.
Boxing and unboxing of variables#
Work was done in early versions of Pyjion to box/unbox Python integers and floats into native long/float types. This allows for native CIL opcodes like ADD, SUB, IMUL, etc. to be used
on native integers instead of having to use the complex (and slow) PyLongObject
/PyFloatObject
operations.
I’ve found this initial prototype to be unstable so the boxing and unboxing has been completely rewritten.
Pyjion’s escape analysis is acheived using an instruction graph to traverse supported unboxed types and opcodes then to tag transition stack variables as unboxed. This functionality complements PGC.
Tracing/profiling#
Neither tracing or profiling callbacks will be emitted in the compiled code by default. This is advantageous over CPython, which would otherwise check the state of the tracing/profiling flag for every opcode.
References#
Checkout EMCA CIL reference for a list of what is possible in the CIL.
See my book for a comprehensive guide to the CPython compiler and design CPython Internals (ISBN 9781775093350)
Checkout Discussions for any discussion about potential optimizations.
Optimization Matrix#
Optimization |
Level 0 |
Level 1 |
Level 2 |
---|---|---|---|
OPT-1 Inline DECREF operations when the reference count is >1 |
Off |
On |
On |
Off |
On |
On |
|
OPT-3 Optimize == and != comparisons for short-integers to pointer comparisons |
Off |
On |
On |
Off |
On |
On |
|
Off |
On |
On |
|
Off |
On |
On |
|
Off |
On |
On |
|
OPT-10 Precompute the hashes for LOAD_NAME and LOAD_GLOBAL dictionary lookups |
Off |
On |
On |
OPT-12 Pre-load methods for builtin types and bypass LOAD_METHOD |
Off |
On |
On |
OPT-13 Pre-load functions for binary operations to known types |
Off |
On |
On |
OPT-14 Optimize function calls which use the CALL_FUNCTION opcode |
Off |
On |
On |
Off |
On |
On |
|
OPT-16 Optimize arithmetic by unboxing float, int and bool values |
Off |
On |
On |
Off |
On |
On |
Configuring Optimizations#
Python API#
The optimization level can be set using the set_optimization_level(level: int)
method:
import pyjion
pyjion.set_optimization_level(0)
The default level is 1. Setting to level 2 will enable all optimizations.
Level 0 disables all optimizations.
Runtime information#
You can see which optimizations were applied by accessing the optimizations
property of the JitInfo
class returned from pyjion.info(func)
:
>>> import pyjion
>>> pyjion.enable()
>>> def half(x):
... return x/2
>>> half(2)
1.0
>>> pyjion.info(half).optimizations
<OptimizationFlags.InlineFramePushPop|InlineDecref: 10>