OPT-6 Shortcut STORE_SUBSCR for known types#
Background#
CPython compiles a single opcode for expressions that assign an object to a mapping type (e.g. dict) or a sequence type (e.g. list).
l = [0, 1, 2, 3]
l[2] = 3
print(l)
This function compiles to the following opcodes:
2 0 BUILD_LIST 0
2 LOAD_CONST 1 ((0, 1, 2, 3))
4 LIST_EXTEND 1
6 STORE_FAST 0 (l)
3 8 LOAD_CONST 2 (3)
10 LOAD_FAST 0 (l)
12 LOAD_CONST 3 (2)
14 STORE_SUBSCR
4 16 LOAD_GLOBAL 0 (print)
18 LOAD_FAST 0 (l)
20 CALL_FUNCTION 1
22 POP_TOP
24 LOAD_CONST 0 (None)
26 RETURN_VALUE
This opcode takes 3 values from the stack: Index(2
), Container (l
), Value(3
).
The two most commonly used types for STORE_SUBSCR
are dict
and list
. The PyObject_SetItem
function is optimized toward dict first, then list. It checks these two ‘base’ types:
For a mapping type:
Determine it is a mapping type
Check the type for a type slot for item assignment
Call the type slots (e.g.
PyDict_SetItem
)
For sequence type, the index needs to first be converted into a native integer. Even frame constants are stored as PyLongObject
, even if the value is small (e.g. 10).
Determine it is a sequence type
Convert the index value from a
PyLongObject
to anint
Check the range of the index with the maximum range of the list
Check the type slot for item assignment
Call the type slot (e.g.
PyList_SetItem
)
The third scenario would be the implementation of a custom class, and the mapping or sequence protocol.
Because the type of the container can be determined at runtime, the PyObject_SetItem
function is slow, but generic.
Solution#
This optimization will determine absolutely typed variables using the preprocess stage (e.g. frame constants, outputs from builtin functions like list()
, or return types from opcodes like BUILD_MAP
).
At compile time, it will optimize some common paths:
If the container type is a dict, emit a call to
PyDict_SetItem
inplace ofPyObject_SetItem
- If the container type is a list,
If the index is a frame constant
PyLongObject
within 4 digits, convert it to a native integer at runtime and emit a call toPyList_SetItem
If the index is a dynamic value, convert at runtime and call
PyList_SetItem
without the overhead ofPyObject_SetItem
If the container type cannot be determined (because it was set at runtime), but the index is an frame constant integer, call a custom
PyObject_SetItem
, where the index value is already converted, shortcutting the conversion step
Gains#
List item assignments are faster (unless the list is an argument to the function, in which case the type is unknown)
Sequence item assignments are faster (as above)
Dictionary item assignments are faster (as above)
Item assignments to sequence types where the index is const value between -5 and 256 are much faster as they bypass multiple tests and conversions
Edge-cases#
None. Even if there is a fault in determining the type of the container, it will be checked at runtime using PyDict_Check
, etc., and then redirect back to PyObject_SetItem
if it fails.
Configuration#
This optimization is enabled at level 1 by default. See Optimizations for help on changing runtime optimization settings.