OPT-3 Optimize == and != comparisons for short-integers to pointer comparisons#
Background#
Because of the frequency of use within an application, CPython generates the int objects for the integers -5 to 256 at startup and caches them in a property of the interpreter
states (small_ints
).
When creating an integer from either a const value in a function, as a property of a class, or via the C-API, the integer will be checked if it is in the range within small_ints
using a macro IS_SMALL_INT
.
CPython uses this to improve performance when creating new integers from a C long (or long long
) and shortcuts the value to the short_ints reference.
There are also performance gains within CPython to this optimization when comparing numbers. For example,
def is_three(x):
return x == 3
This function will generate the bytecode sequence LOAD_FAST
, LOAD_CONST
, COMPARE_OP
, RETURN
.
For COMPARE_OP
, the CPython evaluation loop will call PyObject_RichCompare()
, which calls the object’s tp_richcompare
type slot method. For a long object, this is:
static PyObject *
long_richcompare(PyObject *self, PyObject *other, int op)
{
Py_ssize_t result;
CHECK_BINOP(self, other);
if (self == other)
result = 0;
else
result = long_compare((PyLongObject*)self, (PyLongObject*)other);
Py_RETURN_RICHCOMPARE(result, 0, op);
}
CPython is already reasonably efficient at detecting equal numbers as it will check if (self == other)
pointer equivalence. However, this is still relatively slow:
It consumes 3 call stack levels to get to this short comparison
It is a common occurrence to compare short integers
It still requires a comparison of the operator at runtime (
Py_EQ
/Py_NE
)
Solution#
The Pyjion compiler has 3 main stages:
Build a “stack table” of the abstract types at each opcode position
Compile CPython opcodes into CIL opcodes
Emit the CIL opcodes into the .NET EE compiler to convert to native machine code/assembly
This optimization is split into two components.
At the preprocess stage, detect const values which are small integers and mark them as
InternInteger
(a subclass ofInteger
)At the compile stage, detect the comparison of an
InternInteger
with anInteger
(AVK_Integer
)
If this comparison is detected, the compiler will emit the equivalent operation of x is y
. This requires that:
The left or right operand is definitely an interned integer (likely a const value)
The other operand is a absolutely an integer
Gains#
Comparison of small integers would be significantly faster.
Adds no overhead/branching to scenarios that do not match this optimization
Potential Improvements#
To build on this assertion, if the type of the opposite operand to the interned integer could be quickly determined to by a PyLongObject
, the same optimization would work.
This would allow for runtime values and hit a far broader set of scenarios.
Edge-cases#
If the user compiled a custom CPython runtime with a smaller range of NSMALLINTS
, this would fire results when comparing integers.
To circumvent this, Pyjion could inspect the interpreter state and cache the size of small_ints
. However this scenario is very, very unlikely (but anything is possible).
Configuration#
This optimization is enabled at level 1 by default. See Optimizations for help on changing runtime optimization settings.
Examples#
The following test cases document what should and should-not be optimized.
def test_const_compare(self):
def test_f():
a = 1
b = 2
return a == b
self.assertOptimized(test_f)
def test_const_compare_big_left(self):
def test_f():
a = 1000
b = 2
return a == b
self.assertOptimized(test_f)
def test_const_compare_big_right(self):
def test_f():
a = 1
b = 2000
return a == b
self.assertOptimized(test_f)
def test_const_compare_big_both(self):
def test_f():
a = 1000
b = 2000
return a == b
self.assertNotOptimized(test_f)
def test_const_not_integer(self):
def test_f():
a = 2
b = "2"
return a == b
self.assertNotOptimized(test_f)
def test_float_compare(self):
def test_f():
a = 2
b = 1.0
return a == b
self.assertNotOptimized(test_f)