Efficiency#

Python’s combination of the names/objects concept and (im)mutability tends to waste memory and CPU time:

  • There might be many different objects holding all the same value. In principle, every time the number 1 occurs in the source code, a new integer object is created.

  • Tying names to other objects may leave objects without name. Such objects are no more accessible but resist in memory.

  • Modifying immutable objects requires to create new objects. Thus, even simple integer computations require relatively complex memory management operations.

To mitigate these drawbacks, the Python interpreter uses several optimization strategies. Although such issues are rather technical we briefly discuss them here, because they sometimes yield unexpected results.

Preloaded Integers#

To avoid object creation every time a new integer is used, the Python interpreter pre-creates integer objects for all integers from -5 to 256. This saves CPU time. The somewhat cumbersome range stems from statistical considerations about integer usage.

In addition, the interpreter takes care that no integer in this range is created twice during program execution. This saves memory. The behavior is demonstrated in the following code snipped:

a = 8
b = 4 + 4
print(id(a))
print(id(b))
140115940688336
140115940688336

Both object IDs are identical, thus only one integer object is used. Since integer objects are immutable, this cannot cause any trouble.

String Interning#

As for integers, the Python interpreter tries to avoid multiple string objects with the same value. Since corresponding comparisons may require too much CPU time, this technique is only used for short strings. The rules controlling which strings get interned and which not are relatively complex.

a = 'short'
b = 'sh' + 'ort'
print(id(a))
print(id(b))
140115859848048
140115859848048
a = 'very very long'
b = 'very' + ' very long'
print(id(a))
print(id(b))
140115859848944
140115899208176

Repeated Literals in Source Code#

Before executing a Python program, the interpreter checks the syntax and creates a list of all literals. Here, literals are all types of explicit data appearing in the source code, like integers or strings. If some literal appears multiple times and if objects of the corresponding data type are immutable, only one object is created.

# Copy the following Python code to a text file and feed the file to the
# Python interpreter to see the effect of optimization of repreated literals.

a = 'a long string, which usually is not interned'
b = 12345678
c = 'a long string, which usually is not interned'
d = 12345678

print(id(a))
print(id(b))
print(id(c))
print(id(b))

The names a and c will point to the same string object, although the string is too long to be interned by the string interning mechanism. The names b and d will point to the same integer object, although they are outside the range of preloaded integers.

Care has to be taken when using interactive Python interpreters like Jupyter. If the above code snipped is executed line by line in an interactive interpreter, then four different objects will be created, because the interpreter does not parse the full code in advance.

Important

Executing Python code with an interactive interpreter may yield different results than executing the same code at once with a non-interactive interpreter! In particular, performance measures like memory consumption may differ.

a = 'a long string, which usually is not interned'
b = 12345678
c = 'a long string, which usually is not interned'
d = 12345678

print(id(a))
print(id(b))
print(id(c))
print(id(d))
140115894986160
140115859673552
140115894986352
140115859679312

Garbage Collection#

As described above, there might be objects without names. Such objects resist in memory, but are no more accessible. To avoid filling up memory as time passes, the Python interpreter automatically removes nameless objects from memory. This mechanism is known as garbage collection and is a feature not available in all programming languages. In the C programming language, for instance, the programmer has to take care to free memory, if data isn’t needed anymore.

Sometimes, especially when working with large data sets, one wants to get rid of some data in memory to have more memory available for other purposes. One way is to tie all names refering to the no more needed object to other objects, which is somewhat unintuitive. Alternatively, the del keyword can be used to untie a name from an object.

a = 5000
del a
print(a)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [5], in <cell line: 3>()
      1 a = 5000
      2 del a
----> 3 print(a)

NameError: name 'a' is not defined

The last line leads to an error message, because after executing line 2 the name a is no more valid. Note that del only deletes the name, not the object. In the following code snipped the object remains in memory, because it has another name:

a = 5000
b = a
del a
print(b)
5000