overview
Ever wondered how Python really works under the hood? This guide will walk
you through Python's compilation and interpretation processes — from
source code to execution. It's written for anyone with basic Python
knowledge, but I would love feedback from readers of all levels!
Note
* Throughout this guide, I refer to CPython, the most commonly adopted
implementation, which you can download from the
official distribution
page.
what happens when you run code?
Whether you are just starting on your programming journey or have years of
experience under your belt, the moment between writing a script and
hitting the run button is always a thrilling one. It's a split second
where inner monologues can range anywhere from "How exciting, it is
ah-live!", to "I am confident
this
time it will work", and "I swear I am switching careers".
But what really happens in that split second after you hit the run button?
What goes on in the background that makes the code either come alive or
abruptly terminate? While implementation differs from language to
language, the core concept is the same:
human-readable code has to be
translated to
machine-readable code. And this is where compilers and interpreters
become our friends.
human to machine
Compilers and interpreters both have the same scope: to
convert user-written code to machine-readable code — they just go
about it in different ways:
-
compilers scan the whole module/file before running
the code; errors are caught before the code is executed.
-
interpreters read and immediately execute the code
line by line, and terminate at the first error that is encountered.
Since the interpreter parses and runs code line by line, compiled
languages are generally considered faster in executing code.
does Python interpret or compile code?
Python actually does both! It is a
hybrid, as it first compiles and
then interprets the code.
During compilation, pure Python scripts are translated to machine-readable
bytecode files.
Bytecode
is platform-independent, meaning that it can be run on any machine with a
compatible interpreter. After compilation, the interpreter analyzes and
executes the bytecode files.
Let's see how it all plays out, step by step!
how is Python code run?
Let's imagine we are building a virtual representation of our pet dog,
Mochi. In the script
mochi.py, we store methods and variables that describe Mochi's
features and characteristics. We can bring Mochi to virtual life by
running the script and typing:
in the terminal. The
python directive kicks off the compilation
process in the background, and the
mochi.py directive indicates
which file it should start from.
During compilation, our mochi.py script is translated into
bytecode. If no errors are found, the interpreter analyzes and
executes the
bytecode:
run
python mochi.py
in
terminal
→
compile
Python code is turned into
bytecode
→
interprete
Virtual Machine executes
bytecode
compiling multiple files
In our current setup, all of our logic is stored in one big script -
mochi.py. What happens when we add more files to our application?
How does Python know which files to compile and in what order?
Let's refactor part of our
mochi.py code into multiple files:
-
dog.py - we isolate the logic that defines a generic dog class
-
utils.py - here we can store the bark method, which randomizes
the number of times Mochi barks
To ensure the refactored code is still available in
mochi.py, we link the new files via
import statements. It is
exactly these
import statements that help Python understand the
relationship between scripts, and define the order in which files are
compiled and executed.
from dog import Dog
from utils import bark
name = "Mochi"
dog = Dog(name, "chocolate")
print(f"{dog.name}, {dog.color}, {name} barks: ", bark())
After refactoring, our
mochi.py file imports the
dog.py and
utils.py files, and the
utils.py file imports the
random package from the standard library:
├── mochi.py
├── dog.py
└── utils.py
└── random
Now we can safely run the program again.
ready, set, action!
Now that we understand how to run Python scripts and how files are linked
to each other, let's take a closer look at how Python's compilation and
interpretation processes work behind the scenes.
compilation in Python
As soon as a Python script is run, the compilation process automatically
starts running in the background. During this stage, each Python
(
.py) file is translated into a bytecode (
.pyc) file, which
is machine-friendly code.
Compilation consists of a few steps:
·
tokenization: the source code is split into tokens; each
word
is labeled according to its role
·
parsing: tokens are used to build an Abstract Syntax Tree (AST);
any syntax errors are raised during this step
·
scope analysis: the AST is traversed and scopes (local, global,
nonlocal) are assigned to each node
·
optimization: lightweight optimizations are applied
·
bytecode generation: a
.pyc
file containing bytecode is generated from the AST
·
caching : bytecode for imported modules is written to __pycache__
for reuse on future runs (the entry script may not always be cached).
Phew! That's quite a few steps for a process that only takes a few
milliseconds.
Let's analyze each step in more detail!
tokenization
Tokenization is the first step in the code abstraction; here the Virtual
Machine labels each sequence of characters with an internally defined
token. We can view this in action by tokenizing our
mochi.py file
with the
tokenize module:
import tokenize
import token
with open("mochi.py", "rb") as f:
tokens = tokenize.tokenize(f.readline)
for tok in tokens:
print(f"{tok.string:<10} -> {token.tok_name[tok.type]}")
Here is the part of the raw output of tokenizing the
mochi.py file:
TokenInfo(type=63 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
TokenInfo(type=1 (NAME), string='from', start=(1, 0), end=(1, 4), line='from dog import Dog\n')
TokenInfo(type=1 (NAME), string='dog', start=(1, 5), end=(1, 8), line='from dog import Dog\n')
TokenInfo(type=1 (NAME), string='import', start=(1, 9), end=(1, 15), line='from dog import Dog\n')
TokenInfo(type=1 (NAME), string='Dog', start=(1, 16), end=(1, 19), line='from dog import Dog\n')
...
A formatted output shows how each
word from the Python file is
mapped to a specific token, depending on its role. For example, the first
import statement becomes:
from dog
import Dog
utf-8 -> ENCODING
from -> NAME
dog -> NAME
import -> NAME
Dog -> NAME
...
parsing
During the parsing step, the stream of tokens built during tokenization is
organized in an
Abstract Syntax Tree (AST). Here is the code that
pretty-prints a file's AST:
import ast
with open("mochi.py", "r") as f:
source = f.read()
tree = ast.parse(source)
print(ast.dump(tree, indent=4))
And here is what the formatted AST for our
mochi.py file looks
like:
Module(
body=[
ImportFrom(
module='dog',
names=[
alias(name='Dog')],
level=0),
ImportFrom(
module='utils',
names=[
alias(name='jumps')],
level=0),
Assign(
targets=[
Name(id='name', ctx=Store())],
value=Constant(value='good doggo')),
Assign(
targets=[
Name(id='dog', ctx=Store())],
value=Call(
func=Name(id='Dog', ctx=Load()),
args=[
Name(id='name', ctx=Load()),
Constant(value='chocolate')],
keywords=[])),
Expr(
value=Call(
func=Name(id='print', ctx=Load()),
args=[
JoinedStr(
values=[
FormattedValue(
value=Attribute(
value=Name(id='dog', ctx=Load()),
attr='name',
ctx=Load()),
conversion=-1),
Constant(value=', '),
FormattedValue(
value=Attribute(
value=Name(id='dog', ctx=Load()),
attr='color',
ctx=Load()),
conversion=-1),
Constant(value=', '),
FormattedValue(
value=Name(id='name', ctx=Load()),
conversion=-1),
Constant(value=' jumps ')]),
Call(
func=Name(id='jumps', ctx=Load()),
args=[],
keywords=[])],
keywords=[])),
Expr(
value=Call(
func=Attribute(
value=Name(id='dog', ctx=Load()),
attr='bark',
ctx=Load()),
args=[],
keywords=[]))],
type_ignores=[])
This piece of code might look intimidating, but it is simply what Python
uses internally to understand the code structure before generating
bytecode. Each node represents a language construct (statements,
expressions, functions, literals) and is identified by a specific type
(Module, Assign, Name, etc...).
scope analysis
Python needs to know where each variable lives and which parts of your
program can see it. Recall that scopes in Python can be:
·
global: names defined at the top level of a module
·
enclosing/non-local: names found in the scope of any enclosing
functions.
·
local: applies to names defined within a function or method
During scope analysis, Python walks through the AST and builds a stack of
scope frames. The stack helps to keep track of the nested scope levels.
During the AST traversal, the stack is used to backtrack and understand
which scope level a node belongs to.
bytecode generation & caching
In the final stages of compilation, the AST is used to generate
bytecode. Each Python file is compiled individually and even if two
files contain identical code, they each get their own bytecode file and
their own entry in
__pycache__.
Bytecode is sensitive to the Python version: the same
.py compiled
on
Python 3.11 vs
3.12 will produce slightly different
bytecode files. Imported modules are cached in
__pycache__, so
repeated runs don't need to recompile unless the source changes.
The
dis (disassemble) package renders the
.pyc file:
import dis
import mochi
dis.dis(mochi)
And here is what our
mochi.py file looks like in bytecode:
Disassembly of Dog:
Disassembly of __init__:
3 0 RESUME 0
4 2 LOAD_FAST 1 (name)
4 LOAD_FAST 0 (self)
6 STORE_ATTR 0 (name)
5 16 LOAD_FAST 2 (color)
18 LOAD_FAST 0 (self)
20 STORE_ATTR 1 (color)
30 LOAD_CONST 0 (None)
32 RETURN_VALUE
Disassembly of bark:
7 0 RESUME 0
8 2 LOAD_GLOBAL 1 (NULL + print)
14 LOAD_CONST 1 ('woof')
16 PRECALL 1
20 CALL 1
30 POP_TOP
32 LOAD_CONST 0 (None)
34 RETURN_VALUE
Disassembly of jumps:
4 0 RESUME 0
5 2 LOAD_GLOBAL 1 (NULL + random)
14 LOAD_ATTR 1 (randint)
24 LOAD_CONST 1 (1)
26 LOAD_CONST 2 (10)
28 PRECALL 2
32 CALL 2
42 RETURN_VALUE
As you can see, our original
mochi.py file - written solely in
Python - is now a set of instructions in
bytecode, and it is ready
to be interpreted and executed!
interpretation in Python
Now that Python code has been compiled into bytecode, the interpreter
steps in to actually run it.
The internal Virtual Machine (VM) reads one bytecode instruction at a time
and executes it. The Virtual Machine runs an internal
evaluation loop, which fetches the next instruction, interprets it,
and dispatches it to the corresponding C function that actually performs
the work. Many checks happen during interpretation, including type checks,
scope lookups, and error handling.
If there are no errors and everything compiled correctly, the code finally
comes alive!
observations
Compilers and interpreters are a very broad and deep topic, but I hope I
was able to give a bird's eye overview of what happens in the background
when running a Python script.
While learning about these topics, these observations came to mind:
-
AST traversal time complexity: the AST is a regular non-binary
tree, therefore the time complexity is O(n) - where n is
the number of language constructs, which includes
functions. Since time complexity is linear, I wondered whether
refactoring and splitting functions - a common and recommended
practice to improve readability and scope definition - might impact
the traversal time of the AST. However, even though refactoring
increases the number of functions and nodes in the AST, it doesn't
fundamentally impact the compilation runtime — most operations
complete within milliseconds.
-
circular imports: I now have a better understanding of circular
imports in Python. Circular imports occur when two files try to import
each other - for example, file A imports file B, and file B imports
file A. The circular loop causes the interpreter to load a module that
hasn't finished executing yet (A -> B -> back to A), which raises an
ImportError. I found it interesting that this type of error is
raised during interpretation rather than compilation.
final thoughts
So far, learning about compilers and interpreters in Python has been a
very rewarding journey. It is a vast topic, and I am looking forward to
continuing deepening and refining my understanding.
As always, I welcome your feedback - perhaps you can share what your
experience has been while learning about compilation and interpretation in
Python?
Till next time!