python compiler & interpreter

overview

Ever wondered how Python really works under the hood? This guide will walk you through Python's compilation and interpretation processes — from source code to execution. It's written for anyone with basic Python knowledge, but I would love feedback from readers of all levels!

Note
* Throughout this guide, I refer to CPython, the most commonly adopted implementation, which you can download from the official distribution page.

what happens when you run code?

Whether you are just starting on your programming journey or have years of experience under your belt, the moment between writing a script and hitting the run button is always a thrilling one. It's a split second where inner monologues can range anywhere from "How exciting, it is ah-live!", to "I am confident this time it will work", and "I swear I am switching careers".

But what really happens in that split second after you hit the run button? What goes on in the background that makes the code either come alive or abruptly terminate? While implementation differs from language to language, the core concept is the same: human-readable code has to be translated to machine-readable code. And this is where compilers and interpreters become our friends.

human to machine

Compilers and interpreters both have the same scope: to convert user-written code to machine-readable code — they just go about it in different ways:

compilers scan the whole module/file before running the code; errors are caught before the code is executed.
interpreters read and immediately execute the code line by line, and terminate at the first error that is encountered.

Since the interpreter parses and runs code line by line, compiled languages are generally considered faster in executing code.

does Python interpret or compile code?

Python actually does both! It is a hybrid, as it first compiles and then interprets the code.

During compilation, pure Python scripts are translated to machine-readable bytecode files. Bytecode is platform-independent, meaning that it can be run on any machine with a compatible interpreter. After compilation, the interpreter analyzes and executes the bytecode files.

Let's see how it all plays out, step by step!

how is Python code run?

Let's imagine we are building a virtual representation of our pet dog, Mochi. In the script mochi.py, we store methods and variables that describe Mochi's features and characteristics. We can bring Mochi to virtual life by running the script and typing:

 python mochi.py

in the terminal. The python directive kicks off the compilation process in the background, and the mochi.py directive indicates which file it should start from.

During compilation, our mochi.py script is translated into bytecode. If no errors are found, the interpreter analyzes and executes the bytecode:

run

python mochi.py
in terminal

→

compile

Python code is turned into bytecode

→

interprete

Virtual Machine executes bytecode

compiling multiple files

In our current setup, all of our logic is stored in one big script - mochi.py. What happens when we add more files to our application? How does Python know which files to compile and in what order?

Let's refactor part of our mochi.py code into multiple files:

dog.py - we isolate the logic that defines a generic dog class
utils.py - here we can store the bark method, which randomizes the number of times Mochi barks

To ensure the refactored code is still available in mochi.py, we link the new files via import statements. It is exactly these import statements that help Python understand the relationship between scripts, and define the order in which files are compiled and executed.

# mochi.py

from dog import Dog
from utils import bark

name = "Mochi"
dog = Dog(name, "chocolate")

print(f"{dog.name}, {dog.color}, {name} barks: ", bark())

After refactoring, our mochi.py file imports the dog.py and utils.py files, and the utils.py file imports the random package from the standard library:

├── mochi.py             # brings mochi to life 
  ├── dog.py             # generic class for building good doggos
  └── utils.py           # randomizes number of barks
    └── random           # import from standard library

Now we can safely run the program again.

ready, set, action!

Now that we understand how to run Python scripts and how files are linked to each other, let's take a closer look at how Python's compilation and interpretation processes work behind the scenes.

compilation in Python

As soon as a Python script is run, the compilation process automatically starts running in the background. During this stage, each Python (.py) file is translated into a bytecode (.pyc) file, which is machine-friendly code.

Compilation consists of a few steps:
· tokenization: the source code is split into tokens; each word is labeled according to its role
· parsing: tokens are used to build an Abstract Syntax Tree (AST); any syntax errors are raised during this step
· scope analysis: the AST is traversed and scopes (local, global, nonlocal) are assigned to each node
· optimization: lightweight optimizations are applied
· bytecode generation: a .pyc file containing bytecode is generated from the AST
· caching : bytecode for imported modules is written to __pycache__ for reuse on future runs (the entry script may not always be cached).

Phew! That's quite a few steps for a process that only takes a few milliseconds.

Let's analyze each step in more detail!

tokenization
Tokenization is the first step in the code abstraction; here the Virtual Machine labels each sequence of characters with an internally defined token. We can view this in action by tokenizing our mochi.py file with the tokenize module:

# tokenizing the mochi.py file
import tokenize
import token

with open("mochi.py", "rb") as f:
    tokens = tokenize.tokenize(f.readline)
    for tok in tokens:
        print(f"{tok.string:<10} -> {token.tok_name[tok.type]}")

Here is the part of the raw output of tokenizing the mochi.py file:

TokenInfo(type=63 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
TokenInfo(type=1 (NAME), string='from', start=(1, 0), end=(1, 4), line='from dog import Dog\n')
TokenInfo(type=1 (NAME), string='dog', start=(1, 5), end=(1, 8), line='from dog import Dog\n')
TokenInfo(type=1 (NAME), string='import', start=(1, 9), end=(1, 15), line='from dog import Dog\n')
TokenInfo(type=1 (NAME), string='Dog', start=(1, 16), end=(1, 19), line='from dog import Dog\n')
...

A formatted output shows how each word from the Python file is mapped to a specific token, depending on its role. For example, the first import statement becomes:

# first line in mochi.py
from dog import Dog

# each directive is labeled with a token


utf-8      -> ENCODING
from       -> NAME
dog        -> NAME
import     -> NAME
Dog        -> NAME

...

parsing
During the parsing step, the stream of tokens built during tokenization is organized in an Abstract Syntax Tree (AST). Here is the code that pretty-prints a file's AST:


import ast

with open("mochi.py", "r") as f:
    source = f.read()

# Parse source code into an AST
tree = ast.parse(source)

# Print a textual representation
print(ast.dump(tree, indent=4))

And here is what the formatted AST for our mochi.py file looks like:

Module(
    body=[
        ImportFrom(
            module='dog',
            names=[
                alias(name='Dog')],
            level=0),
        ImportFrom(
            module='utils',
            names=[
                alias(name='jumps')],
            level=0),
        Assign(
            targets=[
                Name(id='name', ctx=Store())],
            value=Constant(value='good doggo')),
        Assign(
            targets=[
                Name(id='dog', ctx=Store())],
            value=Call(
                func=Name(id='Dog', ctx=Load()),
                args=[
                    Name(id='name', ctx=Load()),
                    Constant(value='chocolate')],
                keywords=[])),
        Expr(
            value=Call(
                func=Name(id='print', ctx=Load()),
                args=[
                    JoinedStr(
                        values=[
                            FormattedValue(
                                value=Attribute(
                                    value=Name(id='dog', ctx=Load()),
                                    attr='name',
                                    ctx=Load()),
                                conversion=-1),
                            Constant(value=', '),
                            FormattedValue(
                                value=Attribute(
                                    value=Name(id='dog', ctx=Load()),
                                    attr='color',
                                    ctx=Load()),
                                conversion=-1),
                            Constant(value=', '),
                            FormattedValue(
                                value=Name(id='name', ctx=Load()),
                                conversion=-1),
                            Constant(value=' jumps ')]),
                    Call(
                        func=Name(id='jumps', ctx=Load()),
                        args=[],
                        keywords=[])],
                keywords=[])),
        Expr(
            value=Call(
                func=Attribute(
                    value=Name(id='dog', ctx=Load()),
                    attr='bark',
                    ctx=Load()),
                args=[],
                keywords=[]))],
    type_ignores=[])

This piece of code might look intimidating, but it is simply what Python uses internally to understand the code structure before generating bytecode. Each node represents a language construct (statements, expressions, functions, literals) and is identified by a specific type (Module, Assign, Name, etc...).

scope analysis
Python needs to know where each variable lives and which parts of your program can see it. Recall that scopes in Python can be:
· global: names defined at the top level of a module
· enclosing/non-local: names found in the scope of any enclosing functions.
· local: applies to names defined within a function or method

During scope analysis, Python walks through the AST and builds a stack of scope frames. The stack helps to keep track of the nested scope levels. During the AST traversal, the stack is used to backtrack and understand which scope level a node belongs to.

bytecode generation & caching
In the final stages of compilation, the AST is used to generate bytecode. Each Python file is compiled individually and even if two files contain identical code, they each get their own bytecode file and their own entry in __pycache__.

Bytecode is sensitive to the Python version: the same .py compiled on Python 3.11 vs 3.12 will produce slightly different bytecode files. Imported modules are cached in __pycache__, so repeated runs don't need to recompile unless the source changes.

The dis (disassemble) package renders the .pyc file:


import dis
import mochi

dis.dis(mochi)

And here is what our mochi.py file looks like in bytecode:

Disassembly of Dog:
Disassembly of __init__:
  3           0 RESUME                   0

  4           2 LOAD_FAST                1 (name)
              4 LOAD_FAST                0 (self)
              6 STORE_ATTR               0 (name)

  5          16 LOAD_FAST                2 (color)
             18 LOAD_FAST                0 (self)
             20 STORE_ATTR               1 (color)
             30 LOAD_CONST               0 (None)
             32 RETURN_VALUE

Disassembly of bark:
  7           0 RESUME                   0

  8           2 LOAD_GLOBAL              1 (NULL + print)
             14 LOAD_CONST               1 ('woof')
             16 PRECALL                  1
             20 CALL                     1
             30 POP_TOP
             32 LOAD_CONST               0 (None)
             34 RETURN_VALUE


Disassembly of jumps:
  4           0 RESUME                   0

  5           2 LOAD_GLOBAL              1 (NULL + random)
             14 LOAD_ATTR                1 (randint)
             24 LOAD_CONST               1 (1)
             26 LOAD_CONST               2 (10)
             28 PRECALL                  2
             32 CALL                     2
             42 RETURN_VALUE

As you can see, our original mochi.py file - written solely in Python - is now a set of instructions in bytecode, and it is ready to be interpreted and executed!

interpretation in Python

Now that Python code has been compiled into bytecode, the interpreter steps in to actually run it.

The internal Virtual Machine (VM) reads one bytecode instruction at a time and executes it. The Virtual Machine runs an internal evaluation loop, which fetches the next instruction, interprets it, and dispatches it to the corresponding C function that actually performs the work. Many checks happen during interpretation, including type checks, scope lookups, and error handling.

If there are no errors and everything compiled correctly, the code finally comes alive!

observations

Compilers and interpreters are a very broad and deep topic, but I hope I was able to give a bird's eye overview of what happens in the background when running a Python script.

While learning about these topics, these observations came to mind:

AST traversal time complexity: the AST is a regular non-binary tree, therefore the time complexity is O(n) - where n is the number of language constructs, which includes functions. Since time complexity is linear, I wondered whether refactoring and splitting functions - a common and recommended practice to improve readability and scope definition - might impact the traversal time of the AST. However, even though refactoring increases the number of functions and nodes in the AST, it doesn't fundamentally impact the compilation runtime — most operations complete within milliseconds.

circular imports: I now have a better understanding of circular imports in Python. Circular imports occur when two files try to import each other - for example, file A imports file B, and file B imports file A. The circular loop causes the interpreter to load a module that hasn't finished executing yet (A -> B -> back to A), which raises an ImportError. I found it interesting that this type of error is raised during interpretation rather than compilation.

final thoughts

So far, learning about compilers and interpreters in Python has been a very rewarding journey. It is a vast topic, and I am looking forward to continuing deepening and refining my understanding.

As always, I welcome your feedback - perhaps you can share what your experience has been while learning about compilation and interpretation in Python?

Till next time!