Interpreted languages, such as Python, typically take raw source code and generate bytecode. Bytecode is encoded instructions that are on a lower level than source code but not quite as optimized as machine code, that is, assembly language.
Bytecode is often executed within the interpreter (which is a type of virtual machine), though it can also be compiled further into assembly language. Bytecode is used primarily to allow easy, cross-platform compatibility. Python, Java, Ruby, Perl, and similar languages, are examples of languages that use bytecode interpreters for different architectures while the source code stays the same.
While Python automatically compiles source code into bytecode, there are some options and features that can be used to modify how the interpreter works with bytecode. These options can improve the performance of Python programs, a key feature as interpreted languages are, by nature, slower than compiled languages
- To create bytecode, simply execute a Python program via
python <program>.py
. - When running a Python command from the command line, there are a couple of switches that can reduce the size of the compiled bytecode. Be aware that some programs may expect the statements that are removed from the following examples to function correctly, so only use them if you know what to expect.
-O removes assert
statements from the compiled code. These statements provide some debugging help when testing the program, but generally aren't required for production code.
-OO removes both assert
and __doc__
strings for even more size reduction.
- Loading programs from bytecode into memory is faster than with source code, but actual program execution is no faster (due to the nature of the Python interpreter).
- The
compileall
module can generate bytecode for all modules within a directory. More information on the command can be found at https://docs.python.org/3.6/library/compileall.html.
When source code (.py
) is read by the Python interpreter, the bytecode is generated and stored in __pycache__
as <module_name>.<version>.pyc
. The .pyc
extension indicates that it is compiled Python code. This naming convention is what allows different versions of Python code to exist simultaneously on the system.
When source code is modified, Python will automatically check the date with the compiled version in cache and, if it's out of date, will automatically recompile the bytecode. However, a module that is loaded directly from the command line will not be stored in __pycache__
and is recompiled every time. In addition, if there is no source module, the cache can't be checked, that is, a bytecode-only package won't have a cache associated with it.
Because bytecode is platform-independent (due to being run through the platform's interpreter), Python code can be released either as .py
source files or as .pyc
bytecode. This is where bytecode-only packages come into play; to provide a bit of obfuscation and (subjective) security, Python programs can be released without the source code and only the pre-compiled .pyc
files are provided. In this case, the compiled code is placed in the source directory rather than the source-code files.