
Python Modules and Packages
Created At: Jan. 23, 2025, 3:31 a.m.
Updated At: Jan. 23, 2025, 7:03 a.m.
Understanding Python Modules
When you exit and re-enter the Python interpreter, any definitions you made (like functions and variables) are lost. For longer programs, it's better to write your code in a text editor and run it as a script. As your program grows, you can split it into multiple files for easier maintenance and reuse of functions without copying their definitions into each script.
Python supports this through modules, which are files containing Python definitions and statements. These modules can be imported into other modules or into the main script, making the code more modular and reusable. The module name is the file name with a .py
suffix, and within the module, its name is available as the value of the global variable __name__
.
For example, create a file named fibo.py
with functions to compute Fibonacci numbers:
# Fibonacci number module
def fib(n): # write Fibonacci series up to n
a, b = 0, 1
while a < n:
print(a, end=' ')
a, b = b, a + b
print()
def fib2(n): # return Fibonacci series up to n
result = []
a, b = 0, 1
while a < n:
result.append(a)
a, b = b, a+b
return result
You can import this module in the Python interpreter using:
import fibo
Now, you can call its functions:
fibo.fib(1000)
fibo.fib2(100)
fibo.__name__
If you use a function often, you can assign it to a local name:
fib = fibo.fib
fib(500)
More on Modules
A module can contain executable statements and function definitions. These statements are executed only the first time the module is imported. Each module has its own private namespace, acting as the global namespace for all functions defined within it. This means the module’s author can freely use global variables without worrying about accidental name collisions with those defined by users or other modules.
By keeping global variables contained within their respective modules, Python helps maintain cleaner and more organized code, making it easier to manage and debug large projects.
Modules can also import other modules, and it is customary to place all import statements at the beginning of a module or script. If placed at the top level of a module (outside any functions or classes), these imports are added to the module’s global namespace.
There is a variant of the import statement that imports names from a module directly into the importing module’s namespace:
from fibo import fib, fib2
fib(500)
This does not introduce the module name from which the imports are taken in the local namespace (so in the example, fibo
is not defined).
There is even a variant to import all names that a module defines:
from fibo import *
fib(500)
This imports all names except those beginning with an underscore (_). Most Python programmers avoid this practice as it introduces an unknown set of names into the interpreter, potentially hiding existing definitions. However, it can be useful for saving typing in interactive sessions.
You can also use as
to give an imported module an alias:
import fibo as fib
fib.fib(500)
This is effectively the same as import fibo
, but the module is available as fib
.
To reload a module during an interpreter session, you can use importlib.reload()
:
import importlib
importlib.reload(fibo)
Running Python Modules as Scripts
When you run a Python module with:
python fibo.py <arguments>
The code within the module is executed just as if you had imported it, but with __name__
set to "__main__"
. By adding this code at the end of your module:
if __name__ == "__main__":
import sys
fib(int(sys.argv[1]))
You can make the file usable as a script and an importable module. The code that parses the command line only runs if the module is executed as the main file:
python fibo.py 50
0 1 1 2 3 5 8 13 21 34
If the module is imported, the code is not run:
import fibo
This technique is often used to provide a convenient user interface to a module or for testing purposes (running the module as a script executes a test suite).
By leveraging the if __name__ == "__main__":
idiom, Python allows you to write modules that can be both reusable as imports and directly executable scripts, enhancing modularity and usability in your code.
Your content for the second part looks great and provides detailed insights into how Python finds and manages modules, speeds up module loading, the sys
built-in module, and the dir()
function. Here’s a slightly refined version to ensure consistency and clarity:
How Python Finds Modules
When a module named spam
is imported, the Python interpreter first looks for a built-in module with that name. These module names are listed in sys.builtin_module_names
. If it doesn't find one, it searches for a file named spam.py
in a list of directories specified by the sys.path
variable. This search path is initialized from several locations:
- The directory containing the input script (or the current directory if no file is specified).
PYTHONPATH
(a list of directory names, with the same syntax as the shell variablePATH
).- The installation-dependent default (including a
site-packages
directory, handled by thesite
module).
More details can be found in the Python documentation on the initialization of the sys.path
module search path.
Note: On file systems that support symlinks, the directory containing the input script is calculated after the symlink is followed. Thus, the directory containing the symlink is not added to the module search path.
After initialization, Python programs can modify sys.path
. The directory containing the script being run is placed at the beginning of the search path, ahead of the standard library path. This means that scripts in that directory will be loaded instead of modules of the same name in the library directory, which can lead to errors unless the replacement is intended. For more information, see the section on Standard Modules in the Python documentation.
Speeding Up Module Loading in Python
To accelerate module loading, Python caches the compiled version of each module in the __pycache__
directory under the name module.version.pyc
, where the version encodes the format of the compiled file and generally includes the Python version number. For example, in CPython release 3.3, the compiled version of spam.py
would be cached as __pycache__/spam.cpython-33.pyc
. This naming convention allows compiled modules from different releases and Python versions to coexist.
Automatic Recompilation and Platform Independence Python automatically checks the modification date of the source file against the compiled version to determine if recompilation is needed. This process is platform-independent, allowing the same library to be shared among systems with different architectures.
Exceptions to Caching Python does not check the cache in two circumstances: 1. It always recompiles and does not store the result for modules loaded directly from the command line. 2. It does not check the cache if there is no source module. To support a non-source (compiled only) distribution, the compiled module must be in the source directory, and there must not be a source module.
Expert Tips
- Use the -O
or -OO
switches on the Python command to reduce the size of a compiled module. The -O
switch removes assert
statements, while the -OO
switch removes both assert
statements and __doc__
strings. Be cautious with these options, as some programs may rely on having these available.
- “Optimized” modules have an opt-
tag and are usually smaller. Future releases may change the effects of optimization.
- The module compileall
can create .pyc
files for all modules in a directory.
Performance Considerations
A program does not run faster when read from a .pyc
file compared to a .py
file; the only advantage of .pyc
files is the speed with which they are loaded.
For more detailed information, including a flow chart of the process, refer to PEP 3147.
Python Standard Modules and the sys
Built-in Module
Python comes with a comprehensive library of standard modules. Some modules are built into the interpreter to provide access to operations that are not part of the core language but are essential for efficiency or interfacing with operating system primitives, such as system calls. The availability of these modules depends on the underlying platform. For example, the winreg
module is only available on Windows systems.
One particularly noteworthy module is sys
, which is built into every Python interpreter. The variables sys.ps1
and sys.ps2
define the primary and secondary prompt strings used in interactive mode:
>>> import sys
>>> sys.ps1
'>>> '
>>> sys.ps2
'... '
>>> sys.ps1 = 'C> '
C> print('Yuck!')
Yuck!
C>
These variables are only defined if the interpreter is in interactive mode.
The sys.path
variable is a list of strings that determines the interpreter’s search path for modules. It is initialized from the PYTHONPATH
environment variable or from a built-in default if PYTHONPATH
is not set. You can modify sys.path
using standard list operations:
>>> import sys
>>> sys.path.append('/ufs/guido/lib/python')
This dynamic modification of sys.path
allows you to control where Python looks for modules, facilitating customization of the module search path. For more detailed information, refer to the Python Library Reference.
Exploring the dir()
Built-in Function in Python
The built-in dir()
function in Python is a handy tool to discover the names a module defines. It returns a sorted list of strings representing the names.
For instance:
import fibo, sys
print(dir(fibo))
# Output: ['__name__', 'fib', 'fib2']
print(dir(sys))
# Output: ['__breakpointhook__', '__displayhook__', '__doc__', '__excepthook__', ...]
When used without arguments, dir()
lists the names you have currently defined:
a = [1, 2, 3, 4, 5]
import fibo
fib = fibo.fib
print(dir())
# Output: ['__builtins__', '__name__', 'a', 'fib', 'fibo', 'sys']
This list includes all types of names: variables, modules, functions, etc.
However, dir()
does not list the names of built-in functions and variables. For that, you can use the builtins
module:
import builtins
print(dir(builtins))
# Output: ['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', ...]
Using dir()
, you can easily inspect the contents of modules and the current environment, making it a powerful tool for understanding and debugging your code.
Summary
In summary, using scripts and modules in Python helps in organizing, maintaining, and reusing code effectively. Happy coding!