Home Cryptocurency ExchangesMod_Python – Integrating Python with Apache

Mod_Python – Integrating Python with Apache

Gregory
Trubetskoy

Abstract

Mod_python [1] is an Apache
server [2]
module that embeds the Python interpreter within the
server and provides an interface to Apache server internals as well
as a basic framework for simple application development in this
environment. The advantages of mod_python are versatility and speed.

This paper describes mod_python with the focus on the
implementation, its philosophy and challenges.

It is intended for an audience already familiar with web
application development in general and Apache in particular, as well
as preferably mod_python itself. Knowledge of C and some
understanding of Python internals is helpful as well.

Project Goal

Quite simply – it is integration of Python and Apache. Apache is a
sort of a Swiss knife of web serving, especially the upcoming 2.0
version, which does not limit itself to HTTP but can serve any
protocol for which there exists a module. Mod_python aims to provide
direct access to the riches of this functionality for Python
developers.

While speed is definitely a key benefit of mod_python and is taken
very seriously during design decisions, it would be wrong to identify
it as the sole reason for mod_python’s existence.

At least for now, providing “inline Python” type
functionality a lá PHP [15] is not a goal
of this project. This is because the integration with Apache can
still use a lot of improvement, and there does not seem to be a clear
consensus within the Python community on how to embed Python code in
HTML, with quite a few modules floating around, each doing it their
own way.

Project Status

Mod_python was initially released in April 2000 as a replacement
for an earlier project called Httpdapy [3]
(1998), which in turn was a port to Apache of Nsapy
[4]
(1997). Nsapy was based on an embedding example by Aaron
Watters in the Internet Programming with Python [5]
book.

Mod_python is stable enough to be used in production. The latest
stable version at the time of this writing is 2.7.6. This version is
written for 1.3 version of the Apache server. All of the development
effort these days is focused on the next major version of mod_python,
3.0, which will support the upcoming Apache 2.0.

Quick Intro

Mod_python consists of two components – an Apache dynamically
loadable module mod_python.so (this module can also be
statically linked into Apache) and a Python package mod_python.

Assuming that mod_python is loaded into Apache, consider this
configuration excerpt:

DocumentRoot /foo/bar

AddHandler python-program .py
PythonHandler hello

The following script named hello.py resides in the /foo/bar
directory:

from mod_python import apache

def handler(req):
req.send_http_header()
req.write(“hello %s” % req.remote_host)
return apache.OK

A request to http://yourdomain/somefile.py would result
in a page showing “hello 1.2.3.4” where
1.2.3.4 is the IP of the client.

Just about every mod_python script begins with “from
mod_python import apache”. apache is a
module inside the mod_python package that provides the
interface to Apache constants (such as OK) and many
useful functions. Note also the Request object req,
which provides information about the current request, the connection
and an interface to more internal Apache functions, in this example
send_http_header() to send HTTP headers and write()
method to send data back to the client.

Apache Modules, Request Phases and Mod_python

Apache processes incoming requests in phases. A phase is
one of a series of small tasks that each need to take place to
service a request. For example, there is a phase during which a URI
is mapped to a file on disk, a phase during which authentication
happens, a phase to generate the content, etc. Altogether, Apache 1.3
has 10 phases (11 if you consider clean-ups a phase).

The key architectural feature of the Apache server is that it can
allow a module to process any phase of a request. This way a
module can augment the server behavior in any way whatsoever. (module
in this context does not refer to a Python module; an Apache module
is usually a shared library or DLL that gets loaded at server
startup, though modules can also be statically linked with the
server).

Mod_python is an Apache module. What makes it different from most
other Apache modules is that it itself doesn’t do anything, but
provide the ability to do what Apache modules written in C do to be
done in Python. To put it another way, it delegates phase processing
to user-written Python code.

This figure shows a diagram of Apache request processing.

Each Apache module can provide a handler function for any of the
request processing phases. There are 4 types of return values
possible for every handler.

  1. DECLINED means the module declined to handle this phase,
    Apache moves to the next module in the module list.

  2. OK means that this phase has been processed, Apache will move
    on to the next phase without giving any more modules an opportunity
    to handle this phase.

  3. An error return (which is any HTTP [7]
    error constant) will cause Apache to produce an error page and jump
    to the Logging phase.

  4. A special value of DONE means the whole request has been
    serviced, Apache will jump to the Logging phase.

The DECLINED return is somewhat deceiving, because many modules
actually perform some action and then return DECLINED to give other
modules an opportunity to handle the phase. The example below
illustrates how the DECLINED return can be used in a handler that
inserts a silly reply header into every request:

from mod_python import apache

def fixup(req):

req.headers_out[“X-Grok-this”] = “Python-Psychobabble”
return apache.DECLINED

At this point it should be a bit clearer how this functionality is
different from CGI environment. Comparing CGI with mod_python is not
very meaningful, because the scope of CGI is much narrower. One
difference is that CGI is intended exclusively for dynamic content
generation, which is not a requirement for mod_python scripts. For
example, consider a mod_python script that implements a custom
logging mechanism for the entire server, which plays no role in
content generation.

Apache Objects

Apache request processing makes use of a few important C
structures, access to which is available through mod_python.

request_rec – the Request Record

request_rec is probably the largest and most
frequently encountered structure. It contains all the information
associated with processing a request (about 50 members total).

Mod_python provides a wrapper around request_rec, a
built-in type mp_request. The mp_request
type is not meant to be used directly. Instead, each mod_python
handler gets a reference to an instance of a Request
class, a regular Python class which is a wrapper around mp_request
(which is a wrapper around request_rec). This is so that
mod_python users could attach their own attributes to the Request
instance as a way to maintain state across different phases.

The Request class provides methods for sending
headers and writing data to the client.

conn_rec – the Connection Record

conn_rec keeps all the information associated with
the connection. It is a separate structure from request_rec
because HTTP [7] allows for multiple requests
to be serviced over the same connection.

The connection record is accessible in mod_python through the
mp_conn built-in type, a reference to which is always
available via connection member of the Request
object (req.connection).

server_rec – the Server Record

server_rec keeps all the information associated with
the virtual server, such as the server name, its IP, port number,
etc. It is available via the server member of the
Request object (req.server).

ap_table – Apache table

All key/value lists (for example RFC 822 [8]
headers) in Apache are stored in tables. A table is a
construct very similar to a Python dictionary, except that both keys
and values must be strings, key lookups are case insensitive and a
table can have duplicate keys. Internally, Apache tables differ from
Python dictionaries in that lookups do not using hashing, but rather
a simple sequential search (although there was a proposal to use
hashing in Apache 2.0).

Mod_python provides a wrapper for tables, an mp_table
object, which acts very much like a Python dictionary. If there are
duplicate keys, mp_table will return a list. To allow
addition of duplicate keys, mp_table provides an add()
method.

Here is some code to illustrate how mp_table acts:

from mod_python import apache

def handler(req):

t = apache.make_table()
t[“Set-Cookie”] = “Foo: bar;”
t.add(“Set-Cookie”) = “Bar: foo;”

s = t[“Set-Cookie”] # s is [“Foo: bar;”, “Bar: foo;”]

return apache.DECLINED

Subinterpreters

The Python C API has a function to initialize a sub-interpreter,
Py_NewInterprer(). Here is an excerpt from the Python/C
API Reference manual [6]
documenting this function:

Create a new sub-interpreter. This is an (almost) totally
separate environment for the execution of Python code. In particular,
the new interpreter has separate, independent versions of all
imported modules, including the fundamental modules __builtin__ ,
__main__ and sys . The table of loaded modules (sys.modules) and the
module search path (sys.path) are also separate. The new environment
has no sys.argv variable. It has new standard I/O stream file objects
sys.stdin, sys.stdout and sys.stderr (however these refer to the same
underlying FILE structures in the C library).

This valuable feature of Python is not available from within
Python itself, so most Python users are not even aware of it. But it
makes good sense to take advantage of this functionality for
mod_python, where one Apache process can be responsible for any
number of unrelated applications at the same time. By default,
mod_python creates a subinterpreter for each virtual server, but this
behavior can be altered.

When a subinterpreter is created, a reference to it is saved in a
Python dictionary keyed by subinterpreter names, which are always
strings. This dictionary is internal to mod_python.

During phase processing, prior to executing the user Python code,
mod_python has to decide which interpreter to use. By default, the
interpreter name will be the name of the virtual server, which is
available via req->server->server_hostname Apache
variable. If the PythonInterpPerDirectory is On,
then the name of the interpreter will be the directory being accessed
(from req->filename), and with
PythonInterpPerDirectiveOn,
the directory where the Python*Handler directive currently
in effect is specified (which can be some parent directory). The
interpreter name can also be forced using PythonInterpreter
directive.

Once mod_python has a name for the interpreter, we check the
dictionary of subinterpreters for this name, if it exists, we switch
to it, else a new subinterpreter is created.

Phase Processing Inside Mod_python

After mod_python has been given control by Apache to process a
phase of a request, it steps through the following actions. (This is
a simplified list.)

  • Determine the interpreter to use by looking at directives
    currently in effect, possibly the server name and the directory.

  • Get/Create a subinterpreter.

  • Get/Create a CallBack object. The CallBack object is a
    Python object whose methods provide all the functionality
    implemented in Python.

  • Create an mp_request object. (for performance
    reasons mp_conn and mp_server objects are
    created on-demand, so if the user code never refers to them they
    would never be created)

  • Call CallBack.Dispatch() passing it a reference
    to mp_request and the name of the phase being
    processed.

  • (From here on all the processing is done in Python rather
    than C)

  • Instantiate a Request object, a wrapper around
    mp_request.

  • Set up sys.path
    by prepending (if not already there) the directory being accessed.

  • Import (or if modification date is
    later than the last import, reload) the Python module specified in
    the configuration.

  • Locate the handler function/object
    inside the module.

  • Call the user
    function/object passing it a reference to Request
    object.

  • Return the return value to
    mod_python.

  • (At this point execution moves back from Python to C)

  • Mod_python returns the return
    value and control to Apache.

Memory Management and Cleanups

Memory management is always a challenge for long running
processes. One has to be very careful to always remember to free all
memory allocated during request processing, no matter what errors
take place.

To combat this problem, Apache provides memory pools. The
Apache API has a rich set of functions for allocating memory,
manipulating strings, lists, etc., and each of these functions always
takes a pool pointer. For example, instead of allocating memory using
malloc() et al, Apache modules allocate memory using
ap_palloc() and passing it a pool pointer. All memory
allocated in such a way can then be freed at once by destroying the
pool. Apache creates several pools with varying lifetimes, and
modules can create their own pools as well. The pool probably used
the most is the request pool, which is created for every request and
is destroyed at the end of the request.

Unfortunately, the Python interpreter cannot use Apache pools. So
for the most part, mod_python programmer is at the mercy of the
Python reference counting and garbage collecting mechanism (or lack
thereof). In most cases it works just fine. In those cases where you
do see the Apache process growing the simplest solution is to
configure the server to recycle itself every few thousand requests
using the MaxRequestsPerChild directive.

Apache provides API’s to execute cleanup functions just before a
pool is destroyed. A cleanup is registered by calling the
ap_register_cleanup() C function which takes three
arguments: a pool pointer, a function pointer, and a void pointer to
some arbitrary data. Just before the pool is destroyed, the function
will be called and passed the pointer as the only argument.
Mod_python uses cleanups internally to destroy mp_request
and mp_tables.

Cleanups are available to mod_python users via
Request.register_cleanup() and
request.server.register_cleanup(). The former runs after
every request, the latter runs when the server exits.

Standard Handlers

As an astute reader probably noticed, mod_python (or rather
Apache) associates a handler with a directory (SetHandler) or
a file type (AddHandler), but not a specific file. In the
quick example in the beginning of this paper it really doesn’t matter
what file is being accessed in the “/foo/bar” directory.
For as long as it ends with .py, same hello handler will
be invoked always yielding the same result. In fact the file referred
to in the URI doesn’t even need to exist.

A natural question would then be “Why can’t I access multiple
mod_python scripts in one directory?” (or “This isn’t very
useful!”). The answer here is that mod_python expects there to
be an intermediate layer between it and the application. This layer
(handler) is up to the user’s imagination, but a couple of functional
handlers (standard handlers) is bundled with mod_python.

CGI Handler (mod_python.cgihandler)

This handler is for users who want to use their existing CGI code
with mod_python. This handler sets up a fake CGI environment and runs
the user program. A couple of interesting implementation challenges
were encountered here.

At first, this handler used to set up the CGI environment through
the standard os.environ object. For whatever reason
(Python bug?) this frequent environment manipulation introduced a
memory leak (about a kilobyte per request), so as a quick hack,
os.environ was replaced with a regular dictionary
object. This works fine for the most part, but is a problem for
scripts that use environment as a way to communicate with
subsequently called programs, notably some database interfaces which
expect database server information in an environment variable.

Another problem was that since cgihandler uses import/reload to
run a module, “indirect” module imports by the “main”
module would become noops after the first hit. This became a problem
for users who expected the top level code in those indirectly
imported modules to be executed for every hit. To solve this problem,
cgihandler now examines the sys.modules variable before
and after importing the user scripts, and in the end, deletes any
newly appeared modules from sys.modules, causing those
modules to be imported again next time.

Last but not the least, the CGI specification [14]
strongly recommends that the server set the current directory to the
directory in which the script is located. There is no thread safe way
of changing the current directory and so the cgihandler uses a thread
lock in multithreaded environment (e.g. Win32) which is held for as
long as the script runs essentially forcing the server to process one
cgihandler request at a time.

Given all of the above problems, the cgihandler is not a
recommended development environment, but is regarded as a stop gap
measure for users who have a lot of legacy CGI code, and should be
used with caution and only if really necessary.

Publisher Handler (mod_python.publisher)

The publisher handler is probably the best way to start writing
web applications using mod_python. The functionality of the publisher
handler was inspired by the ZPublisher, a component of Zope
[10]
.

The idea is that a URI is mapped to some object inside a module,
the “/” in the URI having the same meaning as a “.”
in Python. So http://somedomain/somedir/module/object/method would
invoke method method of
object object inside
module module in
directory somedir, and
the return value of the method would be sent to the client.

Here is a “hello world” example:

def hello(req, who=”nobody”):

return “Hello, %s!” % who

If the file containing this code is called myapp.py in
directory somedir, then hello function can
be accessed via http://somedomain/somedir/myapp/hello which should
result in a page showing “Hello, nobody!”, whereas
http://somedomain/somedir/myapp/hello?who=John should result in
“Hello, John!”.

Note that the first argument is a Request object,
which means all the advanced mod_python functionality is still
available when using the publisher handler.

Debugging

Debugging mod_python applications can be difficult. Mod_python
provides support for the Python debugger (pdb) via the
PythonEnablePdb configuration directive, but its usability is
limited because the debugger is an interactive tool that uses
standard input and output and therefore can only be used when Apache
is running in foreground mode (-X switch in Apache 1.3 or
-DONE_PROCESS in 2.0).

Mod_python sends any traceback information to the server log, and
with PythonDebug directive set to On (default is Off),
the traceback information is sent to the client.

For programmers who like to use the print statement
as a debugging tool, the technique favored by the author is to
instead raise a variable optionally surrounded by “`”
(back quotes) from any point in the code with the PythonDebug
directive On. This will make
the value of the variable appear on the browser and is as effective
as print.

Threads

Mod_python is thread-safe and runs fine on Win32, where Apache is
multithreaded.

One should be careful to make sure that any extension modules that
an application uses are thread-safe as well. For example, many
database access drivers on Windows are not thread safe, and some kind
of a thread lock needs to be used to make sure no two threads try to
run the driver code in parallel.

Interestingly, the Python interpreter itself isn’t completely
thread safe, and to run multiple threads it maintains a thread lock
that is released every 10 Python bytecode instructions to let other
threads run. If any, the negative impact of that is most likely
negligible.

On Design and Implementation

Mod_perl

Those familiar with mod_perl [10] will
notice that some functionality of mod_python is remarkably similar to
mod_perl, for example the names of the Apache configuration
directives are exactly the same except the word Perl is substituted
for Python.

It would be wrong not to say that much of mod_python
functionality, especially in the area of Apache configuration, was
intentionally made functionally similar to mod_perl. Under the
hood they have next to nothing in common, mainly because Perl and
Python interpreters are quite different.

There were good reasons for similarities though. First, there is
no sense in reinventing the wheel – mod_perl has encountered and
solved many problems just as applicable to mod_python. Second, since
both projects had similar goals, except the language of choice was
different, it made sense to keep the outside look consistent,
especially the Apache configuration. Oftentimes the person who has to
deal with the Apache config is a System Administrator, not a
programmer, and consistency would make SysAdmin’s job easier.

Python vs C

In a web application environment speed and low overhead are
extremely important. Many people don’t appreciate how really
important it is until their site gets featured on another big volume
site (the so called “/. effect”) but instead of getting
lots of hard earned publicity, they get a bunch of frustrated web
surfers trying to get to a site so overloaded that no one can access
it.

Considering this angle, C always wins over Python. If the author
of mod_python had more time, a much larger percentage of mod_python
would be implemented in C. But given the length of time it takes to
write quality C code, initially a decision was made to implement in C
only those parts which cannot be done in Python.

SWIG

SWIG [13] was given some consideration as a
tool to provide the mapping to Apache C structures (such as
request_rec). There are a few problems with SWIG. The main advantages
of SWIG are speed and ease with which an interface to a C library can
be created. The resulting C code is not necessarily meant to be easy
to read, and SWIG itself becomes yet another tool that is required
for compilation in an already pretty complicated build environment.
Altogether, for a long-term project like mod_python, where quality is
more important than the timeline, SWIG does not seem to be the right
choice.

Future Direction and Apache 2.0

As has been mentioned before, the main focus of development today
is compatibility with Apache 2.0. Apache 2.0 is architecturally quite
a bit different from its predecessor (1.3), so much so that it would
not be very easy or practical to try to write code that works with
both 1.3 and 2.0. It is possible, but the code becomes a tangle of
#ifedef statements because the majority of the API
functions have been renamed. So the next major version of mod_python
will support Apache 2.0 only.

Apache 2.0 is actually a combination of two software packages. One
is the server itself, the other is the underlying library, the Apache
Portable Runtime (APR) [12]
. The APR is a general purpose library
designed to provide functionality common in daemons of all kinds and
to abstract the OS specifics (thus “Portable”). Future
versions of mod_python will eventually provide an interface to large
part or perhaps all of the APR.

Another big improvement in 2.0 is the introduction of filters
and connection handlers. The alpha version of
mod_python 3.0 already supports filters. (A filter would be the right
place to implement inline Python). A connection handler is a handler
at a level below HTTP. Using a connection handler one could implement
an entirely different protocol, e.g. FTP. At the time of this writing
mod_python 3.0 alpha does not support connection handlers, but such
support is in the plans.

References


[1] Mod_python. http://www.modpython.org/
[2]
Apache Http Server. http://httpd.apache.org/
[3]
Httpdapy. http://www.ispol.com/home/grisha/httpdapy
[4]
Nsapy. http://www.ispol.com/home/grisha/nsapy
[5]
Aaron Watters, Guido van Rossum, James C. Ahlstrom, Internet
Programming with Python, M&T Books, 1996.
[6] Guido van
Rossum, Fred L. Drake, Jr, Python/C API Reference Manual, PythonLabs.
http://www.python.org/doc/current/api/.

[7] R. Fielding, UC Irvine, J. Gettys, J. Mogul, DEC, H. Frystyk,
T. Berners-Lee, MIT/LCS, “Hyper Text Transfer Protocol —
HTTP/1.1”, RFC 2068, IETF January 1997.
http://www.ietf.org/rfc/rfc2068.txt
[9]
Crocker, D., “Standard for the Format of ARPA Internet Text
Messages”, STD 11, RFC 822, UDEL, August 1982.
http://www.ietf.org/rfc/rfc822.txt
[10]
Zope http://www.zope.org/
[11]
Mod_perl, Apache/Perl Integration. http://perl.apache.org/
[12]
Apache Portable Runtime. http://apr.apache.org/
[13]
Simplified Wrapper and Interface Generator. http://www.swig.org/
[14]
Ken A L Coar, The WWW Common Gateway Interface Version 1.1.
http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt

[15] PHP. http://www.php.net/

Source

Leave a Reply

Your email address will not be published. Required fields are marked *

Human Verification: In order to verify that you are a human and not a spam bot, please enter the answer into the following box below based on the instructions contained in the graphic.