PHP - Optimisation
From LXF Wiki
| Table of contents |
Practical PHP Programming
(Original version written by Paul Hudson for Linux Format magazine issue 41.)
What's faster than PHP code? Surely nothing! We show you how to make your scripts run 326x faster!
Everyone knows that PHP is faster than a speeding ticket, but can it be made to go faster? C programmers have for years trumpeted the fact that their language is extremely fast and therefore capable of handling performance-critical tasks. However, very often you'll find that when C programmers /really/ need performance, they use inline assembly code.
Open up your Linux kernel source (you /do/ have the kernel source to hand, right?), and pick what you consider to be a CPU intensive operation. I chose arch/i386/lib/mmx.c, the code that handles MMX/3dNow! instructions on compatible chips. Inside this file you'll see lots of quite complicated C code, but also extended instances of assembly code wherever speed is optimal. In fact, if you change directory to the root of the Linux kernel tree, try this command:
grep "__asm__" ./ -rl | wc -l
That searches all the files in the Linux source distribution for instances of assembly, and counts the number of files that match. In 2.5.65, the number of files in the kernel source that use assembly once or more is the rather ominous number of 666! So, C programmers using assembly is quite a widespread thing.
PHP programmers, although blessed with a naturally fast language, can also use a lower-level language for speed critical operations - although in our case, C is next on the food chain. While it's possible to use assembly code from PHP (through C, as C programmers do), there's more than enough speed improvement just switching to C, so that's what I will be covering here.
Please note that, within the extent of the space available, this is a no-holds-barred article - prior knowledge of C is required, knowledge of assembly would be good, and /very/ good knowledge of PHP is mandatory. Furthermore, in order to provide the most detailed description of how things work, this tutorial has been split into two parts. I hope you will agree it's worth it!
The C Perspective
PHP itself is written in C, as are Flex and Bison, the lexer and parser that PHP uses internally. The process of executing PHP code works by matching various parts of code against pre-defined lists of acceptable grammar. For example:
T_IF T_LEFTBRACK T_CONDITION T_RIGHTBRACK T_LEFTBRACE T_STATEMENT T_RIGHTBRACE
In that piece of pseudo-grammar, T stands for "Type". It will match a statement that starts with "if", then an opening bracket, followed by any boolean condition, followed by a close bracket, then an opening brace "{", a statement, then a closing brace "}". Sound familiar? PHP uses the same sort of rules -- although on a much more complicated level -- to parse your code.
PHP has hundreds of such rules, and, when it matches them, it calls appropriate internal C functions to handle the statement. For example, when PHP matches the following rule (this is direct from the PHP source code):
T_DOLLAR_OPEN_CURLY_BRACES T_STRING_VARNAME '[' expr ']' '}'
The Zend Engine will, amongst other things, call "fetch_array_begin(&$$, &$2, &$4 TSRMLS_CC)", which uses items 2 and 4 of the rule (T_STRING_VARNAME and expr) to read and return an array item. So, as you should be able to guess, that particular handler is for accessing arrays inside strings, eg: {$foo['bar']}
Because the code to execute your script is just compiled C code, it means that no matter how fast your PHP code is, it still has to be interpreted then executed as normal C code. PHP is not compiled to native machine code at any point, so there is never any chance of it out-performing C, or generally even coming close to the performance of C.
So, the way to make your PHP code faster is to replace chunks of it with pure, compiled C. In PHP, this can be done in three ways: writing your own module, editing the PHP source code, or editing the Zend Engine.
Writing a module for PHP is the accepted way to add functionality, and there are many modules available in PHP to do all sorts of tasks. However, modules are the slowest way to add functionality, particularly if calls to dl() are required to dynamically load the module each time a script needs it.
Writing functions directly into the PHP source code is faster than using modules, but only really possible if you're working on your own server. Finally, writing functions directly into the Zend Engine provides the biggest performance boost, but basically confines your script to your own machine - not many would be willing to patch their Zend Engine code to try out your code! There is actually a surprising boost for shifting code into the Zend Engine - when Andrei Zmievski converted strlen() into a ZE statement as opposed to a function, he reported a 25% speed boost.
With such a big gain to be offered, you're probably thinking /everything/ should be put directly into the Zend Engine. However, it's important to realise that there's a big trade-off between speed and manageability, and generally modules come out top because they operate more than fast enough for most needs.
C vs PHP
To give you an idea of quite how much faster C is compared to PHP, I wrote a very simple C extension and compared it with its PHP equivalent.
Here's the PHP script:
<?php
$start = time();
for ($count = 1; $count < 1000000; ++$count) {
$j = 0;
for ($i = 0; $i < 999; ++$i) {
$j += $i;
}
}
echo "PHP time: ", time() - $start, " (number: $j)\n";
$start = time();
for ($count = 1; $count < 1000000; ++$count) {
$result = lxf_hardwork();
}
echo "C time: ", time() - $start, " (number: $result)\n";
?>
lxf_hardwork() is the module function I've written in C. Don't worry about how to create and install modules yet - we'll get to that later. For now, here's the source code to the lxf_hardwork() function:
PHP_FUNCTION(lxf_hardwork)
{
int i = 0;
int j = 0;
for (i = 0; i < 999; ++i) {
j += i;
}
RETURN_LONG(j);
}
PHP_FUNCTION and RETURN_LONG are both C macros to avoid lots of complicated code in source files, and they can be ignored for now. The rest of the code simply performs exactly the same thing as the PHP code, just in C - as you can see, the two are very similar linguistically.
Executing the PHP script first runs through two loops adding up numbers, then runs through another loop and calls our C function. This could have been optimised further by putting the outer loop into the C code also, but leaving it inside PHP allowed me to tweak the number of iterations without a recompile.
When the script is run, it outputs how long both PHP and C took to execute the loops. If you're not sitting down, I suggest you grab onto something before reading on!
PHP took a total of total of 1,956 seconds to run through the loops. The C code, in comparison, took just /five seconds/ to do exactly the same. Of course, when you consider the loop is only 999,000,000 iterations and that this is an 800MHz PIII able therefore to do 800,000,000 operations a second, five seconds sounds quite a lot. However the loop in lxf_hardwork() function compiles down to the following assembly:
.L319:
addl %eax,%edx
incl %eax
cmpl $998,%eax
jle .L319
From the label L319, add i to j, increment i by one, compare it against 998, and if it's less than or equal to 998, re-do the loop. So, there's actually four instructions in there, one of which is a jump, which is a branch instruction and therefore incurs more of a speed hit than the others. So, albeit somewhat simplified, I hope you can see that five seconds really isn't all that much - it's as fast as the computer could go!
In the example code above, we saw a 326x speed improvement when switching to C. Naturally the example is hardly from a real world piece of code, but suffice to say that converting to C is likely to give a huge performance boost no matter what you choose to do with it.
Before we begin
If you're still reading, you're hopefully all set to write your own PHP extension. Extension writing in PHP is actually fairly easy, because the PHP team have put a lot of work into making the process as streamlined and fool-proof as possible. Furthermore, as you'll discover, th Zend Engine is a remarkable piece of software that really takes much of the hard work away from programmers. You will need to have the PHP source code on your system.
For the purpose of this tutorial, we'll be creating an extension for PHP that handles tar files. To do this, our extension will use the libtar library created by Mark D. Roth, available from http://freshmeat.net/projects/libtar/. libtar is available under the BSD licence, so we're free to use it for our needs. You'll need to have the libtar development files on your system.
Just to make sure we're all reading from the same songsheet, I want to briefly discuss the tar format. TAR (short for Tape ARchive) was designed to handle tape backups, but has been in general use for quite some time. Put simply, a tar file is a concatentation of files that are not compressed. Using tar, many files become one file, which can then be compressed using gzip or bz2. Tar files by themselves are uncompressed, and approximately equal in size to the sum of the files it holds.
First steps
To get you started with a module, PHP includes "ext_skel", which creates the skeleton of an extension. To run ext_skel, go into the ext directory of the PHP source code, then type:
./ext_skel --extame=tar
ext_skel creates for you a tar.h file and a tar.c file to contain our code, tar.php to test installation of the module has worked, config.m4, which is part of PHP's automatic build system (explained later), and also a default test file for your extension.
First off, open up tar.c and browse around. My version is 183 lines, encompassing pre-written code to do all sorts of tasks common to extensions. As you can see, using ext_skel saves you quite a lot of work!
Now, onto the config.m4. This is a pretty horrifying file at first, but it does require changing at least once. The m4 file is used by PHP's buildconf script to generate the configure script so that all the modules are configured by end users in one central location. Our config.m4 file needs just one or two minor changes to get it working.
Firstly, when running configure, modules are enabled using either --enable-x or --with-x. The difference here is that the --enable-x syntax is used when no special headers or libraries are required to compile the extension, whereas --with-x is for modules that reference external files. As our tar extension requires libtar to compile, we need to use --with-tar.
To achieve this, look for the line "dnl PHP_ARG_WITH(tar, for tar support,". The "dnl" part is a comment, so this line is ignored. To enable --with-tar support, remove the dnl from this line. Then, delete the next line altogether ("dln Make sure that..."), and remove the dnl on the line after that ("dnl [ --with-tar...."). So, the lines should look like this:
PHP_ARG_WITH(tar, for tar support, [ --with-tar Include tar support])
There are a few other tweaks that need to be made to the file before we're finished with it, but it gets complicated - here's how your file should look
dnl $Id$
dnl config.m4 for extension tar
PHP_ARG_WITH(tar, for tar support,
[ --with-tar Include tar support])
if test "$PHP_TAR" != "no"; then
SEARCH_PATH="/usr/local /usr"
SEARCH_FOR="/include/libtar.h"
if test -r $PHP_TAR/; then
TAR_DIR=$PHP_TAR
else # search default path list
AC_MSG_CHECKING([for tar files in default path])
for i in $SEARCH_PATH ; do
if test -r $i/$SEARCH_FOR; then
TAR_DIR=$i
AC_MSG_RESULT(found in $i)
fi
done
fi
if test -z "$TAR_DIR"; then
AC_MSG_RESULT([not found])
AC_MSG_ERROR([Please reinstall the tar distribution])
fi
PHP_ADD_INCLUDE($TAR_DIR/include)
LIBNAME=tar
LIBSYMBOL=tar_open
PHP_CHECK_LIBRARY($LIBNAME,$LIBSYMBOL,
[
PHP_ADD_LIBRARY_WITH_PATH($LIBNAME, $TAR_DIR/lib, TAR_SHARED_LIBADD)
AC_DEFINE(HAVE_TARLIB,1,[ ])
],[
AC_MSG_ERROR([wrong tar lib version or lib not found])
],[
-L$TAR_DIR/lib -lm -ldl
])
PHP_SUBST(TAR_SHARED_LIBADD)
PHP_NEW_EXTENSION(tar, tar.c, $ext_shared)
fi
<pre>
Near the top you can see the "PHP_ARG_WITH(tar, for tar support," line. Other important lines are:
<pre>
SEARCH_FOR="/include/libtar.h"
This locates the header file required for libtar, which is libtar.h. Also, these two lines are crucial:
LIBNAME=tar LIBSYMBOL=tar_open
LIBNAME is used as part of the GCC compile line. In this case, -ltar is used. LIBSYMBOL should be set to a symbol contained in the LIBNAME library. tar_open() is a function contained in libtar, so that's what I've used for LIBSYMBOL. If you're wondering why this is important, configure actually writes out a short C program that calls the LIBSYMBOL function, then tries to compile and link that program against LIBNAME using GCC. If the compilation succeeds error free, it means the libtar.so exists and it contains the reference we're looking for, which means it's a legit copy of libtar for and not, for example, a file that is "Lopsided Igloo Bureau for Tuning All Radios". In other words, these three crucial lines all make sure the system is capable of compiling our extension.
Configure, compile, install, and test
Now we're done with config.m4 - cd back to the PHP source directory and type "./buildconf". This generates the configure script for PHP, and will include our new tar extension if all is well.
To make sure buildconf succeeded, type "./configure --help" and look for the line --with-tar. If the m4 file was good, you should see the line somewhere in there, and also "Include tar support" in the column next to it. On my screen, the "Include tar support" column is one character off to the left compared to the others. If you recall, the default m4 file had a line in there saying "Make sure that the comment is aligned" - this is what that comment was referring to. If your comment is out of alignment add or remove spaces in the config.m4 file (line five, if you've used the above m4 file) to correct it.
--with-tar is there, although the description is a character out to the left
The next step is to run:
./configure --with-tar
You may want to add other PHP extensions to your configure line if you use them, however the above is enough to test our new extension.
As the output from configure flies by, you should see the following three lines somewhere in there:
checking for tar support... yes checking for tar files in default path... found in /usr checking for tar_open in -ltar... yes
The first line signals "yes" if --with-tar was specified on the command line. If --with-tar was used, configure checks for the location of the header file we specified (libtar.h), and outputs where it found it, which is line two. The final line is our library check, and makes sure that the tar_open symbol is in libtar.so. If any of these tests fail, configure will stop with a warning, and you can read config.log to see where the problem is.
Once configure is complete, type *make* to compile PHP and the tar extension. make is likely to take quite some time, depending on the speed of your computer.
Once make has finished, cd into ./sapi/cgi - this is where the PHP CGI SAPI is placed once built, pending installation. Type ./php -m to have PHP output a list of modules available - you should see "tar" in there, probably between "standard" and "tokenizer". If so, you're successfully compiled your first PHP module!
To perform a slightly better test, cd into the PHP source directory and run these commands:
su make install exit php -f ext/tar/tar.php
tar.php was created by ext_skel and calls the function confirm_tar_compiled(), which is a default function defined in tar.h and tar.c that that simply confirms the module compiled correctly. So, if your tar module works fine, you should see the message "Congratulations! You have successfully modified ext/tar/config.m4. Module tar is now compiled into PHP."
Conclusion
Having had a special 8-page PHP tutorial last month, it just plain wasn't possible to run another long tutorial this month -- at least not without renaming the mag "PHP Format"! So, this tutorial will be continued next month.
At this point, you've got a working extension to PHP - although it doesn't do much. Next month we'll look at how to use libtar in the extension by writing a function tar_list(). If you want to create your own extension in the future, simply repeat the steps covered in this issue - next issue will be libtar specific.
Over-optimisation
There's a fine line - be sure you don't cross it!
Can optimisation ever been taken foo far? Reading through the /gcc/ man page, one can see all sorts of optimisation flags that can be passed in to theoretically make code run faster. For example, -ffast-math will "cheat" on some mathematics calls to make code run faster, whereas -funroll-loops will try to cut down the number of fixed loop iteratios by increasing the code size. However, these optimisations need to be used with great care.
Without wishing to get into to much depth -- this article is after all about PHP -- optimisation in programming is generally a trade-off between size of code and speed of code - sometimes its faster to use more CPU instructions than it is to use fewer, which results in larger executable size. However, a key exception to this rule is tight loops of code, where having more instructions inside the loop will make the CPU's instruction cache overflow causing a speed hit.
Optimising compilers such as GCC, when instructed to optimise with the -Ox flag, generally aim to achieve maximum performance with the least increase in code size. However, with certain flags being used, "optimisation" can result in substantially slower code. For example, compiling PHP with *-g -O3 -ffast-math -fomit-frame-pointer -fexpensive-optimizations* executed the first script in 51 seconds. Adding -funroll-loops to that actually makes the script take 60 seconds to execute.
Porting code from PHP to C can often give a huge performance boost to your applications, but you need to be careful - switching to C makes it much easier to shoot yourself in the foot, or, worse, shoot your whole leg off!
Special Warning
It's not often you hear me say this, but the PHP manual is /not/ the best place to check for information on writing extensions. The reason for this is because the information in the online manual is an edited version of one of the chapters from Web Application Development with PHP 4.0 (Ratschiller & Gerken, New Riders). Although WAD is a good book in itself, it's quite old - much of what is in there just doesn't apply any more. The online version available in the PHP manual has a number of edits to bring the work up to speed, but the end result is that some information is correct and some is not - read with caution!

