PHP for Platform Engineers

August 3, 2018 Off By kex

PHP for Platform Engineers

Contents

  1. Basic Checklist
  2. Overview of the Ecosystem
  3. History & Going Forward
  4. The PHP Interpreter
  5. Performance
    1. Caching & Opcodes
  6. Serving PHP

Basic checklist:

As Platform Engineers we don’t necersarily need to learn how to write PHP code, however when we design ways to deliver PHP code and how to troubleshoot issues that can arise we should have a good understanding of how the language works along with some of it’s quirks. In this post I will try to dump what I know about PHP, and how I approach the delivery of PHP. To get started here is a list of basic PHP facts:

  • PHP is an interpreted language
  • It is a high level language
  • It is a serverside language that can be embedded within HTML or run as it’s own process
  • There are different ways to deliver/serve PHP, which primarily depends upon your intentions for the platform
  • PHP is primarily a language used to write web applications, there are people who use PHP to do system configuration/general purpose scripting, they are bad people
  • PHP stands for PHP Hypertext Preprocessor (because everyone loves recursive acronyms right?)

Overview of Ecosystem:

The vast majority of websites on the internet as of writing this are running PHP. For this reason alone it is important that you have some idea as to how to serve PHP, what your options are and most importantly – what is the best way?

The reason for PHPs massive popularity is primarily due to the popularity of content management systems and frameworks written in PHP, it is widely considered to be one of the easiest languages to ‘get up and go’ with due to the quality of tooling. There are many fantastic tools written for PHP, a few of the more common ones are listed below with a brief description:

  • WordPress
    • The most popular Content Management System on the web
    • Can be set up and configured for most user needs without requiring code changes
    • Comprises around 30% of all websites on the internet (March 2018)
    • Highly modular, plethora of plugins available that can solve almost any problem
    • Lots of support available online/frequent updates due to high adoption rate
    • Wide range of jobs that focus on just using WordPress
  • Drupal
    • A Content Management System that some consider to be more robust/enterprise ready than WordPress
    • Is well suited to small/medium websites
    • Highly modular
  • Laravel
    • The most popular PHP framework by a massive margin
    • Has a great support network due to its popularity
    • Uses the Model – View – Controller architectural pattern (MVC)
    • Puts a focus on enabling developers to iterate quickly and easily
  • Symfony
    • A popular framework, Drupal is written using Symfony
    • Places a focus on using Open Source packages/modules in its design
    • Uses the MVC architectural pattern
    • Uses a liberal license (MIT)
  • Zend
    • Beefy enterprise framework
    • Similar to an MVC framework but doesn’t have a native implementation for the ‘Model’ component, this is usually abstracted away using modules such as Doctrine
    • Zend is considered to be more difficult to jump into for new developers than other frameworks such as Laravel due to the unique tooling
    • Has a large footprint compared to other frameworks
    • Solid support network and highly modular
  • CakePHP
    • Places an emphasis on rapid development without losing flexibility
    • Focus on MVC methodoloy
    • Includes a native ORM

History & Going Forward:

PHP was first created in 1994 and a was written as a simple set of CGI binaries written in C, it was written to track visits to websites and quickly grew as more functionality was requested by the community.

In 1995 the project was open sourced and lots more content was rapidly added to the language over the years. For a while PHP was considered to be a template language, code was embedded within a HTML document and executed server-side when the webserver was visited. Over time PHP had a rapid iteration/release cycle and evolved with the rapidly changing web environment of the late 90s/early 2000s, over this time the language core and functionality were rewritten multiple times until the release of PHP 5 in 2004.

According to the website w3techs.com, over 80% of websites are still running PHP 5, a quick glance at the CentOS/RHEL 6 repos shows that the primarily supported version is still PHP 5.3 (with security patches retroactively applied, this is what you will get if you run ‘yum install php’).

PHP version 6 was worked on around 2010, however there were a number of large issues that caused the project to fail (the primary one being an attempt to implement unicode support into the language core), as a result of this every feature that was roadmapped for the PHP6 release was back-ported into PHP 5.

PHP version 7 is the current version of PHP, it features huge performance improvements and new features to help bring PHP up to the standard of other modern web languages, features such as anonymous functions, return types, major changes to exception handling and many others, we won’t go too in depth as to what these changes are here as they are primarily focussed on a development workflow.

Looking forward we can see the PHP 8 release slowly appearing on the the horizon, one of the highly anticipated features slated for PHP 8 is JIT or ‘Just in Time’ complication. JIT Compilation runs just after the program has started and compiles the code ‘just in time’ to a lower level instruction set, this can lead to massive performance increases as we can compile the code down to the CPUs native instruction set and get rid of the overhead introduced by feeding everything through an interpreter. The idea of JIT compilation for PHP was first implemented by Facebook using a tool called HHVM, you can read more about that here. JIT is a common concept for Java developers, as the JVM will reference the compiled bytecode rather than executing on each method call.

The PHP Interpreter

The interpreter for PHP is a compiled binary, you can find the package ‘php-cli’ on most Linux distributions and run it ad-hoc to test small scripts, or just compile it from source with your favourite C compiler. The interpreter works by parsing a PHP script when it is executed, doing some operations to translate the code into a lower level representation, then executing that ‘lower level’ code. In more explicit terms:

  1. Read PHP file into memory
  2. Break down PHP code, resolve function names, remove comments, translate into a symbol table
  3. Parse symbol table and translate to opcodes
  4. Execute the opcodes

This diagram shows a more ‘realistic’ example of how the steps above are run, with many assumptions:

Interpreter Diagram

The process listed above is very inefficient, if we run through this process every time a request comes in we end up making the CPU repeat instructions and pulling/pushing from memory unencersarily. In the next section we will address how we can fix that problem, and a few other tweaks/notes about PHP performance.

Performance

Caching & Opcodes

Before we start addressing larger scale issues that affect performance we need to start with the basics, the easiest way to improve PHP performance is to use opcode caching. Since PHP has to ‘compile’ code into opcodes everytime it executes a script we can cache those opcodes to avoid repeatedly calling the CPU to process the same tasks over and over, instead we offload that task to memory, pulling the opcodes from RAM eliminates much of the computational overhead required to run scripts.

Normally we will use either opcache or APC, APC is primarily used with PHP versions before 5.5, in version 5.5 and later opcache is built into the PHP process and enabled by default. Opcache typically offers better performance than APC but the difference is small. Both caches work by loading the compiled opcodes in RAM whenever a script is run, this process is transparent to the user and will greatly increase performance of your applications.

So how does the cache fit into our previous workflow? Let’s look at the workflow: Whenever a PHP script is executed the Opcode cache is checked before anything is compiled, if the cache contains the opcodes needed then PHP will execute them directly rather than recompiling the entire script. If the cache does not contain the requested opcodes then it will cache them after it first compiles and executes the script. It is worth repeating here that in versions newer than 5.5 Opcache is included in core and enabled by default, you can configure it within the php.ini file (depending on your method of delivery, more on this later). Normally you do not need to configure Opcache and can just ‘fire and forget’, the caveat to this is shared environments, where opcache has been noted to cause some issues. If we implemented Opcache into the diagram above it might look something like this:

Opcache example

Although APC is largely considered to be deprecated as of the integration of Opcache in PHP 5.5, you will sometimes run into applications that depend upon functionality within APC that is not present with opcache. For the purposes of this guide I don’t feel it is worth going into too much detail about APC, as if you are running a version of PHP older than 5.5 then you should be upgrading that as soon as possible rather than trying to bodge a fix with APC – you can find out more about how APC actually works inĀ this blog post.

MPM

If you are using Apache to serve PHP you will need to consider which MPM you are going to use. MPM stands for ‘Multi-Processing Module’ which is a largely meaningless term without context so let’s cover the most common implementations:

MPM Prefork:

  • Does not use threads, a single process is created to handle each request that comes in
  • Is very fast to serve single requests, or lower amounts of traffic compared to alternatives. This is because the server will have processes sitting and waiting to recieve requests, the downside to this approach being that memory is allocated and not used while the processes are not doing anything
  • Is the only way to run PHP using mod_php
  • Forks the master process when requests come in to serve content as and when needed
  • Is the default way that Apache serves content
  • Isolates requests within processes
  • Many modules and libraries do not work with threading at all. If this is the case with your application then you can only use this MPM.

MPM Worker:

  • Uses threads rather than processes to serve requests
  • Able to serve more requests with lower use of system resources
  • Keeps multiple processes and threads avaliable to serve requests as and when needed
  • Attributes threads to ‘connections’ and not requests, this is a big distinction as connections have keepalives which can hang on to a thread for a long period of time unless configured specifically to not do so
  • Follows a ‘master-slave’ design, a master process is created which spawns two things:
    • Child processes than can create their own threads to serve requests
    • A listener thread which passes connections to the aforementioned child processes when requests come in
  • Can be considered more insecure as many PHP libraries are not threadsafe, this means issues with requests can affect other requests running within the same process
  • Is much less taxing on memory than prefork as threadcount has considerably less overhead than processcount

MPM Event:

  • Is functionally the same as the worker MPM except…
  • Has an additional thread spawned by the master process to handle keepalives, this means that worker threads are only served ‘requests’ and as soon as they are done processing that request they will hand it back to the ‘keepalive’ thread.
  • Creates dedicated threads for SSL connections in the same way as the Worker MPM mentioned above due to the security implications of handing off the keepalive of an SSL connection

Default Apache installations will use the MPM Prefork configuration as it is standard with mod_php, which is fine for most use cases. However if you are expecting the server to recieve large amounts of traffic I would reccomend using the Event MPM.

Versions

PHP 7 is the ‘current’ version of PHP, however statistics from w3techs suggest that over 80% of websites currently running PHP are still using PHP 5.x. There are a few reasons for this:

  1. Most enterprises will run RHEL/CentOS, these flavours of Linux place an emphasis on ‘tried and tested’ packages, as a consequence of this the main repositories for these distributions contain PHP version 5.x. There are a few ways to change this:
    1. If you’re managing your own yum repository just compile the code from source, or download an RPM and stick it on your repository server to distribute out to other machines via the package manager.
    2. Compile the desired version binary from source and distribute it with whichever config management tool you’re using in your environment.
    3. Use a pre existing package manager repository, I would recommend using Remi as he is a somewhat reputable source and keeps his packages up to date. Just download the yum repo file and put it in your ‘/etc/yum.repos.d/‘ directory, followed by ‘yum search php‘ to see the list of available versions.
  2. Many tools/packages that were written in PHP5 do not support PHP7, this is often found in smaller projects, or internally facing services that wouldn’t benefit as much from the performance upgrades introduced in PHP 7. This is also very bad news if your applications dependencies are not updated to be compatibile with PHP 7, this can quickly become a large chunk of political work if you have a large number of external dependencies.
  3. Due to PHP7 deprecating a number of features from PHP5 some code bases require more refactoring than others, this means that the amount of work required to upgrade can be hard to quantify. If you’re pushing to upgrade from 5 to 7 you will need to check with your development team how much work will need to be done.
  4. A significant number of WordPress/Drupal installations are running on platforms that the user does not control, a lot of these users are non-technical so there is little incentive for smaller web hosting companies to upgrade their backend platform if their clients do not push for it. This is because many smaller hosting providers have made money in the past by hosting specifically blogging platforms and often don’t see the business benefits to upgrading.

You can view the support matrix for PHP versions here, it’s worth considerating that PHP 5 security support ends very soon, and has already been extended beyond the normal timeframe. This poses a huge risk to a large number of organisations hosting external facing applications running versions less than 7, if you are using PHP5 to host external facing services please ensure you are implementing a path towards PHP7.

There are a number of articles available online that show the performance difference between PHP 5 and PHP 7, in every case PHP7 will outperform 5, this is in terms of throughput (or the number of transactions processed per second) and in speed (how quickly each request is processed). The main reason for these improvements are to do with the processes followed by the interpreter:

  • Lower memory usage overall
  • Caching mechanism changed to also make use of on disk caching of opcodes
  • Code syntax changes allow programmers to be more terse, this means there is less for the interpreter to compile and less to do when a script is first run

Serving PHP

Ok, so at this point we can move onto more important stuff, how do we stick the code on a server and get it to work?

First of all we need to look at the different delivery options for PHP, we only have two real options:

  • mod_php: The most popular way of serving PHP. This is a module for the Apache webserver which will load the PHP interpreter into each Apache process and pass scripts to it for execution.
  • FastCGI: A binary protocol based off the Common Gateway Interface (but faster!). This protocol was designed to allow web servers to execute external applications rather than just serve static assets. FastCGI can bind to either a local socket or a TCP port, this means you can run your PHP interpreter on a different machine to your webserver if you so wish. This setup is usually the only option to serve PHP unless you’re using Apache, but does also work with Apache using the mod_proxy_fcgi module.

Breaking down what we’ve already covered we will need a few things:

  1. PHP code
  2. The php interpreter
  3. A Linux server
  4. A web server

Let’s look at a couple of scenarios, one where we have a fairly bog standard VPS, and one where we’re impleting a containerised solution.

Getting Started:

Before we start working we need to pick a webserver, here we will look at two of the biggest players in webserver game nginx and Apache. To see the bigger differences between the two web servers check out my post here. In relation to PHP projects we have a few things to consider:

As mentioned above the usual way to serve PHP is with mod_php and Apache. Let’s look at some positives and negatives of doing it this way:

  • PHP does not work well with multithreaded environments (PHP7 does not natively support multithreading, and likely never will. There has been much debate over the years as to wether the ‘threadsafe’ version of PHP is actually threadsafe or not, which is something worth considering. You can read more about that here)
  • As the PHP process is embedded within the Apache process you can end up with some very large, long running process by default which is not ideal as we expose ourselves to memory leaks. Since the process is embedded within the Apache process you will find that under large load the Linux kernels Out Of Memory Killer will happily wipe out these processes.
  • Apache will dynamically load content from the files on the server even if they are changed, this can lead to issues if the opcache contains ‘older’ cached opcodes and cause strange errors. We can reload/restart the apache process to wipe out the opcache to fix this issue, however this can significantly slow down response times while the opcache is repopulated.
  • Is simple to setup and run