Thursday, January 7, 2010

Nginx Configuration and Popular Web Applications

The Pandion website runs on a Slicehost VPS with just 256 MB RAM. A few weeks ago I moved it from Apache2+mod_php to Nginx+fcgi_php+xcache. The result has been a drastic improvement in performance and stable memory usage.

Why Apache Cannot Scale

Apache handles each connection in a separate thread. But too many threads in a process slows down performance, so Apache spawns more processes. Each process however uses a lot of memory as it includes all enabled Apache modules including the PHP interpreter. This adds up to 2-5 MB per process and sometimes over a hundred processes may be running. At least once a month Apache would require so many processes that the server ran out of free memory and simply crashed. The primitive solution would be to add memory, but that costs money and only works to a point. The elegant solution lies in a more modern networking API.

Why Nginx Can Scale

Nginx uses epoll which can handle huge numbers of connections within a single process by using asynchronous I/O capabilities in the Linux 2.6 kernel. Slicehost VPS has two dual core Opteron CPUs so I run four Nginx worker processes, four fcgi_php workers and four MySQL workers. The server is now happily humming along at a stable 195-205 MB memory usage. The only fluctuation is from caching in MySQL and Xcache/fchi_php. CPU load is nigh constant at a ridiculous 0.00. Awesome!

DIY Configuration

One downside of Nginx is that there are no official, off-the-shelf configuration scripts for Drupal, phpMyAdmin, phpBB and other popular web applications. Luckily Nginx has a configuration scripting language that is much more sane than Apache's. So I created my own Nginx configurations for each of these tools.

Nginx Build Settings

The VPS runs Ubuntu 9.10 (Karmic Koala) which offers Nginx 0.7.62 through apt-get. Because Nginx is still in beta and development is extremely active I chose to build it by myself from the latest (as of this writing) 0.8.31 source code. I won't go over this process as there are many good tutorials available. But for reference, these were the build arguments:

./configure \
 --prefix=/usr \
 --sbin-path=/usr/sbin/nginx \
 --conf-path=/etc/nginx/nginx.conf \
 --pid-path=/var/run/nginx.pid \
 --lock-path=/var/lock/nginx.lock \
 --error-log-path=/var/log/nginx/error.log \
 --http-log-path=/var/log/nginx/access.log \
 --http-client-body-temp-path=/var/lib/nginx/body \
 --http-proxy-temp-path=/var/lib/nginx/proxy \
 --http-fastcgi-temp-path=/var/lib/nginx/fastcgi \
 --user=www-data \
 --group=www-data \
 --with-http_gzip_static_module \
 --with-http_stub_status_module \
 --without-select_module \
 --without-poll_module \
 --with-cpu-opt=opteron \
 --with-md5-asm \
 --with-sha1-asm \
 --with-zlib-asm=pentiumpro \
 --add-module=/home/sebastiaan/agentzh-headers-more-nginx-module-db9913e

Notice that the last line includes a custom module called headers-more. Nginx currently does not offer very good built-in control over HTTP output headers. I wanted to strip various headers from the HTTP response (Server, X-Powered-By, Date, Expires, Last-Modified) and use the Cache-Control header to fine-tune proxy server and web browser behaviour.

Drupal

Many Drupal sites already run on Nginx. In fact even Acquia, the Drupal consulting firm by Drupal founder Dries Buytaert, provides high performance Nginx-based Drupal hosting. Because there are many ways to use PHP with Nginx it's not easy to create a 100% compatible configuration script for Drupal. But this configuration works for me:

server {
 listen 80;
 # Strip "www." from the URL and redirect the old domain
 server_name www.pandion.im pandion.be www.pandion.be;
 rewrite ^/(.*) $scheme://pandion.im/$1 permanent;
}

server {
 listen 80;
 server_name pandion.im;
 root /home/sebastiaan/sites/pandion.im;
 index index.php index.xml index.html;

 # Serve static file
 try_files $uri @drupal;

 # Clean URLs for Drupal
 location @drupal {
  rewrite ^/(.*)$ /index.php?q=$1 last;
 }

 # Serve PHP scripts
 location ~ .php$ {
  # Prevent caching of admin panel
  if ($query_string ~ q=admin) {
   more_set_headers -s 200 "Cache-Control: no-cache, no-store, max-age=0";
  }
  # Forward to FastCGI daemon
  fastcgi_index index.php;
  include fastcgi_params;
  fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
  fastcgi_pass 127.0.0.1:10005;
 }
}

phpMyAdmin

This is my Nginx port of the default Apache .htaccess file used by phpMyAdmin. It enables PHP, prevents caching and implements security measures to prevent access to unauthorised resources.

server {
 listen 80;
 server_name phpmyadmin.pandion.im;
 root /usr/share/phpmyadmin;
 index index.php;

 # Serve PHP scripts
 location ~ .php$ {
  # Prevent caching
  more_set_headers -s 200 "Cache-Control: no-cache, no-store, max-age=0";
  # Forward to FastCGI daemon
  fastcgi_index index.php;
  include fastcgi_params;
  fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
  fastcgi_pass 127.0.0.1:10005;
 }

 # Authorize for setup
 location /setup {
  auth_basic "phpMyAdmin Setup";
  auth_basic_user_file /etc/phpmyadmin/htpasswd.setup;
 }

 # Disallow web access to directories that don't need it
 location /libraries {
  deny all;
 }
 location /setup/lib {
  deny all;
 }
}

phpBB

The archived forums of Pandion are locked and running the outdated phpBB 2. I did not test this with phpBB 3 so caveat emptor.

server {
 listen 80;
 server_name forums.pandion.im;
 root /home/sebastiaan/sites/forums.pandion.im;
 index index.php index.htm;

 # Serve PHP scripts
 location ~ .php$ {
  # Prevent caching of dynamic content
  more_set_headers -s 200 "Cache-Control: no-cache, no-store, max-age=0";
  # Forward to FastCGI daemon
  fastcgi_index index.php;
  include fastcgi_params;
  fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
  fastcgi_pass 127.0.0.1:10005;
 }

 # Security
 location /cache {
  deny all;
 }
}

CoralCDN

The free (as in freedom and beer!) Coral Content Distribution Network is used to deliver the Pandion installer. CoralCDN automatically caches the file on any of hundreds of webservers distributed globally. Users automatically download from the Coral server nearest to them.

This configuration is ported from the official Apache rewrite rules. Requests for the file are forwarded to CoralCDN by appending ".nyud.net" to the hostname. Seeding requests from CoralCDN servers, identified by the user agent header, should be handled by Nginx. If CoralCDN cannot handle the request (eg. bandwidth limit reached) it will bounce the request back to us with "coral-no-serve" appended to the query string, and Nginx should serve the file directly to the client.

# CoralCDN caching of the download package
location = /pandion_setup.msi {
 if ($http_user_agent ~ ^CoralWebPrx) {
  break;
 }
 if ($query_string ~ (^|&)coral-no-serve$) {
  break;
 }
 rewrite ^/(.*) $scheme://$host.nyud.net/$1? redirect;
}

1 comment:

Mark B said...

Have you updated the CoralCDN configuration to work with Nginx 1.8?