Verticals: Web Front End
The web frontend vertical covers the outward facing part of a web server system, i.e., the components that service HTTP requests that arrive over the external network.
We aim to put ourselves in the shoes of the provider of a medium-to-large web service who is looking to replace some parts of their deployment with ARM servers and aim to:
- determine which part of such a stack are most suitable for replacement with ARM server parts, and
- find and ameliorate performance bottlenecks before someone tries to do this for real.
Currently, there is no base line defined, nor is there a optimization target. This makes it very hard to put claims about performance into perspective. As LEG we need to:
- gain an understanding of the various kinds of workloads;
- gain an understanding of the performance limits of the various target platforms (cpu, memory, i/o);
- identify the expected bottlenecks for some given reference load (i.e., saturate the network pipe);
- find other bottlenecks and remove them.
This is a very high-level view on what needs to be done, but it should help us to tie each optimization to a certain workload on a certain target.
At the moment, the following platforms are targeted by LEG for optimization work.
ARMv8 Foundation Model
The ARMv8 emulator can run a full LAMP stack. This is useful for validation from functional point of view, but it is unclear whether this platform is useful for modeling system performance as well.
Calxeda Highbank 1000 (Cortex-A9)
This is a server platform developed by Calxeda. It is targeted by the armhf port of Debian/Ubuntu, which is expected to contain fairly optimal compiled code.
This is a mobile 32-bit ARM platform developed by Samsung. It is also targeted by the armhf with the side note that it is not optimal with respect to integer division. (This is a potential bottleneck for SSL processing)
This is a non-exhaustive list of imaginable workloads for a web front-end system. Only a few of them will likely be relevant, though.
Distributes incoming requests over multiple web servers and relays the responses back to the clients. I/O bound on the incoming responses by definition (if deployed correctly). No caching is involved, so the volume of network data entering the system equals the volume of data leaving the system.
LEG ref implementation: haproxy
According to the website, it is possible to saturate a 10 GBps pipe using haproxy with 20% CPU load on a 2.66 GHz Core 2 Duo. This indicates that this particular workload would be suitable for porting to ARM, but it requires server class peripherals and interconnects (jumbo frames, TCP checksumming in h/w etc). [http://haproxy.1wt.eu/10g.html]
Apache serving static content
System running an Apache instance that serves files as-is, i.e., without per-request processing. Typically I/O bound, but the bottleneck can reside either in persistent storage I/O or in network I/O.
Apache serving static content over SSL
Similar to previous, but with on-the-fly encryption. Likely CPU-bound because of the symmetric encryption of the payload, depending on the amount of data served per connection, slight likelihood that asymmetric crypto (server authentication using RSA) may have an impact as well. (Note that for the lock icon to appear in your browser, each tiny .gif or .jpeg needs to be served over SSL, and each is retrieved using a separate request)
This is a system that caches duplicate responses to equivalent requests. Intended to alleviate CPU/memory bottlenecks elsewhere, so typically I/O bound, especially when residing on the same system as the Apache instance. Cached data can reside in RAM or on persistent storage.
Reverse proxy over SSL
Similar to previous item, but likely CPU bound due to encryption.
Apache serving dynamic PHP content
In addition to static content, dynamic content is generated on-the-fly by presentation logic implemented in PHP and data from external sources (e.g., SQL db on other host) All induced load can be tied to directly to web requests. High likelihood of being CPU bound, as plain vanilla PHP scripts are parsed for every request.
Apache serving dynamic environment
Extension of the previous scenario where the line between presentation logic and business logic is blurred, e.g., webmail, web chat, etc.
Full LAMP stack
System where the primary data store, business logic, presentation logic and the web server all reside on the same system.
|/benchmarking-approach /getting-benchmarking-right /list-of-open-questions /memcached-manual-benchmarks|
LEG/Engineering/vertical-web (last modified 2013-02-18 15:47:41)