Multi-node testing until LAVA does it

Introduction

The optimization activities of the Linaro Enterprise Group depend on being able to benchmark to assess the impact of any work done, and for network-service type workloads (among others), tests inevitably involve more than one node (at a minimum, one to run load generation, one to run the service).

At the time of writing LAVA does not support multi-node tests (it is scheduled for 2013q3), so I have written a slightly hackish tool to allow us to run multi-node tests on calxeda nodes before LAVA gains this support (and also to try out some ways of constructing multi-node tests).

This page attempts to describe how this tool works. Please bear in mind that this is a work in progress: as I use the tool more, I am continually improving it, and am not particularly worrying about backwards compatibility. If you use the tool, please talk to me about what you are doing!

The code lives at http://bazaar.launchpad.net/~mwhudson/+junk/highbank-bench-scripts/files. The basic idea is that you pass the "fakedispatcher.py" script the URL of a bzr branch that contains a YAML file called test.yaml that describes the test (if this sounds like how lava-test-shell works, this isn't a complete coincidence).

Getting started: a simple YAML file

A simple but not completely trivial test.yaml that runs a web server on one host and accesses it from another and considers that the test passed if this succeeds would be:

metadata:
  name: example-multi-node-tests

devices:
  - role: client
    count: 1
  - role: server
    count: 1

initialization:
  "*":
     - "lava-set-up-hosts-file"
  "server":
     - "apt-get install -y apache2-mpm-worker"
     - "touch /var/www/resource"
  "client":
     - "apt-get install -y curl"

testcases:
  download-file:
    run:
      client:
        - "lava-result-from-cmd curl -O http://server01/resource"

There's quite a lot to take in here -- although hopefully the overall pattern makes some kind of sense -- so lets go bit by bit.

metadata:
  name: example-multi-node-tests

This defines the "test run id" that will be used if and when the results of running the test are uploaded to LAVA. The other metadata fields that lava-test-shell tests can put here are ignored currently.

devices:
  - role: client
    count: 1
  - role: server
    count: 1

This defines the nodes that the test will use. The idea is that you request a certain number of nodes of different "roles", and different code runs on each node based on its role (more on this below). Each node gets a 'test name' based on role and index of the form ${role_name}XX, so in this case the nodes will be named client01 and server01. If we had asked for 3 client nodes, their names would have been client01, client02 and client03. You get the idea.

initialization:
  "*":
     - "lava-set-up-hosts-file"

The initialization section defines the code that runs after nodes have booted (and optionally reinstalled) and before tests start running. Normal things to do here would be install and configure packages. Each key of this section is a glob that is matched against role names, so this "*" defines code that runs on all nodes, irrespective of role.

The lava-set-up-hosts-file script is a helper script installed by the tool on each node (there are a few of these, see below) that adds each node in the tests ip address and 'test name' to the /etc/hosts file of the node. This means that the client test code can just pass server01 as the hostname to curl.

  "server":
     - "apt-get install -y apache2-mpm-worker"
     - "touch /var/www/resource"
  "client":
     - "apt-get install -y curl"

The role-specific node initialization code is simple enough: install needed packages, create a resource for the client node to download over http in the test.

testcases:
  download-file:
    run:
      client:
        - "lava-result-from-cmd curl -O http://server01/resource"

Finally we reach the code that defines the test cases. From the outside in:

  • The keys in the testcases section define the names of the test cases.

  • The keys under the test case names are test "phases": setup, run and teardown.
  • The keys under the phase are globs that match roles.
  • Finally under the role glob is the code that runs for each phase on each node.

This code is "compiled" into a shell script that synchronizes things so the setup code completes on all nodes before the run code runs, and that that completes before the teardown code runs and that that completes before the next test cases' setup code runs.

The only line of actual test code here uses another helper script: lava-result-from-cmd executes the command that you pass it as records the test case as passing or failing dependent upon the return code of the command.

This file needs to live in a bzr branch; for this case I've put the branch at lp:~mwhudson/+junk/example-multi-node-tests

Invocation

Check out the branch lp:~mwhudson/+junk/highbank-bench-scripts. The code needs to know the name or address of some calxeda ecme nodes -- currently it just uses a fixed list in the source, so it will likely only work in the lab -- and only for one person at once!!

The simplest invocation is just "./fakedispatcher <url-of-branch>". This will run the tests described in test.yaml in the branch and create a LAVA results bundle in ./results.bundle.

MichaelHudsonDoyle/MultiNodeTesting (last modified 2013-04-18 22:54:50)