Notes for Ceph Analysis

Pre-requisite libraries

  • boost
  • jerasure
  • jerasure-gf-complete
  • rocksdb
  • tcmalloc

Ceph QA test suite

Ceph QA test suite results available in the analysis of the Ceph QA failures.

Optimisation Opportunities

  • The crc32c instruction could be used to speed up Ceph. (https://github.com/ceph/ceph/pull/3604).

  • Ceph has LTTNG https://lttng.org/ tracepoints built in. These should be exploited to ascertain performance characteristics.

  • The Ceph pre-requisite libraries should be analysed for individual hotspots.

Quick(ish) Start from source code

Checkout and build

  1. git clone --recursive https://github.com/ceph/ceph.git

  2. cd ceph

  3. git checkout v0.91

  4. ./autogen.sh

  5. ./configure

  6. make -j6

wait a (long) while...

Start a test instance

  1. cd ./src

  2. CEPH_NUM_MON=1 CEPH_NUM_OSD=3 CEPH_NUM_MDS=1 ./vstart -n

This will fire up a test cluster with 1xmonitor, 3xOSDs and 1xMDS.

A warning about btrfs

/!\ /!\ /!\ As of 3.18-rc2, btrfs does not support mounting filesystems where the sector size is different to the PAGE_SIZE. When one creates a btrfs filesystem, the sector size is set by default to be the currently running PAGE_SIZE. This obviously means that on arm64, switching between 4K and 64K pages can throw up mounting problems with btrfs. There is an rfc to fix this issue: http://comments.gmane.org/gmane.comp.file-systems.btrfs/38808.

Mounting the Ceph Filesystem

/!\ Warning the kernel mode Ceph filesystem driver has been found to be unstable. I managed to completely crash my desktop system with the Ceph kernel modules. I thoroughly recommend the use of fuse instead, this is outlined below.

  • Create /etc/ceph/ceph.conf on the client machine:

    [mon.a]
            host = {monitor-hostname}
            mon addr = {monitor-ip-address}:6789
  • Copy the monitor keyring /etc/ceph/ceph.client.admin.keyring over to the client machine.

  • Mount the filesystem as follows:

    ceph-fuse --debug_client 10 --debug_ms 1 --log-to-stderr -k /root/ceph.client.admin.keyring /mnt/ceph/
    (I've kept the debug flags as they may be useful).

Quick Start (old instructions)

These instructions adapted from http://eu.ceph.com/docs/wip-4600-bobtail/start/quick-start/ and http://radialmind.blogspot.co.uk/2014/06/running-ceph-on-standard-ubuntu-1404.html:

  • Install Ceph

    sudo apt-get install ceph ceph-mds ceph-deploy
  • Create the following configuration file /etc/ceph/ceph.conf, substituting in hostname and ip-address:

    [osd]
            osd journal size = 1000
            filestore xattr use omap = true
    
            # Execute $ hostname to retrieve the name of your host,
            # and replace {hostname} with the name of your host.
            # For the monitor, replace {ip-address} with the IP
            # address of your host.
    
    [mon.a]
    
            host = {hostname}
            mon addr = {ip-address}:6789
    
    [osd.0]
            host = {hostname}
    
    [mds.a]
            host = {hostname}
  • Create empty directories for ceph-fs:

    sudo mkdir /var/lib/ceph/osd/ceph-0 \
        /var/lib/ceph/mon/ceph-a \
        /var/lib/ceph/mds/ceph-a
  • Create the ceph-fs:

    cd /etc/ceph && \
    sudo mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring
  • Start the Ceph cluster:

    sudo service ceph start

Some extra tweaks are needed to make the Ceph cluster operational, these are only needed once:

  • Create the osd-0 data:

    ceph-osd -i 0
  • Set the repliction to one node:

    sudo ceph osd pool set data size 1
    sudo ceph osd pool set metadata size 1
    sudo ceph osd pool set rbd size 1
  • We should then get a healthy Ceph cluster

    sudo ceph --status
        cluster 08020104-5d23-4d29-bc3c-1e8a35d5fd58
         health HEALTH_OK
         monmap e1: 1 mons at {a=10.1.99.172:6789/0}, election epoch 1, quorum 0 a
         mdsmap e4: 1/1/1 up {0=a=up:active}
         osdmap e8: 1 osds: 1 up, 1 in
          pgmap v16: 192 pgs, 3 pools, 1884 bytes data, 20 objects
                1082 MB used, 99308 MB / 102400 MB avail
                     192 active+clean

Some blog posts

Some bedtime reading

LEG/Engineering/Storage/Ceph/ceph-analysis (last modified 2015-07-22 09:10:59)