Home Rook/Ceph Background Research
Post
Cancel

Rook/Ceph Background Research

Rook Implementation Research Notes

Videos

Tutorials

Blog Posts

Questions

  • How can I set up encryption at rest in Ceph?
  • How do I craft a backup strategy for ceph?
  • Can backup be managed at the Ceph level?

Fun tools I found along the way

Benchmarking

Benchmark disks that will eventually make up the OSDs

IOPS Benchmark

First step is to get a baseline understanding of the disk’s IOPS.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
BLOCK_DEVICE=/dev/sdl

# Get a baseline on the disk's IOPS
fio --filename=${BLOCK_DEVICE} \
    --direct=1 \
    --fsync=1 \
    --rw=randwrite \
    --bs=4k \
    --numjobs=1 \
    --iodepth=1 \
    --runtime=60 \
    --time_based \
    --group_reporting \
    --name=4k-sync-write-test

Take Note of the IOPS returned. Track repeated benchmarks with the same model of drive over and over to get a good understanding of how the drive is going to perform, noting any variance between the tests. It’s important to ensure that the variance of the IOPS of the disks in the system is not wildly different between disks.

Throughput Benchmark

Next step is to run a throughput benchmark on each of the drives. The key metric in this test is the Bandwidth (BW) metric. Repeat this test for each drive.

1
2
3
4
5
6
7
8
9
10
11
12
13
BLOCK_DEVICE=/dev/sdl

fio --filename=${BLOCK_DEVICE} \
    --direct=1 \
    --fsync=1 \
    --rw=write \
    --bs=1M \
    --numjobs=1 \
    --iodepth=1 \
    --runtime=60 \
    --time_based \
    --group_reporting \
    --name=1M--sync-throughput-test

Benchmark the network that our storage cluster will communicate over.

Make sure the network bandwidth is where it should be. Ensure retransmission rate is not high and congestion window is not too small.

Select a node to be the server in this test.

Install iperf3 on client and server

1
sudo apt install -y iperf3

On the Server

First we are going to configure iperf3 on the server to listen on port 5201.

1
2
3
SERVER_IP_ADDRESS=192.168.200.2

iperf3 -B ${SERVER_IP_ADDRESS} -s

On the Client

Next we need to configure iperf3 on the client to connet to the server. Note that we are explictly telling iperf3 which interface we are using to perform the test. This is important because the ceph cluster may have multiple interfaces which may be configured for frontend/backend traffic.

1
2
3
4
5
6
7
8
SERVER_IP_ADDRESS=192.168.200.2
CLIENT_IP_ADDRESS=192.168.200.3

# Test from client ==> server
iperf3 -B ${CLIENT_IP_ADDRESS} -c ${SERVER_IP_ADDRESS}

# Test from server ==> client
iperf3 -B ${CLIENT_IP_ADDRESS} -c ${SERVER_IP_ADDRESS} -R

There are two metrics we are interested in looking at.

  • Retransmission: the number of times the sender needed to resend the packet before it was actually accepted. Smaller numbers are better.
  • Congestion Window: the amount of data that can be sent without having to get an ACk sent back. Larger numbers are better, ideally around 1 MB.

Repeat this test on all nodes and interfaces that will be part of the Ceph cluster.

Get an idea of any packet errors or drops.

1
netstat -i | column -t

This will give us an overview of the following:

  • RX-OK: Received packets that were OK.
  • RX-ERR: Any packets received that errored.
  • RX-DRP: Any packets that were dropped.
  • TX-OK: Sent packets that were OK.
  • TX-ERR: Any packets sent that errored.
  • TX-DRP: Any packets sent that dropped.

It’s good to monitor these statistics over time to see if the network is seeing any kind of drops or issues.

Perform a ping test to get a feel for latency between the nodes.

1
2
3
SERVER_IP_ADDRESS=192.168.200.2

ping ${SERVER_IP_ADDRESS}

Ceph performance hinges on network latency being as low as possible.

Tuning the system for low latency

In this section we want to disable some of the deeper c-states (idle states) and tune the CPU’s p-states for peak performance and getting peak boost clocks.

Using the tuned tool we an set in place multiple, meaningful tunings by enabling specific tuning profiles. For Ceph, we want to use the network-latency profile.

Make sure BIOS isn’t going to clobber your settings

Using tuned to tune the system

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# List tuning profiles
tuned-adm list

# take a look at latency-performance configurations
egrep -v '^$|^\#|\[' /usr/lib/tuned/latency-performance/tuned.conf

# take a look at the network-latency configurations
egrep -v '^$|^\#|\[' /usr/lib/tuned/network-latency/tuned.conf

# set network-latency as the active profile.
tuned-adm profile network-latency

# confirm network-latency profile is the active profile
cat /etc/tuned/active_profile

Verify the tuning profile has been applied

Make sure after applying the tuned profile and restarting the machine that all of the CPU cores are 100% in the C1 state and not in any of the deeper c-states which cause additional latency, reduce the impact of power management, reduce task migrations and reduce the amount of outstanding dirty pages kept in memory.

These changes will result in a higher power draw!

1
turbostat sleep 5
This post is licensed under CC BY 4.0 by the author.