Benchmarking: Disk Bandwidth analysis of i instances | Part 2 FIO

By Flux7 Labs
February 11, 2014

In a previous post we discussed CoreMark, an industry standard for benchmarking CPU performance. In this post we’ll run IO benchmarks on i instances using the Flexible IO (FIO) tool.

Disk Bandwidth Benchmarking Through FIO

FIO is a tool used for benchmarking and stress testing IO. It’s a generic tool that can be used to test different types of drives, IO operations and IO engines. FIO is an open source tool that can be downloaded from You can install it after compiling the source.

Different types of workloads (termed “jobs”) can be submitted to FIO in simple text format. Detailed descriptions of each parameter used to describe an FIO job can be found here:

For benchmarking, we chose to test instance stores with the 4 types of workloads that are the most common disc IO workloads for most applications.

  • Sequential Read

  • Sequential Write

  • Sequential Read-write mix

  • Random Read-write mix

The job descriptor file that we used for our testing is shown below:

In this file, two jobs are defined and named ssd-1 and ssd-2. Every job inherits the parameters defined in the global section. Here is a short description of each parameter:

  • rw → Type of IO operation (

    • possible values:

  • read → sequential read

  • write → sequential write

  • randread → random read

  • randwrite → random write

  • readwrite/rw → sequential read write mix

  • randrw → random read write mix

  • size → Size of the file used for testing.

  • ioengine → Type of IO engine used.

  • iodepth → Number of IO units to keep in flight against the file.

  • bs → Block size.

  • numjobs → Number of job clones to run in parallel.

  • log_avg_msec → Average of each log entry over a specific period of time. This will prevent resolution of the log, but is important for preventing the log file from growing too large since, by default, FIO will log every disc operation.

  • write_bw_log → If used, writes a bandwidth log of jobs in the file.

  • write_lat_log → Same as write_bw_log, except that this stores IO submission, completion, and total latencies.

  • write_iops_log → Same as write_bw_log, but writes IOPS.

  • group_reporting → Used for writing a final report per-group instead of per-job. It’s particularly useful when the “numjobs” option is used.

The job description above actually produces 8 FIO processes, which run in parallel, i.e., 4 clones each of ssd-1 and ssd-2.

We initially ran FIO benchmarking as a single process on different AWS instances. However, we soon realized that running FIO as a single process won’t indicate differences in disc IO performance between various instance types. So then we used a multi-process approach whereby the processes were evenly distributed across the different drive mounts offered by the instances. For example, an i2.large instance has a single SSD mount and 4 VCPUs, so we ran 4 FIO process on that drive. In contrast, an i2.4xlarge has 2 SSD mounts with 8 VCPUs, so we ran 8 FIO processes with 4 processes operating on two disc mounts.

In a methodology similar to that used for our Coremark benchmark, we ran the benchmark 10 times and discarded the 2 fastest and slowest results. Of the remaining 6 results, we calculated an average disc IO bandwidth. We also calculated the standard deviation of all 10 results. Since the benchmarking was done on 4 different disc operations, we calculated the geomean of all averages in order to obtain a single number in order to make it easier to compare the instances. A summary of the results is as follows:


The highest IO bandwidth came from i2.4xlarge, but the highest bandwidth-per-price was found in i2.xlarge, which justifies the latter’s price. The i2.8xlarge showed a dip in performance and we’ll find out why in future experiments.

Even though i2.xlarge, offering a single 800 GB SSD drive, exhibits the best performance-per-dollar, its overall performance is quite low for today’s big data requirements. That makes it unsuitable for many applications. What follows is another comparison of IO performance in terms of latency.

i2.2xlarge shows the lowest latency compared to the other instance types, while hs1.8xlarge, which does not offer an SSD drive, exhibits the highest. Did you read the post on CPU Performance on i instances?

Did you find this useful?  

Interested in getting tips, best practices and commentary delivered regularly? Click the button below to sign up for our blog and set your topic and frequency preferences.