Exploring Parallelism in IO Operations

By Flux7 Labs
April 8, 2014

In our earlier posts, we have used FIO tool for benchmarking I/O on various EC2 instances. In this post we have tried to explore the effects of parallelism in I/O operation on a single EC2 instance. In other words, we were trying to find the optimum number of parallel processes which are I/O bound and which results in best I/O throughput.

[Tweet “Effects of parallelism in I/O operation on a single EC2 instance”]

For our experiments we chose m3.large as the EC2 instance. m3.large is a general purpose instance type which provides One can get the details of m3.large instance here. To run the I/O operations we used the instance storage which is a 32 GB SSD drive. We used Ubuntu Precise (12.04 ) as our Operating system. We initially started our experiments with one process and gradually increased the number of processes. We stopped our experiments after we ran our experiments with 13 parallel processes because we were seeing a gradual decrease in the I/O performance and it was evident that we had already crossed the threshold for optimum performance. As in our previous experiments, we ran four types of I/O operations. They are:

  1. Sequential Read

  2. Sequential Write

  3. Sequential Read Write mix

  4. Random Read Write mix

We ran the benchmarks 10 times and discarded the two topmost and bottommost outliers. We then calculated the average of the remaining 6 results to get our final number.

We have also calculated the geomean of the average bandwidths obtained in different I/O operations so that we get one final number which is easier to compare.

The results are quite interesting. The following chart gives bandwidth we got for different number of processes running in parallel. We got the highest bandwidth when the number of parallel processes were 9. The highest bandwidth obtained was 144,299.1027 Kbytes/sec. Although the bandwidth we obtained when the number of parallel threads were 8 was quite close.

The following tables gives the exact figures we got for different number of parallel processes.


We also measured various CPU times using the time command. We measured the times on each of the run and calculated the average for each of the I/O operations. We then calculated the geomean to get a single number for System time, User time and Elapsed (real) time. The following graph gives the results. We can see that the system time and the elapsed time increase linearly until the number of parallel processes is 8. And then we saw a decrease in the system CPU time. The user time also increased until the number of parallel processes were 8. After that we can see that it almost the same.

The table below give the details for System, User and Elapsed time for different number of parallel processes.

Considering the results we got for the bandwidth as well as analyzing the times we get an idea that using 8 parallel processes for 2 vCPUs would result in optimal use of I/O. It would also be interesting to run similar experiments for EC2 instances which offer normal disk for their instance store.