Skip to content

Latest commit

 

History

History
3 lines (2 loc) · 1.43 KB

IOProfiling.md

File metadata and controls

3 lines (2 loc) · 1.43 KB

Section 10: I/O Profiling

Another important factor to consider when profiling bioinformatics tools and pipelines is Input/Output (I/O) throughput. The user can include the "-v" option with /usr/bin/time utility that is built-in with Hummingbird to additionally measure the total number of bytes written that is currently not available on Hummingbird. The I/O overhead can depend on the type of disk (persistent or non-persistent) used with the cloud instance such as a standard hard disk drive (HDD) or solid-state drive (SSD) available in the fault tolerant architecture offered by the cloud platform. Furthermore, using a local SSD can further increase the I/O performance due to lower latency. Optimization of disk types can be done by utilizing metrics such as the Input/Output Operations per Second (IOPS). Other factors that can impede the I/O performance include I/O block size, network egress limits, and number of vCPUs in the cloud instance (https://cloud.google.com/compute/docs/disks, https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/disk-performance.html, https://azure.microsoft.com/en-us/pricing/details/managed-disks/). Understanding the above factors limiting I/O throughput requires benchmarking studies to be performed and is out of scope for the current version of our framework. Therefore, the user needs to check with the cloud provider on the different disk types and other resources available that help maximize the I/O throughput.