Introduction
esxtop is a great tool for performance analysis of all types. However, with only latency and throughput statistics, esxtop will not provide the full picture of the storage profile. Furthermore, esxtop only provides latency numbers for Fibre Channel and iSCSI storage. Latency analysis of NFS traffic is not possible with esxtop.
Since ESX 3.5, VMware has provided a tool specifically for profiling storage: vscsiStats. vscsiStats collects and reports counters on storage activity. Its data is collected at the virtual SCSI device level in the kernel. This means that results are reported per VMDK (or RDM) irrespective of the underlying storage protocol. The following data are reported in histogram form:
- IO size
- Seek distance
- Outstanding IOs
- Latency (in microseconds)
- More!
Running vscsiStats
vscsiStats collection and analysis requires two steps:
- Start statistics collection.
- View accrued statistics.
Documentation on command-line parameters are available when running '/usr/lib/vmware/bin/vscsiStats -h'.
Starting and Stopping vscsiStats Collection
The tool is started with the following command:
/usr/lib/vmware/bin/vscsiStats -s -w <world_group_id>
This command starts the process that will accrue statistics. The world group ID must be set to a running virtual machine. The running VMs' IDs can be obtained by running '/usr/lib/vmware/bin/vscsiStats -l'.
After about 30 minutes vscsiStats will stop running. If the analysis is needed for a longer period, the start command should be repeated above in this window. That will defer the timeout and termination by another 30 minutes.
Since results are accrued and reported out in summary, the histograms will include data since collection was started. To reset all counters to zero, run '/usr/lib/vmware/bin/vscsiStats -r'.
Viewing Statistics
Counters are displayed by using the following command:
/usr/lib/vmware/bin/vscsiStats -p <histo_type> [-c]
The histogram type is used to specify either all of the statistics or one group of them. Options include all, ioLength, seekDistance, outstandingIOs, latency, interarrival.
Results can be produced in a more compact comma-delimited list by adding the optional "-c" above.
Using vscsiStats Results
Use Case 1: Identifying Sequential IO
Storage arrays can process sequential IO much faster than random IO. You can therefore improve the performance of a sequential workload by placing it on a dedicated LUN to allow the array to optimize access. vscsiStats can help you identify your sequential workloads even if you don't understand anything about the application in the VM.
Take the following graph as example, which I generated by running '/usr/lib/vmware/bin/vscsiStats -p seekDistance':
This graph shows that most of the commands are being issued a great distance from the previous command. It looks like all of the commands were 50,000 or more logical blocks away from the previous command. When I looked at the raw data, I saw that over 99% of the commands were more than 128 blocks away from the previous command. That's random access if I've ever seen it. Here's the opposite example:
In this case the logical block number (LBN) of each command is most frequently exactly one larger than the previous command. That's the signature of a heavily sequential workload. It shouldn't surprise you to learn that both of these profiles were generated by Iometer using random and sequential writes, respectively.
Use Case 2: Optimizing for IO Sizes
The IO size is an important characteristic of storage profiles. A variety of best practices have been provided by storage vendors to enable customers to tune their storage to a particular IO size. As an example, it may make sense to optimize an array's stripe size to its average IO size. vscsiStats can provide a histogram of IO sizes to help this process. The following graph was generated by '/usr/lib/vmware/bin/vscsiStats -p ioLength':
From these results I can see that about a quarter of the commands came in IOs smaller than 4k. About half of the commands were sized to 4k commands. The minute number of remaining IOs were larger than 4k. This signature is common of a VMDK formatted to 4k blocks and supporting OS and application execution. The storage array should be optimized for 4k blocks if this disk's performance is a priority.
Use Case 3: Storage Latency Analysis (Including NFS!)
esxtop is a terrific tool for latency-based storage analysis. Fibre Channel and iSCSI HBAs have device and kernel latencies in esxtop's storage panel. Software iSCSI initiators will show up as vmhba32 (ESX 3.5 and earlier) and vmhba33 (ESX 4.0 and later.) But esxtop does not provide latency statistics for NFS stores.
Because vscsiStats collects its results where the guest interacts with the hypervisor, it is unaware of the storage implementation. Latency statistics can be collected for all storage configurations with this tool.
The above graph shows that the server in my office with a single direct-attached SCSI disk is performing as I would expect. About half of all the operations are completing in under 5 ms. The other half take 5-15 ms to complete. A few commands took longer than 15 ms, but the number is so small that it doesn't concern me. Similar results can be seen with NFS arrays.
vscsiStats on ESXi
vscsiStats can be installed on ESXi hosts after putting the host into tech support mode. More information on this process is availalble on Scott's blog on the subject on vPivot.
Additional Resources
My colleagues Ajay Gulati, Chethan Kumar, and Irfan Ahmad presented at VPACT 09 Storage Workload Characterization and Consolidation in Virtualized Enviornments. This paper serves as an excellent example of vscsiStats in action.
I learned vscsiStats by reviewing Irfan's VMworld 2007 presentation (vscsiStats: Fast and Easy Disk Workload Characterization on VMware ESX Server) and playing with the tool. Check out his presentation if you'd like more detail.