Book Image

Troubleshooting vSphere Storage

By : Mike Preston
Book Image

Troubleshooting vSphere Storage

By: Mike Preston

Overview of this book

Virtualization has created a new role within IT departments everywhere; the vSphere administrator. vSphere administrators have long been managing more than just the hypervisor, they have quickly had to adapt to become a ‘jack of all trades' in organizations. More and more tier 1 workloads are being virtualized, making the infrastructure underneath them all that more important. Due to this, along with the holistic nature of vSphere, administrators are forced to have the know-how on what to do when problems occur.This practical, easy-to-understand guide will give the vSphere administrator the knowledge and skill set they need in order to identify, troubleshoot, and solve issues that relate to storage visibility, storage performance, and storage capacity in a vSphere environment.This book will first give you the fundamental background knowledge of storage and virtualization. From there, you will explore the tools and techniques that you can use to troubleshoot common storage issues in today's data centers. You will learn the steps to take when storage seems slow, or there is limited availability of storage. The book will go over the most common storage transport such as Fibre Channel, iSCSI, and NFS, and explain what to do when you can't see your storage, where to look when your storage is experiencing performance issues, and how to react when you reach capacity. You will also learn about the tools that ESXi contains to help you with this, and how to identify key issues within the many vSphere logfiles.
Table of Contents (16 chapters)
Troubleshooting vSphere Storage
Credits
About the Author
Acknowledgment
About the Reviewers
www.PacktPub.com
Preface
Index

esxtop statistics


esxtop collects an abundance of statistics regarding all aspects of how your ESXi host and VMs are performing. Although it is possible to view metrics in regards to CPU, memory, and networking as well, I've only outlined the storage-related metrics here. The thresholds listed here are simply suggested values, thresholds that I've seen used in many whitepapers, VMware documentation, and resources in the past. There are many different reasons why a threshold could be met, so these are certainly not hard numbers.

Have a look at the following table:

Statistic

Description

Threshold

CMDS/s

Number of commands issued per second.

varies

READS/s

Number of read commands issued per second.

varies

WRITES/s

Number of write commands issued per second.

varies

MBREAD/s

Megabytes read per second.

varies

MBWRTN/s

Megabytes written per second.

varies

DAVG/cmd

Latency observed by the device driver—roundtrip latency from HBA to storage array. Sustained thresholds usually indicate a performance issue with the underlying storage.

25

KAVG/cmd

Latency observed inside the VMkernel. Value should always be very low if not 0 unless queuing is observed.

1

QAVG/cmd

Latency observed inside the queue. This is part of KAVG/cmd. Sustained values indicate an issue with queuing or queue depth.

1

GAVG/cmd

Round trip latency as observed by the guest OS. Normally a total of DAVG, KAVG, and QAVG.

25

AQLEN

The storage adapter queue length—maximum number of active commands the adapter is configured for.

n/a

LQLEN

The LUN queue depth—maximum number of active commands the LUN can have.

n/a

WQLEN

The world queue depth—maximum number of active commands the world can contain.

n/a

ACTV

The number of commands that are currently active within the VMkernel.

varies

QUED

The number of commands that are currently queued in the VMkernel waiting for processing. Sustained thresholds may indicate a need to increase queue depth or an issue with the underlying storage array.

1

%USD

The percentage of queue depth used by active commands. Normally sits close to 0 unless queuing is occurring.

1

LOAD

The total number of active and queued commands as compared to queue depth. Should always be 0 unless queuing is occurring.

1

ABRTS/s

The number of commands that have been aborted per second. Normally indicates that the underlying storage is unable to meet the demands of your workloads.

1

RESETS/s

The number of commands reset per second.

1

RESV/s

The number of SCSI reservations issued per second.

n/a

CONS/s

The number of SCSI reservations conflicts occurring per second. Sustained high values could indicate that actions need to be taken to balance the metadata heavy operations.

20