Troubleshooting CentOS

As a matter of principle, most people will tend to suggest that all system information can be categorized as either hardware-or-software based. This approach certainly serves to simplify things, but throughout the course of this chapter I will go some way to infer that there are instances in which the interplay of both (hardware and software) can be the reason for the issues at hand.

So, before you begin troubleshooting a system, always consider that the need gathering information about a system is the recommended approach to gaining additional insight and familiarity. Look at it this way: the practice of gathering hardware information is not necessarily required, but an investigation of this type may assist you in the search for an eventual diagnosis.

To begin, we will start by running a simple CPU-based hardware report with the following command:

# cat /proc/cpuinfo

As you will see, the purpose of this command is to output all information related to the CPU model, family, architecture, the cache, and much more. The /proc approach is always a good tradition, but using the following command is generally considered to be a better practice and far easier to use:

# lscpu

This command will query the system and output all relevant information associated with the CPU in the following manner:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
...

On the other hand, rather than querying absolutely everything, you can specify criteria by using grep (a subject that we will return to a little later in this chapter) in order to obtain any pertinent information, like this:

# lscpu | grep op-mode

So, having done this and recorded the results for future reference, we will now continue our investigation by running a simple hardware report with the lspci command in the following way:

# lspci

The result of this command may output something similar to the following information:

00:00.0 Host bridge: Intel Corporation 82P965/G965 Memory Controller Hub (rev 02)
00:01.0 PCI bridge: Intel Corporation 82G35 Express PCI Express Root Port (rev 02)
00:05.0 Ethernet controller: Red Hat, Inc Virtio network device
00:0a.0 PCI bridge: Digital Equipment Corporation DECchip 21150
00:0e.0 RAM memory: Red Hat, Inc Virtio memory balloon
00:1d.0 USB controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI #1 (rev 02)
00:1d.7 USB controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801HB/HR (ICH8/R) LPC Interface Controller (rev 02)

The lspci command provides all the relevant information concerning the PCI devices of your server, which in turn, can be expanded by employing either the -v option or the alternative -vv / -vvv option(s), depending on the level of detail you require:

# lspci -v
# lspci -vv
# lspci -vvv

By default, the above commands will provide all the information required by you to confirm whether a device is supported by any of the modules currently installed on your system or not. It is expected that you should only need to do this when hardware upgrades have been implemented, when the system has just been installed, or if you are attempting to familiarize yourself with a new environment. However, in order to simplify this exercise even further, you will be glad to know that a "tree view mode" is also available. The purpose of this facility is to output the associated device ID and show how these values are associated with the relevant bus.

To do this, type the following command:

# lspci -t

As a troubleshooter, you will be aware that every device must maintain a unique identifier as CentOS, like all other operating systems, will use that identifier to bind a driver to that device. The lspci command works by scanning the /sys tree for all connected devices, which can also include the connection port, the device type, and class, to name but a few. Having done this, the lspci command will then consult /usr/share/hwdata/pci.ids to provide the human-readable entries it displays.

For example, you can display the kernel drivers/modules by typing the following lspci command with the -k option like this:

# lspci -k

Naturally, during any hardware-based troubleshooting investigation you will want to review the system logs for additional clues, but as we have seen, both the lscpu and lspci commands are particularly useful when attempting to discover more about the necessary hardware information present on your system.

You can learn more about these commands by reviewing the respective on-board manuals at any time:

$ man lscpu
$ man lspci

Meanwhile, if you want to practice more, a simple test would be to insert a USB thumb drive and to analyze the findings yourself by paying close attention to the enumeration found within /var/log/messages.

Note

Remember, if you do try this, you are looking at how the system reacted once the USB drive was inserted; you are not necessarily looking at the USB drive itself; the information about which can be obtained with lsusb.

On the other hand, in the same way that we can use grep with lscpu, if you are already feeling comfortable with this type of investigation, then you may like to know that you can also use grep with the lspci command to discover more about your RAID controller in the following way:

# lspci | grep -i raid

Now, I am sure you will not be surprised to learn that there are many more commands associated with obtaining hardware information. This includes (but is not limited to) lsmod, dmidecode hdparm, df -h, or even lsblk and the many others that will be mentioned throughout the course of this book. All of them are useful, but for those who do not want to commit them to memory, a significant amount of information can be found by simply reading the files found within the /proc and /sys directories like this:

# find /proc | less
# find /sys | less

Consequently, and before we move on, you should now be aware that when you are dealing with hardware analysis, perfecting your skills is about practice and exposure to a server over the long term. My reason for stating this is based on the notion that a simple installation procedure can serve to identify these problems almost immediately, but without that luxury, and as time goes by, it is possible that the hardware will need replacing or servicing. RAID Battery packs will fail, memory modules will fail, and, on some occasions, it could be that a particular driver has not fully loaded during the most recent reboot. In this situation, you may find that the kernel is flooding the system with random messages to such an extent that it suggests an entirely different issue is causing the problem. So yes, hardware troubleshooting requires a good measure of patience and observation, and it is for this reason that a quick review of both the lscpu and lspci commands has formed our introduction to troubleshooting CentOS 7.

Troubleshooting CentOS

By : Jonathan Hobson

Troubleshooting CentOS

By: Jonathan Hobson

Overview of this book

Related Content you might be interested in

Current Title:

Troubleshooting CentOS

Gathering hardware information

Note