ACI uses inter-fabric messaging (IFM) to communicate between the different nodes. IFM uses TCP packets, which are secured by 1024-bit SSL encryption, and the keys are stored on secure storage. The Cisco Manufacturing Certificate Authority (CMCA) signs the keys.
Issues with IFM can prevent fabric nodes communicating and from joining the fabric. We will cover this in greater depth in the SSL Troubleshooting recipe in Chapter 9, Troubleshooting ACI, but we can look at the output of the checks on a healthy system:
apic1# netstat -ant | grep :12 tcp 0 0 10.0.0.1:12151 0.0.0.0:* LISTEN tcp 0 0 10.0.0.1:12215 0.0.0.0:* LISTEN tcp 0 0 10.0.0.1:12471 0.0.0.0:* LISTEN tcp 0 0 10.0.0.1:12279 0.0.0.0:* LISTEN <truncated> tcp 0 0 10.0.0.1:12567 10.0.248.29:49187 ESTABLISHED tcp 0 0 10.0.0.1:12343 10.0.248.30:45965 ESTABLISHED tcp 0 0 10.0.0.1:12343 10.0.248.31:47784 ESTABLISHED tcp 0 0 10.0.0.1:12343 10.0.248.29:49942 ESTABLISHED tcp 0 0 10.0.0.1:12343 10.0.248.30:42946 ESTABLISHED tcp 0 0 10.0.0.1:50820 10.0.248.31:12439 ESTABLISHED apic1# openssl s_client -state -connect 10.0.0.1:12151 CONNECTED(00000003) SSL_connect:before/connect initialization SSL_connect:SSLv2/v3 write client hello A SSL_connect:SSLv3 read server hello A depth=1 O = Cisco Systems, CN = Cisco Manufacturing CA verify error:num=19:self signed certificate in certificate chain verify return:0 SSL_connect:SSLv3 read server certificate A SSL_connect:SSLv3 read server key exchange A SSL_connect:SSLv3 read server certificate request A SSL_connect:SSLv3 read server done A SSL_connect:SSLv3 write client certificate A SSL_connect:SSLv3 write client key exchange A SSL_connect:SSLv3 write change cipher spec A SSL_connect:SSLv3 write finished A SSL_connect:SSLv3 flush data SSL3 alert read:fatal:handshake failure SSL_connect:failed in SSLv3 read server session ticket A 139682023904936:error:14094410:SSL routines:SSL3_READ_BYTES:sslv3 alert handshake failure:s3_pkt.c:1300:SSL alert number 40 139682023904936:error:140790E5:SSL routines:SSL23_WRITE:ssl handshake failure:s23_lib.c:177: --- Certificate chain 0 s:/CN=serialNumber=PID:APIC-SERVER-L1 SN:TEP-1-1, CN=TEP-1-1 i:/O=Cisco Systems/CN=Cisco Manufacturing CA 1 s:/O=Cisco Systems/CN=Cisco Manufacturing CA i:/O=Cisco Systems/CN=Cisco Manufacturing CA --- Server certificate -----BEGIN CERTIFICATE----- <runcated> -----END CERTIFICATE----- subject=/CN=serialNumber=PID:APIC-SERVER-L1 SN:TEP-1-1, CN=TEP-1-1 issuer=/O=Cisco Systems/CN=Cisco Manufacturing CA --- No client certificate CA names sent --- SSL handshake has read 2171 bytes and written 210 bytes --- New, TLSv1/SSLv3, Cipher is DHE-RSA-AES256-GCM-SHA384 Server public key is 2048 bit Secure Renegotiation IS supported Compression: zlib compression Expansion: NONE SSL-Session: Protocol : TLSv1.2 Cipher : DHE-RSA-AES256-GCM-SHA384 Session-ID: Session-ID-ctx: Master-Key: 419BF5E19D0A02AA0D40BDF380E8E959A4F27371A87EFAD1B Key-Arg : None PSK identity: None PSK identity hint: None SRP username: None Compression: 1 (zlib compression) Start Time: 1481059783 Timeout : 300 (sec) Verify return code: 19 (self signed certificate in certificate chain) --- apic1#
IFM is essential in the success of the discovery process. A fabric node is only considered active when the APIC and the node can exchange heartbeats through IFM. Going forward, though, we still need IFM once we have active nodes, as it is also used by the APIC to push policies to the fabric leaf nodes.
The fabric discovery process has three stages and uses IFM, LLDP (Link Layer Discovery Protocol), DHCP (Dynamic Host Configuration Protocol), and TEPs (tunnel endpoints):
- Stage 1: A second discovery brings in any spines connected to initial "seed" leaf.
- Stage 2: The leaf node that is directly connected to APIC is discovered.
- Stage 3: In this stage, we have the discovery of other leaf nodes and other APICs in the cluster.
The process can be visualized as follows:
Figure 9
The node can transition through a number of different states during the discovery process:
- Unknown: Node discovered but no node ID policy configured
- Undiscovered: Node ID configured but not yet discovered
- Discovering: Node discovered but no IP address assigned
- Unsupported: Node is not a supported model
- Disabled: Node has been decommissioned
- Inactive: No IP connectivity
- Active: Node is active
Using the acidiag fnvread
command, you can see the current state. In the following command output, the leaf node is in the unknown
state (note that I have removed the final column in the output, which was LastUpdMsg
, the value of which was 0
):
apic1# acidiag fnvread
ID Pod ID Name Serial Number IP Address Role State
---------------------------------------------------------------------
0 0 TEP-1-101 0.0.0.0 unknown unknown
Total 1 nodes
apic1#
During fabric registration and initialization, a port may transition to an out-of-service state. In this state, the only traffic permitted is DHCP and CDP or LLDP. There can be a number of reasons why we would transition to this state, but these are generally due to human error, such as cabling or LLDP not being enabled; again, these are covered in the Layer-2 troubleshooting recipe in Chapter 9, Troubleshooting ACI.
There are a couple of ways in which we can check the health of our controllers and nodes. We can use the CLI to check LLDP (show lldp neighbors
), or we can use the GUI (System
| Controllers
| Node
| Cluster as Seen by Node
):
Figure 10
This shows us the APIC, and we can look at our leaf nodes from the Fabric
menu. In the code output from acidiag fnvread
, we saw a node named TEP-1-101
. This is a leaf node, as we can see from the GUI (Fabric
| Inventory
| Fabric Membership
):
Figure 11
We will look at the GUI in the next section.