CFM
Load cfm.init.cfg
Set Eth16 and Eth17 on the switch as follows:
An EVPN-VPWS service is already configured between CSR5 and CSR6.
Configure CFM as follows:
The CE devices should be in a domain called CUSTOMER at level 7
CSR5 should be MEP ID 5 and CSR6 should be MEP ID 6
Use port mode for the CUSTOMER domain and use service number 123
CSR1 and CSR3 should be in a domain called PROVIDER at level 4
CSR1 should be MEP ID 101 and CSR3 should be MEP ID 103
Use EVC mode for CSR1 and CSR3 with service number 56
When CSR5 and CSR6 do an ethernet traceroute to each other, both CSR1 and CSR3 should reply and appear in the trace
Use an continuity interval of 1 seconds for all domains
Answer
Explanation
CFM is defiend in 802.1ag. It is a standardized way to monitor continunity and isolate faults on a layer 2 Ethernet circuit by using layer-3-like tools such as a ping and traceroute. CFM operates end-to-end (UNI-to-UNI), unlike other Ethernet OAM tools such as Ethernet LMI which only operates on a single UNI link (CE-PE).
In CFM, layer 2 domains are bounded by two end devices at the same level. The broadest level must have the highest number, and the most narrow level must have the lowest number. In this lab, the customer domain is the broadest and has level 7. The operator domain is nested within this, so it has a small level number - level 4. Each level is defined by exactly two maintenance endpoints (MEPs) which bound the domain. Optionally a domain can also have maintenance intermediate points (MIPs) which can respond to CFM pings/traceroutes.
Once the domains are defined, the maintenance association (MA) is configured, which defines the service within the domain. This allows you to have multiple services that are monitored within the same domain. The MA can be defined by a string, number, VLAN ID, or VPN ID. In this lab we use a number to define each MA.
Finally we must understand the idea of directionality in CFM. A MEP has a direction - either up or down. Down is used on the CE device, and means that the CFM message is sent directly out the wire. Up is used on the PE device, and means that the CFM message is bridged internally “upwards.” In VPWS, the message is then transmitted to the remote PE on the other end of the VPWS service.
Let’s examine the configuration for the CUSTOMER domain. First we must enable CFM globally. We then define the CFM domain and give it a level. Within the domain we can have multiple MAs, each defined by a service number/string/VLAN ID/VPN ID. For the CUSTOMER domain we use a MA with number 123. Then we need to set the MA mode as either port or EVC. This is port-based because these are CE devices. Port-based mode implicitly uses the down direction. Then we enable the continunity-check and set an interval. By default it is 10 seconds.
Lastly, we enable CFM on the physical interface. We give CSR5 a MPID of 5 and CSR6 a MPID of 6.
We can confirm that it is working by checking the remote MEPs on the CEs. CSR5 sees CSR6 as a remote MEP.
This MAC address matches the MAC on CSR6 Gi1. When a MEP operates in the down direction, the physical MAC is used for CFM.
Continunity check messages (CCMs) run at a 1 second interval. These are unidirectional messages, and there is no reply. A remote MEP is considered down if 3x CCMs are missed. Since CCMs are unidirectional, this means that the stream of CCMs from the remote endpoint has stopped. So if we shut Gi2 down on CSR6, after 3 seconds we should see an error logged on CSR5:
Let’s enable Gi2 again. Here’s what the continunity check messages look like. It is essentially a multicasted heartbeat message, which includes the level, domain name, and maintenance ID (number 123 in this case).
Note that MAC addresses here and in subsequent pcap screenshots are different, as these pcaps were taken while running the lab at a different time
We can also manually run ping and traceroute operations. First let’s check that R5 can “ping” R6 at layer two. We can either use the MAC address of R6, or just the MEP ID of R6 (which is usually easier). These messages are unicast to the L2 address of the MEP, so the MEP must be discovered first via CCMs before you can use the MPID.
A CFM “ping” uses loopback messages and loopback replies.
Next we’ll enable CFM for the PROVIDER domain. This is very similar to what we did for the CEs, except we must use EVC mode. We then need to associate the EVC with the service instance. We must remove the service instance and then re-add it with the EVC. This also requires us to re-add the service instance to the VPWS.
Within the PROVIDER domain, CSR1 and CSR3 see each other as MEPs. Since the level is 4, they will transparently pass CFM messages at any level higher. (Not at the same level (4), but any level 5 or above). This allows the CE CFM messages to still work end-to-end.
When using CFM in EVC mode, the direction is implicitly up. This uses a bridging function, so the PEs do not use their physical MAC. Instead they use a “bridge-brain MAC” which is essentially a virtual MAC that represents the bridging function.
The PEs also need additional configuration to act as a maintenance intermediate point (MIP) for the CUSTOMER domain at level 7. This allows the PE to respond to loopback (ping) and link trace (traceroute) messages.
Now CSR5 can trace to CSR6 and see the PEs in the trace.
Here is what the CFM traceroute looks like. The LTM is sent to a multicast address, and each node responds via a unicast LTR.
CSR5 can also individually ping the PEs using their brain MAC, but not their MPID, as their MPID is only relevant to their PROVIDER domain. (The CEs do not discover the PEs via their MPID in the CUSTOMER domain).
The CFM traceroute allows us to easily isolate a fault. For example, let’s bring down Gi2 on CSR3. A trace will only get as far as CSR1, which means that the end-to-end VPWS service is not available. From CSR5’s perspective, the issue is within the service provider domain, not the customer domain.
The customer can raise a support case with the service provider. The service provider can confirm that the issue is within their domain, as CSR1 cannot ping CSR3 via CFM. From here, the provider can use tools such as MPLS OAM to further isolate the issue.
Further Reading
Last updated