MC-LAG

MC-LAG allows a single CE to be multi-homed to two separate PEs at layer 2 using LACP. The two PEs use the same LACP sys ID, so that they appear as a single device to the CE. The two PEs communicate via ICCP (interchassis communication protocol) to determine which PE should be active and which PE should be standby. The standby PE signals the member link down via LACP using standby mode.

In mLACP (multichassis LACP), the CE is the dual-homed device (DHD) and the PEs are the point of attachment (POA).

The POAs must have a better system priority (lower value) than the DHD so that the POAs determine which link is active or standby, instead of the DHD. It seems that using MC-LAG allows only one active POA at a given time.

Failure Scenarios

The ports or link between the DHD and active POA fail
The active POA fails altogether
The active POA loses connectivity to the core

How the failover occurs is determined by the switchover type (explained later).

ICCP

MC-LAG requires the use of ICCP so that the PEs can sync state between each other and detect failure events. ICCP uses an LDP session between the two PEs. The PEs can be directly connected or multi-hop away from each other.

Below is the config for two PEs that are directly connected.

#PE1
int Gi0/0/0/0
 ip add 10.0.0.1/30
!
int Lo0
 ip add 1.1.1.1/32
!
router static add ipv4 uni
 2.2.2.2/32 10.0.0.2
!
mpls ldp
 router-id 1.1.1.1
 discovery targeted-hello accept
 int Gi0/0/0/0
!
redundancy iccp
 group 1
  member neighbor 2.2.2.2
  mlacp system mac 0000.1111.2222       ! Same on all POAs
  mlacp system priority 1               ! Same on all POAs
  mlacp node 1                          ! Unique per device in the group
  backbone interface Gi0/0/0/1
  backbone interface Gi0/0/0/2          ! If all backbone interfaces go down,
                                        ! a switchover occurs

Bundle Config

Once ICCP is setup, you can configure multichassis bundles.

int BE100
 mac-address 0001.0001.0001            ! Must be the same on both PEs
 bundle wait-while <msec>
 lacp switchover suppress-flaps <msec>
 mlacp iccp-group 1
 mlacp port-priority <value>           ! The PE you want to be active should
                                       ! have a lower priority value

The wait-while timer is how long the router allows LACP to try to negotiate a working link. Once the timer expires, it seems that the link is moved to standby. Essentially, this timer seems to be how long the port waits after receiving LACP information before the link is attached to the bundle.

The suppress-flaps timer defines how long to allow a link to be down without bringing down the bundle. This supresses a flap as long as the link comes back up within the specified time. The config guide says to make this value larger than the wait-while timer. In the example, the flap suppression timer is recommeded to be 300 ms and wait-while is recommended to be 100 ms.

Coupled vs. Decoupled Modes

Coupled mode means that the pseudowire status is coupled to the status of the bundle. If the bundle is standby, the pseudowire is standby. If the bundle is active, the pseudowire is active.

This allows you to have two-way pseudowire redundancy (meaning each end has redundancy), which results in four total pseudowires, with three in standby and one in active. A pseudowire is only active if both sides advertise active.

Decoupled mode means that the status of the bundle does not affect the status of the pseudowire. This is the default case for VPLS on ASR9000.

Forcing switchover

You can force a switchover by using this command on the active PE:

mlacp switchover Bundle-Ether 100

The bundle status on this PE will now read “mLACP hot standby”

Note that to be able to use this, you must be in the default non-revertive behavior mode. (This is explained later). Otherwise, the primary PE would simply take over again immediately.

Handling a down ICCP peer

If the standby PE is lost during normal operation, the bundle will continue to operate. But if the active PE now reboots, when ICCP starts up it will not find a peer. The bundle will stay down indefinitely.

To prevent this situation, you can set a timeout value for the ICCP connection. The bundle will be enabled once the timeout has elapsed.

redundancy iccp
 group 1
  mlacp connect timeout 120

Hot vs Cold Standby

Typically we should see “mLACP hot standby” instead of “mLACP cold standby.” Hot standby means that the POA can take over without a flap if the active router goes down. Cold standby means that the link is down, and a failover event will result in a flap. This is due to missing config on the POA such as “lacp switchover suppress-flaps.”

Switchover Types

When a switchover occurs (standby POA becomes active), the method can be done two ways:

The standby POA can decrease its port priority so that the DHD chooses those ports as best
The standby POA use a “brute force” mechanism, in which the standby POA stops running LACP on the links, to ensure they are not selected

There are also two fallback behaviors:

revertive - the bundle has a primary and a secondary POA. If the secondary POA is active and the primary comes back, the primary becomes active again. The bundle “reverts” back to the primary.
non-revertive - there is no primary or secondary POA. If a switchover occurs and the previously active POA comes back, there is no switchback. This is usually recommended because it causes the least amount of churn.

The switchover types are as follows:

Default
- Dynamic priority management with non-revertive behavior
brute-force
- brute force mechanism with revertive behavior
- configured using int BE 100 mlacp switchover type brute-force
revertive
- Dynamic priority management with revertive behavior
- The primary POA is determined by the POA that has the lower priority number configured with the port-priority command. If the priorities match, the POA with the lower mLACP node ID is primary.
- configured using int BE 100 mlacp switchover type revertive

Note that if the DHD has a better LACP system priority, the port priorites set by the POAs are ignored. So the only mechanism that can be used in this case is brute force. An alarm is raised by the system when this happens.

Testing failover

You can simply shutdown the BE interface to force a switchover, however the LACP states can no longer be monitored.

int BE100
 shut

A better way is to use the “bundle shutdown” command. This keeps links in LACP standby mode. This can only be used with dynamic priority management (not brute-force) in either revertive or non-revertive mode.

int BE100
 bundle shutdown

Split Brain Scenarios

If the ICCP link goes down, both POAs consider themselves active and bring the LACP links up. The only way to deal with this is to set the max number of active links on the DHD side. For example, if using one link to each PE, the DHD should set maximum-active links to 1.

NAKs

When the PEs communicate over the ICCP link, sync messages are used to ensure the objects such as the bundle are in-sync. A NAK is used if there is an issue, such as a clashing node ID, or a different bundle number is used on each PE. When this happens, a resync is requested. If there is still an issue detected with the incoming sync because the problem cannot be resolved with a resync, the message is NAKd.

When a message is NAKd, the object referred to in the message is disabled. So for example, the bundle would be disabled for LACP. To re-enable the object again, there must be some change in its config that causes a new Config TLV to be sent which resets the NAKd state.

Note on Node ID: The node ID must be unique, because it is used in a formula to produce the LACP port number. The port numbers cannot be the same on each POA, otherwise the DHD would believe that two of its interfaces are connecting to the same port on the remote device.

Syslog messages

MLACP_CANNOT_SWITCHOVER: Could not perform mLACP switchover/switchback requested by user for bundle <name>: <reason>

A switchover cannot be performed because the mLACP peer is down, the bundle is not active on the node that is being switched from, or the switchover behavior config is not the default (non-revertive).

MLACP_CONNECT_FAILED: Failed to connect to another mLACP device in ICCP Group <id>. Reason: <reason>

The peer might not be configured, or there might be a version mismatch between the two devices.

MLACP_SYSTEM_ID_ARBITRATION: The system ID for ICCP group <id> has been established by arbitration

There is a misconfig or a different mLACP sys ID used for each peer in the ICCP group. One of the values must be chosen for the bundle to operate, so the system chooses one. If there is a switchover, the bundle must flap because the sys ID will change.

MLACP_BUNDLE_MAC_ARBITRATION: The MAC address for <bundle name> has been selected through arbitration.

Same as above but for the mLACP sys MAC.

MLACP_CORE_ISOLATION: <bundle name> marked as isolated due to not being able to connect to the core.

All tracked interfaces are down, so the bundle is isolated and will switchover to the standby POA.

ICCP-SM (Service Multihoming)

With ICCP service multihoming, the CE device uses two separate bundle interfaces, one to each PE, or no bundling at all. The CE configures all VLANs active on all bundles/links. The POAs manually distribute the VLANs across the two bundles/links. For each VLAN, one POA is active and one POA is standby (not forwaring).

The CE device initially floods traffic on both links, but will only see incoming traffic on one link, and therefore learn the MAC address on that link. If a POA fails, the other POA actives the standby VLANs, and a MAC flush is sent to the CE to force it to re-learn MACs on the new link. This uses STP TCN by default, or optionally MVRP.

ICCP-SM does not require a single dual-homed device (DHD). Instead you can use a dual-homed network (DHN) in which different devices in the access network connect to the PEs. This is because LACP is only used between each PE the directly connected CE. mLACP is not used. You can also forgo LACP altogether.

Note that because this requires MAC learning/bridging, this feature is only supported in VPLS.

The advantage of ICCP-SM (also called “Psuedo mLACP”) is that you can use a DHN instead of DHD, and do per-VLAN active/standby redundancy which results in both links passing traffic, instead of per-link active/standby which results in one unused link.

l2vpn
 redundancy group iccp 1
  multi-homing node-id 1
  mac-flush stp-tcn               ! STP-TCN is the default, mvpn is the other option
  int be1
   primary vlan 1-10              ! The oppposite would be configured on the other PE
   secondary vlan 11-20

MCEC on IOS

IOS calls MC-LAG MCEC (Multi-chassis Etherchannel). There are only a few devices with the behavior of MCEC on IOS:

The default mode is revertive, while on IOS-XR is it non-revertive.
The peer is monitored using ICRM (interchassis redundancy manager), which can use BFD
The interchassis group is defined under the redundancy config section
A portchannel is associated with the MCEC using the command interchassis group id under the portchannel config
The PE needs the command status peer topology dual-homed under a pw-class which is applied to the xconnect neighbor