MST-AG
Last updated
Last updated
A customer of an L2 service may want redundant connections to two PEs in a single site. This creates the possibility of a loop. Traditionally STP is used to allow for redundant L2 connections but block some of the ports to produce a loop-free topology in which BUM traffic will not cause a flooding storm.
Let’s take a simple VPLS example. Below is a single site that connects to two PEs:
We cannot simply run STP, because the two PEs are not directly connected. Additionally, BPDUs are not transported across pseudowires. So this allows for the possibility of looping in the VPLS network. Imagine that a CE at another site sends a BUM frame. PE1 on the left and PE2 will receive it and send it to the access devices. These will flood the traffic back to the PEs (BUM frame from PE1 floods back to PE2), which will again flood the traffic out the pseudowires to other PEs. We need one link in this access network to be blocked to break this loop.
One solution could be for the PEs to run STP and have a direct connection between them, or have a virtual connection over a pseudowire which transports BPDUs. This would break the loop, but it requires the extra pseudowire just for BPDU propagation, and it also requires a lot of state on the PEs if they have a lot of access domains they connect to. The PEs would have to maintain STP states for every single VPLS domain. This solution isn’t very scalable.
Another solution could be for the PEs to tunnel the BPDUs without participating in STP. They could tunnel the access network’s BPDUs between each other over a special pseudowire. This would result in a loop free topology, but if a PE-CE link goes down, failover takes a full 6 seconds (for RSTP/MST) because 3 hellos need to be missed. This is because the PEs aren’t participating in the protocol, just transporting the BPDUs transparently. Additionally, this means that after a topology change, traffic can be blackholed for up to 5 minutes since the PEs are not flushing the MAC table in response to a TCN.
The best solution is called MST Access Gateway. The PEs simply send statically configured BPDUs every hello period into the access network. Both PEs are configured with an identical virtual root bridge. This is scalable because the PEs are not actually participating in MST. They do not need to run a MST state machine. The only additional functionality they need is to respond to received TCNs by flushing the MAC table and withdrawing MACs. The PEs send MAC withdrawls within the VPLS domain to the remote PEs in response to a received TCN.
Above, each PE statically sends a BPDU reporting connectivity of 0 cost to a root bridge with priority 0 and a bridge ID of 0. This can never be beaten. Each PE has a different priority, which can be configured per-instance, so that a different topology is used for each instance in the access network.
This setup forces the access network to block a link. The PEs will never block a link. Their PE-CE links will always be DP, since the PEs are closest to the virtual root bridge.
PEs can only enable MST-AG on physical interfaces or untagged subinterfaces. The MST protocol only allows for sending untagged BPDUs.
Failover time with MST-AG is not sub 50msec. A link failure results in about 100msec outage, and a node failure can be 2-3 seconds outage (I believe with Hello set to 1 second). MST-AG protects against failure of a link in the access network, or a PE-CE link, and failure of a PE or access node. It can also protect against a PE losing connectivity to the core (which is mentioned later and requires extra configuration).
In this setup, there is no special PE-PE pseudowire or link. However, if a topology change happens in the access network and the access network is partitioned, the PE that receives the TCN may have to propagate the TCN to the other PE which is now partitioned from the access network. This can be done by creating a VPWS and adding the untagged interface participating in MST-AG to the VPWS (shown later).
Before we take a look at the configuration for MST-AG, let’s review MST. MST is a standards-based variation of STP that is based on RSTP. A group of switches in the same MST region share the following configuration:
The same region name
The same revision number (administratively configured)
The same vlan-to-instance mapping (determined by an MD5 hash that is sent in the BPDU)
MST BPDUs are sent untagged and contain the root bridge/priorities for each instance. VLANs are manually mapped to instances and this mapping is statically set, and must be the same on all bridges in the region. By having multiple instances, this allows you to use a different root for each instance, so that all links are used in the topology. This is a form of load balancing.
The internal MST instance is instance 0. This is used to speak SPT with bridges outside of the MST region. The entire MST region is presented as if its a single switch to the outside world. This can get complex, but I don’t think this would come up in the CCIE-SP exam, since it seems outside of the scope of MST-AG.
Guidelines for configuring MST-AG:
Both PE devices should have a port path cost of 0
This is what the doc says, but I don’t see how to configure this - lowest internal cost option is 1
One PE should have a higher bridge priority and ID than the other. This allows you to control which redudant link is blocked (when there is only a single CE).
One PE should have bridge priority 0
One PE should have bridge priority 4096
All access devices should have priority greater than or equal to 8192
To configure MST-AG TCN propagation, you simply put the AC in a VPWS with the other PE. TCNs will be flooded out the VPWS.
If a PE loses connectivity to the core, it should stop sending BPDUs indicating it is directly connected to the virtual root. You can configure this as follows. All core interfaces that are defined must go down for the router to consider the tracked object down.
Above, if both Gi0/0/0/1 and Gi0/0/0/2 go down, the router will start sending “startup BPDUs”. These indicate a worse priority than under normal circumstances. You can set the startup root priority/bridge ID as follows:
When the links come back up, the router will continue sending these startup BPDUs for the preempt delay period (10 seconds above). This is also used to delay sending BPDUs when the access circuit comes up as well. (When the AC first comes up, startup BPDUs are sent for the preempt delay period).
If the PEs have a L3 VLAN mixed with other L2 VLANs, the L3 VLANs must be in their own instance, in which the PEs are set as “edge-mode.”
When the PEs terminate the L2 domain as a L3 router, the layer 2 loop is broken. (The PE will not continue to flood a L2 BUM frame when it has an L3 IP address on that VLAN). The PEs typically participate in a gateway redundancy mechanism, such as VRRP. The problem is that if you leave the L3 VLAN in the same instances as other L2 VLANs, one link in the access network will be blocking, and the PEs will not see each other over VRRP.
Edge-mode configures the PE to stop listening for TCNs, and also advertise the worst possible path to the worst possible root. (It’s not clear whether this happens automatically or you need to manually configure high priorities for this instance). Some device in the access network ends up becoming the root, and no links in the access network will be blocking. (Unless there is a loop within the access network itself).
Below, the L2 VLANs are in their own MSTI, in which the PEs report connectivity to a virtual root. The L3 VLANs use a different MSTI, and the access device at the bottom becomes the root, and no links are blocked.
This runs MSTP in 802.1ad mode, in which a different MAC address is used, and BPDUs with 802.1Q MACs are forwarded transparently.
PVST-AG involves normal per-vlan spanning-tree but with access gateway. This is very similar to MST-AG. The router sends static BPDUs on a per-VLAN basis which report connectivity to a virtual root.
One difference with PVST-AG compared to MST-AG is that topology change propagation is not supported in PVST-AG. Also, TCNs received on a single VLAN will affect all VLANs and BDs on that physical interface.
Additionally, only a single access device (CE) can be attached to the PEs in PVST-AG. This is because TCN propagation is not possible. TCNs can arrive with any vlan tag in PVST, so you can’t simply put all subinterfaces into a VPWS to the remote PE. For this reason, you cannot have more than one CE, as that creates the possibility of having a partitioned access network with no TCN propagation capability.
Configuration can be quite intensive, as each VLAN needs to be manually defined. Also, Q-in-Q subinterfaces are not supported. Only single-tagged dot1q interfaces are allowed. Physical interfaces and L2 interfaces with encap default are not allowed.