FRR Link Protection (XE, BFD)
Load mpls.te.frr.init.cfg
A TE tunnel using the path CSR8-CSR9-CSR10-CSR5 is setup on CSR8. Configure FRR link protection so that if the CSR9-CSR10 link goes down on CSR9, traffic is rerouted around this failure and stitched back together on CSR10. Use BFD to detect the failure.
Answer
Explanation
MPLS-TE FRR (fast reroute) allows for sub-50 msec convergence times during link or node failures in the network. Without this, the TE path would have to be explicitly torn down, the IGP would signal the change, which updates the TED, and then the headend would calculate a new path via CSPF, and set it up via RSVP. This would cause a noticeable, although brief, outage to drop-sensitive traffic.
Using FRR, backup tunnels are precalculated. The backup tunnels are applied to the interface of the protected link, or the interface facing the protected node. When this link goes down, the local router immediately switches traffic for the protected LSP onto the backup LSP.
In link protection, the backup tunnel terminates on the NHOP itself (the router on the other end of the link). The PLR (point of local repair) learns the NHOP’s label for the primary LSP simply because the PLR is directly connected to the NHOP. The PLR pushes two labels when protecting the primary LSP: the top label is the label for the repair tunnel, and the second label is the NHOP’s label for the primary LSP. The penultimate hop of the repair tunnel will naturally preform PHP, exposing the NHOP’s local label for the primary LSP. Thanks to the global label space, this label can arrive via any interface, which allows FRR to function properly.
In node protection, the backup tunnel terminates on the NNHOP, which is the NHOP’s nexthop. The PLR learns the NNHOP’s label for the primary LSP via the label record option in the RSVP RESV message. We will see this in action in the next lab.
In both link and node protection, link failure is either detected physically, or via RSVP Hello. If relying on the RSVP Hello, you will likely want to use BFD. This allows you to quickly reconverge. Without BFD or RSVP Hellos, you would need to wait about two minutes for RSVP to detect the failure by default, due to the LSP refresh failing. Note that RSVP Hellos are not used by default. To use them you need to activate RSVP Hellos globally, and at the link level:
If you wish to use BFD, you simply add the bfd keyword to the end of both commands.
FRR uses three main steps:
The headend must request FRR protection on the LSP.
This allows some LSPs to use FRR and some not to use FRR, which lets you to protect only important LSPs in case your backup paths run over links that would not have enough bandwidth to protect all LSPs.
When you activte FRR protection on the LSP, the RSVP PATH message contains two flags: Local protection requested, and Label recording requested. Label recording is only necessary for node protection, however it always appears when requesting FRR protection on the LSP.
The PLR must configure a backup tunnel and apply it to the protected link.
If using link protection, the backup tunnel terminates on the NHOP, and should use an explicit-path avoiding the link.
If using node protection, the backup tunnel termiantes on the NNHOP, and should use an explicit-path avoiding the node.
The PLR must detect the failure, either physically, via RSVP Hellos, or via BFD
RSVP Hellos are not sent by default. You must activate them explicitly. You can optionally use BFD with RSVP, which we have done in this lab.
Once the PLR detects failure it does a few things:
Protects the primary LSP by switching data traffic onto the backup tunnel
Informs the headend that a fault has occured, but that the tunnel is undergoing protection using a PathErr message
The headend knows not to tear down the tunnel in this case. Instead it can try to reoptimize the tunnel and preform make-before-break so that there is no outage.
Changes the primary tunnel’s SENDER_TEMPLATE in the PATH message used for refreshing to its own IPv4 address. This forces the NHOP’s RESV messages to arrive back at the PLR, and allows the primary LSP to continue being refreshed over the backup tunnel.
If the primary LSP cannot find an alternate path, it will remain signaled over the backup path indefinitely.
Verification
First, CSR8 must request FRR protection on the TE tunnel. We can verify this at CSR8 itself:
We can also see at CSR8 that the record route option has already recorded each hop’s label (in parathesis):
Next, CSR9 and CSR10 will run BFD, with FRR as the registered client. They must set RSVP hellos to use BFD. Additionally, CSR9 must create the backup tunnel to CSR10 that excludes the CSR9-CSR10 link. The command mpls traffic-eng backup-path tun0 instructs CSR9 to protect any LSPs traversing the link over the backup tunnel0 if the link goes down.
The backup tunnel uses CSR9-XRv12-CSR10:
We can see that BFD is up and the registered client is TE/FRR:
We can also see that tun0 is ready to protect the CSR8 TE tunnel if interface Gi2.590 goes down:
Interestingly, we can also see that protection is available from the headend. This is included in the RRO:
Another verification command we can use on CSR9 is shown below. Currently there is one protected LSP, zero LSPs active (undergoing backup protection), and zero interfaces currently undergoing backup protection.
If the interface goes down, CSR9 will push two labels and send to XRv12. It will push XRv12’s TE label for the backup path, and push CSR10’s TE label for the primary path. XRv12 will pop the top label, leaving only CSR10’s primary TE label on the packet as CSR10 receives it.
CSR9 is the point of local repair, or PLR. CSR10 is the merge point, or MP. This is where the backup tunnel merges back onto the primary tunnel.
CSR10’s TE label (or FRR label) is 16.
The topmost label is 24002 as seen in the LFIB:
To test this out, shut gi2.590 on CSR10. CSR9 should show that the backup tunnel is now “active”
On CSR8, we see that “reroute is pending.” CSR8 knows about the failure, but knows that the path is protected, so it can leave the LSP up and preform make-before-break.
A few seconds later this dissapears, and we see “path option 1 reoptimization in progress”:
A few seconds afterwards, R8 finds a new path. We see in the tunnel History section that the last error was from R9 (PLR) that there is a PathErr but the tunnel is locally repaired:
On R9, the protection is now gone. This is because CSR8 has calculated and signaled a new path, which no longer uses Gi2.590 on CSR9.
If CSR8 could not signal an alternate path, the backup tun0 would protect the primary tunnel indefinitely. The path signaling for the primary tunnel would continue to ride over the backup tunnel.
Last updated