SR-TE PCE Redundancy
Load isis.sr.vpnv4.and.te.enabled.cfg
Configure R9 and R10 as a PCE and R1 as a PCC.
On R1, configure the PCEP keepalive/hold timers as 15 seconds/45 seconds. R10 should be the primary PCE, and R9 should be used as backup. R1 should invalidate delegated paths after 30 seconds when all PCEP sessions are lost.
On R1, configure a policy that uses color 10 and endpoint 7.7.7.1 with a dynamic path. This should be PCE-computed.
Answer
Explanation
It is always recommended to have PCE redundancy. If the headends are relying on a PCE for inter-domain paths, or disjoint paths, then loss of the PCE could result in loss of end-to-end LSPs. It is quite simple to add PCEs to the network to achieve redundancy. We will see in this lab that the PCC automatically provides sync for all PCEs.
A PCEP session uses a keepalive/dead timer that is similar to BGP in nature. The default value is a 30 second keepalive and 120 second dead timer. However, unlike BGP, the keepalive isn’t negotiated. Both the keepalive and dead value are signaled, and can be asymmetric between the peers. The following commands change the timers:
The PCE can also set the keepalive interval, but not the dead interval. The PCE’s dead interval is always 120 seconds. The PCE also has a minimum peer keepalive configuration, which protects against keepalives that are sent too quickly. By default this is 20 seconds. Without changing this to at minimum 15 seconds, the session with R1 will never come up in this lab.
In addition, similar to BGP, the PCC uses a sort of “nexthop tracking.” If the address of the PCE is removed from the RIB, the PCEP session is immediately brought down.
When all PCEs are lost, candidate-paths that use PCEP are brought down after the delegation-timeout expires. This is by default 60 seconds on the PCC, and can be changed with the following command. If you use a timeout of 0, any PCE-calculated paths will be indefinitely retained and kept up upon loss of all PCEs.
PCE Failover Mechanism
The PCC chooses one primary or “best” PCE based on the precendence value. The lowest precendence value is best. In case of a tie, it appears the lowest IP address is best. The default precedence is the worst possible value (255).
We can see the results of the PCE election on the PCC by simply showing the peers:
When the PCC configures a PCE-delegated policy, it sends a PCEP Report to every PCE it has a session with, but only sets the D (delegate) flag on the Report towards the primary PCE. Only the primary PCE will respond with an Update (due to the D flag), and the PCC will reply again to all PCEs with a PCEP Report (as an ACK). This report also only has the D flag set towards the primary PCE.
We can verify this by inspecting the LSP that R1 has delegated to R10 (the best PCE). On R10, we see the D flag is set.
On R9, we see that the D flag is not set, and the computed path is empty. This shows that R9 learns of the policy via R1’s redundant PCEP Reports, even though R9 is not responsible for computation and update of the LSP.
Upon loss of the primary LSP, the PCC will immediately re-delegate all local policies to the next best PCE. We can test this out by shutting down R10’s loopback. R1, using “nexthop tracking” will immediately bring down the PCEP session with R10 and use R9 as the primary LSP.
R1 immediately uses R9 as the next best PCE:
The LSP is now delegated to R9, and R9 has calculated the path.
When R10 is available again, R1 will re-delegate its policies to R10.
Any paths that can’t be redelegated for some reason are called “orphan paths.” They are invalidated after the delegation-timeout timer expires, which is 60 seconds by default. This happens most clearly when the PCC loses all PCEP sessions.
Note that the behavior of the PCC informing all connected PCEs of all policies means that the PCEs don’t have to run some type of sync session between themselves. The PCCs are able to keep all PCEs in sync. However, we will see in the next lab that there are situations in which you might want to add a sync session between PCEs.
Failover for PCE-initiated policies
Let’s say that R10 configures a PCE initiated policy that it pushes to R1. Since R1 does not locally configure this policy, when R10 is lost, the policy is invalidated as well. R9 is not aware of this policy, so it cannot “re-push” this to R1. Even if you run the sync protocol between R9 and R10, the policy is still lost.
PCE-initiated policies have a less sensitive timer. The PCC will allow re-delegation after 3 minutes, and will time them out completely after 10 minutes. (If a higher-level application programs all PCEs with the policy, then it is possible both R9 and R10 have this policy, so R9 would be allowed to push this to R1. R1 would accept R9’s pushed policy after 3 minutes, once its re-delegation timer expires). This can be controlled using the following commands:
Last updated