SR inter-IGP using PCE
Last updated
Last updated
Load sr.inter.igp.pce.init.cfg
Nodes R1-R4 are running ISIS, and nodes R5-R8 are running OSPF. Node R3 is running both ISIS and OSPF.
Achieve an end-to-end LSP between R1 and R7 without redistributing routes into each IGP.
R10 is a PCE, not shown in the diagram. It belongs to both IGPs.
On R1 and R7, use an ODN policy with color 10 that simply uses the IGP metric and requests PCE computation.
A PCE (Path Computation Element) allows for policies requiring full network topology visibility, such as inter-domain policies and path disjoint policies, to be computed. The headend acts as PCC (Path Computation Client) and requests the path calculation from the PCE. The PCE is active and stateful, so it maintains the delegated policy, updating it as needed when there are any IGP topology changes.
First, we must configure a TE RID on all nodes in the network. This is missing from the init file. Without this, the PCE won’t be able to calculate a path, as none of the nodes can be “placed” in the topology graph without a TE RID.
Next, we enable PCE server functionality on R10. This is simply done using the following command. The address is the local address which will listen for TCP SYNs on the PCEP port (4189).
Now R10 needs to populate its SR-TED with both IGP topologies. R10 already belongs to both IGPs, so we simply use the distribute link-state command under each IGP. We must make sure to use separate topology IDs for each IGP so that they are kept separate in R10’s consolidated SR-TED. As a reminder, when no instance-id is specified, the instance id is 0. This is not a problem when the entire network is only a single IGP instance.
Each PCC simply configures the PCE under the segment-routing traffic-eng configuration:
We should now see that the PCEP session is established between the PCCs and the PCE. On the PCCs we can use the following command:
Above, the PCE is stateful which appears to always be the case. (There does not appear to be a way to use XR as a stateless PCE). The PCE statefully keeps track of policies which allows the PCE to update the PCC’s paths and instantiate new paths on the PCC. Instantiation means the PCE configures the policy locally and then pushes the policy to the client. The default precendence is 255, which is used for PCE redundancy. The lowest precendence number is the best PCE.
On the PCE we can verify PCC sessions using a similar command:
The detail keyword on either the PCC or PCE provides some details of the PCEP session statistics:
The PCE should have an SR-TED consisting of all nodes in both IGPs. For example, verify that both 1.1.1.1 and 7.7.7.1 are present in R10’s SR-TED:
We can take this a step further and verify that R10, acting as PCE, can calculate a SR-TE policy between R1 and R7 using the following command:
We can now configure R1 and R7 to have an ODN policy which uses the PCE. The keyword pcep means to use PCEP (PCE Protocol) for computation as opposed to headend (local) computation. Note that this is not strictly necessary for ODN policies, as ODN policies use two candidate paths by default: pref 200 for local computation, and pref 100 for PCE computation. The PCE computation then takes effect when local computation fails.
We’ll color CE routes on R1 and R7 so that each PE will request path computation for the ODN policy from the PCE.
We should see that the policy is up on each PE. Only R1 is shown for brevity. Note that the PCE included the SID descriptor (ex. 7.7.7.1) along with the label (ex. 16007). This is how R1 knows the resolution of the label even though R1 is not itself resolving each label.
The workflow for a PCE policy works as follows:
The PCC sends a PCEP Report that contains the name, constraints, and optimization metric for the SR-TE policy, but with an empty SID list. The delegate flag is set in the LSP object, indicating that it wants to delegate this policy to the PCE.
The PCE, noticing the empty SID list, interprets this as a request for path computation. The PCE computes the path and signals it to the PCC in a PCEP Update
The PCC installs the policy in its FIB and then sends a PCEP Report, echoing back the SID list and details of the policy. This is used as an acklodegment mechanism so the PCE knows the PCC was able to install the policy. The PCEP Report allows the PCE to track the policy in its SR-TED.
If the PCE cannot calculate the policy, it sends back an empty PCEP Update with the delegate flag cleared.
If at any time the topology changes, the PCE recalculates the policy and if anything has changed, signals the changes to the PCC in a PCEP Update. The PCC replies with a PCEP Report.
So as you can see, the basic PCEP functionality is enabled with PCEP Reports (sent from PCCs) and PCEP Updates (sent by the PCE).
We can see the number of SR-TE policies (LSPs) the PCE is tracking for each peer using the following command:
Using the detail keyword, we can get details of a given LSP:
Notice above that there is a Reported path section and a Computed path section. The computed path is the path that the PCE calculated and signaled to the PCC. The reported path is the path that was seen in the PCEP Report as an ACK from the PCC. So the reported path should be equal to the computed path.
You can also see above that any aspect of an SR-TE policy (such as metric margin, BSID value, metric of the path, name of the path) can be signaled via PCEP.
These LSPs are actually part of the SR-TED itself. The SR-TED is not only fed via the local IGP and BGP-LS, but also via PCEP. It is important for the PCE to track LSPs in its SR-TED so that it can enable features such as disjoint LSPs. The calculation of a new LSP might be based on the state of other existing LSPs.
As a note, we can now remove “distribute link-state” from all other nodes besides R10. R10 is the only node which requires a populated SR-TED. However, you can optionally still allow headends to compute intra-domain paths and only ask the PCE for inter-domain path calculation. In that case, you would leave “distribute link-state” configured on every node.
Finally, let’s confirm connectivity between CE101 and CE107. Currently we have an issue: the PEs are not selecting each other’s VPNv4 routes as valid, due to RIB failure:
Interestingly though, the color still triggered the ODN policy to come up, and a BSID was allocated. However, this recursion on the BSID cannot overcome the inaccessibility issue. To solve this, we can use a null0 static route on each PE. This will cause BGP not to flag the PE as inaccessible as a route in the RIB does exist (although via null0), and BGP will continue on, recursing the route via the BSID.
(Note, in IOS-XR 7.x, we can instead use the BGP knobs bgp bestpath igp-metric sr-policy and nexthop validation color-extcomm sr-policy.)
The VPNv4 route is now available for use:
The CEs have reachability to eachother via a PCE-computed end-to-end TE LSP!
A PCE’s northbound interface allows applications to program the PCE. The PCE’s southbound interface interacts with PCCs, generally over PCEP, to program their policies.
A PCE exposes REST APIs such as the following:
http://<sr-pce-ip-addr>:8080/topo/subscribe/json
Get topology info
http://<sr-pce-ip-addr>:8080/lsp/subscribe/json
Get LSP info
I’m not clear whether these are available on XRv as PCE.
There are three types of computation designs:
Centralized
Only an SDN controller programs policies. This is a “vertical” model. The PCE is responsible for pushing all policies to all routers. None of the routers do any local path computation.
Distributed
All routers calculate paths themselves. No PCE/controller is used.
Hybrid
Routers calculate paths when they can (intra-area and non-disjoint), but use a PCE to calculate when necessary (inter-domain and disjointness).
The routers build the policies themselves locally but use a PCE when necessary.
In the centralized model, the PCE/controller instantiates the policies.
This is called a “horizontal” model and is generally what is recommended.
The SR-TED on R10 is fed via each IGP. A different instance ID must be used to keep the IGPs separated within the SR-TED. The SR-TED consolidates all information learned via the local IGPs and BGP-LS (which we will see next) into a single graph, so the unique instance ID keeps the IGPs separate once consolidated.
The distribute link-state command also has an optional throttle parameter. The default throttle is 50ms for ISIS and 5ms for OSPF. This is how long the router will wait before distributing an IGP topology change into SR-TED or BGP-LS. (Similar to throttling SPF runs, for example).
All routers belonging to the same IGP must distribute using the same instance ID, even if they are in different IGP levels/areas. The instance ID is for completely separate IGP instances, not separate IGP areas.
The PCE treats all metrics and link attributes from different IGP instances as if they were global, since they are all consolidated into a single graph. This could create an issue if different operators manage the different IGPs, and link metrics, link affinitys, etc. are not comparable between the IGPs.
Nodes that connect to different IGPs must have the same TE RID so that the PCE can identify the node as belonging to both IGPs. However, a different RID should be used in each IGP to prevent the possibility of a duplicate RID if routers from each separate IGP inadvertently form an adjancency and the LSP/LSAs are leaked between the IGPs.