PIM RPF Troubleshooting #2
Load pim.rpf.tshoot2.cfg
CSR2 has joined two SSM groups:
(7.1.7.1, 232.1.1.1)
(2007:7:1:7::1, FF35::1)
CSR1 has joined two SSM groups:
(9.2.5.2, 232.1.1.1)
(2009:9:2:5::2, FF35::1)
When CSR1 pings 232.1.1.1 or FF35::1, it does not receive a response from CSR2. Explain the issue and find a way to fix it.
Answer
Explanation
Currently multicast traffic that uses R1 as the sender is not working. We’ll verify that we have (S, G) state for R1 as the source at the LHR, R5:
Indeed we have state and have the correct RPF interface. We’ll next check R10, which is not shown, but also has the correct state and RPF interfaces.
Next we look at XR1 and notice that the RPF interface is wrong.
Note that a faster way to find the fault would be to run an mtrace from as close to the receiver as possible. This verifies where the break in the PIM signalling occurs. The syntax is <sender IP> <receiver IP> <group IP>. Notice that the RPF interface is creating a loop. R10 points to XR1, but XR1 points back to R10.
Looking at the RPF check on XR1, we don’t see many clues explaining why this is, except for the admin distance of the route. The admin distance is BGP (20) which gives us a clue that this may come from a multicast BGP route.
As a side note, on IOS-XR, the show pim rpf command only shows the RPF results for entries that are actively in the MRIB table. To see an RPF result for any arbitrary source, you can use the command show pim rpf hash and show pim ipv6 rpf hash.
On IOS-XR, when the router enables BGP ipv4/multicast or ipv6/multicast, a multicast RPF table is created. This can be seen with show route ipv4|ipv6 multicast.
We see on XR1 that several BGP routes appear in this table, as well as a few OSPF routes which are inherited from the unicast RIB. This is very similar to IOS-XE’s show ip route multicast.
If we look at 7.1.7.1 sepcifically, we see that it matches a BGP route via R10:
The reason this is happening is that XR1 and R10 are running BGP ipv4/multicast and R10 is redistributing OSPF into ipv4/multicast. Not only that, but R10 is setting next-hop-self on these routes as well.
On R10, these routes are injected into the BGP ipv4/multicast table with the nexthop from OSPF. Even though R10 and XR1 are eBGP peers, R10 will not change the nexthop for routes that are reachable out the subnet used for peering. In this case, by default R10 will not change the nexthop for the routes that point to XR1. (This is called third-party nexthop).
However, R10 has the additional command “next-hop-self” which changes these nexthops to itself too. We can remove that so that R10 goes back to not changing these nexthops, which are essentially third-party nexthops.
XR1 now only has the routes for which it is not the nexthop.
This is because XR1 rejects routes for which it is the nexthop:
XR1 now falls back to the OSPF route for 7.1.7.1 and traffic works again.
IPv6/multicast does not appear to use this third-party nexthop mechanism. R10 is not setting next-hop-self on ipv6/multicast routes, yet XR1 sees all BGP routes with R10 as the nexthop. To fix this, the easiest way is to just turn off BGP ipv6/multicast or stop redistributing OSPF routes into BGP.
XR1 falls back to the OSPFv3 route and traffic works again.
Last updated