Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conntrack entry mismatch leads to policy enforcement on reply packet for service loopback case #19406

Closed
2 tasks done
skmatti opened this issue Apr 12, 2022 · 18 comments
Closed
2 tasks done
Labels
area/loadbalancing Impacts load-balancing and Kubernetes service implementations info-completed The GH issue has received a reply from the author kind/bug This is a bug in the Cilium logic. pinned These issues are not marked stale by our issue bot. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.

Comments

@skmatti
Copy link
Contributor

skmatti commented Apr 12, 2022

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Steps to reproduce:

minikube start --network-plugin=cni --cni=false --memory=4096 --kubernetes-version=v1.22.8

cilium install --version=v1.10.5

kubectl apply -f - <<EOF
apiVersion: v1
kind: ReplicationController
metadata:
  name: guestbook
  labels:
    k8s-app.guestbook: web
spec:
  replicas: 1
  selector:
    k8s-app.guestbook: web
  template:
    metadata:
      labels:
        k8s-app.guestbook: web
    spec:
      containers:
      - image: gcr.io/google-samples/gb-frontend:v6
        name: guestbook
        ports:
        - containerPort: 80
          name: http-server
          protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  name: guestbook
  labels:
    k8s-app.guestbook: web
spec:
  ports:
  - port: 81
    protocol: TCP
    targetPort: http-server
  selector:
    k8s-app.guestbook: web
  type: ClusterIP
EOF

kubectl apply -f - <<EOF
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-ingress-on-tcp-80
specs:
- endpointSelector:
    matchLabels:
      k8s-app.guestbook: web
  ingress:
  - toPorts:
    - ports:
      - port: "80"
        protocol: TCP
EOF

It is important that the service port (81) is different from the container port (80). Issue does not occur if they are the same.

$ kubectl get svc guestbook
NAME        TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
guestbook   ClusterIP   10.98.138.60   <none>        81/TCP    42m
$ kubectl get pods -o wide --selector=k8s-app.guestbook=web
NAME              READY   STATUS    RESTARTS   AGE   IP           NODE       NOMINATED NODE   READINESS GATES
guestbook-8b46v   1/1     Running   0          43m   10.0.0.208   minikube   <none>           <none>

Send traffic to itself via cluster IP:

$ kubectl exec guestbook-8b46v -- curl 10.98.138.60:81

Cilium monitor output:

$ cilium monitor --related-to 1221
Press Ctrl-C to quit
level=info msg="Initializing dissection cache..." subsys=monitor
-> endpoint 1221 flow 0x95218252 identity 22887->22887 state established ifindex lxccbaae0c0d570 orig-ip 169.254.42.1: 169.254.42.1:55518 -> 10.0.0.208:80 tcp SYN
Policy verdict log: flow 0xfef8e31c local EP ID 1221, remote ID 22887, proto 6, ingress, action deny, match none, 10.98.138.60:81 -> 10.0.0.208:55518 tcp SYN, ACK
xx drop (Policy denied) flow 0xfef8e31c to endpoint 1221, identity 22887->22887: 10.98.138.60:81 -> 10.0.0.208:55518 tcp SYN, ACK
Policy verdict log: flow 0x8caf00f9 local EP ID 1221, remote ID 22887, proto 6, ingress, action deny, match none, 10.98.138.60:81 -> 10.0.0.208:55518 tcp SYN, ACK

As per my understanding, a pod sending packets to itself via cluster IP should have skipped policy enforcement all together. Although, the first packet (pod -> itself via cluster IP) is skipped policy enforcement, the reply packet is not skipped (incorrectly subjected to policy enforcement). This is happening because the reply packet do not match any conntrack entries.

$ cilium bpf ct list global | grep 10.0.0.208
TCP IN 169.254.42.1:55522 -> 10.0.0.208:80 expires=17529096 RxPackets=3 RxBytes=222 RxFlagsSeen=0x02 LastRxReport=17529035 TxPackets=6 TxBytes=444 TxFlagsSeen=0x12 LastTxReport=17529035 Flags=0x0008 [ LBLoopback ] RevNAT=7 SourceSecurityID=22887 IfIndex=0
TCP OUT 10.0.0.208:55522 -> 10.98.138.60:80 expires=17529096 RxPackets=0 RxBytes=0 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=3 TxBytes=222 TxFlagsSeen=0x02 LastTxReport=17529035 Flags=0x0008 [ LBLoopback ] RevNAT=7 SourceSecurityID=22887 IfIndex=0
TCP OUT 10.98.138.60:81 -> 10.0.0.208:55522 service expires=17529096 RxPackets=0 RxBytes=7 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x02 LastTxReport=17529035 Flags=0x0000 [ ] RevNAT=7 SourceSecurityID=0 IfIndex=0
ICMP OUT 10.0.0.208:0 -> 10.98.138.60:0 related expires=17529093 RxPackets=0 RxBytes=0 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=1 TxBytes=74 TxFlagsSeen=0x02 LastTxReport=17529035 Flags=0x0018 [ LBLoopback SeenNonSyn ] RevNAT=7 SourceSecurityID=22887 IfIndex=0

As per my understanding, the second entry is causing the problem:
10.0.0.208:55522 -> 10.98.138.60:80 - the destination IP address is the Cluster IP but the port is container port (80) instead of service port (81).

The reply packet on the other hand has 10.98.138.60:81 -> 10.0.0.208:55518 which does not match the conntrack entry and is incorrectly subjected to network policy enforcement.

Cilium Version

1.10.5

Kernel Version

x86_64

Kubernetes Version

1.22.8

Sysdump

cilium-sysdump-20220411-230655.zip

Relevant log output

No response

Anything else?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@skmatti skmatti added kind/bug This is a bug in the Cilium logic. needs/triage This issue requires triaging to establish severity and next steps. labels Apr 12, 2022
@skmatti
Copy link
Contributor Author

skmatti commented Apr 12, 2022

This issue related to #11593 and #15321

@Weil0ng
Copy link
Contributor

Weil0ng commented Apr 12, 2022

maybe @pchaigno

@pchaigno pchaigno added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. labels Apr 12, 2022
@pchaigno
Copy link
Member

Did you check if the same bug also affects master?

@skmatti
Copy link
Contributor Author

skmatti commented Apr 13, 2022

Yes, I tried v1.12.0-rc1 which was created Yesterday and it has the issue

@skmatti
Copy link
Contributor Author

skmatti commented Apr 13, 2022

I enabled debug logs which confirms that the reply packet is treated as a new packet:

CPU 04: MARK 0x41fb41bb FROM 1143 DEBUG: Conntrack lookup 1/2: src=10.105.185.194:81 dst=10.0.0.92:53112
CPU 04: MARK 0x41fb41bb FROM 1143 DEBUG: Conntrack lookup 2/2: nexthdr=6 flags=0
CPU 04: MARK 0x41fb41bb FROM 1143 DEBUG: CT verdict: New, revnat=0
CPU 04: MARK 0x41fb41bb FROM 1143 DEBUG: Policy evaluation would deny packet from 2899 to 2899
Policy verdict log: flow 0x41fb41bb local EP ID 1143, remote ID 2899, proto 6, ingress, action deny, match none, 10.105.185.194:81 -> 10.0.0.92:53112 tcp SYN, ACK

Cluster IP - 10.105.185.194
Pod IP - 10.0.0.92

@pchaigno
Copy link
Member

So if I understand correctly, the issue here is that we're not using the correct port when creating the conntrack entry? Did you check the code to see why that is?

@skmatti
Copy link
Contributor Author

skmatti commented Apr 15, 2022

I am using v1.12.0-rc1 for debugging. The conntrack entries are created correctly with this version.

Cluster IP - 10.105.185.194
Pod IP - 10.0.0.92

$ cilium bpf ct list global
Defaulted container "cilium-agent" out of: cilium-agent, ebpf-mount (init), clean-cilium-state (init)
TCP OUT 10.105.185.194:81 -> 10.0.0.92:53162 service expires=17762859 RxPackets=0 RxBytes=4 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x02 LastTxReport=17762801 Flags=0x0000 [ ] RevNAT=4 SourceSecurityID=0 IfIndex=0 
TCP OUT 169.254.42.1:53162 -> 10.0.0.92:80 expires=17762859 RxPackets=0 RxBytes=0 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=5 TxBytes=370 TxFlagsSeen=0x02 LastTxReport=17762801 Flags=0x0008 [ LBLoopback ] RevNAT=4 SourceSecurityID=2899 IfIndex=0 
TCP IN 169.254.42.1:53162 -> 10.0.0.92:80 expires=17762859 RxPackets=5 RxBytes=370 RxFlagsSeen=0x02 LastRxReport=17762801 TxPackets=10 TxBytes=740 TxFlagsSeen=0x12 LastTxReport=17762801 Flags=0x0008 [ LBLoopback ] RevNAT=4 SourceSecurityID=2899 IfIndex=0 
ICMP OUT 169.254.42.1:0 -> 10.0.0.92:0 related expires=17762843 RxPackets=0 RxBytes=0 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=1 TxBytes=74 TxFlagsSeen=0x02 LastTxReport=17762785 Flags=0x0018 [ LBLoopback SeenNonSyn ] RevNAT=4 SourceSecurityID=2899 IfIndex=0 

But the issue is with the conntrack lookup:

CPU 04: MARK 0x41fb41bb FROM 1143 DEBUG: Conntrack lookup 1/2: src=10.105.185.194:81 dst=10.0.0.92:53112
CPU 04: MARK 0x41fb41bb FROM 1143 DEBUG: Conntrack lookup 2/2: nexthdr=6 flags=0

flags=0 which means the lookup was initialized from

ret = ct_lookup4(get_ct_map4(&tuple), &tuple, ctx, l4_off, CT_EGRESS,
but it should not have reached this line. Instead, the conntrack lookup should have been initialized from
ret = lb4_local(get_ct_map4(&tuple), ctx, l3_off, l4_off,
because it is service packet.

Somehow packet fails lb4_lookup_service test and misses lb4_local.

Does this summary looks accurate?. If so, how do I proceed with determining why service check fails for the reply packet?.

@pchaigno
Copy link
Member

flags=0 means tuple->flags == TUPLE_F_OUT which itself means dir == CT_INGRESS. I think you are looking at

ret = ct_lookup4(get_ct_map4(&tuple), &tuple, ctx, l4_off, CT_INGRESS, &ct_state,

@skmatti
Copy link
Contributor Author

skmatti commented Apr 15, 2022

I see, thanks. But why would it go through nodeport, it is a cluster IP service. Maybe it is coming from

ret = ct_lookup4(get_ct_map4(&tuple), &tuple, ctx, l4_off, CT_INGRESS, &ct_state,

@pchaigno
Copy link
Member

Oh, right. That makes more sense.

So it's ingressing into the container and we're doing a CT lookup with the VIP instead of the loopback IP. So why wasn't the packet SNATed to the loopback IP on egress?

@skmatti
Copy link
Contributor Author

skmatti commented Apr 15, 2022

Here is the full conntrack lookups for the trafffic.

There was a lookup with loopback IP and the verdict was Reply. Then reverse SNAT and DNAT happens and a conntrack lookup is initiated again:

$ kubectl -n kube-system exec cilium-dbvqk -- cilium monitor --related-to 1143 -v
Defaulted container "cilium-agent" out of: cilium-agent, ebpf-mount (init), clean-cilium-state (init)
Listening for events on 12 CPUs with 64x4096 of shared memory
Press Ctrl-C to quit
level=info msg="Initializing dissection cache..." subsys=monitor
CPU 04: MARK 0xe97cb587 FROM 1143 DEBUG: Conntrack lookup 1/2: src=10.0.0.92:53162 dst=10.105.185.194:81
CPU 04: MARK 0xe97cb587 FROM 1143 DEBUG: Conntrack lookup 2/2: nexthdr=6 flags=4
CPU 04: MARK 0xe97cb587 FROM 1143 DEBUG: CT verdict: New, revnat=0
CPU 04: MARK 0xe97cb587 FROM 1143 DEBUG: Service backend slot lookup: slot=2, dport=81
CPU 04: MARK 0xe97cb587 FROM 1143 DEBUG: Conntrack create: proxy-port=0 revnat=4 src-identity=0 lb=0.0.0.0
CPU 04: MARK 0xe97cb587 FROM 1143 DEBUG: Loopback SNAT from=10.0.0.92 to=169.254.42.1
CPU 04: MARK 0xe97cb587 FROM 1143 DEBUG: Successfully mapped addr=10.0.0.92 to identity=2899
CPU 04: MARK 0xe97cb587 FROM 1143 DEBUG: Conntrack lookup 1/2: src=169.254.42.1:53162 dst=10.0.0.92:80
CPU 04: MARK 0xe97cb587 FROM 1143 DEBUG: Conntrack lookup 2/2: nexthdr=6 flags=1
CPU 04: MARK 0xe97cb587 FROM 1143 DEBUG: CT verdict: New, revnat=0
CPU 04: MARK 0xe97cb587 FROM 1143 DEBUG: Conntrack create: proxy-port=0 revnat=4 src-identity=2899 lb=169.254.42.1
CPU 04: MARK 0xe97cb587 FROM 1143 DEBUG: Attempting local delivery for container id 1143 from seclabel 2899
CPU 04: MARK 0xe97cb587 FROM 1143 DEBUG: Conntrack lookup 1/2: src=169.254.42.1:53162 dst=10.0.0.92:80
CPU 04: MARK 0xe97cb587 FROM 1143 DEBUG: Conntrack lookup 2/2: nexthdr=6 flags=0
CPU 04: MARK 0xe97cb587 FROM 1143 DEBUG: CT entry found lifetime=17762843, revnat=4
CPU 04: MARK 0xe97cb587 FROM 1143 DEBUG: CT verdict: Established, revnat=4
-> endpoint 1143 flow 0xe97cb587 , identity 2899->2899 state established ifindex lxc9ea2309f4588 orig-ip 169.254.42.1: 169.254.42.1:53162 -> 10.0.0.92:80 tcp SYN
CPU 04: MARK 0xee59446 FROM 1143 DEBUG: Successfully mapped addr=169.254.42.1 to identity=2
CPU 04: MARK 0xee59446 FROM 1143 DEBUG: Conntrack lookup 1/2: src=10.0.0.92:80 dst=169.254.42.1:53162
CPU 04: MARK 0xee59446 FROM 1143 DEBUG: Conntrack lookup 2/2: nexthdr=6 flags=1
CPU 04: MARK 0xee59446 FROM 1143 DEBUG: CT entry found lifetime=17762843, revnat=4
CPU 04: MARK 0xee59446 FROM 1143 DEBUG: CT verdict: Reply, revnat=4
CPU 04: MARK 0xee59446 FROM 1143 DEBUG: Reverse NAT lookup, index=4
CPU 04: MARK 0xee59446 FROM 1143 DEBUG: Performing reverse NAT, address=10.105.185.194 port=81
CPU 04: MARK 0xee59446 FROM 1143 DEBUG: Loopback reverse SNAT from=169.254.42.1 to=10.0.0.92
CPU 04: MARK 0xee59446 FROM 1143 DEBUG: Attempting local delivery for container id 1143 from seclabel 2899
CPU 04: MARK 0xee59446 FROM 1143 DEBUG: Conntrack lookup 1/2: src=10.105.185.194:81 dst=10.0.0.92:53162
CPU 04: MARK 0xee59446 FROM 1143 DEBUG: Conntrack lookup 2/2: nexthdr=6 flags=0
CPU 04: MARK 0xee59446 FROM 1143 DEBUG: CT verdict: New, revnat=0
CPU 04: MARK 0xee59446 FROM 1143 DEBUG: Policy evaluation would deny packet from 2899 to 2899
Policy verdict log: flow 0xee59446 local EP ID 1143, remote ID 2899, proto 6, ingress, action deny, match none, 10.105.185.194:81 -> 10.0.0.92:53162 tcp SYN, ACK

@github-actions
Copy link

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jun 15, 2022
@github-actions
Copy link

github-actions bot commented Jul 7, 2022

This issue has not seen any activity since it was marked stale.
Closing.

@github-actions github-actions bot closed this as completed Jul 7, 2022
@joestringer joestringer reopened this Sep 27, 2022
@joestringer joestringer added pinned These issues are not marked stale by our issue bot. and removed needs/triage This issue requires triaging to establish severity and next steps. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. labels Sep 27, 2022
@joestringer
Copy link
Member

I wonder if the next step could be to create a new test in https://github.com/cilium/cilium-cli connectivity testsuite that configures a service with only one pod backing it, then initiates requests from that pod to the service with policy applied. This could make reproducing the issue more easy, which should also make it easier to debug and figure out a solution.

@aditighag
Copy link
Member

aditighag commented Sep 27, 2022

If I understand the issue correctly, we already have a test case to verify this scenario - https://github.com/cilium/cilium/blob/master/test/k8s/services.go#L170-L170.

It is important that the service port (81) is different from the container port (80). Issue does not occur if they are the same.

If this is indeed relevant, we should extend the test case. @skmatti Here is context for how we skip the policy enforcement for the service loopback + policy enforcement case - 52cd6da. This hopefully helps in debugging/fixing the issue further.

@sypakine
Copy link
Contributor

This may be fixed by #27798

@julianwiedmann julianwiedmann added the area/loadbalancing Impacts load-balancing and Kubernetes service implementations label Aug 31, 2023
@julianwiedmann
Copy link
Member

This may be fixed by #27798

agree. @skmatti could you please re-test?

@julianwiedmann julianwiedmann added the need-more-info More information is required to further debug or fix the issue. label Sep 7, 2023
@skmatti
Copy link
Contributor Author

skmatti commented Dec 12, 2023

The issue is resolved on cilium 1.12.15 so most likely fixed by #27798

-> endpoint 2244 flow 0xabe2de97 , identity 43665->43665 state new ifindex lxc54076bf7fdd8 orig-ip 169.254.42.1: 169.254.42.1:51850 -> 10.244.1.210:8080 tcp SYN
-> endpoint 2244 flow 0xabe2de97 , identity 43665->43665 state established ifindex lxc54076bf7fdd8 orig-ip 169.254.42.1: 169.254.42.1:51850 -> 10.244.1.210:8080 tcp ACK
-> endpoint 2244 flow 0xabe2de97 , identity 43665->43665 state established ifindex lxc54076bf7fdd8 orig-ip 169.254.42.1: 169.254.42.1:51850 -> 10.244.1.210:8080 tcp ACK
-> endpoint 2244 flow 0xabe2de97 , identity 43665->43665 state established ifindex lxc54076bf7fdd8 orig-ip 169.254.42.1: 169.254.42.1:51850 -> 10.244.1.210:8080 tcp ACK
-> endpoint 2244 flow 0xabe2de97 , identity 43665->43665 state established ifindex lxc54076bf7fdd8 orig-ip 169.254.42.1: 169.254.42.1:51850 -> 10.244.1.210:8080 tcp ACK
-> endpoint 2244 flow 0xabe2de97 , identity 43665->43665 state established ifindex lxc54076bf7fdd8 orig-ip 169.254.42.1: 169.254.42.1:51850 -> 10.244.1.210:8080 tcp ACK, FIN
-> endpoint 2244 flow 0xabe2de97 , identity world->43665 state established ifindex 0 orig-ip 169.254.42.1: 169.254.42.1:51850 -> 10.244.1.210:8080 tcp ACK, FIN

@skmatti skmatti closed this as completed Dec 12, 2023
@github-actions github-actions bot added info-completed The GH issue has received a reply from the author and removed need-more-info More information is required to further debug or fix the issue. labels Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/loadbalancing Impacts load-balancing and Kubernetes service implementations info-completed The GH issue has received a reply from the author kind/bug This is a bug in the Cilium logic. pinned These issues are not marked stale by our issue bot. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

No branches or pull requests

7 participants