Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FeGW] TCP diameter Gx/Gy links failure #7496

Closed
emersonacuna opened this issue Jun 11, 2021 · 8 comments · Fixed by #7637
Closed

[FeGW] TCP diameter Gx/Gy links failure #7496

emersonacuna opened this issue Jun 11, 2021 · 8 comments · Fixed by #7637
Assignees
Labels
priority: p1 priority_p1 type: bug Something isn't working

Comments

@emersonacuna
Copy link

emersonacuna commented Jun 11, 2021

Federated LTE Network

  • Version:
  • Affected Component: Federation gateway
  • Affected Subcomponent: Federation gateway, NMS
  • Deployment Environment: local docker-compose (orc8r), Openstack AGW/FeG

Describe the Issue

Diameter links Gx/Gy are configured with TCP port to a DRA. Randomly, these links are getting down from time to time. The way we get back the links is with a manual restart of services of dockers. The links does not drop off at the same time, there is no patron of time.

�[36msession_proxy    |�[0m E0603 02:46:04.104037       1 diameter_client.go:150] diameter error on 10.68.152.61:5112: read tcp 10.30.26.57:3871->10.68.152.61:5112: use of closed network connection
�[36msession_proxy    |�[0m E0603 11:52:27.999359       1 diameter_client.go:150] diameter error on 10.68.152.61:5111: read tcp 10.30.26.57:3870->10.68.152.61:5111: use of closed network connection

The S6a link is configured with SCTP port and never happens this issue.

Expected behavior
Links must estable all the time, otherwise will generate alarms in the DRA's side.

Additional context
Attached the logs of session_proxy docker and the traces of Gx and Gy. Consider these info:
Gy (TCP port 5112) down time -> 02/06 21:46
Gx (TCP port 5111) down time -> 03/06 06:52
Each pcap file name ends with the interface name that got down within that capture

GxGy_down.zip

@emersonacuna emersonacuna added the type: bug Something isn't working label Jun 11, 2021
@emakeev
Copy link
Contributor

emakeev commented Jun 11, 2021

It's not only affecting Gx/Gy TCP, the same logic is also used for s6a/swx & other diameter based protocols TCP or SCTP.
The connection recovers "on demand" & we currently do not track connection health.

@joary
Copy link
Contributor

joary commented Jun 15, 2021

Is there a workaround that could be taken on this issue?

From a naive perspective the diameter client could raise a signal for service restart when it detects the usage of closed network connection.

@ekowtaylor
Copy link
Contributor

@uri200 , @themarwhal any thoughts and further insights about this issue?

@ekowtaylor ekowtaylor added the priority: p1 priority_p1 label Jun 16, 2021
@uri200
Copy link
Contributor

uri200 commented Jun 16, 2021

Hey @ekowtaylor this is not a simple feature. It will require code changes on all our diameter clients and testing to see HA is not impacted (which I am not even sure if it is possible). This should be planned for 1.7 and try to backport it to 1.6 if ready earlier

@uri200
Copy link
Contributor

uri200 commented Jun 17, 2021

@emersonacuna the issue may be coming from the vm where FEG is running. Per what I can see, this message is triggered when a socket is closed and feg tries to write on it. The reason for closing that socket is not known to the feg because there is nothing on the log that indicates an error.

Can you include the syslog when this issue happens?

Can you also include the version you are using (1.5? 1.5.1?) so I can try to build a custom binary please?

@uri200
Copy link
Contributor

uri200 commented Jun 17, 2021

@emersonacuna can you confirm that FEG is running at the ip 10..30.26.57

@emersonacuna
Copy link
Author

emersonacuna commented Jun 17, 2021

Hi @uri200 yes. IPs of FeG are these ones: FeG Active 10.30.26.55 / FeG Backup 10.30.26.57. All the info i sent you is for FeG Backup.

BTW, our current version is 1.4, but we are planning to upgrade 1.5.1.

emakeev added a commit to emakeev/magma that referenced this issue Jun 17, 2021
emakeev added a commit to emakeev/magma that referenced this issue Jun 18, 2021
emakeev added a commit to emakeev/magma that referenced this issue Jun 18, 2021
emakeev added a commit to emakeev/magma that referenced this issue Jun 18, 2021
emakeev added a commit to emakeev/magma that referenced this issue Jun 18, 2021
emakeev added a commit that referenced this issue Jun 19, 2021
…) (#7637)

Signed-off-by: Evgeniy Makeev <evgeniym@fb.com>
themarwhal pushed a commit that referenced this issue Jun 19, 2021
…) (#7637)

Signed-off-by: Evgeniy Makeev <evgeniym@fb.com>
themarwhal pushed a commit that referenced this issue Jun 19, 2021
…) (#7637)

Signed-off-by: Evgeniy Makeev <evgeniym@fb.com>
themarwhal pushed a commit that referenced this issue Jun 19, 2021
…) (#7637)

Signed-off-by: Evgeniy Makeev <evgeniym@fb.com>
@uri200
Copy link
Contributor

uri200 commented Jun 22, 2021

@ekowtaylor this issue was closed. Waiting for the update to be installed at @emersonacuna setup to confirm

rmeleromira pushed a commit to rmeleromira/magma that referenced this issue Jul 24, 2021
…ma#7496) (magma#7637)

Signed-off-by: Evgeniy Makeev <evgeniym@fb.com>
Signed-off-by: Ramon Melero <ramonmelero@fb.com>
m-trojanowski pushed a commit to openEPC/magma that referenced this issue Oct 20, 2021
m-trojanowski pushed a commit to openEPC/magma that referenced this issue Oct 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p1 priority_p1 type: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants