Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field index out of range while writing messages #66

Closed
divyanshupundir opened this issue Aug 16, 2023 · 17 comments · Fixed by #72
Closed

Field index out of range while writing messages #66

divyanshupundir opened this issue Aug 16, 2023 · 17 comments · Fixed by #72
Labels
bug Something isn't working

Comments

@divyanshupundir
Copy link

I have an application that needs to route messages between a few endpoints based on some criteria. When I was using older versions of gomavlib it used to work great. But after updating recently, I have started seeing crashes. The routing works fine for some time but the crash occurs randomly in a few seconds.

I have added the crash logs:

panic: reflect: Field index out of range

goroutine 10 [running]:
reflect.Value.Field({0x58bb40?, 0xc0001d56e0?, 0x2c?}, 0x560ca0?)
        /usr/local/go/src/reflect/value.go:1224 +0xb2
github.com/bluenviron/gomavlib/v2/pkg/message.(*ReadWriter).Write(0xc00009cd00, {0x5fe340?, 0xc0001d56e0}, 0x1)
        /home/divyanshu/go/pkg/mod/github.com/bluenviron/gomavlib/v2@v2.0.4/pkg/message/readwriter.go:479 +0x22a
github.com/bluenviron/gomavlib/v2.(*Node).encodeMessage(0xc0004d0120, {0x5a4720?, 0xc0002b8db0})
        /home/divyanshu/go/pkg/mod/github.com/bluenviron/gomavlib/v2@v2.0.4/node.go:388 +0x24b
github.com/bluenviron/gomavlib/v2.(*Node).run(0xc0004d0120)
        /home/divyanshu/go/pkg/mod/github.com/bluenviron/gomavlib/v2@v2.0.4/node.go:280 +0x390
created by github.com/bluenviron/gomavlib/v2.NewNode
        /home/divyanshu/go/pkg/mod/github.com/bluenviron/gomavlib/v2@v2.0.4/node.go:242 +0x9f5
exit status 2
@aler9 aler9 added the bug Something isn't working label Aug 16, 2023
@aler9
Copy link
Member

aler9 commented Aug 16, 2023

Hello,are you using the library directly or through mavp2p? if you are using the library directly, which dialect are you using?

@divyanshupundir
Copy link
Author

I am using the library directly. Mavp2p is not involved.

We're using a custom dialect that extends common and ardupilotmega along with some custom messages.

@divyanshupundir
Copy link
Author

Also, I've made sure that I'm using the dialect-import v2.0.4 to generate the go files along gomavlib v2.0.4 for the runtime.

@aler9
Copy link
Member

aler9 commented Aug 17, 2023

The crash you posted appears because the message you pass to WriteMessage is not the same of the one contained inside Dialect{}.
Are you sure that this is not the case?

@divyanshupundir
Copy link
Author

Even I thought so and had tried to debug it from my side before posting it here.

Before this I was using this version of gomavlib:

github.com/aler9/gomavlib@v0.0.0-20220505052330-0c59caeb7516

After seeing the crash in 2.0.4, I shifted back to the version mentioned above, regenerated the old go files and everything was working fine.

I've been using the above mentioned version for over a year now and it works stably.

But I'll try to have a look at my custom messages again to see if I've made a mistake.

@aler9
Copy link
Member

aler9 commented Aug 17, 2023

The difficulty of debugging this issue also lays in a design flaw of the library: WriteMessageAll() can't return errors since writing is asynchronous, so the only way to report errors is to crash the library

@aler9
Copy link
Member

aler9 commented Aug 17, 2023

Anyway, the crash needs to be replicated before it can be solved.

@divyanshupundir
Copy link
Author

I am using the WriteFrameAll() function, I guess that has the same flaw.

Anyway, the crash needs to be replicated before it can be solved.

I understand. I'll try to reproduce the problem and share the code.

@divyanshupundir
Copy link
Author

divyanshupundir commented Aug 17, 2023

package main

import (
	"fmt"
	"log"
	"math"
	"time"

	"github.com/bluenviron/gomavlib/v2"
	"github.com/bluenviron/gomavlib/v2/pkg/dialects/common"
)

func main() {

	nodeA := newNode(8120)
	defer nodeA.Close()

	nodeB := newNode(8220)
	defer nodeB.Close()

	nodeC := newNode(8320)
	defer nodeC.Close()

	go func() {
		fmt.Println("nodeA: starting routing")

		for event := range nodeA.Events() {
			switch evt := event.(type) {

			case *gomavlib.EventChannelOpen:
				log.Println("nodeA: channel open")

			case *gomavlib.EventChannelClose:
				log.Println("nodeA: channel close")

			case *gomavlib.EventParseError:
				log.Println("nodeA: parse error")

			case *gomavlib.EventFrame:
				frame := evt.Frame

				nodeB.WriteFrameAll(frame)
				nodeC.WriteFrameAll(frame)
			}
		}
	}()

	go func() {
		fmt.Println("nodeB: starting routing")

		for event := range nodeB.Events() {
			switch evt := event.(type) {

			case *gomavlib.EventChannelOpen:
				log.Println("nodeB: channel open")

			case *gomavlib.EventChannelClose:
				log.Println("nodeB: channel close")

			case *gomavlib.EventParseError:
				log.Println("nodeB: parse error")

			case *gomavlib.EventFrame:
				frame := evt.Frame

				nodeA.WriteFrameAll(frame)
				nodeC.WriteFrameAll(frame)
			}
		}
	}()

	go func() {
		fmt.Println("nodeC: starting routing")

		for event := range nodeC.Events() {
			switch evt := event.(type) {

			case *gomavlib.EventChannelOpen:
				log.Println("nodeC: channel open")

			case *gomavlib.EventChannelClose:
				log.Println("nodeC: channel close")

			case *gomavlib.EventParseError:
				log.Println("nodeC: parse error")

			case *gomavlib.EventFrame:
				frame := evt.Frame

				nodeA.WriteFrameAll(frame)
				nodeB.WriteFrameAll(frame)
			}
		}
	}()

	for {
		time.Sleep(time.Duration(math.MaxInt64))
	}

}

func newNode(port uint32) *gomavlib.Node {
	node, err := gomavlib.NewNode(gomavlib.NodeConf{
		Endpoints: []gomavlib.EndpointConf{
			gomavlib.EndpointTCPServer{Address: fmt.Sprintf("0.0.0.0:%d", port)},
		},
		Dialect:          common.Dialect,
		OutVersion:       gomavlib.V2,
		OutSystemID:      2,
		OutComponentID:   5,
		HeartbeatDisable: true,
	})
	if err != nil {
		log.Panicf("cannot create mavlink node: %s", err)
	}

	return node
}

@divyanshupundir
Copy link
Author

divyanshupundir commented Aug 17, 2023

I noticed that the crash does not occur when there were only 2 nodes. It started occurring when I created 3 nodes. In my application I have multiple such nodes.

If you have mavlink-router and some SITL instance installed on your system, then you can use these commands to connect to the application:

make px4_sitl jmavsim
mavlink-routerd -t 0 127.0.0.1:14550 -p 127.0.0.1:8120
mavlink-routerd -t 0 -p 127.0.0.1:8220
mavlink-routerd -t 0 -p 127.0.0.1:8320

@divyanshupundir
Copy link
Author

divyanshupundir commented Aug 17, 2023

The logs:

nodeC: starting routing
nodeB: starting routing
nodeA: starting routing
2023/08/17 18:36:11 nodeB: channel open
2023/08/17 18:36:11 nodeA: channel open
2023/08/17 18:36:11 nodeC: channel open
panic: reflect: Field index out of range

goroutine 37 [running]:
reflect.Value.Field({0x570040?, 0xc000332840?, 0x18?}, 0x54ca00?)
        /usr/local/go/src/reflect/value.go:1224 +0xb2
github.com/bluenviron/gomavlib/v2/pkg/message.(*ReadWriter).Write(0xc00045a440, {0x5d8f00?, 0xc000332840}, 0x1)
        /home/divyanshu/go/pkg/mod/github.com/bluenviron/gomavlib/v2@v2.0.4/pkg/message/readwriter.go:479 +0x22a
github.com/bluenviron/gomavlib/v2.(*Node).encodeMessage(0xc000412120, {0x584200?, 0xc000329ec0})
        /home/divyanshu/go/pkg/mod/github.com/bluenviron/gomavlib/v2@v2.0.4/node.go:388 +0x24b
github.com/bluenviron/gomavlib/v2.(*Node).run(0xc000412120)
        /home/divyanshu/go/pkg/mod/github.com/bluenviron/gomavlib/v2@v2.0.4/node.go:280 +0x390
created by github.com/bluenviron/gomavlib/v2.NewNode
        /home/divyanshu/go/pkg/mod/github.com/bluenviron/gomavlib/v2@v2.0.4/node.go:242 +0x9f5
exit status 2

@divyanshupundir
Copy link
Author

I'm sorry, I made a small mistake in the code I had posted earlier. I have updated the comment.

@aler9
Copy link
Member

aler9 commented Aug 17, 2023

The reason why the crash happens is that passing the same Frame to multiple nodes for writing causes a race condition, in which the frame is encoded by two nodes in parallel. This is present at least since v2.0.0.

@divyanshupundir
Copy link
Author

Oh, that makes sense. I guess because the writing is happening in different coroutines, wrapping the WriteFrameAll() in a mutex lock won't fix it.

Can you please suggest a way in which I can send the frames to multiple nodes??

@aler9
Copy link
Member

aler9 commented Aug 18, 2023

this is fixed in v2.1.0.

@divyanshupundir
Copy link
Author

Thanks a lot @aler9 for fixing this!

Copy link

This issue is being locked automatically because it has been closed for more than 6 months.
Please open a new issue in case you encounter a similar problem.

@github-actions github-actions bot locked and limited conversation to collaborators Feb 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants