Skip to content

How interactive debugging works (old way)

Rich Chiodo edited this page Jul 26, 2022 · 15 revisions

The Interactive Window is used for running cells in a python script. One of the capabilities of running these cells is to debug them.

This page will describe how this debugging is implemented under the covers.

What does debug cell do?

image

This sequence diagram is explained below:

Inject debugpy

The first step is to inject debugpy into kernel. Without knowing what the process id of the kernel is, we need a way to have the kernel ready for attach. We do this by running this code in the kernel:

import debugpy
debugpy.listen(('localhost', 0))

That causes debugpy to start a server and returns the port to listen on. We use this port to generate a launch config. Something that looks like so:

        {
            "type": "python",
            "request": "attach",
            "connect": { "host": "localhost", "port": 5678 },
            "justMyCode": true
        },

Where does debugpy come from?

Debugpy ships with the python extension. We add the path on disk where the python extension has debugpy before we attach. In a remote situation, debugging is disabled anyway, so we don't have to worry about supporting this for remote.

Attaching

The launch config generated in the previous step is used to attach the debugger to the running debugpy server (much like done here for launching debugging of a python file).

VS code then transitions to debug mode. It just sits there waiting for an event from the debuggee.

Cell File names

The next step is called out as 'Replace kernel's run cell handler'.

What is that code doing? It's replacing the IPython runcell method with a new one so that we can set an environment variable BEFORE we run a cell.

Specifically this code here:

                predicted_name = __VSCODE_compute_hash(args[1], args[0].execution_count)
                os.environ["IPYKERNEL_CELL_NAME"] = predicted_name

Internally IPython uses the environment variable IPYKERNEL_CELL_NAME to set the name of the pseudo file associated with any code that's run.

Why the special run cell hook?

If we need an environment variable to be set when running a cell, why not just execute some code in the kernel? This diagram might explain why:

image

If we're using cells to change the environment variable, those cells themselves end up with the IPYKERNEL_CELL_NAME set for them. In the example above, if CELL_2 calls into code in CELL_1, the debugger won't know that it was the original cell 1, and not the second "cell 1" where the variable was set.

To work around this problem, we instead patch the kernel so it sets the variable itself as it executes code.

Remapping file names.

As you can see in the python code above, the IPYKERNEL_CELL_NAME is set to the hash of the cell contents plus the execution count. However in the interactive window, we're running code like so:

image

This means when the debugger returns a stack frame, its source member will be pointing to something like ipython_34343434343434.py. This is obviously not the same as manualTestFile.py which is where we want the IP indicator to be in VS code when the stop event fires.

Debugpy allows us to send a custom message to have it remap the ipython_34343434343434.py into manualTestFile.py. This means when the source locations in the stack frame responses come to VS code, it will just open the correct file.

This custom message is sent in the Remap source files event in the sequence diagram.

Enable thread tracing

So the debugger is attached and we have the file paths correct when execution happens, what is the enable thread tracing step?

When debugpy is injected into a python process, it watches the execution of every line to see if it should be hitting breakpoints or not. This is really slow. This is okay if we're debugging a cell, but what about afterwards?

What if the user debugs one cell but then wants to run the kernel for a number of cells afterwards? We don't want the overhead of debugpy watching every line executing.

Debugpy has a global flag that essentially turns off this watching. That's what the enable thread tracing step does. It enables the global flag for debugpy to start watching execution again.

It's not shown in the diagram above, but after debugging is complete, the flag is turned off.

Add breakpoint to source

Before we execute the cell from the user, we want the cell to stop on the first line in the cell. You might think, why not just send a breakpoint in the attach initialization (or before we enable tracing).

We didn't do that and instead just put a breakpoint() instruction in the code that's about to be executed. The reason for this was to prevent VS code from knowing about the breakpoint and showing it in the UI when the response for the setBreakpoint was received.

Execute the cell

Finally the cell is ready to execute. We execute it normally and because the first line has a breakpoint, it stops right after that point.

Stop event fires

Once the stop event fires, debugging the cell is just like debugging any python code. Variable and stack frame requests are made. The user can step in and step out.

After debugging

Then what happens after the user goes off the end of the cell? From debugpy's point of view, the process is just running non user code now, so it just keeps going until non user code is hit.

By disabling the thread tracing flag, debugpy will stop listening and normal cells can be executed without debugpy breaking into them.

image

This is different than normal notebook debugging, where debugpy detaches from the kernel altogether.

What about remote?

As mentioned above, this isn't supported in remote. Why is that? Two reasons mainly:

  1. Debugpy is loaded from the python extension, we didn't want to sync it to the remote machine and load it into the remote kernel.
  2. Debugpy starts a debug server listening on a port. We didn't think users would want to open another port on their machine.
Clone this wiki locally