Skip to content
This repository has been archived by the owner on Dec 6, 2022. It is now read-only.

Fix race condition when killing Chrome on Windows when disconnecting #703

Merged
merged 2 commits into from
Jul 18, 2018

Conversation

mrcrane
Copy link
Contributor

@mrcrane mrcrane commented Jul 17, 2018

No description provided.

for (let i = 0 ; i < 10; i++) {
// Check to see if the process is still running
let tasklistOutput = execSync(`tasklist /FI "PID eq ${chromePID}"`).toString();
if (!tasklistOutput.includes(chromePID.toString())) {
Copy link
Contributor

@digeff digeff Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the chromePID is a substring of another process PID? Is it possible that the output of the command is not in the format that we expect?

Copy link
Contributor Author

@mrcrane mrcrane Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When tasklist finds the process it will return output that looks like
Image Name PID Session Name Session# Mem Usage
========== === ============ ======== =========
chrome.exe 16116 Console 1 273,300 K

When tasklist doesn't find the process it outputs
INFO: No tasks are running which match the specified criteria.

In both cases the exit code is 0

Because the tasklist is run with a PID filter it will only return one or zero processes. Any process that with a PID that is a superstring will be filtered out.

I implemented with a search of PID in the output because it seemed like the most robust way to differentiate between success and failure outputs. Searching for the presence of the PID is independent of the format.

Let me know if you have any better suggestions. I can add comments as documentation for what is going on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to use csv output which is slightly better, but still not perfect.


for (let i = 0 ; i < 10; i++) {
// Check to see if the process is still running
let tasklistOutput = execSync(`tasklist /FI "PID eq ${chromePID}"`).toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we log this execution too?

@@ -347,6 +332,37 @@ export class ChromeDebugAdapter extends CoreDebugAdapter {
this._chromeProc = null;
}

private async killChromeOnWindows(chromePID: number): Promise<void> {
Copy link
Contributor

@digeff digeff Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have enough telemetry to figure out if this method/logic is working as expected?

@rakatyal
Copy link
Contributor

Why do we have an updated package-lock.json for this change?

// The command will fail if process was not found. This can be safely ignored.
}

for (let i = 0 ; i < 10; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use for this for loop? How is this better than using a timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This follows the pattern in findNewlyLaunchedChromeProcess() at the bottom of the file. It will try up to 10 times and wait 200ms between tries. Total time will be 2000ms in addition to command execution time. It does use a timeout to wait.

I think it's readable but I am open to changing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine but you could also use the retryAsync helper in core's utils.ts if you want.

@rakatyal
Copy link
Contributor

Should we store some time performance telemetry for the disconnect request? PZ gives us 5 seconds to respond to a command and we should make sure we are not near that number after this change.

@mrcrane
Copy link
Contributor Author

mrcrane commented Jul 17, 2018

The package-lock.json changes happen when I run npm install. I assume that is because of a previous check-in. I can leave it out but it keeps coming back so I decided to add it to the PR.

@mrcrane
Copy link
Contributor Author

mrcrane commented Jul 18, 2018

I addressed the logging concerns.

I still need to add telemetry, but I think it would be best to tie into the existing ClientRequest telemetry. In this case: "ClientTelemetry/disconnect". This would already cover measuring the disconnect end-to-end times, and we could see if suddenly every disconnect time started spiking.

Some issues I'm having with that:

  • I actually can't find any "ClientRequest/disconnect" telemetry in Kusto even though I see all the other requests. Maybe something is preventing that from being reported?
  • I'd like to use the TelemetryPropertyCollector to augment the telemetry, but it's not passed through properly to the disconnect handler. This would require changes in -core to fix.

Thoughts? @roblourens

Copy link
Contributor

@digeff digeff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create a user story to finish adding telemetry to killChromeOnWindows

// If the process is found, tasklist will output CSV with one of the values being the PID. Exit code will be 0.
// If the process is not found, tasklist will give a generic "not found" message instead. Exit code will also be 0.
// If we see an entry in the CSV for the PID, then we can assume the process was found.
if (!tasklistOutput.includes(`"${chromePID}"`)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant template string :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's looking for the quotes around the PID as well as the PID itself...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops too early for code review

@roblourens
Copy link
Member

Was the bug here that Chrome takes a bit of time to shutdown, so we are force-killing it unnecessarily?

@roblourens
Copy link
Member

I actually can't find any "ClientRequest/disconnect" telemetry in Kusto even though I see all the other requests

Maybe we lose telemetry from right before the process shuts down, let me check that out...

@roblourens
Copy link
Member

roblourens commented Jul 18, 2018

Actually, I think that telemetry after disconnect will not be sent (looking at vscode's code at least). The event is sent to the client which then passes it on to the telemetry service, but I think the client will stop handling this telemetry after disconnect. Does VS do the same?

@mrcrane
Copy link
Contributor Author

mrcrane commented Jul 18, 2018

Correct, taskkill sends the request for Chrome to shut down but then in some cases the subsequent taskkill /F would happen before Chrome could shutdown gracefully.

@roblourens
Copy link
Member

@rakatyal and @digeff if you sign off I'll merge this

@digeff
Copy link
Contributor

digeff commented Jul 18, 2018

Ship it!

@rakatyal
Copy link
Contributor

Looks good!

@roblourens
Copy link
Member

Thanks, next time you could leave an "approved" review just because I'll wait if I think you guys are still looking at it. I think anybody can create reviews on a PR.

@roblourens roblourens merged commit 83decfc into microsoft:master Jul 18, 2018
@roblourens roblourens added this to the July 2018 milestone Aug 3, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants