Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ForEach-Object -Parallel situationally drops pipeline input #12801

Closed
mklement0 opened this issue May 26, 2020 · 17 comments · Fixed by #18138
Closed

ForEach-Object -Parallel situationally drops pipeline input #12801

mklement0 opened this issue May 26, 2020 · 17 comments · Fixed by #18138
Labels
Issue-Question ideally support can be provided via other mechanisms, but sometimes folks do open an issue to get a Resolution-Fixed The issue is fixed. WG-Cmdlets-Core cmdlets in the Microsoft.PowerShell.Core module

Comments

@mklement0
Copy link
Contributor

mklement0 commented May 26, 2020

Note: I can only reproduce this reliably on macOS 10.15.4.
I see it occasionally on Ubuntu 18.04, and can never reproduce it on Windows.

It may be connected to using Start-Sleep inside the script block; not sure if the fact that a custom class is used across thread boundaries is relevant [update: it is]

Steps to reproduce

class Foo { static [object] Echo($val) { return $val } }
$cls = [Foo]
while ($true) { 
  $res = 1..10 | ForEach-Object -Parallel {
    Start-Sleep -ms 100; ($using:cls)::echo($_) 
  }  |  Sort-Object
  Compare-Object $res (1..10) | Should -BeNullOrEmpty
  Write-Host -NoNewline . 
}

Expected behavior

The loop should run indefinitely, because the test should alway succeed.

Actual behavior

The loop eventually - after a varying number of iterations - breaks due to intermittent test failures, stemming from a missing output; e.g.:

Expected $null or empty, but got @(@{InputObject=5; SideIndicator==>})

That is, not all input objects were echoed.

Environment data

PowerShell Core 7.1.0-preview.3  on macOS 10.15.4
@mklement0 mklement0 added the Issue-Question ideally support can be provided via other mechanisms, but sometimes folks do open an issue to get a label May 26, 2020
@SeeminglyScience
Copy link
Collaborator

not sure if the fact that a custom class is used across thread boundaries is relevant.

Yeah. It tries to marshal back to the original pipeline execution thread, but since a pipeline is already running it just runs in the current thread while pretending it's the pipeline thread. As a result a whole bunch of state corruption and other weird side effects are possible.

@SeeminglyScience
Copy link
Collaborator

SeeminglyScience commented May 27, 2020

You can see it by doing this:

class Foo {
    static [object] Echo($val) {
        return [PSCustomObject]@{
            ThreadId = [Threading.Thread]::CurrentThread.ManagedThreadId
            RunspaceId = [runspace]::DefaultRunspace.Id
        }
    }
}

$cls = [Foo]
while ($true) { 
    1..10 | ForEach-Object -Parallel {
        Start-Sleep -ms 100
        ($using:cls)::echo($_) 
    }
}

You'll get a whole bunch of different thread ID's with the same runspace. Also occasionally some other evidence of state corruption like "Global scope cannot be removed" errors (note this is on Windows too, I'm guessing your Windows machine just has hardware that makes the race conditions less frequently hit).

@SeeminglyScience
Copy link
Collaborator

SeeminglyScience commented May 27, 2020

That's a long winded way of saying it's basically a dupe of #4003. Though a significantly more likely to happen in the wild repro.

@SeeminglyScience
Copy link
Collaborator

/cc @PaulHigin @daxian-dbw

A possible "solution" is to try to block PowerShell classes from being carried over. Sort of like how script blocks currently work.

If #4003 was fixed, this would just dead lock. Personally I'd prefer if runspace affinity was just removed from classes but I understand it's next to impossible to know the impact of a change like that. Something should probably be done though, many folks are going to turn to classes as a way to "get around" the script block problem. It might even appear to work for awhile.

@mklement0
Copy link
Contributor Author

Thanks for the detailed analysis, @SeeminglyScience.

If removing runspace affinity is too risky, perhaps another option is to simply recreate classes in the target runspace, similar to what is already being discussed for script blocks - #12378 - and in line with making the planned opt-in for transferring runspace state comprehensive - #12240.

@vexx32
Copy link
Collaborator

vexx32 commented May 27, 2020

I mean doing that would already be effectively removing runspace affinity anyway -- you'd just be binding their affinity to the new runspace. It would be largely indistinguishable from always having the code execute in whichever thread it's called and just looking for the default runspace for that thread. More or less about the same there, I'd guess.

@SeeminglyScience
Copy link
Collaborator

Just straight up recreating it is a little sketchy though because you'd lose state from static properties. Property accessors don't (currently) invoke any PowerShell so they aren't subject to the same problems necessarily. Not that you should use them for state sharing in all your threads, but more that it would be confusing if it "appeared" to be gone randomly. Also recreating the classes would create a lot of confusion around type identity.

@mklement0
Copy link
Contributor Author

While disallowing cross-thread class use is definitely the safest option, it also takes away potentially useful functionality.

Given the "less solid" nature of PowerShell classes my guess is that loss of type identity would be less of an issue - it's a case where documenting the issue may suffice (just as users will have to understand that script blocks that are recreated in a different thread no longer form closures around the original runspace's variables, if any); the .FullName properties will be identical, but -is and -as won't work (perhaps a little white lie could fix that)?

Good point about static properties, however - that sounds more problematic.
But couldn't we special-case this to copy any static property values after recreating (just like script blocks will be special-cased, as discussed in #12378 (comment))?

I'd prefer if runspace affinity was just removed from classes but I understand it's next to impossible to know the impact of a change like that.

Has anyone ever looked into this?
Isn't the current behavior never useful, or are there legitimate scenarios where existing code relies on it?

@iSazonov iSazonov added the WG-Cmdlets-Core cmdlets in the Microsoft.PowerShell.Core module label Jun 1, 2020
@SeeminglyScience
Copy link
Collaborator

SeeminglyScience commented Jun 5, 2020

NOTE: The examples below should never be purposefully relied on, they are incredibly dangerous and can corrupt state in unexpected ways and/or cause general instability.

Isn't the current behavior never useful, or are there legitimate scenarios where existing code relies on it?

It's probably accidently relied on. Apparently a class method converted to a delegate can run on any thread, even one without a runspace. I was having trouble coming up with a scenario where this would appear useful because I assumed you needed a thread with a default runspace other than the one it was originally bound to. Knowing this, it makes a lot of .NET interop appear to work as expected.

class Test {
    static [psobject] GetInfo() {
        return [PSCustomObject]@{
            ThreadId = [Threading.Thread]::CurrentThread.ManagedThreadId
            RunspaceId = [runspace]::DefaultRunspace.Id
        }
    }
}

[Threading.Tasks.Task[psobject]]::
    Run([Func[psobject]][Test]::GetInfo).
    GetAwaiter().
    GetResult()

[Test]::GetInfo()

Gives some variant of:

ThreadId RunspaceId
-------- ----------
      19          1
      15          1

Here's something that really surprised me, if you don't block until completion, it'll actually properly marshal it to the right thread:

class Test {
    static [psobject] GetInfo() {
        return [PSCustomObject]@{
            ThreadId = [Threading.Thread]::CurrentThread.ManagedThreadId
            RunspaceId = [runspace]::DefaultRunspace.Id
        }
    }
}

$task = [Threading.Tasks.Task[psobject]]::Run([Func[psobject]][Test]::GetInfo)
while (-not $task.AsyncWaitHandle.WaitOne(200)) { }
$task.GetAwaiter().GetResult()

[Test]::GetInfo()
ThreadId RunspaceId
-------- ----------
      15          1
      15          1

So the answer is yes, it probably accidently makes a lot of complicated .NET interop work where it would otherwise throw. Any API where you pass a delegate and that delegate gets called on another thread.

@SeeminglyScience
Copy link
Collaborator

SeeminglyScience commented Jun 5, 2020

Personally I think these behaviors would be the most ideal for method invocations.

If called on:

  1. The origin runspace thread - run with SessionState affinity like normal
  2. A thread with a default runspace but one where the class or instance was not defined - run in the current thread's runspace without SessionState affinity. Maybe with TopLevelSessionState affinity
  3. A thread with no default runspace - throw ScriptBlockDelegateInvokedFromWrongThread like ScriptBlock.Invoke currently does

Though 2 and 3 would be breaking changes, I think they're worth it considering how dangerous they can be.

cc @rjmholt @PaulHigin @daxian-dbw if any of y'all want to weigh in.

@vexx32
Copy link
Collaborator

vexx32 commented Jun 5, 2020

With #3, is it more desirable to throw there, or take the time to spin up a runspace to execute in anyway, so as to "make" it work without corrupting the state of another runspace?

@SeeminglyScience
Copy link
Collaborator

SeeminglyScience commented Jun 5, 2020

With #3, is it more desirable to throw there, or take the time to spin up a runspace to execute in anyway, so as to "make" it work without corrupting the state of another runspace?

I've thought about that a lot but there's a lot of questions

  1. What kind of state should it have? InitialSessionState.CreateDefault()?
  2. Is it better to do that transparently? There's a potential for a significant amount of extra overhead in a way that isn't super visible to the user.
  3. If you throw instead, you know exactly what happened. If you create a new runspace, it's next to impossible for even the above average user to know their delegate was invoked in a different thread.
  4. When would the runspace be disposed? Directly after invocation?

As much as the error has been frustrating for me in the past, it's also been pretty helpful in informing me that I basically just shouldn't use that API. It's a hard call. I'd like the option, but I'm also not super sure how you'd go about surfacing that option. Or if it'd be worth the development time.

@SeeminglyScience
Copy link
Collaborator

Another consequence of this is that if you call a second method from the first one, you get a dead lock.

class Test {
    static [void] FirstMethod() {
        [Console]::WriteLine('First method starting')
        [Test]::SecondMethod()
        [Console]::WriteLine('First method finishing')
    }

    static [void] SecondMethod() {
        [Console]::WriteLine('Second method starting')
        [Console]::WriteLine('Second method finishing')
    }
}

[System.Threading.Tasks.Task]::Run([action][Test]::FirstMethod).GetAwaiter().GetResult()

The first method isn't actually running on the pipeline thread, so the second method call also tries to use PowerShell eventing to retake the pipeline thread. The first method will never complete, because it's waiting on the second method, which is waiting on the first method.

I accidently ran into this trying to use the PSDataCollection<>.DataAdding event with PowerShell class methods as event handlers.

@chrisbues
Copy link

Well, just ran into this today with trying to reference a class instance created in the parent script.

@PaulHigin
Copy link
Contributor

PowerShell classes are not thread safe. This is true whether they are run in ForEach-Object -Parallel or any of the PowerShell threading APIs. It would be worth looking into whether this can be detected through a using reference.

@CanineBarbie
Copy link

@SeeminglyScience Thanks for taking the time to look at #17957 .

As someone who isn't a developer, but uses PS a lot, classes have been great. Their introduction helped me structure my work a lot better.

I have two questions. It does not seem that the documentation pages for "about_Classes" or "about_ThreadJob" mention anywhere that PS classes are not thread safe. I'm not sure if this does in fact appear anywhere in the PS documentation?

Secondly, is there any chance that this might change in future? Or will it be a case of PowerShell not really being the correct tool for the job, and people who might need to use classes with threading moving away from PS, to complied C# code or A.N.Other solution?

Cheers again.

@daxian-dbw
Copy link
Member

I'd prefer if runspace affinity was just removed from classes but I understand it's next to impossible to know the impact of a change like that.

Has anyone ever looked into this?
Isn't the current behavior never useful, or are there legitimate scenarios where existing code relies on it?

More precisely, it's affinity to session state. A runspace could have multiple session states.
The current behavior is useful for classes or class instances exposed from modules. The implementation of static/instance methods of a powershell class may depend on states within the module scope, such as functions and variables. The affinity to session state makes sure those states is accessible when calling the static/instance methods of the powershell class.

I think a cleaner and safer solution is to introduce an attribute to explicitly opt out of session state affinity for a powershell class. And for such a class definition, its instances object and static method calls simply ignores the session state affinity, which means they will run in the current default runspace, with the current session state in that runspace.

[OptOutSessionStateAffinity()]
class Foo
{
    static [object] Echo($val) { return $val } }
}

Of course, once declaring the attribute to the class, the class implementation should not depend on any states of a particular session state, because there is no guarantee the dependencies would be available at runtime.

I will work on this idea a bit more to see if there are any potential blockers.


At the meanwhile, there is a way today to create a powershell class instance without any session state affinity (no such workaround for static members though). See #3651 (comment) for the details.

Note that, we officially support this "workaround": #3871

@ghost ghost added the In-PR Indicates that a PR is out for the issue label Sep 20, 2022
@ghost ghost added Resolution-Fixed The issue is fixed. and removed In-PR Indicates that a PR is out for the issue labels Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Question ideally support can be provided via other mechanisms, but sometimes folks do open an issue to get a Resolution-Fixed The issue is fixed. WG-Cmdlets-Core cmdlets in the Microsoft.PowerShell.Core module
Projects
None yet
8 participants