Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpClient Response Stream hangs #74568

Closed
iSazonov opened this issue Aug 25, 2022 · 46 comments
Closed

HttpClient Response Stream hangs #74568

iSazonov opened this issue Aug 25, 2022 · 46 comments
Assignees
Milestone

Comments

@iSazonov
Copy link
Contributor

There have already been two similar requests #36822 and #11594.
In #36822 there is even an excellent repro #36822 (comment).

I discovered the issue in PowerShell:

  1. Start download large file
Invoke-WebRequest https://github.com/PowerShell/PowerShell/releases/download/v7.3.0-preview.7/PowerShell-7.3.0-preview.7-win-x64.zip -OutFile C:\temp\PowerShell-7.3.0-preview.7-win-x64.zip
  1. Emulate network disaster by means of two options:
    2.1 Close Wi-Fi connection on notebook
    2.2 Disconnect Wi-Fi router from Internet
  2. PowerShell cmdlet hangs on CopyToAsync()
  • for option 2.1 until Wi-Fi recovered (if network was locked over ~1 min download is interrupted)
  • for option 2.2 infinitly even after the Internet connection is restored

It is not a problem for interactive scenario since user can press Ctrl-C but it is an unpleasant problem for script and (PowerShell SDK custom) host scenarios.
Since it does not leave a trace in the logs and is eliminated by rebooting, the technical support concludes that it is a hardware problem, which is a false path.
PowerShell user also does not get a message that the file is not fully downloaded.

Note that it is the web client that we are considering. Why is it important to have a timeout?

The infinite timeout of sockets is a fundamental constant and this will never be changed. This is useful for low-level connections which can re-establish the connection without any extra effort.
But how do web services behave? In fact they close the connection if the client doesn't send requests for some time so as not to waste resources. This timeout is quite short.
But if any web server closes a session by a short timeout, then it doesn't make sense for any web client to have an infinite timeout on socket level - reconnection on the socket level doesn't work. On the contrary, the client must receive a timeout signal to either end the session on his side or try to resume it at a higher level.
This makes us think that any web client should have a default timeout greater than the typical timeout of any web server. If there is any doubt that there should be a default timeout, then at least the developer should be able to set it.


In which APIs is it better to add this timeout? I don't have the exact answer.

Obviously the timeout must be in SocketsHttpHandler.

But this is not enough since this class is sealed. Thus PowerShell historically uses HttpClientHandler which is a SocketsHttpHandler wrapper. We could migrate PowerShell to SocketsHttpHandler but HttpClientHandler uses some internal helper classes to map to SocketsHttpHandler. So perhaps it makes sense to add ReadWriteSocketTimeout to HttpClientHandler too (or make the helper classes public).

Note that the WebRequest API supports this timeout (although it doesn't seem to be public either).

// Set up a ConnectCallback so that we can control Socket-specific settings, like ReadWriteTimeout => socket.Send/ReceiveTimeout.
handler.ConnectCallback = async (context, cancellationToken) =>
{
var socket = new Socket(SocketType.Stream, ProtocolType.Tcp);
try
{
socket.NoDelay = true;
if (parameters.Async)
{
await socket.ConnectAsync(context.DnsEndPoint, cancellationToken).ConfigureAwait(false);
}
else
{
using (cancellationToken.UnsafeRegister(s => ((Socket)s!).Dispose(), socket))
{
socket.Connect(context.DnsEndPoint);
}
// Throw in case cancellation caused the socket to be disposed after the Connect completed
cancellationToken.ThrowIfCancellationRequested();
}
if (parameters.ReadWriteTimeout > 0) // default is 5 minutes, so this is generally going to be true
{
socket.SendTimeout = socket.ReceiveTimeout = parameters.ReadWriteTimeout;
}
}
catch
{
socket.Dispose();
throw;
}
return new NetworkStream(socket, ownsSocket: true);
};

This uses a callback, which is not convenient. In addition, it duplicates the standard code from the SocketsHttpHandler, which can be changed in the future but will not be automatically inherited by the application.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Aug 25, 2022
@ghost
Copy link

ghost commented Aug 25, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

There have already been two similar requests #36822 and #11594.
In #36822 there is even an excellent repro #36822 (comment).

I discovered the issue in PowerShell:

  1. Start download large file
Invoke-WebRequest https://github.com/PowerShell/PowerShell/releases/download/v7.3.0-preview.7/PowerShell-7.3.0-preview.7-win-x64.zip -OutFile C:\temp\PowerShell-7.3.0-preview.7-win-x64.zip
  1. Emulate network disaster by means of two options:
    2.1 Close Wi-Fi connection on notebook
    2.2 Disconnect Wi-Fi router from Internet
  2. PowerShell cmdlet hangs on CopyToAsync()
  • for option 2.1 until Wi-Fi recovered (if network was locked over ~1 min download is interrupted)
  • for option 2.2 infinitly even after the Internet connection is restored

It is not a problem for interactive scenario since user can press Ctrl-C but it is an unpleasant problem for script and (PowerShell SDK custom) host scenarios.
Since it does not leave a trace in the logs and is eliminated by rebooting, the technical support concludes that it is a hardware problem, which is a false path.
PowerShell user also does not get a message that the file is not fully downloaded.

Note that it is the web client that we are considering. Why is it important to have a timeout?

The infinite timeout of sockets is a fundamental constant and this will never be changed. This is useful for low-level connections which can re-establish the connection without any extra effort.
But how do web services behave? In fact they close the connection if the client doesn't send requests for some time so as not to waste resources. This timeout is quite short.
But if any web server closes a session by a short timeout, then it doesn't make sense for any web client to have an infinite timeout on socket level - reconnection on the socket level doesn't work. On the contrary, the client must receive a timeout signal to either end the session on his side or try to resume it at a higher level.
This makes us think that any web client should have a default timeout greater than the typical timeout of any web server. If there is any doubt that there should be a default timeout, then at least the developer should be able to set it.


In which APIs is it better to add this timeout? I don't have the exact answer.

Obviously the timeout must be in SocketsHttpHandler.

But this is not enough since this class is sealed. Thus PowerShell historically uses HttpClientHandler which is a SocketsHttpHandler wrapper. We could migrate PowerShell to SocketsHttpHandler but HttpClientHandler uses some internal helper classes to map to SocketsHttpHandler. So perhaps it makes sense to add ReadWriteSocketTimeout to HttpClientHandler too (or make the helper classes public).

Note that the WebRequest API supports this timeout (although it doesn't seem to be public either).

// Set up a ConnectCallback so that we can control Socket-specific settings, like ReadWriteTimeout => socket.Send/ReceiveTimeout.
handler.ConnectCallback = async (context, cancellationToken) =>
{
var socket = new Socket(SocketType.Stream, ProtocolType.Tcp);
try
{
socket.NoDelay = true;
if (parameters.Async)
{
await socket.ConnectAsync(context.DnsEndPoint, cancellationToken).ConfigureAwait(false);
}
else
{
using (cancellationToken.UnsafeRegister(s => ((Socket)s!).Dispose(), socket))
{
socket.Connect(context.DnsEndPoint);
}
// Throw in case cancellation caused the socket to be disposed after the Connect completed
cancellationToken.ThrowIfCancellationRequested();
}
if (parameters.ReadWriteTimeout > 0) // default is 5 minutes, so this is generally going to be true
{
socket.SendTimeout = socket.ReceiveTimeout = parameters.ReadWriteTimeout;
}
}
catch
{
socket.Dispose();
throw;
}
return new NetworkStream(socket, ownsSocket: true);
};

This uses a callback, which is not convenient. In addition, it duplicates the standard code from the SocketsHttpHandler, which can be changed in the future but will not be automatically inherited by the application.

Author: iSazonov
Assignees: -
Labels:

area-System.Net.Http

Milestone: -

@karelz
Copy link
Member

karelz commented Aug 25, 2022

Triage: There may be opportunity to provide more user-friendly timeouts for ResponseStream read/write. Currently it has to go via ConnectCallback or write your own streams, which is rather ugly.
We should figure out what's the best option.

Note: There is difference between overall timeout and idle timeout on the read/write.

@iSazonov
Copy link
Contributor Author

SocketsHttpHandler return NetworkStream instance

stream = new NetworkStream(socket, ownsSocket: true);

NetworkStream docs says about ReadTimeout(WriteTimeout):

This property affects only synchronous reads performed by calling the Read method. This property does not affect asynchronous reads performed by calling the BeginRead method.

I think it is true for ReadAsync/WriteAsync too.

HttpClient doesn't use NetworkStream directly but by means of some stream wrappers. So if NetworkStream doesn't throw on socket timeout these wrappers don't too.
Then NetworkStream is a wrapper for Socket class.

So I assume Socket would be the right place to throw on socket timeout.

Since it is all about async and streams I would welcome comments from @stephentoub and @adamsitnik experts.

@stephentoub
Copy link
Member

stephentoub commented Aug 26, 2022

I think it is true for ReadAsync/WriteAsync too.

No, it's not.

@iSazonov
Copy link
Contributor Author

I think it is true for ReadAsync/WriteAsync too.

No, it's not.

Oh, then I don't understand why NetworkStream/Socket is not throw on timeout. Is this expected or bug?

@stephentoub
Copy link
Member

stephentoub commented Aug 26, 2022

then I don't understand why NetworkStream/Socket is not throw on timeout. Is this expected or bug?

I don't understand the question. ReadTimeout/WriteTimeout does not apply to asynchronous operations so the async operations don't timeout in response to them. Read/WriteAsync are cancelable via the token passed in.

The only time it would apply to the async operations on a Stream is if the Stream doesn't actually implement the async methods, and NetworkStream does.

@iSazonov
Copy link
Contributor Author

iSazonov commented Aug 26, 2022

I don't understand the question.

If I understand correctly socket pinvoke read operation return a timeout error code in the Issue scenario
I guess for Windows it is WSAETIMEDOUT
So if it is not Success I'd expect an exception but we infinitely wait the task completes.

@stephentoub
Copy link
Member

For sync operations, not async operations. You said the timeout applies to ReadAsync/WriteAsync as well, and I said it doesnt. It doesn't apply at the .NET level nor on Windows level with overlapped I/O.

@stephentoub
Copy link
Member

Note that PowerShell can implement timeouts via cancellation per read/write operation just as well as SocketsHttpHandler could internally.

@iSazonov
Copy link
Contributor Author

For sync operations, not async operations. You said the timeout applies to ReadAsync/WriteAsync as well, and I said it doesnt. It doesn't apply at the .NET level nor on Windows level with overlapped I/O.

In PowerShell I migrate to SocketHttpHandler and implement callback as in HttpWebRequest.cs metioned above. Only I set ReceiveTimeout and SendTimeout for socket in async branch. I see in debbuger the socket is closed if network fails (I hope it is not debugger side effect) but CopyToAsync() is still blocked on copyTask.Wait().
The PowerShell code (without my additions for sockets)
https://github.com/PowerShell/PowerShell/blob/0c63a549528ceb8eef68cef3a441a63401a7f291/src/Microsoft.PowerShell.Commands.Utility/commands/utility/WebCmdlet/StreamHelper.cs#L286-L309

Note that PowerShell can implement timeouts via cancellation per read/write operation just as well as SocketsHttpHandler could internally.

I see no other way than to check for changes in the length of the target stream in a separate task. But it is a workaround, not a solution to this problem, which looks dangerous.

@stephentoub
Copy link
Member

stephentoub commented Aug 26, 2022

but CopyToAsync() is still blocked on copyTask.Wait().

And in the debugger you see that the cancellation token passed to both CopyToAsync and Wait has had cancellation requested? What's the type of the Stream? What version of. NET?

@iSazonov
Copy link
Contributor Author

The cancellation token works well if PowerShell user press Ctrl-C.
In experiment I do the cancellation token is not involved.

It is .Net 7.0 Preview.3
It is ContentLegthReadStream -> SslStream -> NetworkStream

Repro steps (there was an error in my previous post, sorry):

  1. Start PowerShell cmdlet
  2. PowerShell show progress bar is updated.
  3. Disconnect network
  4. Progress bar looks like freeze since stream.Position is not changed.
  5. Debugger show that the while cycle is working infinitely because the copyTask is never moved to completion status.
    If I expand input stream properties I see socket in Closed status.

Since the socket is in Closed status I'd expect the task should complete and since it is unexpected socket close I'd expect an exception.

@stephentoub
Copy link
Member

In experiment I do the cancellation token is not involved.

It should be. If PowerShell wants to time out the operation, it has the ability to do so via the token.

@iSazonov
Copy link
Contributor Author

iSazonov commented Aug 26, 2022

I don't understand what timeout you mean. The scenario is about network disaster. If network is disconnected read operation should return an I/O error on the socket unexpected closing and the task should throw an exception and then Wait() return AggregateException.

@stephentoub
Copy link
Member

You were saying you wanted to be able to set a read/write timeout. I'm telling you how.

If your concern is the socket connection failing at the OS level or below, it's up to the OS to tell us that. It will eventually, but it can take many seconds or even minutes depending on the OS and its configuration. There's nothing short of cancellation/timeout to be done about that at the .NET level.

@iSazonov
Copy link
Contributor Author

I may not have been accurate, sorry, but it's not about read/write timeout operations, it's about timeout on the socket level itself and this timeout works - OS close the socket after network disaster and .Net see this. A problem is that NetworkStream continue think that the socket lives and copy task is never completed.

@stephentoub
Copy link
Member

A problem is that NetworkStream continue think that the socket lives and copy task is never completed

That is an OS issue. If the OS isn't reporting that the connection has aborted, there's nothing NetworkStream/Socket can do about that other than user-initiated timeout/cancellation.

@iSazonov
Copy link
Contributor Author

But the socket is already in Closed status. If a socket goes into this state, it means that the task knows about it and should terminate.

@stephentoub
Copy link
Member

But the socket is already in Closed status

What member specifically are you looking at that says "Closed"?

@iSazonov
Copy link
Contributor Author

iSazonov commented Aug 26, 2022

Connected (on srceenshot network is in up state, if network is in down state this has False value after ~1min)
image

image

@iSazonov
Copy link
Contributor Author

I think that if the OS has closed the socket, no doubt ReadAsync task should end - it has nothing to do over a closed socket.
Since OS can not know the reason for closing the socket, an exception cannot be generated unless the connection is closed by timeout.
I think in any case it is ContentLegthReadStream to know if full content is read and generate an exception if not and I guess it does this.

@iSazonov
Copy link
Contributor Author

Maybe problem in the method https://source.dot.net/#System.Net.Http/System/Net/Http/SocketsHttpHandler/HttpConnection.cs,1719
Comment says "Throws IOException on EOF. This is only called when we expect more data." but it is not throw if network is disconnected.

@iSazonov
Copy link
Contributor Author

iSazonov commented Aug 26, 2022

May issue is in SslStream...

Update: tried HTTP download - the same issue.

@stephentoub
Copy link
Member

stephentoub commented Aug 27, 2022

Connected (on srceenshot network is in up state, if network is in down state this has False value after ~1min)

I expect this is a case of Schrodinger's cat. My guess is you're only seeing that as Connected=false because you queried the properties of the Socket in the debugger. Also, note that this Connected property doesn't actually mean the connection has been dropped. It's set to false if many properties that would be evaluated by the debugger throw an exception.

I think that if the OS has closed the socket, no doubt ReadAsync task should end

It does. If you believe otherwise, please share an isolated repro, without PowerShell involved, that demonstrates this not happening once the OS deems the connection has been lost.

but it is not throw if network is disconnected.

Yes, it does. If the socket is closed, the OS will alert the Socket to that fact and the Read{Async} will end.

@iSazonov
Copy link
Contributor Author

If remove using CTS in the repro it outputs infinitly:

Code
  • static async Task Main(string[] args) { var httpClient = new HttpClient();
            var request = new HttpRequestMessage(HttpMethod.Get, "https://speed.hetzner.de/10GB.bin");
            var response = await httpClient.SendAsync(request, HttpCompletionOption.ResponseHeadersRead);
            Console.WriteLine($"Content-Length: {response.Content.Headers.ContentLength.Value}");
    
    
            var responseStream = await response.Content.ReadAsStreamAsync();
            var memoryStream = new MemoryStream();
    
            var t = responseStream.GetType();
            var _connection = t.GetField("_connection", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic)?.GetValue(responseStream)!;
            var t_connection = _connection.GetType();
            var _stream = (System.Net.Security.SslStream)t_connection.GetField("_stream", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic)?.GetValue(_connection)!;
            var t_stream = _stream.GetType();
            var innerStream = t_stream.GetProperty("InnerStream", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic)?.GetValue(_stream)!;
            var t_innerStream = innerStream.GetType();
            var innerStream_connection = t_innerStream.GetField("_connection", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic)?.GetValue(innerStream)!;
            var t_innerStream_connection = innerStream_connection.GetType();
            var network_stream = (System.Net.Sockets.NetworkStream)t_innerStream_connection.GetField("_stream", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic)?.GetValue(innerStream_connection)!;
            var s = network_stream.Socket;
            s.ReceiveTimeout = 10; // This doesn't work
            s.SendTimeout = 10; // This doesn't work
    
            long previousLength = 0;
            var cts = new CancellationTokenSource();
            cts.CancelAfter(2000);
    
            _ = Task.Run(async () =>
            {
                while (true)
                {
                    Console.WriteLine($"[{DateTimeOffset.Now.ToString(CultureInfo.InvariantCulture)}] Memory stream length: {memoryStream.Length} Socket: Blocking = {s.Blocking} Connected = {s.Connected} SafeHandle.IsClosed = {s.SafeHandle.IsClosed} SafeHandle.IsInvalid = {s.SafeHandle.IsInvalid}");
                    if (previousLength != memoryStream.Length)
                    {
                        previousLength = memoryStream.Length;
                        cts.CancelAfter(2000);
                    }
    
                    await Task.Delay(1000);
                }
            });
    
            await responseStream.CopyToAsync(memoryStream, cts.Token);
        }
    
[08/29/2022 15:27:44 +05:00] Memory stream length: 17579708 Socket: Blocking = True Connected = True SafeHandle.IsClosed = False SafeHandle.IsInvalid = False
[08/29/2022 15:27:45 +05:00] Memory stream length: 20741820 Socket: Blocking = True Connected = True SafeHandle.IsClosed = False SafeHandle.IsInvalid = False
[08/29/2022 15:27:46 +05:00] Memory stream length: 24035004 Socket: Blocking = True Connected = True SafeHandle.IsClosed = False SafeHandle.IsInvalid = False
[08/29/2022 15:27:47 +05:00] Memory stream length: 27754172 Socket: Blocking = True Connected = True SafeHandle.IsClosed = False SafeHandle.IsInvalid = False
[08/29/2022 15:27:48 +05:00] Memory stream length: 31506108 Socket: Blocking = True Connected = True SafeHandle.IsClosed = False SafeHandle.IsInvalid = False
[08/29/2022 15:27:49 +05:00] Memory stream length: 33357500 Socket: Blocking = True Connected = True SafeHandle.IsClosed = False SafeHandle.IsInvalid = False
[08/29/2022 15:27:50 +05:00] Memory stream length: 33357500 Socket: Blocking = True Connected = True SafeHandle.IsClosed = False SafeHandle.IsInvalid = False
[08/29/2022 15:27:51 +05:00] Memory stream length: 33357500 Socket: Blocking = True Connected = True SafeHandle.IsClosed = False SafeHandle.IsInvalid = False

So the socket is never closed. The fact that a read operation never terminates if there is no network connectivity does not make sense. So the original request to add a timeout for each read operation is still valid.

There is a related discussion #916 (comment)
It doesn't give me any understanding why it's impossible to use the current CTS in ReadAsync and create a linked CTS with a timer based on it, like it is done elsewhere.
I hope there is a way to pass the Timeout from HttpClient to NetworkStream and Socket.

@stephentoub
Copy link
Member

stephentoub commented Aug 29, 2022

So the original request to add a timeout for each read operation is still valid.

As I've already stated, PowerShell itself can already do this via a CancellatioToken. Are you asking for this again because of convenience or because there's a technical blocker?

why it's impossible to use the current CTS in ReadAsync and create a linked CTS with a timer based on it

It's not impossible. It's expensive (relatively). And it's exactly what PowerShell itself can do if it wants a timeout.

@iSazonov
Copy link
Contributor Author

This is neither a convenience nor a technical interlock. It is something that is misleading and previous discussions show it. If HttpClient has a Timeout property and it doesn't work as users expect, it's really costly for them because first they have to take the time to understand the problem and then still come up with a workaround. Same thing happens with PowerShell web cmdlets - users keep asking why Timeout doesn't work sometimes.

It's not impossible. It's expensive (relatively). And it's exactly what PowerShell itself can do if it wants a timeout.

Impossible or still expensive? This was already implemented for connect and send. Why not for the read? If we look on ContentLengthReadStream -> SslStream -> NetworkStream -> Socket - all the classes have timeout properties except ContentLegthReadStream. The properties are used for sync operations and I hope they could be reused for async too.

And it's exactly what PowerShell itself can do if it wants a timeout.

The CTS is for whole ReadAsync method but the request is to have timeout for every internal io operation. Otherwise it creates a clumsy workaround as shown in the code example above.

@stephentoub
Copy link
Member

stephentoub commented Aug 30, 2022

If HttpClient has a Timeout property and it doesn't work as users expect

This is the first time I've seen you mentioning HttpClient.Timeout. How is it not working as you expect? That does create a CancellationToken that is passed to the underlying stream and socket operations. It applies to the duration of the HttpClient.SendAsync operation, so if you pass HttpCompletionOptions.ResponseContentRead, it'll apply to the entire request and response, including all of the response body. If you pass ResponseHeadersRead, it'll apply up through the reading the response headers; at that point, you have the response Stream, and you can pass whatever you want as a CancellationToken to the individual Stream.CopyToAsync or Stream.ReadAsync methods you make on that stream. I've seen zero evidence of HttpClient.Timeout not behaving in this manner.

This was already implemented for connect and send. Why not for the read?

Connections are typically relatively rare. You create a connection once and then use it over and over and over and over. So the overhead associated with a connect operation isn't nearly as impactful. I assume for "send" you mean the HttpClient.SendAsync; similarly, that's for the entirety of the request/response as discussed above. Per read is much, much finer grained, with overheads adding up significantly more.

all the classes have timeout properties except ContentLegthReadStream

As has already been discussed, the timeouts on SslStream, NetworkStream, and Socket are for synchronous operations only. They have no impact on asynchronous operations, which is what you're asking for. And HttpClient is designed for asynchronous operations; sync support was only added recently, in limited capacity, and for very specific limited scenarios.

The CTS is for whole ReadAsync method but the request is to have timeout for every internal io operation. Otherwise it creates a clumsy workaround as shown in the code example above.

I don't know what distinction you're making between Stream.ReadAsync on the response stream and "every internal io operation". One ReadAsync on the response stream is one IO operation. Which is why I've repeatedly said that you can implement this on top. Here, I'll do it for you:

static class StreamExtensions
{
    public static async ValueTask<int> ReadAsync(this Stream stream, Memory<byte> buffer, int millisecondsTimeout)
    {
        using var cts = new CancellationTokenSource(millisecondsTimeout);
        return await stream.ReadAsync(buffer, cts.Token).ConfigureAwait(false);
    }
}

With that as you're reading the response stream, you can have whatever timeout you want for each individual read. Or if you wanted a CopyToAsync operation that applied the same timeout to each read it issues while being efficient about reusing the CancellationTokenSource:

public static async Task CopyToAsync(this Stream source, Stream destination, int perReadMillisecondsTimeout)
{
    byte[] buffer = ArrayPool<byte>.Shared.Rent(4096);
    var cts = new CancellationTokenSource();
        
    while (true)
    {
        if (!cts.TryReset())
        {
            cts.Dispose();
            cts = new CancellationTokenSource();
        }

        cts.CancelAfter(perReadMillisecondsTimeout);
        int bytesRead = await source.ReadAsync(buffer, cts.Token).ConfigureAwait(false);
        if (bytesRead == 0)
        {
            break;
        }

        await destination.WriteAsync(buffer.AsMemory(0, bytesRead)).ConfigureAwait(false);
    }

    cts.Dispose();
    ArrayPool<byte>.Shared.Return(buffer);
}

PowerShell can use code like that above, tweaked to your heart's content. This is as good as SocketsHttpHandler itself could do it (there are possible further tweaks to minimize overheads for this or that case, but fundamentally SocketsHttpHandler doesn't have any magic sauce and is just doing what PowerShell itself can do on top).

@MihaZupan
Copy link
Member

For reference, YARP has logic similar to what Stephen described above: StreamCopier.CopyAsync.
It also includes instrumentation for Telemetry - you could choose to do something similar if your aim is to track the progress as well (even in cases where the destination stream's Length may not be supported).

@iSazonov
Copy link
Contributor Author

iSazonov commented Aug 31, 2022

Thanks for workarounds. I can not use them because this is ContentLengthReadStream - internal helper class in HttpConnection. ContentLengthReadStream uses CopyToContentLengthAsync from HttpConnection and it is not trivial.

This is as good as SocketsHttpHandler itself could do it (there are possible further tweaks to minimize overheads for this or that case, but fundamentally SocketsHttpHandler doesn't have any magic sauce and is just doing what PowerShell itself can do on top).
if you wanted a CopyToAsync operation that applied the same timeout to each read it issues while being efficient about reusing the CancellationTokenSource:

Yes, this is I want. And HttpClient really does such magic for some whole operations - it convert Timeout property to linked CTS with the timeout. I hope there is a way in HttpClient to specify CTS to each read in CopyToAsync.

@stephentoub
Copy link
Member

. I can not use them because this is ContentLengthReadStream - internal helper class in HttpConnection. ContentLengthReadStream uses CopyToContentLengthAsync from HttpConnection and it is not trivial.

That's an implementation detail that doesn't affect your ability to use the examples posted. You do not need to copy it nor can or should you. That same Stream's ReadAsync encompasses all of the logic to make it work as well.

@iSazonov
Copy link
Contributor Author

:-) From the point of view of an expert like you, it looks simple. I would have to throw out some of the standard code and think about what I could break. But then what is the purpose of .Net Runtime, and in particular HttpClient, if not to allow a wider range of developers to create simple and reliable code without going into the details of the implementation, which is created by more qualified experts?
From my perspective as a consumer, if HttpClient.Timeout works for all but one operation, it's a bug. It's like if the button on my floor in the lift didn't work.
There are of course many classes and dependencies, but the path from HttpClient.Timeout to ContentLengthReadStream does not look hopeless HttpClient -> SocketsHttpHandler -> HttpConnectionSettings - > HttpConnection -> ContentLengthReadStream

@stephentoub
Copy link
Member

From my perspective as a consumer, if HttpClient.Timeout works for all but one operation, it's a bug

Can you help me understand what about the docs for HttpClient.Timeout suggests it's a per-read/write timeout?

@iSazonov
Copy link
Contributor Author

iSazonov commented Sep 1, 2022

Gets or sets the timespan to wait before the request times out.

The same timeout will apply for all requests using this HttpClient instance.

You can certainly argue about what a request is, but in my understanding it is any request to the server, and since it comes from HttpClient, I expect that all its settings apply to all derived actions.
In other words, if the stream was created in isolation as a new AnyWebStream(url), then the expectation is that it works the way AnyWebStream itself is implemented. But if some stream is created inside an HttpClient, the expectation is that Timeout acts on all its operations, including this internal stream too, not just on send or post.

Also the mention in the documentation of DNS inaccessibility looks strange in the sense that it's some third-party service while the main service is inaccessible. Common sense says that the main service is what the HttpClient consumer thinks about and expects Timeout to apply to the unavailability of the main service in any HttpClient operation, including retrieving Content.

@stephentoub
Copy link
Member

Thanks. That didn't really answer my question, though. I'm confused about what exactly we're discussing here...

In some parts of the conversation, it seems like your concern is that the OS is notifying the request that the socket has been closed but the operation isn't respecting that. I believe that's been shown to not be the case; in these cases, as far as what the OS is reporting, the socket is still alive and well and we have no reason to terminate the read.

In some parts of the conversation, it seems like your concern is lack of an ability to specify a timeout per Socket.ReadAsync/WriteAsync that's performed as part of the HTTP request/response (e.g. each read gets 5 seconds to complete regardless of how many previous reads have been issued and how long each of those has taken). There is no built-in mechanism for that, but as has been exemplified numerous times, it's straightforward to layer on top of the CancellationToken at various layers of granularity.

In some parts of the conversation, it seems like your concern is that HttpClient.Timeout only applies to the operations performed on the HttpClient and not operations on a Stream it creates. That is correct. The timeout applies until the moment that the Task the HttpClient gives you back has completed. After that, you control any cancellation/timeout via the CancellationToken you pass to whatever operations you subsequently call.

In some parts of the conversation, it seems like the concern is lack of detailed documentation about HttpClient.Timeout. That's fair, and the docs can be improved.

In some parts of the conversation, it seems like the concern is that PowerShell isn't behaving well, and you're placing blame for that on HttpClient. I believe we've shown that this is on PowerShell to fix by correctly passing in a CancellationToken with whatever timeout semantics are desired.

@iSazonov
Copy link
Contributor Author

iSazonov commented Sep 2, 2022

I'm confused about what exactly we're discussing here...

This is a search for an answer to why HttpClient.Timeout does not work in the particular scenario as expected.
I insist that it is the expected behavior of users that this timeout should work. This certainty comes from previous discussions, it has been reinforced during this discussion. Furthermore, yesterday I posted a PR with a workaround in the PowerShell repository, and the first comment from the MSFT PowerShell team member was whether this is a bug in .Net, indicating that even the much experienced developers perceive HttpClient.Timeout just as much as less experienced ones like me.

It's not just user expectations that this timeout should work, but also the implementation itself. These streams like ContentLengthReadStream are internal HttpClient classes that are spawned only in HttpClient and nowhere else. Such streams must have the same timeout. This does not contradict the basic Stream class concept, which initially supports such timeouts (abstract CanTimeout property).

As for the documentation, it simply says that this timeout exists, which supports user expectations that this timeout works for all operations without exceptions. Of course the documentation can be updated that this doesn't work, but it will still raise questions and be perplexing as to why this useful API requires workarounds when it could have been implemented internally.

@stephentoub
Copy link
Member

We're talking past each other. Maybe someone else will be able to assist better.

@iSazonov
Copy link
Contributor Author

iSazonov commented Sep 2, 2022

@stephentoub Many thanks for your help! I get better understanding how this work internally - great experience for me.

@karelz
Copy link
Member

karelz commented Sep 6, 2022

Triage:

  • We agree that Stream.ReadTimeout and Stream.WriteTimeout should NOT apply to async operations.
  • We agree HttpClient.Timeout should NOT extend to stream read/write operations. It would break WebSockets, YARP and other customers.
  • We agree that handling these timeouts only via cancellation is less than ideal and perhaps we should have something more user friendly exposed. E.g. something like HttpClient.ReadWriteTimeout or HttpResponseStream.ReadWriteTimeout (though we might not want to do that one) ... which may very well lead back to 1st case above (Stream.ReadTimeout or Stream.WriteTimeout).

@iSazonov is there anything else that you think needs attention / response from triage?

Given that there is way too long discussion here in many angles, we should fork the 3rd case above into separate issue and discuss solution there. Then close this issue. All assuming we didn't miss something else in the discussion.

@karelz karelz self-assigned this Sep 6, 2022
@karelz karelz added the needs-author-action An issue or pull request that requires more info or actions from the author. label Sep 6, 2022
@ghost
Copy link

ghost commented Sep 6, 2022

This issue has been marked needs-author-action and may be missing some important information.

@iSazonov
Copy link
Contributor Author

iSazonov commented Sep 7, 2022

@iSazonov is there anything else that you think needs attention / response from triage?

No, 3rd case cover all. Who will open new issue? I feel we need to discuss timeouts for CopyToAsync() in common since it has a loop with base operations (ReadAsync and WriteAsync) which we can not control with CopyToAsync() interface as needed. Then the solution for HttpClient will be clear too.

@ghost ghost added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Sep 7, 2022
@karelz
Copy link
Member

karelz commented Sep 7, 2022

@iSazonov why do you think we need to discuss Stream.CopyToAsync specifically?
I assume that once ReadAsync and WriteAsync are solved, that one should just built on them and therefore be addressed as well. Am I missing something?

I will file the issue.

@iSazonov
Copy link
Contributor Author

iSazonov commented Sep 7, 2022

@karelz We can use CTS with timeout for ReadAsync and WriteAsync in any scenario without problem - the timeout will terminate these methods. But if we use CST with timeout for CopyToAsync we can not control every ReadAsync and WriteAsync in the loop - the timeout will will only apply to the whole cycle.
(Of course we could add an explicit timeout to these methods too.)

@karelz
Copy link
Member

karelz commented Sep 7, 2022

Created #75188. Closing.

@karelz karelz closed this as completed Sep 7, 2022
@karelz karelz added this to the 8.0.0 milestone Sep 7, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Sep 7, 2022
@MihaZupan
Copy link
Member

ith timeout for CopyToAsync we can not control every ReadAsync and WriteAsync in the loop

Note that even if such a helper existed, it isn't what PowerShell should be using anyway if you are also tracking progress.
You are currently relying on destination.Length, but not all Streams support querying the Length (as indicated by CanSeek).

@iSazonov
Copy link
Contributor Author

iSazonov commented Sep 7, 2022

Note that even if such a helper existed, it isn't what PowerShell should be using anyway if you are also tracking progress.

This could help us to avoid the bug with hang we caught and make the code more reliable. (Progress bar is implemented in PowerSHell specific way in another thread.)

You are currently relying on destination.Length, but not all Streams support querying the Length (as indicated by CanSeek).

The destination is always a file in PowerShell in the scenario.

@ghost ghost locked as resolved and limited conversation to collaborators Oct 7, 2022
@karelz karelz removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Oct 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants