-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strategy for dealing with Stack Overflows #696
Comments
Option 1 for sure. Some guards got added at one point but I thought they'd all been removed by now. |
Ok, I am working on this. It's not a simple as removing the |
I use nexTick, then setImmediate later on with all my "next" callbacks when using async because I ran into issues a very long time ago. Does Async now have these built in? I'm assuming there is a performance hit if the library does it and I do it as well. If they are leaving, then I have to change nothing, but if they are staying (if they exist) then I should probably remove mine. As for my suggestion it would making it a switch. I've always hated having to call my callbacks wrapped up, it makes for ugly code. If it were a switch to turn on hidden setImmediate calls that would be slick. |
I've added an |
Something like this, http://dbaron.org/log/20100309-faster-timeouts, would be a good strategy for browsers. |
I want to incorporate some Also, even the fastest implementation ( |
Implementing a postMessage/nextTick function based on a variable MAX_STACK_SIZE option would be nice. |
I'd suggest not guarding against stack overflows by default. There could be an opt-in option that can either be turned on per method call or with an overall switch that wraps every async function in ensureAsync.
if options.safe (or something else) is true, then wrap iterator in ensureAsync. |
Agreed, it'd be nice to be able to detect the SO and provide a console warn to use a sync version ( |
With |
I ran into this issue as well recently. I am trying to parse a certain file format using node.js streams. As a consequence, I'm unable to parse the entire file at once, because a node.js stream will only give me buffers of which I don't know the size beforehand. As a result, I also don't know if sufficient bytes are present to do the parsing of a specific entry in the file. This is why I used the async module which uses callbacks that get called once sufficient bytes become available. If sufficient bytes are already available in the current buffer, then the callback is fired synchronously. This works fine and provides a neat way of abstracting away whether I have to wait for another buffer chunk to be received or not. However, I if receive very large buffer chunks at once from the stream; it is possible that certain loops that I created using the async module are completely synchronous. Consider a trivial example where I have the entire file to be parsed as 1 buffer chunk, in that case all my async-stuff will be synchronous under the hood because all bytes are available - which is what I want because I would like to use my parser for synchronously parsing a file as well. As a result, I sometimes receive stack overflow errors. I managed however to solve the issue by implementing my own "whilst" method. The philosophy is that I detect whether a callback is fired synchronously or asynchronously. If the callback is fired synchronously, I use a while loop which doesn't increase the call stack. If the callback is fired asynchronously, I use a recursive function call as normal. Stack overflow errors aren't to be expected here because the stack is cleared between two process ticks. This looks as follows: const whilst = function(test, it, cb) {
let ctx = this, main;
let strategy = main = function() {
return test.call(ctx);
};
let quit = function() {return false;};
let results = [];
const go = function() {
// Loop synchronously as much as we can in a while loop instead of
// going deeper down the call stack.
while (strategy()) {
let sync = true;
let pre = false;
it.call(ctx, function(err, ...rest) {
// This is where the magic happens: in case we're still
// synchronous, the "sync" variable will _still_ be set to
// true. Don't do anything now, the loop will continue out of
// itself by calling the test again. However, if the callback
// was fired asynchronously, the "sync" variable will already
// be false. In that case we need to re-enter the
// while(strategy()) loop with the test function being called.
results = rest;
if (sync) {
pre = true;
}
else {
strategy = main;
go();
}
});
// In case we're async, "pre" will stil be false. This means that
// the loop needs to be quit, which can be done by changing
// strategies to quit.
if (!pre) {
strategy = quit;
}
sync = false;
}
// If we've reached this point, but the strategy was not set to
// "quit", this means that the synchronous test simply returned false.
// Call the final callback to exit the loop.
if (strategy === main) {
cb(null, ...results);
}
};
go();
}; I can imagine that this concept would be possible to apply to other methods vulnerable to stack overflows as well. As such I think that in theory it is possible to adapt the entire async library to be able to handle synchronous callbacks without stack overflows. Whether this is desirable in a library whose purpose is for working with "true" asynchronous functions is an open question. |
What you described is very similar to |
@aearly, perhaps my explanation was not entirely clear, but what I tried to do was the exact opposite of let i = 0;
let order = [];
whilst(function() {
return i < 1e6;
}, function(cb) {
i++;
cb(null);
}, function(err) {
order.push(1);
});
order.push(2);
// Right now, order === [1, 2];, because the callbacks are called synchronously. This is already the default behavior for the |
Stack overflow issues have come up frequently with async. Most of the time, they come from using synchronous iteration functions with large inputs, e.g.
async.eachSeries(Array(1000000), function () { callback(); }, done)
. (#66, #75, #110, #117, #133, #296, #413, #510, #573, #588, #590, #604, #622) Some of those issues aren't caused by large arrays exactly, but most of the time they boil down to using a sync iteration function.A common way to fix the problem is to use an
async.nextTick()
internally to re-set the call-stack for the next step of iteration. This masks the problem, and slows performance. In node and later IE's, you can usesetImmediate
orprocess.nexTick
which is very fast, but in other browsers it is implemented assetTimeout(fn, 0)
, which actually ends up being aboutsetTimeout(fn, 4)
. It adds unnecessary delay to your processing, if your iteration functions are actually asynchronous. ThissetImmediate
-type guarding was added towaterfall
and it appears to cause a pretty big performance hit: #40, #152, #513, #621I feel like we need to come up with a comprehensive and consistent strategy for dealing with these competing issues.
Option 1: Don't guard against stack overflows, leave it to the user
I propose removing all
async.nextTick
s andsetImmediate
s from async library functions, to remove the performance hit. This would be a pretty simple change. We can still offer theasync.nextTick
andasync.setImmediate
functions to help people fix their sync functions, when they do need to use them in awaterfall
or something similar.We would need to document what we actually mean by synchronous functions , based on the confusion in #629. A
RangeError
is a pretty good indicator of using a sync function.This would be a breaking change, a 1.0-type feature. We could also provide a wrapper function that will make sync functions work, to ease migration.
stackSafe
from #516 is an interesting idea.Option 2: Guard against stack overflows, but use a better
setImmediate
There is a pretty comprehensive polyfill for setImmediate that handles a variety of browsers and JS execution environments: YuzuJS/setImmediate. It uses the native
setImmediate
in node and later IE's,postMessage
tricks in modern browsers, andscript.onreadystatechange
tricks in older IE's. It gives sub-millisecond deferrals for 99%+ of JS execution environments.If we were to use this for the expressed purpose of preventing stack overflows, we should use it everywhere. There should be tests for every async library function with huge arrays and sync functions. This would be a bit of work.
It would require a build-step (e.g.
browserify --standalone
) for the standalone distribution of async. Async would now have an npm dependency, The other option is to just copy the setImmediate code in, which is compatible with the MIT license I believe.Option 3: Have a "dev-mode" that can detect sync iterators
Based on an environment variable or
async._devMode
flag, we could conditionally add some guards to iteration functions to see if they callback on the same event loop tick. Stack overflows would still happen, but you would get warnings that would make the source of the error easier to find. This would be tricky to implement, but it is a compromise between Options 1 and 2.Implementing it without causing a performance hit will be difficult, and would probably require a build that can strip dead code. I'm reminded of how
envify
and Uglify can strip outif (process.env.NODE_ENV !== "production") { ... }
blocks when browserifying, but I'm not sure if having bothasync.js
andasync.dev.js
is desirable.This is also a breaking, 1.0-type feature.
Option 4: Do nothing
Just leave things as-is. Some functions guard against stack overflows, others don't. We have a mis-match of philosophies.
waterfall
is slower than is has to be,eachSeries
is fast, but willRangeError
with a large array and a sync iteration function. We will continue to receive bug reports about stack overflows and poor performance.So -- thoughts anyone? (@caolan @beaugunderson ) I am in favor of Option 1, but I am also biased because I also have experience with diagnosing the errors caused by sync iterators. 😃
The text was updated successfully, but these errors were encountered: