Strategy for dealing with Stack Overflows #696

aearly · 2015-01-11T00:24:19Z

Stack overflow issues have come up frequently with async. Most of the time, they come from using synchronous iteration functions with large inputs, e.g. async.eachSeries(Array(1000000), function () { callback(); }, done). (#66, #75, #110, #117, #133, #296, #413, #510, #573, #588, #590, #604, #622) Some of those issues aren't caused by large arrays exactly, but most of the time they boil down to using a sync iteration function.

A common way to fix the problem is to use an async.nextTick() internally to re-set the call-stack for the next step of iteration. This masks the problem, and slows performance. In node and later IE's, you can use setImmediate or process.nexTick which is very fast, but in other browsers it is implemented as setTimeout(fn, 0), which actually ends up being about setTimeout(fn, 4). It adds unnecessary delay to your processing, if your iteration functions are actually asynchronous. This setImmediate-type guarding was added to waterfall and it appears to cause a pretty big performance hit: #40, #152, #513, #621

I feel like we need to come up with a comprehensive and consistent strategy for dealing with these competing issues.

Option 1: Don't guard against stack overflows, leave it to the user

I propose removing all async.nextTicks and setImmediates from async library functions, to remove the performance hit. This would be a pretty simple change. We can still offer the async.nextTick and async.setImmediate functions to help people fix their sync functions, when they do need to use them in a waterfall or something similar.

We would need to document what we actually mean by synchronous functions , based on the confusion in #629. A RangeError is a pretty good indicator of using a sync function.

This would be a breaking change, a 1.0-type feature. We could also provide a wrapper function that will make sync functions work, to ease migration. stackSafe from #516 is an interesting idea.

Option 2: Guard against stack overflows, but use a better `setImmediate`

There is a pretty comprehensive polyfill for setImmediate that handles a variety of browsers and JS execution environments: YuzuJS/setImmediate. It uses the native setImmediate in node and later IE's, postMessage tricks in modern browsers, and script.onreadystatechange tricks in older IE's. It gives sub-millisecond deferrals for 99%+ of JS execution environments.

If we were to use this for the expressed purpose of preventing stack overflows, we should use it everywhere. There should be tests for every async library function with huge arrays and sync functions. This would be a bit of work.

It would require a build-step (e.g. browserify --standalone) for the standalone distribution of async. Async would now have an npm dependency, The other option is to just copy the setImmediate code in, which is compatible with the MIT license I believe.

Option 3: Have a "dev-mode" that can detect sync iterators

Based on an environment variable or async._devMode flag, we could conditionally add some guards to iteration functions to see if they callback on the same event loop tick. Stack overflows would still happen, but you would get warnings that would make the source of the error easier to find. This would be tricky to implement, but it is a compromise between Options 1 and 2.

Implementing it without causing a performance hit will be difficult, and would probably require a build that can strip dead code. I'm reminded of how envify and Uglify can strip out if (process.env.NODE_ENV !== "production") { ... } blocks when browserifying, but I'm not sure if having both async.js and async.dev.js is desirable.

This is also a breaking, 1.0-type feature.

Option 4: Do nothing

Just leave things as-is. Some functions guard against stack overflows, others don't. We have a mis-match of philosophies. waterfall is slower than is has to be, eachSeries is fast, but will RangeError with a large array and a sync iteration function. We will continue to receive bug reports about stack overflows and poor performance.

So -- thoughts anyone? (@caolan @beaugunderson ) I am in favor of Option 1, but I am also biased because I also have experience with diagnosing the errors caused by sync iterators. 😃

The text was updated successfully, but these errors were encountered:

caolan · 2015-01-12T08:09:02Z

Option 1 for sure. Some guards got added at one point but I thought they'd all been removed by now.

aearly · 2015-01-14T03:29:52Z

Ok, I am working on this. It's not a simple as removing the setImmediates -- some methods relying on it to make things easier.

SethHamilton · 2015-02-17T03:15:19Z

I use nexTick, then setImmediate later on with all my "next" callbacks when using async because I ran into issues a very long time ago. Does Async now have these built in? I'm assuming there is a performance hit if the library does it and I do it as well. If they are leaving, then I have to change nothing, but if they are staying (if they exist) then I should probably remove mine.

As for my suggestion it would making it a switch. I've always hated having to call my callbacks wrapped up, it makes for ugly code. If it were a switch to turn on hidden setImmediate calls that would be slick.

aearly · 2015-02-22T02:07:16Z

I've added an ensureAsync function wrapper to my acomb helper library that will use a setImmediate call with your callback if (and only if) it needs to:

https://github.com/aearly/acomb#ensureAsync

matbee-eth · 2015-04-14T21:27:28Z

Something like this, http://dbaron.org/log/20100309-faster-timeouts, would be a good strategy for browsers.

aearly · 2015-04-18T19:30:20Z

I want to incorporate some postMessage tricks like that, but getting a cross browser/cross platform implementation that is fast in the most places is quite complicated: https://github.com/YuzuJS/setImmediate/blob/master/setImmediate.js

Also, even the fastest implementation (process.nextTick) is still slower than a simple function call, so it's not something we'd want to do by default everywhere.

matbee-eth · 2015-04-20T18:08:58Z

Implementing a postMessage/nextTick function based on a variable MAX_STACK_SIZE option would be nice.

charlierudolph · 2015-07-23T00:55:48Z

I'd suggest not guarding against stack overflows by default.

There could be an opt-in option that can either be turned on per method call or with an overall switch that wraps every async function in ensureAsync.

async.each(arr, iterator[, options][, callback])

if options.safe (or something else) is true, then wrap iterator in ensureAsync.

megawac · 2015-08-17T21:32:59Z

Agreed, it'd be nice to be able to detect the SO and provide a console warn to use a sync version (window.onerror&process.on('uncaughtException') hack :P)

aearly · 2016-03-09T21:18:11Z

With each, auto, and waterfall now without a deferral, I think this issue is safe to close. The only functions that have deferrals are ensureAsync (its core functionality), memoize, forever, and queue (which need it to work properly).

sebamarynissen · 2018-08-16T12:55:44Z

I ran into this issue as well recently. I am trying to parse a certain file format using node.js streams. As a consequence, I'm unable to parse the entire file at once, because a node.js stream will only give me buffers of which I don't know the size beforehand. As a result, I also don't know if sufficient bytes are present to do the parsing of a specific entry in the file. This is why I used the async module which uses callbacks that get called once sufficient bytes become available. If sufficient bytes are already available in the current buffer, then the callback is fired synchronously.

This works fine and provides a neat way of abstracting away whether I have to wait for another buffer chunk to be received or not. However, I if receive very large buffer chunks at once from the stream; it is possible that certain loops that I created using the async module are completely synchronous. Consider a trivial example where I have the entire file to be parsed as 1 buffer chunk, in that case all my async-stuff will be synchronous under the hood because all bytes are available - which is what I want because I would like to use my parser for synchronously parsing a file as well. As a result, I sometimes receive stack overflow errors.

I managed however to solve the issue by implementing my own "whilst" method. The philosophy is that I detect whether a callback is fired synchronously or asynchronously. If the callback is fired synchronously, I use a while loop which doesn't increase the call stack. If the callback is fired asynchronously, I use a recursive function call as normal. Stack overflow errors aren't to be expected here because the stack is cleared between two process ticks. This looks as follows:

const whilst = function(test, it, cb) {
  
  let ctx = this, main;
  let strategy = main = function() {
    return test.call(ctx);
  };
  let quit = function() {return false;};
  let results = [];
  const go = function() {

    // Loop synchronously as much as we can in a while loop instead of 
    // going deeper down the call stack.
    while (strategy()) {
      let sync = true;
      let pre = false;
      it.call(ctx, function(err, ...rest) {

        // This is where the magic happens: in case we're still 
        // synchronous, the "sync" variable will _still_ be set to 
        // true. Don't do anything now, the loop will continue out of 
        // itself by calling the test again. However, if the callback 
        // was fired asynchronously, the "sync" variable will already 
        // be false. In that case we need to re-enter the
        // while(strategy()) loop with the test function being called.
        results = rest;
        if (sync) {
          pre = true;
        }
        else {
          strategy = main;
          go();
        }

      });

      // In case we're async, "pre" will stil be false. This means that 
      // the loop needs to be quit, which can be done by changing 
      // strategies to quit.
      if (!pre) {
        strategy = quit;
      }
      sync = false;

    }

    // If we've reached this point, but the strategy was not set to 
    // "quit", this means that the synchronous test simply returned false. 
    // Call the final callback to exit the loop.
    if (strategy === main) {
      cb(null, ...results);
    }

  };
  go();

};

I can imagine that this concept would be possible to apply to other methods vulnerable to stack overflows as well. As such I think that in theory it is possible to adapt the entire async library to be able to handle synchronous callbacks without stack overflows. Whether this is desirable in a library whose purpose is for working with "true" asynchronous functions is an open question.

aearly · 2018-08-16T18:58:03Z

What you described is very similar to ensureAsync -- a wrapper for functions that always ensures the callback is called on the next tick.

sebamarynissen · 2018-08-16T19:10:05Z

@aearly, perhaps my explanation was not entirely clear, but what I tried to do was the exact opposite of ensureAsync: if all callbacks are called synchronously, then the entire whilst loop executes synchronously. This means that

let i = 0;
let order = [];
whilst(function() {
  return i < 1e6;
}, function(cb) {
  i++;
  cb(null);
}, function(err) {
  order.push(1);
});
order.push(2);

// Right now, order === [1, 2];, because the callbacks are called synchronously.

This is already the default behavior for the whilst function in async, but my approach above avoids stackoverflow errors for synchronous while loops with more iterations than the JS call stack can handle.

aearly mentioned this issue Jan 12, 2015

eachSeries - stack size limitation #700

Closed

aearly mentioned this issue Jan 20, 2015

Removing internal setImmediate's #704

Closed

honzajavorek mentioned this issue Apr 9, 2015

Remove async.waterfall apiaryio/boutique.js#51

Merged

aearly added the question label May 19, 2015

This was referenced May 19, 2015

Use setImmediate to call eachSeries iterator to prevent stack overflow #604

Closed

Risk of stack overflow in eachSeries/forEachSeries #588

Closed

forEachOf from npm #760

Closed

charlierudolph mentioned this issue Jul 23, 2015

I guess I found one bug in async.whilst #860

Closed

megawac mentioned this issue Aug 17, 2015

Performance regression in eachSeries after 1.0.0 #883

Closed

megawac added the performance label Aug 17, 2015

This was referenced Oct 22, 2015

async.queue's task callbacks recursion bug #938

Closed

Arg validation in .waterfall() is now asynchronous #570

Closed

aearly added this to the 2.0 milestone Oct 25, 2015

aearly added the breaking-change label Oct 25, 2015

This was referenced Dec 14, 2015

Updated async version to ^1.5.0 appvation/express-jwt#1

Closed

Updated package.json. async ^1.5.0 auth0/express-jwt#106

Merged

ex1st mentioned this issue Dec 17, 2015

strongly request add interval setting for all Api #982

Closed

This was referenced Mar 7, 2016

Refactor auto to not need a deferral #1049

Merged

Waterfall multiple callback defense #1050

Merged

aearly closed this as completed Mar 9, 2016

ex1st mentioned this issue Jul 22, 2016

Async.each / Async.eachSeries #1250

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strategy for dealing with Stack Overflows #696

Strategy for dealing with Stack Overflows #696

aearly commented Jan 11, 2015

caolan commented Jan 12, 2015

aearly commented Jan 14, 2015

SethHamilton commented Feb 17, 2015

aearly commented Feb 22, 2015

matbee-eth commented Apr 14, 2015

aearly commented Apr 18, 2015

matbee-eth commented Apr 20, 2015

charlierudolph commented Jul 23, 2015

megawac commented Aug 17, 2015

aearly commented Mar 9, 2016

sebamarynissen commented Aug 16, 2018

aearly commented Aug 16, 2018

sebamarynissen commented Aug 16, 2018 •

edited

Loading

Strategy for dealing with Stack Overflows #696

Strategy for dealing with Stack Overflows #696

Comments

aearly commented Jan 11, 2015

Option 1: Don't guard against stack overflows, leave it to the user

Option 2: Guard against stack overflows, but use a better setImmediate

Option 3: Have a "dev-mode" that can detect sync iterators

Option 4: Do nothing

caolan commented Jan 12, 2015

aearly commented Jan 14, 2015

SethHamilton commented Feb 17, 2015

aearly commented Feb 22, 2015

matbee-eth commented Apr 14, 2015

aearly commented Apr 18, 2015

matbee-eth commented Apr 20, 2015

charlierudolph commented Jul 23, 2015

megawac commented Aug 17, 2015

aearly commented Mar 9, 2016

sebamarynissen commented Aug 16, 2018

aearly commented Aug 16, 2018

sebamarynissen commented Aug 16, 2018 • edited Loading

Option 2: Guard against stack overflows, but use a better `setImmediate`

sebamarynissen commented Aug 16, 2018 •

edited

Loading