Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoreload reliability improvements #14500

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

jaysarva
Copy link

@jaysarva jaysarva commented Aug 20, 2024

Introducing Deduperreload, an autoreload implementation that does not produce duplicate versions of objects upon reload. This work was done as part of a Summer 2024 internship at Databricks. We hope the community can also benefit from this!

The goal of this autoreload implementation is to allow for any modification to a function/method and have those changes be implemented directly without having to refresh the repl.
Here’s our pipeline:

  1. Check whether we are able to use deduperreload by analyzing the changes to the AST of each module
  2. If we cannot use deduperreload, we should default back to the original autoreload.
  3. If we can use deduperreload, we do.

Note that this functionality only takes effect for CPython.

What types of changes are supported by deduperreload?

Modifications to any function/method are supported. If some other change occurs (e.g. modification to a class member variable, or some module-level variable), we fall back to superreload.
Furthermore, if the AST is not modified (in a meaningful way), reloading will not occur, even if the file has been modified (e.g. adding a new line, etc.)

How does deduperreload work?

Instead of calling reload() to generate a new copy of the entire module (and everything in it), we intelligently find functions/methods that have been modified and just update the attributes of these existing function/method object. We also support adding/removing new functions/methods in a similar manner.

Why is dedupereload important?

With the original implementation of autoreload, function objects get duplicated due to directly calling the reload() function. This can cause many problems, especially with decorators, enums, etc. as it is possible for both the old function object and the new reloaded function object to exist at the same time.

Fixes #14395

@jaysarva jaysarva marked this pull request as ready for review August 21, 2024 21:10
@smacke
Copy link
Contributor

smacke commented Aug 21, 2024

cc @jasongrout

Very very cool stuff!

@Carreau
Copy link
Member

Carreau commented Aug 29, 2024

Thanks, that is great ! Do you want to wrote a bit about this in the what's new ?

https://github.com/ipython/ipython/tree/main/docs/source/whatsnew (and feel free to mention this is internship at databrick and link to any relavant inofrmation and your profile and those of people who helped you.).

I don't have much time to review, and this is big, but if you work with someone like @jasongrout and they +1 then I'm happy to get than in.

I might do a release tomorrow, I'm not sure this will make the cut, but otherwise at the release at end of september.

@smacke
Copy link
Contributor

smacke commented Sep 4, 2024

Hi Matthias, I was Jay's intern mentor for this project, for which we were fortunate to have advising from @jasongrout (will make sure to secure his +1 at some point :P ). I can help out with shepherding this PR thru as well as adding the relevant release notes to the IPython docs. Internally we're also discussing a potential blog post that goes into a few more details -- if that ends up happening, I'll be sure to link it as well. Regardless, definitely no rush on the release -- I'll follow up here around the end of September.

@Carreau
Copy link
Member

Carreau commented Sep 6, 2024

No worries. If you tell me Jason's was involved and think it's ok, I trust you. My guess is a coordinated blog post/release would be the best, so I can make a release out of the usual schedule if that helps you. I'm currently travelling but should have some time again at the end of next week.

@phihung
Copy link

phihung commented Oct 5, 2024

Hi.
I've been playing with the new reload. It works really well. Thanks a lot for the efforts.
I found a potential issue: Sometime, update_sources function may take up to 10s to read 2K python files (from pandas, pytorch, transformers, etc). It happens randomly, maybe my disk was busy doing some things else at the time.

One potential fix is an option to ignore anything from /site-packages/ directory

if "/site-packages/" in fname:
    self.source_by_modname[new_modname] = ""
    continue

@smacke
Copy link
Contributor

smacke commented Oct 16, 2024

Hi @phihung, thanks a ton for the report. I'll take a look to see what's going on and I agree that excluding /site-packages/ from deduperreload as a workaround may be the right move

@smacke
Copy link
Contributor

smacke commented Oct 17, 2024

@jaysarva opened jaysarva#1 to address @phihung's above comment

@@ -233,7 +238,7 @@ def filename_and_mtime(self, module):

return py_filename, pymtime

def check(self, check_all=False, do_reload=True):
def check(self, check_all=False, do_reload=True, use_deduper_reload=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

I'd like to see if there is a way to turn off use_duper_reload at runtime; just in case it breaks something maybe have the default as in instance varaible that cn be change with a parameter to the magic.

I'm og with true being the default, but scare we'd break someone without options to turn it off without changing version

@@ -285,6 +297,8 @@ def check(self, check_all=False, do_reload=True):
file=sys.stderr,
)
self.failed[py_filename] = pymtime
if use_deduper_reload:
self.deduper_reloader.update_sources()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. (I put ok in there for myself to know how far I've read).

DefinitionAst = (ast.FunctionDef, ast.AsyncFunctionDef)


def get_module_file_name(module: ModuleType | str) -> str:
Copy link
Member

@Carreau Carreau Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a note (not a request to change). Those |, or list[] fails on older version of Python when using runtime types extraction of type (https://docs.python.org/3/library/typing.html#typing.get_type_hints). Som in some case from __future__ import annotations is not sufficient to use the new annotation syntax.

"""

qualified_name: tuple[str, ...]
abstract_syntax_tree: ast.AST
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

Comment on lines +84 to +89

def __init__(self) -> None:
self.children: dict[str, AutoreloadTree] = {}
self.defs_to_reload: dict[str, ast.AST] = {}
self.defs_to_delete: set[str] = set()
self.new_nested_classes: dict[str, ast.AST] = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think usually you want to the type annotation at class level for proper type checking in subclasses I believe; it used to be that mypy was not able to infer type annotation of attribute defined in init. (or emit a AutoreloadTree.children attribute might not be set/ None does not have method XXX).

I don't think that's a big problem here, but something to keep in mind depending on what

Detects what functions/methods can be reloaded by recursively comparing the old/new AST of module-level classes, module-level classes' methods, recursing through nested classes' methods.
If other changes are made, original autoreload algorithm is called directly.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe wrap at 80 chars wide.

Comment on lines +214 to +215
Returns `true` if we can run our targeted autoreload algorithm safely.
Returns `false` if we should instead use IPython's original autoreload implementation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Returns `true` if we can run our targeted autoreload algorithm safely.
Returns `false` if we should instead use IPython's original autoreload implementation.
Returns
-------
`True` if we can run our targeted autoreload algorithm safely.
`False` if we should instead use IPython's original autoreload implementation.

We try to follow numpydoc standard when we can.

Returns `true` on success. Returns `false` on failure.
Currently, only returns `true` as we do not block on failure to build this graph.
"""
return self._gather_dependents(new_ast.body)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

Copy link
Member

@Carreau Carreau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience I'm slowly reading through this. I will do a release soon, and might not have the time to fully review. I've dropped some comments for myself when I come back to it.

denylist site-packages and dist-packages from deduperreload
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Defining custom torch function causes "maximum recursion depth" in autoreload
4 participants