Autoreload reliability improvements #14500

jaysarva · 2024-08-20T01:58:17Z

Introducing Deduperreload, an autoreload implementation that does not produce duplicate versions of objects upon reload. This work was done as part of a Summer 2024 internship at Databricks. We hope the community can also benefit from this!

The goal of this autoreload implementation is to allow for any modification to a function/method and have those changes be implemented directly without having to refresh the repl.
Here’s our pipeline:

Check whether we are able to use deduperreload by analyzing the changes to the AST of each module
If we cannot use deduperreload, we should default back to the original autoreload.
If we can use deduperreload, we do.

Note that this functionality only takes effect for CPython.

What types of changes are supported by deduperreload?

Modifications to any function/method are supported. If some other change occurs (e.g. modification to a class member variable, or some module-level variable), we fall back to superreload.
Furthermore, if the AST is not modified (in a meaningful way), reloading will not occur, even if the file has been modified (e.g. adding a new line, etc.)

How does deduperreload work?

Instead of calling reload() to generate a new copy of the entire module (and everything in it), we intelligently find functions/methods that have been modified and just update the attributes of these existing function/method object. We also support adding/removing new functions/methods in a similar manner.

Why is dedupereload important?

With the original implementation of autoreload, function objects get duplicated due to directly calling the reload() function. This can cause many problems, especially with decorators, enums, etc. as it is possible for both the old function object and the new reloaded function object to exist at the same time.

Fixes #14395

…ions/deduperreload/deduperreload_patching.py

smacke · 2024-08-21T22:15:01Z

cc @jasongrout

Very very cool stuff!

Carreau · 2024-08-29T08:31:40Z

Thanks, that is great ! Do you want to wrote a bit about this in the what's new ?

https://github.com/ipython/ipython/tree/main/docs/source/whatsnew (and feel free to mention this is internship at databrick and link to any relavant inofrmation and your profile and those of people who helped you.).

I don't have much time to review, and this is big, but if you work with someone like @jasongrout and they +1 then I'm happy to get than in.

I might do a release tomorrow, I'm not sure this will make the cut, but otherwise at the release at end of september.

smacke · 2024-09-04T22:31:17Z

Hi Matthias, I was Jay's intern mentor for this project, for which we were fortunate to have advising from @jasongrout (will make sure to secure his +1 at some point :P ). I can help out with shepherding this PR thru as well as adding the relevant release notes to the IPython docs. Internally we're also discussing a potential blog post that goes into a few more details -- if that ends up happening, I'll be sure to link it as well. Regardless, definitely no rush on the release -- I'll follow up here around the end of September.

Carreau · 2024-09-06T15:27:44Z

No worries. If you tell me Jason's was involved and think it's ok, I trust you. My guess is a coordinated blog post/release would be the best, so I can make a release out of the usual schedule if that helps you. I'm currently travelling but should have some time again at the end of next week.

phihung · 2024-10-05T13:34:20Z

Hi.
I've been playing with the new reload. It works really well. Thanks a lot for the efforts.
I found a potential issue: Sometime, update_sources function may take up to 10s to read 2K python files (from pandas, pytorch, transformers, etc). It happens randomly, maybe my disk was busy doing some things else at the time.

One potential fix is an option to ignore anything from /site-packages/ directory

if "/site-packages/" in fname:
    self.source_by_modname[new_modname] = ""
    continue

smacke · 2024-10-16T22:09:50Z

Hi @phihung, thanks a ton for the report. I'll take a look to see what's going on and I agree that excluding /site-packages/ from deduperreload as a workaround may be the right move

smacke · 2024-10-17T16:41:55Z

@jaysarva opened jaysarva#1 to address @phihung's above comment

Carreau · 2024-10-25T08:06:50Z

IPython/extensions/autoreload.py

@@ -233,7 +238,7 @@ def filename_and_mtime(self, module):

        return py_filename, pymtime

-    def check(self, check_all=False, do_reload=True):
+    def check(self, check_all=False, do_reload=True, use_deduper_reload=True):


Ok.

I'd like to see if there is a way to turn off use_duper_reload at runtime; just in case it breaks something maybe have the default as in instance varaible that cn be change with a parameter to the magic.

I'm og with true being the default, but scare we'd break someone without options to turn it off without changing version

Carreau · 2024-10-25T08:07:34Z

IPython/extensions/autoreload.py

@@ -285,6 +297,8 @@ def check(self, check_all=False, do_reload=True):
                            file=sys.stderr,
                        )
                    self.failed[py_filename] = pymtime
+        if use_deduper_reload:
+            self.deduper_reloader.update_sources()


ok. (I put ok in there for myself to know how far I've read).

Carreau · 2024-10-25T08:09:34Z