Tags: google/xarray-beam
Tags
[xarray-beam] fix docs & bug in make_template Reusing dask.array names for arrays with different shapes was the indirect cause of the error on our docs pages. I've also updated the build dependencies for our docs to fix other build issues, and added a Python 3.11 run of our unit tests. Fixes #76 PiperOrigin-RevId: 536022594
Use multi-stage rechunking. As described in pangeo-data/rechunker#89, this can yield significant performance benefits for rechunking large arrays. PiperOrigin-RevId: 518325665
Support datasets with differently chunked variables in DatasetToChunks There are two major internal changes: 1. Key objects from DatasetToChunks now can include different dimensions for different variables when using split_vars=True. This makes it easier to handle large datasets with many variables and different chunking per variable. 2. Inputs inside the DatasetToChunks pipeline can now be sharded across many tasks. This is important for scalability to large datasets, especially with this chagne because the above refactor increases the number of inputs by the number of variables when split_vars=True. Otherwise, we can run into performance issues on the machine launching the pipeline when the number of inputs goes into the millions (e.g., slow speed, out of memory). See the new integration test for a concrete use-case, resembling real model output. Also revise the warning message in the README to be a bit friendlier. Fixes #43 PiperOrigin-RevId: 471948735
PreviousNext