Skip to content

Releases: klarna-incubator/mleko

v4.3.0

08 Jun 19:29
Compare
Choose a tag to compare

v4.3.0 (2024-06-08)

✨ Features

  • model: Add check for fitted model in LGBMModel fingerprint. (f6a0933)

🐛 Bug Fixes

  • tuning: Optional enqueue_trials parameter added to fingerprint of OptunaTuner. (80fa374)
  • transformer: Update LabelEncoder to use PyArrow implementation of unique to prevent vaex bug from crashing the transformer. (85059d7)

v4.2.0

21 May 14:17
Compare
Choose a tag to compare

v4.2.0 (2024-05-21)

✨ Features

  • transformer: Update ExpressionTransformer to use TypedDict instead of tuples. (3950abd)

v4.1.0

18 May 16:28
Compare
Choose a tag to compare

v4.1.0 (2024-05-18)

✨ Features

  • tuning: Add support for enqueuing trials in OptunaTuner. (9e0b6b2)
  • data splitting: Add support for stratification on multiple features in the RandomSplitter. (d745434)
  • transformer: Add metadata option for the ExpressionTransformer that allows for creation of meta features not tracked in the DataSchema. (f16ea8b)
  • transformer: Add ExpressionTransformer for creating features using the vaex expression system. (c0faf74)

v4.0.0

09 May 10:08
Compare
Choose a tag to compare

v4.0.0 (2024-05-09)

⛔️ BREAKING CHANGES

  • exporter: Add S3Exporter that implements cached S3 exporting of files from the local disk. (d17b2d2)
  • exporter: Add BaseExporter and LocalExporter implementations that support exporting data to disk, along with corresponding Pipeline steps. (6ce13cf)

✨ Features

  • exporter: Add LocalManifest support for LocalExporter which simplifies caching logic and enables S3 manifest translations. (2199ff0)
  • exporter: Add support for multiple data export using LocalExporter. (ff988b6)
  • data source: Add support for reading manifest files from S3 buckets in S3Ingester. (9c68a9b)
  • pipeline: Add disable_cache parameter to Pipeline execution. (da1e31a)

🐛 Bug Fixes

  • data cleaning: Fix newline characters breaking CSV reading using Arrow. (3a7e594)
  • tuning: Delete logging of storage URI to minimize risk of accidentally logging credentials. (054692d)

🛠️ Code Refactoring

  • data source: Extract shared S3 logic to utils which can be then used by S3Exporter. (97a7974)

v3.2.0

18 Apr 20:13
Compare
Choose a tag to compare

v3.2.0 (2024-04-18)

✨ Features

  • tuning: Add support for RDSStorage using the OptunaTuner (cc06ddd)

🐛 Bug Fixes

  • data source: Fix bug where dataset_id consisting of path components would break local metadata file creation (17c4866)
  • model: Add verbosity parameter to BaseModel to set log level in the base class. (0a3828f)

v3.1.0

12 Apr 11:14
Compare
Choose a tag to compare

v3.1.0 (2024-04-12)

✨ Features

  • model: Add optional memoization to datasets during model training. (#209) (2ca4465)
  • model: Add optional memoization to datasets during model training. (6a955dc)

v3.0.0

05 Apr 08:27
Compare
Choose a tag to compare

v3.0.0 (2024-04-05)

⛔️ BREAKING CHANGES

  • model: Update LGBMModel to use dependency injection, now expects a lightgbm.LGBMModel as argument. (7250f34)

🐛 Bug Fixes

  • Switch vaex file format to Arrow instead of HDF5 for better type support. (ac8e500)
  • data cleaning: Fix bug where boolean columns are stored as numerical in the data schema due to int8 conversion. (da358d8)

v2.2.0

22 Mar 17:06
Compare
Choose a tag to compare

v2.2.0 (2024-03-22)

✨ Features

  • filter: Add ImblearnResamplingFilter which is a wrapper for imblearn over- and under-samplers. (77a3d7d)
  • filter: Add ExpressionFilter and base class for simple DataFrame filtering using vaex expressions. (dc679ff)
  • cache: Add disable_cache argument to all cached functions to completely bypass all caching functionality. (fbdfc5d)

📝 Documentation

  • Update CHANGELOG.md format to include missing categories. (d97b32c)

v2.1.0

24 Feb 15:57
Compare
Choose a tag to compare

v2.1.0 (2024-02-24)

🐛 Bug Fixes

  • data cleaning: Fix meta_columns not being forcefully cast to correct data type in CSVToVaexConverter. (b42b9ed)

🧪 Tests

  • Fix test cases generating cache directory outside temporary directory. (ba57fbf)

v2.0.0

07 Feb 21:33
Compare
Choose a tag to compare

v2.0.0 (2024-02-07)

⛔️ BREAKING CHANGES

  • pipeline: Refactor PipelineStep to use TypedDict for both inputs and outputs. (2eb623c)

🐛 Bug Fixes

  • data cleaning: Rename empty column name to _empty to prevent vaex crashes. (da72b75)
  • data cleaning: Cast boolean columns to int8 during cleaning to reduce label encoding needs. (d94f7c9)
  • Added reserved keyword column name replacement to prevent evaluation errors from vaex. (3969ffd)

🛠️ Code Refactoring

  • Improve error logging messages, and update codebase to new black format. (a29ad45)
  • cache: Break out cache handler retrieval method. (aba9e41)

🤖 Continous Integration

  • Remove TypeGuard and PyUpgrade from build and pre-commit. (d374406)
  • Add custom template for release notes to follow changelog structure. (30518c0)