Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is maggy applicable to my use case? #69

Open
blazejdolicki opened this issue Oct 30, 2020 · 12 comments
Open

Is maggy applicable to my use case? #69

blazejdolicki opened this issue Oct 30, 2020 · 12 comments

Comments

@blazejdolicki
Copy link

Hi, I've just found this library and it seems great, but wanted to quickly double-check if it's applicable to my use case. Namely, I have a large amount of tabular data stored in Spark DataFrames (so the data is distributed on multiple machines) on databricks and I'm using a Spark ML model. Will I be able to run trials in parallel with such setting using maggy?

@moritzmeister
Copy link
Contributor

Hey!

Thanks for your interest! Maggy is very applicable to your use case, however, at this point in time it is very much tied to Hopsworks. If you want to try it out on Hopsworks, you can get access to a free demo instance on hopsworks.ai or you can deploy an entire Hopsworks instance to your own AWS account.

We are working on making Maggy more general, but it will take few more weeks for it to be ready for use on any Spark Cluster.
We are planning to release a standalone version of Maggy for the Data+AI Summit Europe in two weeks, or shortly thereafter.

Please come back and check the repo for any new releases!

In the meantime if you want to know more about Maggy as a research project, we have some blogposts (here and here. And also a paper at the MLOps Workshop of this years MLSys conference.

Hope that answers your questions!
I will ping you here, once we made another release!

@blazejdolicki
Copy link
Author

Thanks for a comprehensive response and all the references. In this case, I will wait for the standalone version. Looking forward to it!

@blazejdolicki
Copy link
Author

Hi @moritzmeister, I was wondering what's the status of Maggy, did you manage to make it standalone? :)

@crakama
Copy link

crakama commented Dec 12, 2020

@moritzmeister I am also interested on this response.

@moritzmeister
Copy link
Contributor

Hi @blazejdolicki, @crakama! Thanks for your interest! We're working on it but it's not there yet. Hope to get it done by mid January.

@crakama
Copy link

crakama commented Jan 17, 2021

@moritzmeister Thank you for the response. Will it be published somewhere?

@crakama
Copy link

crakama commented Feb 9, 2021

Any updates on this?

@moritzmeister
Copy link
Contributor

Hey @crakama, sorry I must've missed your previous message! With increasing interest we are now working towards a major 1.0 release. So we are getting there but it will still take some time.

@moritzmeister
Copy link
Contributor

Hi @blazejdolicki, hi @crakama,

it took us a while, but we just published a 1.0.0rc0 release candidate on PyPi. Give it a try on your Spark clusters. I suspect there are still a few bugs, but we are working on fixing them until the main release.

To get people started, there are a bunch more example notebooks now in
https://github.com/logicalclocks/maggy/tree/master/examples

Also we are redesigning/rewriting the documentation, so keep an eye on www.maggy.ai :)
And we wrote some Blog posts, which can serve as a kind of documentation until then:

Feel free to open new issues here on GitHub if you have specific questions or encounter any bugs.
I am closing this issue here. Looking forward to any feedback you might have!

@blazejdolicki
Copy link
Author

Thanks for letting us know!

@crakama
Copy link

crakama commented Jun 6, 2021 via email

@moritzmeister moritzmeister reopened this Jun 7, 2021
@moritzmeister
Copy link
Contributor

Hi @crakama,

that is very strange, what kind of python environment are you using and which version of pip? The release is on pypi: https://pypi.org/project/maggy/1.0.0rc0/. Alternatively, I also uploaded the wheel to the release tag here on github https://github.com/logicalclocks/maggy/releases/tag/1.0.0rc0

I just tested it with Python 3.8.8 and Pip 21.1.1 and I can install it as expected:

(test-maggy) moritzmeister@ ~ () $ pip install maggy==1.0.0rc0
Collecting maggy==1.0.0rc0
  Downloading maggy-1.0.0rc0-py3-none-any.whl (154 kB)
     |████████████████████████████████| 154 kB 8.2 MB/s 
Collecting scikit-optimize==0.7.4
  Using cached scikit_optimize-0.7.4-py2.py3-none-any.whl (80 kB)
Collecting numpy==1.19.2
  Using cached numpy-1.19.2-cp38-cp38-manylinux2010_x86_64.whl (14.5 MB)
Collecting statsmodels==0.11.0
  Using cached statsmodels-0.11.0-cp38-cp38-manylinux1_x86_64.whl (8.7 MB)
Collecting scipy==1.4.1
  Using cached scipy-1.4.1-cp38-cp38-manylinux1_x86_64.whl (26.0 MB)
Collecting scikit-learn>=0.19.1
  Downloading scikit_learn-0.24.2-cp38-cp38-manylinux2010_x86_64.whl (24.9 MB)
     |████████████████████████████████| 24.9 MB 13.3 MB/s 
Collecting pyaml>=16.9
  Using cached pyaml-20.4.0-py2.py3-none-any.whl (17 kB)
Collecting joblib>=0.11
  Using cached joblib-1.0.1-py3-none-any.whl (303 kB)
Collecting patsy>=0.5
  Using cached patsy-0.5.1-py2.py3-none-any.whl (231 kB)
Collecting pandas>=0.21
  Downloading pandas-1.2.4-cp38-cp38-manylinux1_x86_64.whl (9.7 MB)
     |████████████████████████████████| 9.7 MB 8.5 MB/s 
Collecting python-dateutil>=2.7.3
  Using cached python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
Collecting pytz>=2017.3
  Using cached pytz-2021.1-py2.py3-none-any.whl (510 kB)
Collecting six
  Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting PyYAML
  Using cached PyYAML-5.4.1-cp38-cp38-manylinux1_x86_64.whl (662 kB)
Collecting threadpoolctl>=2.0.0
  Using cached threadpoolctl-2.1.0-py3-none-any.whl (12 kB)
Installing collected packages: six, numpy, threadpoolctl, scipy, PyYAML, pytz, python-dateutil, joblib, scikit-learn, pyaml, patsy, pandas, statsmodels, scikit-optimize, maggy
Successfully installed PyYAML-5.4.1 joblib-1.0.1 maggy-1.0.0rc0 numpy-1.19.2 pandas-1.2.4 patsy-0.5.1 pyaml-20.4.0 python-dateutil-2.8.1 pytz-2021.1 scikit-learn-0.24.2 scikit-optimize-0.7.4 scipy-1.4.1 six-1.16.0 statsmodels-0.11.0 threadpoolctl-2.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants