Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.0 drops embeddings_util.py breaking semantic text search #676

Closed
mrbullwinkle opened this issue Nov 6, 2023 · 13 comments
Closed

v1.0 drops embeddings_util.py breaking semantic text search #676

mrbullwinkle opened this issue Nov 6, 2023 · 13 comments
Assignees

Comments

@mrbullwinkle
Copy link

mrbullwinkle commented Nov 6, 2023

Describe the bug

The previous version of the OpenAI Python library contained embeddings_utils.py which provided functions like cosine_similarity which are used for semantic text search with embeddings. Without this functionality existing code including OpenAI's cookbook example: https://cookbook.openai.com/examples/semantic_text_search_using_embeddings will fail due to this dependency.

Are there plans to add this support back-in or should we just create our own cosine_similarity function based on the one that was present in embeddings_utils:

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

To Reproduce

Cookbook example cannot be converted to use v1.0 without removing the dependency on embeddings_utils.py https://cookbook.openai.com/examples/semantic_text_search_using_embeddings

Code snippets

from openai.embeddings_utils import get_embedding, cosine_similarity

# search through the reviews for a specific product
def search_reviews(df, product_description, n=3, pprint=True):
    product_embedding = get_embedding(
        product_description,
        engine="text-embedding-ada-002"
    )
    df["similarity"] = df.embedding.apply(lambda x: cosine_similarity(x, product_embedding))

    results = (
        df.sort_values("similarity", ascending=False)
        .head(n)
        .combined.str.replace("Title: ", "")
        .str.replace("; Content:", ": ")
    )
    if pprint:
        for r in results:
            print(r[:200])
            print()
    return results


results = search_reviews(df, "delicious beans", n=3)

OS

Windows

Python version

Python v3.10.11

Library version

openai-python==1.0.0rc2

@mrbullwinkle mrbullwinkle added the bug Something isn't working label Nov 6, 2023
@RobertCraigie
Copy link
Collaborator

Thanks for calling this out @mrbullwinkle, we're working on updating the cookbook repository to include the functions provided in embeddings_utils.py directly so that you can copy them into your own project.

This is a better approach than the current embeddings_utils as you can just include the dependencies for the function you want whereas with the current approach you'll have to install dependencies you'll never use.

@mrbullwinkle
Copy link
Author

@RobertCraigie, makes sense. Thank you for the super fast response!

@RobertCraigie RobertCraigie removed the bug Something isn't working label Nov 6, 2023
@CristianPQ
Copy link

CristianPQ commented Nov 7, 2023

@mrbullwinkle
In the meantime, I hope this piece of code can help you.

source: https://learn.microsoft.com/en-us/azure/ai-services/openai/tutorials/embeddings?tabs=python-new%2Ccommand-line

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def get_embedding(text, model="text-embedding-ada-002"): # model = "deployment_name"
    return client.embeddings.create(input = [text], model=model).data[0].embedding

def search_docs(df, user_query, top_n=4, to_print=True):
    embedding = get_embedding(
        user_query,
        model="text-embedding-ada-002" # model should be set to the deployment name you chose when you deployed the text-embedding-ada-002 (Version 2) model
    )
    df["similarities"] = df.ada_v2.apply(lambda x: cosine_similarity(x, embedding))

    res = (
        df.sort_values("similarities", ascending=False)
        .head(top_n)
    )
    if to_print:
        display(res)
    return res


res = search_docs(df_bills, "Can I get information on cable company tax revenue?", top_n=4)

@mrbullwinkle
Copy link
Author

mrbullwinkle commented Nov 7, 2023

@CristianPQ you beat me to adding a ref to that code sample to this issue after I added it earlier today. I am the author of that code sample/article. ☺️

@patham9
Copy link

patham9 commented Nov 23, 2023

Very unprofessional that such functions just get removed. It is a moving API one cannot rely on, very annoying.

@bhaniraj
Copy link

@RobertCraigie , is the support added in cookbook repo for the functions provided in embeddings_utils.py ?

@rattrayalex
Copy link
Collaborator

@logankilpatrick , can you help here?

@julianlevy
Copy link

Hi folks - any updates on the cookbook yet?

@jawwadturabi
Copy link

Still can't see the updates on the cookbook

@logankilpatrick
Copy link
Contributor

Hey folks, we migrated these over to the cookbook's own utils folder ~3 months ago: https://github.com/openai/openai-cookbook/blob/main/examples/utils/embeddings_utils.py, if you find any notebooks that are out of sync and not using the built-in utils, please open an issue on the cookbook repo.

@the-rich-piana
Copy link

Why the frick would you guys remove this. This API is god awful.

@davidgilbertson
Copy link

FWIW, if you're using embeddings from the OpenAI API, you can get cosine similarity with just a @ b, since they're normalized to 1 anyway (assuming a and b are NumPy arrays or tensors or similar, not plain lists)

@davidgilbertson
Copy link

Also BTW there is still a reference to embeddings_utils in the actual user guide in the docs here: https://platform.openai.com/docs/guides/embeddings/use-cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests