-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sentencesearch returns images #442
Comments
i created a new project and just crawled this website: https://www.tagesschau.de/inland/innenpolitik/spd-miersch-generalsekretaer-100.html |
It looks very suspicious to me... The similarity score between ANY image and ANY text is never higher than any text to another text. (commonly known as modality gap). Might it be that this is a displaying bug or that some sdoc IDs got mixed up? Have you checked the raw output in the SwaggerUI ? |
calling the endponit
returns
sdoc_id 2 is an image. I mean even images have text documents (the automatically generated caption) however, this caption should only be returned once, as it most of the time only consists of one sentence. maybe there a bug related to embedding the caption? |
This similarity search should only return sentences / text files, right?
The text was updated successfully, but these errors were encountered: