Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not deduplicate different units at the same address #1472

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

missinglink
Copy link
Member

Multiple units at the same address are currently being deduplicated.

This can lead to situations where the response is showing much fewer results than was requested.
In particular this is not a great experience for /v1/reverse but also for the forward-geocoding APIs

One thing to consider before merging this:

  • If the labels we generate don't include the unit number then the results will contain 10+ entries with the exact same label.

So the pro here is that we can show all the units in the building and the con is that these results will dominate results for queries near these multi-unit buildings/campuses.

I really wish there was something analogous to a GROUP BY in elasticsearch, but this is unfortunately not possible.

An alternative solution would be to increase the querySize variable from what it is now (usually double ?size) so that more results are returned and then continue to consider these as duplicates 🤷‍♂️

Thoughts/Feels?

@Joxit
Copy link
Member

Joxit commented Aug 3, 2020

IMO, this is a complex choice and depends on the use-case.

As you said, having all units of a building will flood the result, especially when your use case is the address only.

In the other hands... When you need the unit... 🤷‍♂️

I suppose, for the forward geocoding, when you are looking for the unit, all units should be displayed.

And for the reverse... 🤷‍♂️ maybe a meta category address:unit or dedicated query parameter ? But it may be a pain for the integration...

@missinglink missinglink marked this pull request as draft August 4, 2020 12:41
@missinglink
Copy link
Member Author

I've demoted this PR from "ready-to-merge" to "draft".

This was originally written to resolve an issue one of our clients was having, they have since solved the issue themselves by performing deduplication of the data before importing it.

As noted by @Joxit, each deployment may wish to configure this differently, so we'd probably need to put it behind a config flag before merging it (if we decide to do so).

@missinglink
Copy link
Member Author

@rowanwins, I would be fine with reopening this PR and pursuing getting it merged if you find it useful.

Since it's a dramatic change for existing users we would have to put it behind a feature flag which defaulted to being off.
That means it wouldn't be available on geocode.earth for instance, but would allow developers such as yourself to continue working on unit number functionality in their local environment/own deployments.

If at a later date we could show that this change doesn't make results super noisy then we could make a subsequent PR which changes that feature flag to be on by default, pending a plus one from the core contributors, which would make it available by default on all installations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants