Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow_no_match doesn't mean allow_multiple_match #249

Open
jpmckinney opened this issue Feb 4, 2017 · 19 comments
Open

allow_no_match doesn't mean allow_multiple_match #249

jpmckinney opened this issue Feb 4, 2017 · 19 comments
Labels

Comments

@jpmckinney
Copy link
Member

jpmckinney commented Feb 4, 2017

Why was this change made? 0c2cb44

If you don't want to raise any errors at all, add a parameter like dont_raise_errors or something. I depend on multiple matches raising errors. allow_no_match is internal to Pupa - I can't set it in my code. The cases where allow_no_match=True make sense to allow no matches. It doesn't make sense for all those cases to also mean 'allow multiple matches'.

@jpmckinney
Copy link
Member Author

@jamesturk

@jpmckinney jpmckinney added the bug label Feb 4, 2017
@jamesturk
Copy link
Member

jamesturk commented Feb 4, 2017 via email

@jamesturk
Copy link
Member

jamesturk commented Feb 4, 2017 via email

@fgregg
Copy link
Contributor

fgregg commented Feb 4, 2017 via email

@jamesturk
Copy link
Member

Hmm, I'm not sure I understand the use case? It is still unable to match, just for a different reason.

@fgregg
Copy link
Contributor

fgregg commented Feb 4, 2017 via email

@jpmckinney
Copy link
Member Author

jpmckinney commented Feb 4, 2017

Same as @fgregg. I never want to allow multiple matches without a loud error.

(The earlier behavior had a bug in that if allow_no_match was set to True, it sometimes still raised errors if it couldn't match because of multiple results)

That's not a bug. That's what 'allow no matches' means! allow_no_match doesn't mean allow_zero_or_multiple_matches.

@jamesturk
Copy link
Member

jamesturk commented Feb 4, 2017 via email

@jamesturk
Copy link
Member

I get that reading, but if you look at how it is used: allow_no_match -> allow this method to return None, not allow the DB to have no results

@fgregg
Copy link
Contributor

fgregg commented Feb 4, 2017

So what should happen if an attempt is made to resolve a bill sponsor 'Smith' and there are two matches?

I would like pupa to throw an error and grind to a halt.

@jamesturk
Copy link
Member

That'll need to be a setting, that absolutely won't work for the volume of ambiguous legislators, we have hundreds.

@jpmckinney
Copy link
Member Author

jpmckinney commented Feb 4, 2017

I guess it should be named allow_failure to avoid ambiguity.

There seem to be two use cases. One where you disambiguate objects in code by adding birth dates to legislators, for example, and one where you disambiguate using the features in admin views.

I have always disambiguated objects in code, but I have far fewer related objects requiring resolution.

@fgregg
Copy link
Contributor

fgregg commented Feb 4, 2017

If you don't throw an error, you will have thousands of pieces of legislation where you don't know who sponsored it.

@jamesturk
Copy link
Member

You can't add a birth date to a sponsor, voter, etc. as all you'll have is their last name.

It is completely OK w/in the db schema for there to be sponsors, etc. with no leg_id set, that was an intended feature from the get-go (to not force perfect resolution if there's ambiguity), which is why this behavior was considered broken.

@jamesturk
Copy link
Member

Very true, but that ambiguity is often true upstream as well. Plus, we're already allowing None to be returned if it wasn't a recognized person.

Consider these cases:

  • States where when Joe & John Smith vote they call both J.Smith
  • States where a sponsor isn't a known entity (we've seen bills w/ sponsors listed that are executive agencies, boy scout troops, etc. and not legislators)
  • States where there's a typo in a sponsor's name "John Smih"

I'm pretty sure that all three should be treated the same, that is- record the textual name and fail to resolve the person_id. It doesn't seem like the first is any different from the other two.

@jpmckinney
Copy link
Member Author

jpmckinney commented Feb 4, 2017

Ah, true, we only _make_pseudo_id with name, which is what resolve_json_id resolves.

I think I was conflating the resolution here with other DB resolution (SameNameError).

@jpmckinney
Copy link
Member Author

Leaving open in case @fgregg has use case for old behavior.

@fgregg
Copy link
Contributor

fgregg commented Feb 4, 2017

@jamesturk

So, to me, there are two main differences.

Interaction between types of bugs and under and overmatching

Bugs in scraper code and bugs on websites can both lead to under and overmatching. In my experience, overmatching is likely a bug in the code and undermatching is likely a bug in the website.

Correct behavior sometimes depends on allowing zero matches

I just wrote some code to allow for the resolution of posts that do not have district identifiers within memberships. These types of memberships have the same signature as memberships without posts. In order to allow for both "post-less" memberships and memberships associated with posts without districts, I'm taking advantage of the no matching behavior.

In this case, I am not making concessions to a messy reality. I've probably overloaded the intent of allow_no_match to allow for accurate representation. I have no case, and can't think of a case, where the correct representation would depend on allow_many_matches.

@jamesturk
Copy link
Member

jamesturk commented Feb 4, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants