Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor labeler.py #1065

Merged
merged 29 commits into from
Sep 2, 2022
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
70f8f31
Use conventions for fit(), fit_transform(), etc.
NickCrews Jun 12, 2022
7f3842b
Remove redundant setting of self.X, self.y
NickCrews Jun 12, 2022
e4e8697
Move BlockLearner type annotation to class
NickCrews Jun 12, 2022
56e2685
Fix mypy
NickCrews Jun 12, 2022
685a0db
Show error codes from mypy runs
NickCrews Jun 12, 2022
cd8782e
Tighten typing on ClaasifierProtocol
NickCrews Jun 17, 2022
d59d538
Don't store data_model in BlockLearner
NickCrews Jun 17, 2022
5bad271
Make BlockLearner.predict private
NickCrews Jun 17, 2022
0b0a117
Remove unneeded type hint BlockLearner.candidates
NickCrews Jun 17, 2022
3ded0bd
Delegate DisagreementLearner.learn_predicates()
NickCrews Jun 17, 2022
925030b
Rename RLRLearner -> MatchLearner, use composition
NickCrews Jun 17, 2022
ced6f4e
Remove unneeded ActiveLearner alias in api
NickCrews Jun 17, 2022
e3442ef
Remove fit() from DisagreementLearner API
NickCrews Jun 17, 2022
5386d6f
Simplify DisagreementLeaner.__init__()
NickCrews Jun 17, 2022
62ad7ca
Fixup: rename classifier to matcher
NickCrews Jun 17, 2022
24a2609
Simplify DisagreementLearner.candidate_scores
NickCrews Jun 17, 2022
d77c118
Test more-public interface of labeler
NickCrews Jun 17, 2022
3e36957
Rename _cached_labels -> _Cached_scores
NickCrews Jun 18, 2022
1b230ac
Overhaul labeler inheritance
NickCrews Jun 18, 2022
2605b29
Privatize DisagreementLearner.learners
NickCrews Jun 18, 2022
f9d88ec
Fixup: linting
NickCrews Jun 18, 2022
c21398f
Fixup: don't say HasCandidates.candidates is RO
NickCrews Jul 9, 2022
3dc3a96
Only have pytest config in pyproject.toml
NickCrews Jul 9, 2022
5b330cf
Remove unused .coveragerc
NickCrews Jul 9, 2022
81bf3f5
Always generate a html coverage report
NickCrews Jul 9, 2022
5d3a100
ValueError if candidate_scores() used before fit()
NickCrews Jul 9, 2022
83b5293
Validate args to Learner.fit()
NickCrews Jul 9, 2022
918b824
Fixup mypy error
NickCrews Jul 9, 2022
1212a7b
Merge branch 'main' into labeler-rename
NickCrews Sep 1, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions dedupe/_typing.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,10 +81,14 @@ class TrainingData(TypedDict):


class Classifier(Protocol):
def fit(self, X: object, y: object) -> None:
"""Takes an array of pairwise distances and computes the likelihood they are a pair."""

def fit(self, X: numpy.typing.NDArray[numpy.float_], y: LabelsLike) -> None:
...

def predict_proba(self, X: object) -> numpy.typing.NDArray[numpy.float_]:
def predict_proba(
self, X: numpy.typing.NDArray[numpy.float_]
) -> numpy.typing.NDArray[numpy.float_]:
...


Expand Down
8 changes: 2 additions & 6 deletions dedupe/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -1298,8 +1298,6 @@ class Dedupe(ActiveMatching, DedupeMatching):
entity.
"""

ActiveLearner = labeler.DedupeDisagreementLearner

def prepare_training(
self,
data: Data,
Expand Down Expand Up @@ -1341,7 +1339,7 @@ def prepare_training(
# existing training data, so add them to data dictionary
examples, y = flatten_training(self.training_pairs)

self.active_learner = self.ActiveLearner(
self.active_learner = labeler.DedupeDisagreementLearner(
self.data_model,
data,
index_include=examples,
Expand All @@ -1361,8 +1359,6 @@ class Link(ActiveMatching):
Mixin Class for Active Learning Record Linkage
"""

ActiveLearner = labeler.RecordLinkDisagreementLearner

def prepare_training(
self,
data_1: Data,
Expand Down Expand Up @@ -1410,7 +1406,7 @@ def prepare_training(
# existing training data, so add them to data dictionaries
examples, y = flatten_training(self.training_pairs)

self.active_learner = self.ActiveLearner(
self.active_learner = labeler.RecordLinkDisagreementLearner(
self.data_model,
data_1,
data_2,
Expand Down
Loading