Skip to content

Commit

Permalink
Update id_column and id_column_type names in PyTest (#347)
Browse files Browse the repository at this point in the history
* update id_column names in pytest

Signed-off-by: Sarah Yurick <[email protected]>

* edit

Signed-off-by: Sarah Yurick <[email protected]>

* edit config

Signed-off-by: Sarah Yurick <[email protected]>

---------

Signed-off-by: Sarah Yurick <[email protected]>
  • Loading branch information
sarahyurick authored Nov 7, 2024
1 parent 01bda47 commit 1e3ccc4
Showing 1 changed file with 6 additions and 4 deletions.
10 changes: 6 additions & 4 deletions tests/test_semdedup.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,15 +64,17 @@ def test_sem_dedup(
cache_dir = os.path.join(tmpdir, "test_sem_dedup_cache")
config = SemDedupConfig(
cache_dir=cache_dir,
id_col_name="id",
id_col_type="int",
input_column="text",
seed=42,
n_clusters=3,
eps_thresholds=[0.10],
eps_to_extract=0.10,
)
sem_duplicates = SemDedup(config=config)
sem_duplicates = SemDedup(
config=config,
input_column="text",
id_column="id",
id_column_type="int",
)
result = sem_duplicates(dedup_data)
result_df = result.df.compute()
duplicate_docs = [2, 3, 4, 200, 300]
Expand Down

0 comments on commit 1e3ccc4

Please sign in to comment.