-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(WIP) Triple classification #106
base: master
Are you sure you want to change the base?
Conversation
…sec on fb15k, implemented an alternative way to find the thresholds, different slight changes
…which were not used in other implementations, include unique condition while sampling negatives from list to ensure same probability, Change find_thresholds so that the smallest score which gives the highest accuracy is used as threshold
…sary specifications in config files
…sholds, added comments for triple classification specification in default file, Included specification of evaluating on either test or valid data depending on the task (Test or validation during train)
…e_metrics, updated comment documentation, easier way to retrieve labels per relations in generate function, minor simplifications and error fixings
34ec99c
to
8a4416f
Compare
@@ -11,8 +11,11 @@ lookup_embedder.dim: 100 | |||
lookup_embedder.initialize: xavier_uniform_ | |||
eval: | |||
type: triple_classification | |||
metrics_per.relation: False | |||
triple_classification_random_seed: False | |||
triple_classification.random_seed: False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
triple_classification.random_seed is not used anymore
@@ -423,6 +423,14 @@ valid: | |||
|
|||
## EVALUATION ################################################################## | |||
|
|||
triple_classification: | |||
random_seed: False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here: random_seed can be removed
@@ -31,6 +36,7 @@ def __init__(self, config: Config, configuration_key: str, dataset: Dataset): | |||
|
|||
def _prepare(self,): | |||
train_data = self.dataset.split("train") | |||
#TODO probably outdated as it refers to out-commented code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct
@@ -20,7 +20,12 @@ def __init__(self, config: Config, configuration_key: str, dataset: Dataset): | |||
self.o_entities = None | |||
uni_sampler_config = config.clone() | |||
# uni_sampler_config.set("negative_sampling.num_samples.s", self.get_option("num_samples.s")) | |||
# TODO this is redundant as uniform.sample() is called with "num_samples" here in self.sample() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that is correct
# TODO maybe changing the API of KGEsampler.sample() to also accept a param "filter" | ||
# as it is the case already with "num_samples" | ||
# then we would not rely here on configuration options which actually | ||
# belong to a training job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. Currently the negative sampling config is mixing the Job and the Sampler. This should be split up in a separate PR.
@@ -121,7 +141,8 @@ def _prepare(self): | |||
|
|||
self.config.log("Generate data with corrupted and true triples...") | |||
|
|||
if self.eval_split == "test": | |||
# TODO maybe should be generalized to allow for other splits as valid_wo_unseen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree, see my comment #115 (comment)
|
||
positives_test = self.dataset.split("test") | ||
negatives_test = self.dataset.split("test_negatives") | ||
self.tune_data = torch.cat((positives_test, negatives_test)).to( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eval_data
negatives_test = self.dataset.split("test_negatives") | ||
self.tune_data = torch.cat((positives_test, negatives_test)).to( | ||
self.device) | ||
self.tune_labels = torch.cat( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eval_labels
I didn't want this PR go to waste so I fixed it up to its current state. It works now except for the following, that would have to be adressed:
There is an incomplete attempt from me to integrate the TC datasets (where valid and test have a label) WN11 and FB13 into our dataset framework. The question here is, how we want to proceed here. My plan was to hae a
load_labels()
function similar toload triples()
.