New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

(WIP) Triple classification #106

Open

samuelbroscheit wants to merge 19 commits into master from triple_classification

Member

samuelbroscheit commented May 25, 2020

I didn't want this PR go to waste so I fixed it up to its current state. It works now except for the following, that would have to be adressed:

There is an incomplete attempt from me to integrate the TC datasets (where valid and test have a label) WN11 and FB13 into our dataset framework. The question here is, how we want to proceed here. My plan was to hae a load_labels() function similar to load triples().

atschalz and others added 19 commits

June 4, 2020 15:36


          Implemented triple classification

49ed5ef


          got rid of unnecessary codelines, Improved classification time to ~15…

a6aec4f

…sec on fb15k, implemented an alternative way to find the thresholds, different slight changes


          Integrate _prepare function, delete conditions in generate_negatives …

af64a54

…which were not used in other implementations, include unique condition while sampling negatives from list to ensure same probability, Change find_thresholds so that the smallest score which gives the highest accuracy is used as threshold


          Fixed some minor Todos, improved documentation

c18e5f8


          Make printing out predictions per relation optionally, delete unneces…

b90c9d5

…sary specifications in config files


          Improved in-code documentation, removed accuracy output from get_thre…

f9d8feb

…sholds, added comments for triple classification specification in default file, Included specification of evaluating on either test or valid data depending on the task (Test or validation during train)


          vectorized prediction function, gt rid of unnecessary part in _comput…

ef7a7f8

…e_metrics, updated comment documentation, easier way to retrieve labels per relations in generate function, minor simplifications and error fixings


          Update

1d92974


          Moved sampling function to sampler.py, updated code documentation

fc660a1


          final updates

51c8ab2


          Imporve and update code

1e609a7


          config

b0b7791


          TC works now for datasets without neg samples

162ff38


          Fix neg sampling with filtering for unseen sp, po in train


          Remove uneeded stuff

4df0c3b


          Init supporting TC datasets

bb5a575


          config

2b98780


          Add preprocess functionality for wn11

5b1a5b4


          Allow to use labels for triple classification from data

8a4416f

rgemulla force-pushed the triple_classification branch from 34ec99c to 8a4416f Compare

June 5, 2020 11:41

samuelbroscheit commented

View reviewed changes

examples/toy-complex-train-tripleclass.yaml

@@ @@ -11,8 +11,11 @@ lookup_embedder.dim: 100 @@
               lookup_embedder.initialize: xavier_uniform_
               eval:
                 type: triple_classification
-                metrics_per.relation: False
-                triple_classification_random_seed: False
+              triple_classification.random_seed: False

Member Author

samuelbroscheit Jun 21, 2020

triple_classification.random_seed is not used anymore

kge/config-default.yaml

@@ @@ -423,6 +423,14 @@ valid: @@
               ## EVALUATION ##################################################################
+              triple_classification:
+                random_seed: False

Member Author

samuelbroscheit Jun 21, 2020

same here: random_seed can be removed

kge/job/triple_classification.py

                   def _prepare(self,):
                       train_data = self.dataset.split("train")
+                      #TODO probably outdated as it refers to out-commented code

Member Author

samuelbroscheit Jun 21, 2020

correct

kge/job/triple_classification.py

                       self.o_entities = None
                       uni_sampler_config = config.clone()
                       # uni_sampler_config.set("negative_sampling.num_samples.s", self.get_option("num_samples.s"))
+                      # TODO this is redundant as uniform.sample() is called with "num_samples" here in self.sample()

Member Author

samuelbroscheit Jun 21, 2020

yes that is correct

kge/job/triple_classification.py

Comment on lines +25 to +28

+                      # TODO maybe changing the API of KGEsampler.sample() to also accept a param "filter"
+                      #  as it is the case already with "num_samples"
+                      #  then we would not rely here on configuration options which actually
+                      #  belong to a training job

Member Author

samuelbroscheit Jun 21, 2020

Absolutely. Currently the negative sampling config is mixing the Job and the Sampler. This should be split up in a separate PR.

kge/job/triple_classification.py

@@ @@ -121,7 +141,8 @@ def _prepare(self): @@
                       self.config.log("Generate data with corrupted and true triples...")
-                      if self.eval_split == "test":
+                      # TODO maybe should be generalized to allow for other splits as valid_wo_unseen

Member Author

samuelbroscheit Jun 21, 2020

Yes I agree, see my comment #115 (comment)

kge/job/triple_classification.py

+                          positives_test = self.dataset.split("test")
+                          negatives_test = self.dataset.split("test_negatives")
+                          self.tune_data = torch.cat((positives_test, negatives_test)).to(

Member Author

samuelbroscheit Jun 21, 2020

eval_data

kge/job/triple_classification.py

+                          negatives_test = self.dataset.split("test_negatives")
+                          self.tune_data = torch.cat((positives_test, negatives_test)).to(
+                              self.device)
+                          self.tune_labels = torch.cat(

Member Author

samuelbroscheit Jun 21, 2020

eval_labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet