Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train on spanbert large, but get F1 1 point lower than presented in paprer #16

Open
yangjingyi opened this issue Jul 16, 2021 · 2 comments

Comments

@yangjingyi
Copy link

yangjingyi commented Jul 16, 2021

Hi,

I use spanbert large model with default parameters in config file, and I get Avg F1 78.27, lower than Avg.F1 79.9 in paper.
config as following:

num_docs = 2802
bert_learning_rate = 1e-05
task_learning_rate = 0.0003
max_segment_len = 512
ffnn_size = 3000
cluster_ffnn_size = 3000
max_training_sentences = 3
bert_tokenizer_name = bert-base-cased

max_top_antecedents = 50
max_training_sentences = 5
top_span_ratio = 0.4
max_num_extracted_spans = 3900
max_num_speakers = 20
max_segment_len = 256

Learning

bert_learning_rate = 1e-5
task_learning_rate = 2e-4
loss_type = marginalized # {marginalized, hinge}
mention_loss_coef = 0
false_new_delta = 1.5 # For loss_type = hinge
adam_eps = 1e-6
adam_weight_decay = 1e-2
warmup_ratio = 0.1
max_grad_norm = 1 # Set 0 to disable clipping
gradient_accumulation_steps = 1

Model hyperparameters.

coref_depth = 1 # when 1: no higher order (except for cluster_merging)
higher_order = attended_antecedent # {attended_antecedent, max_antecedent, entity_equalization, span_clustering, cluster_merging}
coarse_to_fine = true
fine_grained = true
dropout_rate = 0.3
ffnn_size = 1000
ffnn_depth = 1
cluster_ffnn_size = 1000 # For cluster_merging
cluster_reduce = mean # For cluster_merging
easy_cluster_first = false # For cluster_merging
cluster_dloss = false # cluster_merging
num_epochs = 24
feature_emb_size = 20
max_span_width = 30
use_metadata = true
use_features = true
use_segment_distance = true
model_heads = true
use_width_prior = true # For mention score
use_distance_prior = true # For mention-ranking score

Other.

conll_eval_path = dev.english.v4_gold_conll # gold_conll file for dev
conll_test_path = test.english.v4_gold_conll # gold_conll file for test
genres = ["bc", "bn", "mz", "nw", "pt", "tc", "wb"]
eval_frequency = 1000
report_frequency = 100

@sm354
Copy link

sm354 commented Feb 17, 2022

Hi @yangjingyi. I am getting ~76 average F1 score using spanbert_large (bert_pretrained_name_or_path = SpanBERT/spanbert-large-cased). I am getting ~76 F1 both with training a new spanbert_large model and also with evaluating the model given in this repo (https://cs.emory.edu/~lxu85/train_spanbert_large_ml0_d2.tar) by using "train_spanbert_large_ml0_d2" config i.e. coref_depth=2 (+AA) for being equivalent to Joshi et al. 2020. Were you able to reproduce ~79 avg F1 score by just evaluating the model provided in this repo?

@lxucs do you know any possible reasons for not being able to reproduce spanbert (+AA) scores (using the model provided here https://cs.emory.edu/~lxu85/train_spanbert_large_ml0_d2.tar)?

@sm354
Copy link

sm354 commented Feb 17, 2022

It turned out that the data needs to be tokenized using "bert-base-cased" with the model as "SpanBERT/spanbert-large-cased". I am able to reproduce ~79 now. Earlier I was using "SpanBERT/spanbert-large-cased" for tokenization also (which gave ~76).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants