You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @yangjingyi. I am getting ~76 average F1 score using spanbert_large (bert_pretrained_name_or_path = SpanBERT/spanbert-large-cased). I am getting ~76 F1 both with training a new spanbert_large model and also with evaluating the model given in this repo (https://cs.emory.edu/~lxu85/train_spanbert_large_ml0_d2.tar) by using "train_spanbert_large_ml0_d2" config i.e. coref_depth=2 (+AA) for being equivalent to Joshi et al. 2020. Were you able to reproduce ~79 avg F1 score by just evaluating the model provided in this repo?
It turned out that the data needs to be tokenized using "bert-base-cased" with the model as "SpanBERT/spanbert-large-cased". I am able to reproduce ~79 now. Earlier I was using "SpanBERT/spanbert-large-cased" for tokenization also (which gave ~76).
Hi,
I use spanbert large model with default parameters in config file, and I get Avg F1 78.27, lower than Avg.F1 79.9 in paper.
config as following:
num_docs = 2802
bert_learning_rate = 1e-05
task_learning_rate = 0.0003
max_segment_len = 512
ffnn_size = 3000
cluster_ffnn_size = 3000
max_training_sentences = 3
bert_tokenizer_name = bert-base-cased
max_top_antecedents = 50
max_training_sentences = 5
top_span_ratio = 0.4
max_num_extracted_spans = 3900
max_num_speakers = 20
max_segment_len = 256
Learning
bert_learning_rate = 1e-5
task_learning_rate = 2e-4
loss_type = marginalized # {marginalized, hinge}
mention_loss_coef = 0
false_new_delta = 1.5 # For loss_type = hinge
adam_eps = 1e-6
adam_weight_decay = 1e-2
warmup_ratio = 0.1
max_grad_norm = 1 # Set 0 to disable clipping
gradient_accumulation_steps = 1
Model hyperparameters.
coref_depth = 1 # when 1: no higher order (except for cluster_merging)
higher_order = attended_antecedent # {attended_antecedent, max_antecedent, entity_equalization, span_clustering, cluster_merging}
coarse_to_fine = true
fine_grained = true
dropout_rate = 0.3
ffnn_size = 1000
ffnn_depth = 1
cluster_ffnn_size = 1000 # For cluster_merging
cluster_reduce = mean # For cluster_merging
easy_cluster_first = false # For cluster_merging
cluster_dloss = false # cluster_merging
num_epochs = 24
feature_emb_size = 20
max_span_width = 30
use_metadata = true
use_features = true
use_segment_distance = true
model_heads = true
use_width_prior = true # For mention score
use_distance_prior = true # For mention-ranking score
Other.
conll_eval_path = dev.english.v4_gold_conll # gold_conll file for dev
conll_test_path = test.english.v4_gold_conll # gold_conll file for test
genres = ["bc", "bn", "mz", "nw", "pt", "tc", "wb"]
eval_frequency = 1000
report_frequency = 100
The text was updated successfully, but these errors were encountered: