You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for building the model for Chinese.
When I tried to use your model to calculate semantic similarity (see bellowed)
import torch
from transformers import (BertTokenizerFast,AutoModel,)
I expected to see cs1>cs2 but it is not the case. I wonder how do you interpret the results which higher similarity occurs between unrelated words? I wonder what can I do to improve the results of semantic related from your model Thanks!
Sincerely,
Veda
The text was updated successfully, but these errors were encountered:
Dear experts,
Thanks for building the model for Chinese.
When I tried to use your model to calculate semantic similarity (see bellowed)
import torch
from transformers import (BertTokenizerFast,AutoModel,)
tokenizer = BertTokenizerFast.from_pretrained('bert-base-chinese')
albert_model = AutoModel.from_pretrained('ckiplab/albert-tiny-chinese')
def encode_text(text):
text_code = tokenizer(text,padding=True,truncation=True,return_tensors='pt')
input_ids = text_code['input_ids']
attention_mask = text_code['attention_mask']
token_type_ids = text_code['token_type_ids']
print('input_ids',input_ids)
print('attention_mask',attention_mask)
print('token_type_ids',token_type_ids)
with torch.no_grad():
output = albert_model(input_ids,attention_mask=attention_mask,token_type_ids=token_type_ids)
embed = output.pooler_output
return embed
cs1=cosine_similarity(encode_text('蘋果'),encode_text('鳳梨'))
cs2=cosine_similarity(encode_text('蘋果'),encode_text('塑膠'))
I expected to see cs1>cs2 but it is not the case. I wonder how do you interpret the results which higher similarity occurs between unrelated words? I wonder what can I do to improve the results of semantic related from your model Thanks!
Sincerely,
Veda
The text was updated successfully, but these errors were encountered: