md5sum mismatch #1

senwu · 2020-08-07T22:55:54Z

Thanks for sharing this awesome work. Really appreciate!!!

I want to use this revised TACRED dataset for my study, while I found my md5 checksums don't match the ones mentioned in the README.

Here are my md5checksums:

1c090c0e3861d6ecccfd199fdf439bed  train.json 
393e7200a63ffd10a16072cbbee464dd  dev.json
d287fb2377747b74e6feae2e2bcd9264  dev_rev.json
aba500ef2f60c32bc41e366383e8cda8  test.json
4c9dfcb4c8d523420dbf0f34858362f3  test_rev.json

Also, from the patch files, I found there are 1590 samples and 936 samples in dev and test files. (Seems like those numbers doesn't match the numbers reported in the paper?)

Please let me know if I am doing anything wrong? Thanks!

The text was updated successfully, but these errors were encountered:

ChristophAlt · 2020-08-11T08:45:27Z

@senwu Thanks for your interest in our work.

Let's see if we can narrow down the issue.

The checksums of the original TACRED (train.json, dev.json, and test.json) match, so this is fine. Could you tell me a little more about your setting, e.g., operating system and python version. It could be that storing a json behaves differently, e.g., line endings, on different platforms.

Your observation is correct, there are less samples in the patch files than reported in the paper. The reason is that the number of "revised" samples also includes those that were assigned a second label by our annotators. As the TACRED format does not support multiple labels per sample, we chose not to patch those instances.

liviosoares · 2020-09-29T13:59:12Z

Just wanted to report, in case its helpful, that I had the same MD5sum problem as @senwu originally when using Python2. When running with Python3, the MD5s were consistent with the ones published by @ChristophAlt.

I have not spent time identifying if the problem is just JSON formatting differences or whether there are other potentially important content differences.

ChristophAlt · 2020-10-02T15:58:21Z

@liviosoares Thank you! Your feedback is very much appreciated. I'll try to identify the root cause of the problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

md5sum mismatch #1

md5sum mismatch #1

senwu commented Aug 7, 2020

ChristophAlt commented Aug 11, 2020

liviosoares commented Sep 29, 2020

ChristophAlt commented Oct 2, 2020

md5sum mismatch #1

md5sum mismatch #1

Comments

senwu commented Aug 7, 2020

ChristophAlt commented Aug 11, 2020

liviosoares commented Sep 29, 2020

ChristophAlt commented Oct 2, 2020