how i fixed many errors in my png annotation import #224
Replies: 3 comments 3 replies
-
what about |
Beta Was this translation helpful? Give feedback.
-
at one point, I was not getting the same results as Lucas. To get the same results, I needed to make sure I had everything the same, namely:
|
Beta Was this translation helpful? Give feedback.
-
I've now received a new version of the annotation, and in this one the tier that needs to be imported is called "final". Otherwise, the names are the same as before, so I should be able to reuse the import scripts to over-write raw & converted, by just changing which tier is analyzed. I found two scripts:
To execute this script, I had to unlock the files:
|
Beta Was this translation helpful? Give feedback.
-
Prep work
before starting, I make sure I have the latest version & create a new branch. Png is special because it's shared between LAAC and EL1k. The idea is that EL1k users won't update png, it's going to be maintained by us and some of the updates will be shared with EL1k - therefore, we work on LAAC-LSCP's version and push some of the work to EL1k.
So I need to get the laac version. I download it:
And right now, given the state of affairs, I need to get on the eaf branch, and create a branch from there:
The first command threw:
so to be safe I did
git log
and compared the history with that online.Mis-named annotation path
to begin with, i mis-named the path: right now it's mc, rather than eaf/mc.
I followed renaming annotations, which in this case meant:
this more complex version of push is only necessary because it's my first push since I created the branch
finding which one file is missing and why
There are 24 .eaf but only 23 imported:
This was because I had made a mistake and assigned the same section of the audio to one of the files. Fixed by fixing the assignment table.
contents of files from the same child are identical
This is clearly a bug in the import script.
purging duplicates from annotations.csv
The safest way to fix this is to remove all the annotations, fix all errors and import again.
i did that and then looked at n of lines of annotations.csv (
wc -l metadata/annotations.csv
) and n of files in each of the converted folders (ls annotations/*/converted/ | wc -l
).There are 309 lines in annotations.csv but only 206 files in annotations/*/converted/. This could mean there's some extra stuff in annotations.csv. The extra lines are vtc or alice, so they should not be affecting my current analyses. Plus, there can be extra lines related to how vtc and alice are imported, which means that i shouldn't just look at the number of lines. Instead, i can do things like validating the project (which would detect duplicates if there are any). So this is NOT an issue at the end.
I then made edits to my import_eaf.py script and tried the import again:
One bug that took me a while to find was:
That was weird, since :
(ChildProjectVenv) (base) Mac-mini:png2019 acristia$ ls annotations/eaf/mc/raw/c23_20190822_2015.eaf
but this is just a pointer! So I needed to remember to do:
and then try my script:
This gave me some warnings like:
I also opened a couple of the converted files, and they looked correct.
I then checked that annotations are functional within the package:
This showed errors like:
That means the .its need to be re-imported.
I tried
which yielded
This is because I forgot to get the its raw, so
I got some errors and warnings that I think should be ignored and may be changed in the future (see #229).
In a nutshell, at this point, my annotations did not pass validation. I hesitated about pushing (which allowed others to see what I had done and help me fix mistakes) or not. @lucasgautheron said " In general, it is fine to push code that does not pass the tests in a feature branch as long as the master branch and tagged commits (i.e. releases) pass the tests". Since I'm working on my own branch, I pushed it with
datalad save . -m "cleaning"
followed bydatalad push
. I wish I hadn't done that, since that saves everything (including the annotations) and that takes a LONG time. Instead, I should have done:Notice that will update metadata and scripts, but not
annotations/
.Beta Was this translation helpful? Give feedback.
All reactions