Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to generate "the preprocessed TED dataset" #62

Open
EmmaThompson123 opened this issue Jan 2, 2025 · 3 comments
Open

How to generate "the preprocessed TED dataset" #62

EmmaThompson123 opened this issue Jan 2, 2025 · 3 comments

Comments

@EmmaThompson123
Copy link

How to generate "the preprocessed TED dataset" on new videos, does is just need to execute the commands in https://github.com/youngwoo-yoon/youtube-gesture-dataset until make_ted_dataset.py ?

@youngwoo-yoon
Copy link
Collaborator

youngwoo-yoon commented Jan 3, 2025

Yes, that's correct. You need to review the results of each step to make sure the scripts work well with the new videos.
And you need to convert 2D poses to 3D poses. I used VideoPose3D for this.
Or it might be better to use recent 3D pose estimator.

@EmmaThompson123
Copy link
Author

It seems https://github.com/youngwoo-yoon/youtube-gesture-dataset can not generate the ted_db, I check make_ted_dataset.py, it only stores these key-values:

                    dataset[-1]['clips'].append({'words': clip_word_list,
                                                 'skeletons': clip_skeleton,
                                                 'start_frame_no': start_frame_no, 'end_frame_no': end_frame_no,
                                                 'vid': vid
                                                 })

I also check data_preprocessor.py in Gesture-Generation-from-Trimodal-Context, found it need these key-values:

    def _sample_from_clip(self, vid, clip):
        clip_skeleton = clip['skeletons_3d']
        clip_audio = clip['audio_feat']
        clip_audio_raw = clip['audio_raw']
        clip_word_list = clip['words']
        clip_s_f, clip_e_f = clip['start_frame_no'], clip['end_frame_no']
        clip_s_t, clip_e_t = clip['start_time'], clip['end_time']

as you can see, they are not matched. And the files under ted_db folder looks like this :
image

but from the code of make_ted_dataset.py, it seems save files like

    print('writing to pickle...')
    with open('ted_gesture_dataset_train.pickle', 'wb') as f:
        pickle.dump(dataset_train, f)
    with open('ted_gesture_dataset_train_small.pickle', 'wb') as f:  # for debugging
        pickle.dump(dataset_train[0:10], f)
    with open('ted_gesture_dataset_val.pickle', 'wb') as f:
        pickle.dump(dataset_val, f)
    with open('ted_gesture_dataset_test.pickle', 'wb') as f:
        pickle.dump(dataset_test, f)

So I doubt if https://github.com/youngwoo-yoon/youtube-gesture-dataset can generate the correct files like ted_db.
My scenario is input a folder of video, extract some features(like audio mel and pose etc), feed into the evaluation code of https://github.com/alvinliu0/HA2G/blob/5e1fd3343db7aa587db68a78397a1bbfea165132/scripts/synthesize_hierarchy.py#L450-L458 to calculate metric values

@youngwoo-yoon
Copy link
Collaborator

youngwoo-yoon commented Jan 7, 2025

Sorry for the confusion. Those are similar but not exactly the same. You need to update the code to get missing information. You need to run a 3D pose estimator for 'skeleton_3d'. The other keys 'audio_raw' and frame numbers are obvious. 'audio_feat' was not used in the final model of the tirmodal paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants