This organization brings together tools developed in the Grosenick lab at Cornell for using modern AI/ML methods for facial posture tracking, utterance embedding, and vocal prosody representation to track affective state in one or more people recorded using video.
Currently we have made our pipeline for processessing video files to perform speaker diarization and extract combined multimodal features (acoustic, linguistic, and facial features) available in the MultimodalFeatureSync repository. See the README in that repo for usage and license. This code has been alpha tested and is ready to try!
Coming soon: in fall 2024 we will also openly release our integrated code for embedding the data streams produced by MultimodalFeatureSync for tracking affect as a latent variables in individuals and dyads!