CS-lol: a Dataset of Viewer Comment with Scene in E-sports Live-streaming
The dataset CS-lol follows CC-BY-NC-SA-4.0. Thus, this dataset are freely available for academic purpose or individual research, but restricted for commecial use.
the source code YouTube-vtt-to-srt.py is inherited from ytb-vtt-to-srt project, which follows the MPL v2.0 license.
if you use the developed tools or dataset from this work, please kindly cite our paper:
@inproceedings{cslol,
author = {Xu, Junjie H. and Nakano, Yu and Kong, Lingrong and Iizuka, Kojiro},
title = {CS-Lol: A Dataset of Viewer Comment with Scene in E-Sports Live-Streaming},
year = {2023},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
booktitle = {Proceedings of the 2023 Conference on Human Information Interaction and Retrieval},
pages = {422–426},
location = {Austin, TX, USA},
series = {CHIIR '23}
}
According to Official narrative by Twitch
8. User Content
Twitch allows you to distribute streaming live and pre-recorded audio-visual works;
to use services, such as chat, bulletin boards, forum postings, wiki contributions,
and voice interactive services; and to participate in other activities
in which you may create, post, transmit, perform, or store content,
messages, text, sound, images, applications, code, or other data
or materials on the Twitch Services (“User Content”).
Follow aforementioned terms, we conclude the use of user comments (chat while watching live-streaming) are license-free. Moreover, considering the process of constructing dataset takes time and different construction process might case results varies, we distribute the raw dataset used in this work.
All of contents used in this dataset use subtitles that automatically generated by integrated ASR system in Youtube. However Youtube itself does not seems to have the license for those subtitles, as it is a kind of transformation of creation of other people, namely (impromptu) transcripts written by commentators.
Due to not sure about the license, alternatively we provide the script to download those subtitles as well as preprocessing script to transform it into the data expected using our implementation of retrieval models.
To get Descriptions please follows README.md
if you want to have request or question, feel free to send issue here, send your request via email to jhxu dotto acm.org or DM me in twitter.
Thank you for your attention!