CS-lol

CS-lol: a Dataset of Viewer Comment with Scene in E-sports Live-streaming

License

The dataset CS-lol follows CC-BY-NC-SA-4.0. Thus, this dataset are freely available for academic purpose or individual research, but restricted for commecial use.

the source code YouTube-vtt-to-srt.py is inherited from ytb-vtt-to-srt project, which follows the MPL v2.0 license.

Citation

if you use the developed tools or dataset from this work, please kindly cite our paper:


@inproceedings{cslol,
author = {Xu, Junjie H. and Nakano, Yu and Kong, Lingrong and Iizuka, Kojiro},
title = {CS-Lol: A Dataset of Viewer Comment with Scene in E-Sports Live-Streaming},
year = {2023},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
booktitle = {Proceedings of the 2023 Conference on Human Information Interaction and Retrieval},
pages = {422–426},
location = {Austin, TX, USA},
series = {CHIIR '23}
}

About usage of CS-lol

Twitch (Comments)

According to Official narrative by Twitch

8. User Content
Twitch allows you to distribute streaming live and pre-recorded audio-visual works;
to use services, such as chat, bulletin boards, forum postings, wiki contributions, 
and voice interactive services; and to participate in other activities 
in which you may create, post, transmit, perform, or store content,
messages, text, sound, images, applications, code, or other data 
or materials on the Twitch Services (“User Content”).

Follow aforementioned terms, we conclude the use of user comments (chat while watching live-streaming) are license-free. Moreover, considering the process of constructing dataset takes time and different construction process might case results varies, we distribute the raw dataset used in this work.

Youtube (Discriptions)

All of contents used in this dataset use subtitles that automatically generated by integrated ASR system in Youtube. However Youtube itself does not seems to have the license for those subtitles, as it is a kind of transformation of creation of other people, namely (impromptu) transcripts written by commentators.

Due to not sure about the license, alternatively we provide the script to download those subtitles as well as preprocessing script to transform it into the data expected using our implementation of retrieval models.

To get Descriptions please follows README.md

Need help?

if you want to have request or question, feel free to send issue here, send your request via email to jhxu dotto acm.org or DM me in twitter.

Thank you for your attention!

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
lolcs		lolcs
Comment_Retrieval_using_Anserini_(for_GitHub).ipynb		Comment_Retrieval_using_Anserini_(for_GitHub).ipynb
LICENSE		LICENSE
README.md		README.md
YouTube-vtt-to-srt.py		YouTube-vtt-to-srt.py
srt_to_jsonl.py		srt_to_jsonl.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS-lol

License

Citation

About usage of CS-lol

Twitch (Comments)

Youtube (Discriptions)

Need help?

About

Releases

Packages

Contributors 2

Languages

License

junj1ehx/CS-lol

Folders and files

Latest commit

History

Repository files navigation

CS-lol

License

Citation

About usage of CS-lol

Twitch (Comments)

Youtube (Discriptions)

Need help?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages