-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extracting Table of Contents (TOCs) for Articles #3
Comments
@mgns , I am interested in working on this project, can you please guide me on how to start working on the same. |
I added a warmup task to this idea. There are mainly two approaches to go for solving this task:
As the first one is the more straightforward solution, you should familiarize with the extraction framework. When writing your proposal, you will have to describe your suggested solution for the problem. |
Hello @mgns I'm also interested in working on this. When I am done with the warm up task, how should I let you know about my progress? |
Simply summarize your findings in a Google Doc and share it with me. |
@mgns would you like to give me your gmail? I want to share the result of the warm up task. |
Hi @khikmatullaev , |
Just share it to: [email protected]
Thanks!
… Am 14.03.2018 um 15:22 schrieb Akmal Khikmatullaev ***@***.***>:
@mgns would you like to give me your gmail? I want to share the result of the warm up task.
By the way, I did not find how I can add myself to the slack chat of DBpedia? Would you like to give me instruction?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Just invite me: [email protected]
Thanks!
… Am 14.03.2018 um 15:22 schrieb Akmal Khikmatullaev ***@***.***>:
@mgns would you like to give me your gmail? I want to share the result of the warm up task.
By the way, I did not find how I can add myself to the slack chat of DBpedia? Would you like to give me instruction?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Hi @mgns , I read the project description and I am interested in working on it. I just wanted to clear a few queries I had. Thanks! |
The project first should extract a TOC for each article in Wikipedia. The TOC should contain all headings and subheadings of the article with the respective label and order. |
Description
Each Wikipedia article is structured by headings and subheadings. These structures indicate the relevance of certain aspects for the described entity. Extracting such data can help in categorizing the entities and facts about the entity. E.g. cities usually have paragraphs on History, Geography and Demographics, while soccer clubs have paragraphs on Honours, Players and Stadiums. Obviously, there are pitfalls: E.g. these paragraphs are not uniformly captioned, thus an alignment (ideally to DBpedia resources) between variations would be helpful. The newly created dataset should follow Linked Data principles, e.g. a sufficiently expressive vocabulary should be used to describe TOCs (ideally as resources), the order of TOC entries, etc.
Optionally, it would be interesting to apply the dataset for a meaningful application, e.g. generating missing types.
Goals
Extract TOCs from article pages and produce an RDF dataset describing the article TOCs in a comprehensive way.
Impact
A new dataset which can be used in various ways. Insights in aspects of DBpedia entities.
Warm up tasks
Mentors
Magnus
Keywords
extraction
The text was updated successfully, but these errors were encountered: