Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal automatically info retrieving model #29

Open
maartensteinfort opened this issue Mar 25, 2015 · 3 comments
Open

Proposal automatically info retrieving model #29

maartensteinfort opened this issue Mar 25, 2015 · 3 comments

Comments

@maartensteinfort
Copy link
Collaborator

@korsvanloon and I have devised a model that is able to automatically provide and rank results, which actually are articles from Bloomberg (so we stay within the Bloomberg domain). Discussed it also already with @wrvangeest.

We should start with 10-30 articles. We leave DBpedia data out of our scope (besides entity names) and priority one is to retrieve Bloomberg articles. If this works, we can also process DBpedia data.

Basically the model consists of the following elements from AlchemyAPI:

  • Database with articles data [articleData]
  • Article Text [articleText]
  • URL [url]
  • Author [authorName]
  • Title [title]
  • Date [datestamp]
  • Entities [entity, relevance, type, subType]
  • Keywords [keyword, relevance]
  • Annotation Request [annotationRequest]
    • Phrase [phrase]
    • Paragraph [paragraph]
    • URL [url]
    • Author [authorName]
    • Title [title]
    • Date [datestamp]
    • Entities [entity, relevance, type, subType]
    • Keywords [keyword, relevance]

Workfow

  1. User selects phrase that is within a paragraph

  2. paragraph is put into AlchemyAPI and elements (described above) from annotationRequest are returned

  3. The keyword and entity-elements from annotationRequest are matched with elements from articleData

    • keyword and entity-elements from annotationRequest are compared with keyword and entity-elements from articleData based on relevance
    • The relevance from the articles (articleData), that have highest relevance on keyword and entity are summed.
    • Thus:
    while **articleData[entity, keyword]** = **annotationRequest[entity, keyword]** { 
    sum the highest articleData[relevance] for the specific relating article
    } 
    
  4. The keyword and entity-elements are multiplied with value X (since these are selected by the user, hence more relevant)

  5. Rank articles from articleData by summed relevance

What we now have is a list with articles that are most 'relevant' to the requested annotation. I hope that you understand this explanation...

*Next step: * Retrieve the most 'relevant' paragraph/ piece of text, so that the user does not have to read the 'whole' article.

If necessary we can fill the database manually with the data from AlchemyAPI from 10-30 articles. But Wiebe said that Herman was working on (or already has) an automatic link to store the mentioned values of AlchemyAPI in our database. @hermanbanken is that true?

@maartensteinfort
Copy link
Collaborator Author

Example

Paragraph:

Tsipras has also stepped up calls for war reparations from Germany for the Nazi occupation during World War II and Greek Finance Minister Yanis Varoufakis has been locked in a war of words with his German counterpart Wolfgang Schaeuble. Last week, the Greek government officially complained about Schaeuble’s conduct, to which Schaeuble replied that the whole matter was “absurd.”

Entities:

schermafbeelding 2015-03-25 om 19 40 10

Keywords:

schermafbeelding 2015-03-25 om 19 40 33

Then these are compared with the matching entities and keywords from all other articles, while the ones with highest relevance are chosen, and summed up in the end to rank the articles.

@maartensteinfort
Copy link
Collaborator Author

@rubenverboon @hermanbanken, graag jullie feedback hierover

@hermanbanken
Copy link
Collaborator

Sorry for slacking to respond here. @maartensteinfort I guess we already discussed the first question in person when we were together. You reminded us to respond but I fail to see the remaining question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants