NaBIC 2012 presentation
Slides of our presentation called What's going on out there right now? A beehive based machine to give snapshots of the ongoing stories on the Web, presented at NaBIC 2012 conference, Mexico City, Mexico.
Published on: Mar 3, 2016
Transcripts - NaBIC 2012 presentation
Whats going on out there right now? A beehive based machine to give snapshot ofthe ongoing stories on the WebŠtefan Sabo and Pavol Návrat email@example.com, firstname.lastname@example.org
General overview• Method to extract keywords related to stories from news articles is proposed.• Multiple agents inspired by honey bees foraging for food are used.• Connections between articles are explored one keyword at a time.• Most promising keywords that provide links between articles are propagated, uninteresting keywords are discarded.
Outline of presentation• Motivation• Method overview• Results• Summary• Future work
Motivation• News stories are often represented by terms that identify the story by providing an easily recognizable label for it.• These keywords are interesting for navigation in the space of news stories.• It is difficult to predict in advance which articles will develop into stories over time and which keywords will represent them.• Dynamic system is needed to follow new articles and account for the changes in the old ones.• Corpus of all the articles in unavailable.
Method overview• Most representative keywords are chosen by comparing relevance of multiple articles to a given keyword.• If two articles are both relevant to a keyword a link is established between them.• Keywords that provide links between most articles are selected as most interesting.• Comparison between every two articles regarding every keyword would be impractical.• To facilitate the process of comparison, the process is performed by a swarm of agents inspired by honey bees.
Method overview - agents• Every agent carries a single keyword at a time and can independently perform one of 3 actions: o foraging – comparing articles o dancing – propagating its current keyword o observing – selecting a new keyword• Based on the keyword quality, an agent may decide to propagate an interesting keyword through dancing or select a new keyword through observation.• This mechanism focuses the swarm on the most interesting keywords for currently visited articles.
Results• News articles from Reuters web page have been checked daily for a period of 9 days.• 298 unique keywords had been identified.• On average, 287 articles have been assigned a keywords every day.• Increased prevalence of proper nouns amongst the top keywords can be noted.
Results – best keywordskeyword n (k) n (k) / N keyword n (k) n (k) / NSyria 177.30 6.87 % court 49.90 1.93 %Egypt 98.10 3.80 % ECB 49.85 1.93 %Apple 92.65 3.59 % attack 49.41 1.91%Afghan 78.23 3.03 % Colorado 41.79 1.62 %Euro 75.50 2.92 % trial 28.90 1.12 %shooting 56.32 2.18 % Libor 27.75 1.07 %Samsung 55.71 2.16 % murder 26.38 1.02 %China 55.30 2.14 % Aleppo 25.31 0.98 %
Results – development over time120100 Colorado 80 China shooting 60 Afghan Egypt 40 Apple Euro 20 Syria 0 4.8. 5.8. 6.8. 7.8. 8.8. 9.8. 10.8. 11.8. 12.8.
Summary• Proposed approach utilizes agents inspired by honey bees foraging for food to extract story related keywords from a set of news articles.• Articles are compared and their proximity is evaluated multiple times with regard to various keywords.• To reduce the number of performed comparisons, agents use the mechanisms of propagation and observation to select the best keywords and discard those less desirable.• Dynamic nature of the process enables agents to react to new articles as well as to changes in the old ones without need for article corpus or machine learning.
Future work• Multi-level hierarchical grouping of keywords based on their generality.• Visualization of stories.