Histograph is a graph-based exploration and crowd-based indexation for multimedia collections. HistoGraph treats multimedia collections as networks. The underlying assumption is simple: if two people are mentioned together in a document, we assume that they may have something to do with each other. Whether or not such a relationship is interesting is in the eye of the beholder. Co-occurrence networks become huge and unwieldy very quickly, which forces us to filter them based on another simple assumption: the more often entities co-occur, the more likely it is that they have a meaningful relationship with each other. We combine these two assumptions with mathematical models (co-occurrence frequencies weighted by tf-idf specificity and Jaccard distances) which allow us to rank the list of co-occurrences.
HistoGraph combines tools like YAGO-AIDA for the automatic detection and disambiguation of named entities - people, places, institutions and dates - with crowd-based annotations. Thanks to the enrichment with DBPedia and VIAF links, histoGraph can handle multilanguage documents flawlessly. By default, every automatically detected entity is pending validation by a human user.
HistoGraph is available open source under MIT licence. The application is designed to serve two purposes: To facilitate the non-hierarchical exploration of multimedia collections based on existing metadata and automatic entity detection and the crowd-based indexation of such collections. HistoGraph can handle any digitized text and image documents.