Our tool components are being integrated into an “Exploration Workbench” for Digital Humanities researchers. ![]() %X We work on tools to explore text contents and metadata of newspaper articles as provided by news archives. %I European Language Resources Association (ELRA) %S Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) %T The eIdentity Text Exploration Workbench Users can retrieve sets of articles on different topics, issues or otherwise defined research questions (“subcorpora”) and investigate quantitatively their media attention on the timeline (“Issue Cycles”). After cleaning the data and compiling a thematically homogeneous corpus, the sample can be used for quantitative analyses which are not affected by noise. We extract metadata on publishing dates, author names, newspaper sections, etc., and split articles into segments such as headlines, subtitles, paragraphs, etc. We index the data with state-of-the-art systems to allow for large scale information retrieval. 860.000 newspaper articles from different media archives, provided in different data formats. These include filtering of off-topic articles, duplicates and near-duplicates, corrupted and empty articles. The Workbench also comprises different tools for data cleaning. Next to the conversion of different data formats and character encodings, a prominent feature of our design is its “Wizard” function for corpus building: Researchers import raw data and define patterns to extract text contents and metadata. ![]() We work on tools to explore text contents and metadata of newspaper articles as provided by news archives. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14)Įuropean Language Resources Association (ELRA) Cite (Informal): The eIdentity Text Exploration Workbench (Kliche et al., LREC 2014) Copy Citation: BibTeX Markdown MODS XML Endnote More options… PDF: = "The e).", European Language Resources Association (ELRA). In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 691–697, Reykjavik, Iceland. The eIdentity Text Exploration Workbench. Stelios Piperidis Venue: LREC SIG: Publisher: European Language Resources Association (ELRA) Note: Pages: 691–697 Language: URL: DOI: Bibkey: kliche-etal-2014-eidentity Cite (ACL): Fritz Kliche, André Blessing, Ulrich Heid, and Jonathan Sonntag. Anthology ID: L14-1294 Volume: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) Month: May Year: 2014 Address: Reykjavik, Iceland Editors: Nicoletta Calzolari, ![]() Abstract We work on tools to explore text contents and metadata of newspaper articles as provided by news archives.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |