What is Ellogon?

Article Index

Ellogon is a multi-lingual, cross-platform, general-purpose language engineering environment, developed in order to aid both researchers who are doing research in computational linguistics, as well as companies who produce and deliver language engineering systems. Ellogon as a language engineering platform offers an extensive set of facilities, including tools for processing and visualising textual/HTML/XML data and associated linguistic information, support for lexical resources (like creating and embedding lexicons), tools for creating annotated corpora, accessing databases, comparing annotated data, or transforming linguistic information into vectors for use with various machine learning algorithms.

During the last decade, a large number of software infrastructures aiming at facilitating R&D in the field of natural language processing have been presented. Some of these infrastructures, such as LT-NSL/LT-XML tools or GATE, have become extremely popular as they have been applied to a wide range of tasks by many institutions around the world.

Ellogon belongs to the category of referential or annotation based platforms, where the linguistic information is stored separately from the textual data, having references back to the original text. Based on the TIPSTER data model, Ellogon provides infrastructure for:

  • Managing, storing and exchanging textual data as well as the associated linguistic information.
  • Creating, embedding and managing linguistic processing components.
  • Facilitating communication among different linguistic components by defining a suitable programming interface (API).
  • Visualising textual data and associated linguistic information.