Standoff annotations to inline annotations (standoff2inline)

Inline annotations are annotations stored within the annotated text, like XML annotations.

The little <noun>cat</noun> drinks milk.

Standoff annotations are annotations stored separately from the text, usually with characters or token positions. For example, in the sentence:

The little cat drinks milk.

the third word, between the 12th and 14th characters, is a noun, so the standoff annotations may be something like this:

12,14,noun

This python module offer classes and function to:

Quick start below on this page, or a more complete use guide in the form of a Jupyter notebook that you can download with the code:

view jupyter notebook download code view github repo

Getting Started

Download the module and copy it in your current directory, or a directory of your PYTHONPATH variable, under the name standoff2inline.py.

Create a new Python script:

from standoff2inline import Standoff2Inline

string = "The little cat drinks milk."
inliner = Standoff2Inline()
inliner.add((0, "<sent>"), (26, "</sent>"))
inliner.add((0, "<gn>"), (13, "</gn>"))
inliner.add((11, "<noun>"), (13, "</noun>"))
inliner.add((22, "<noun>"), (25, "</noun>"))
inliner.add((0, "<det>"), (2, "</det>"))
inliner.apply(string)

When you execute it, you will get:

<sent><gn><det>The</det> little <noun>cat</noun></gn> drinks <noun>milk</noun>.</sent>

Documentation

Full documentation in the form of a Jupyter notebook is available in the code archive:

view jupyter notebook download code