standoff2inline
)Inline annotations are annotations stored within the annotated text, like XML annotations.
The little <noun>cat</noun> drinks milk.
Standoff annotations are annotations stored separately from the text, usually with characters or token positions. For example, in the sentence:
The little cat drinks milk.
the third word, between the 12th and 14th characters, is a noun, so the standoff annotations may be something like this:
12,14,noun
This python module offer classes and function to:
<span>
tags,[...]
.Quick start below on this page, or a more complete use guide in the form of a Jupyter notebook that you can download with the code:
view jupyter notebook download code view github repo
Download the module and copy it in your current directory, or a directory of your PYTHONPATH
variable, under the name standoff2inline.py
.
Create a new Python script:
from standoff2inline import Standoff2Inline
string = "The little cat drinks milk."
inliner = Standoff2Inline()
inliner.add((0, "<sent>"), (26, "</sent>"))
inliner.add((0, "<gn>"), (13, "</gn>"))
inliner.add((11, "<noun>"), (13, "</noun>"))
inliner.add((22, "<noun>"), (25, "</noun>"))
inliner.add((0, "<det>"), (2, "</det>"))
inliner.apply(string)
When you execute it, you will get:
<sent><gn><det>The</det> little <noun>cat</noun></gn> drinks <noun>milk</noun>.</sent>
Full documentation in the form of a Jupyter notebook is available in the code archive: