SACR (from the French "Script d'Annotation des Chaînes de Référence") is a tool optimized for coreference chain annotation.
It has been published in the following paper:
Oberle B. (2018). SACR: A Drag-and-Drop Based Tool for Coreference Annotation. Proceedings of the 11th Edition of the Language Resources and Evaluation Conference (LREC 2018). Miyazaki, Japan (poster)
On this page:
SACR is a single webpage. All operations are done in the browser. You can download the code and open the
index.html file, or use it online (see links above).
The workflow is as follows:
(1) Mark the referring expressions:
(2) Build the coreference chains:
(3) Add feature annotations:
(4) Play and search:
Documentation can be found in the
user_guide.pdf file. It is a work in progress, with some English sections to be done. It has not yet been proof-read.
I have made some video tutorials in French, available on YouTube (see also the playlist) (these links open in new window or tab):
Use the coreference database project scripts to convert your work into a relational database, in the form of a series of CSV (Comma Separated Values) files, that you can use in a spreadsheet program like Microsoft Office or LibreOffice Calc, or in a specialized statistic program like R or Python's Pandas.
This works for a single text or a whole corpus (several texts separately annotated with SACR).
The table (CSV files) are:
tokens: all the tokens in the texts
sentences: all the sentences in the texts, with specific annotations (like the number of tokens, mentions, chains, etc.),
paragraphs: all the paragraphs in the texts, with specific annotations (like the number of tokens, mentions, chains, etc.),
texts: all the texts, with specific annotations (like the number of tokens, mentions, chains, etc.),
chains: all the chains in the texts, with specific annotations (like the number of mentions, etc.)
mentions: all the mentions in the texts, with specific annotations (like the name of the chain, the size of the chain, etc.)
relations: all the relations in the texts, with specific annotations (like the distance between two mentions). There are several types of relations:
first: relations from the first mention to every other mentions in the chain (A-B, A-C, A-D...),
consecutive: relations from a mention to the next mention in the chain (A-B, B-C, C-D...),
all: both first and consecutive relations.
Note that there is simpler beta version available online here.
corefconversion project to convert to and from Conll and other formats.
You can convert between formats online (offline while updating the new website):
You can also download the following Perl scripts to perform the conversion on your own computer:
To export and import to TXM:
.sacrfile from SACR
perl sacr2glozz<VERSION>.pl -K --model --link-name MENTION -p REF FILE.sacr CORPUS_NAME/export
CORPUS_NAMEdirectory three files (
-m 2 -e SI, singletons will have "SI" as referent name