Some of my projects: Natural Language Processing (NLP), programming, linguistics... But also Ancient Greek, electronics, photography or cinema. Click on the “Read more” button to go to the project page.
Analyse coreference in a corpus with a relational databases containing tables for coreference data (mentions, chains, relations) as well as for textual structures (tokens, sentences, paragraphs, texts). Includes linguistic annotations (part of speech, named entity, etc.).
Enriched version of the Democrat corpus for French.
Learn Ancient Grammar with my 150 reference sheets (419 pages): morphology (declension, conjugation), phonetics, syntax, usage of tenses and moods...
A corpus linguistic study of coreference chains in IMRaD research articles: discussing the concepts of referring expression and coreference, building the corpus (webscrapping), designing annotation guidelines, annotating the texts, analyzing the annotations.
This is one of my 2 master theses (in “French linguistics”).
Search for patterns in list of objects, such as tokens. For example:
to look for a determiner followed by a noun with the lemma cat. This works for objets in any fields, not just for linguistic objects!
Automatically detect coreference with a system built with hand-crafted linguistic rules. I also developed a dictionary of named entities and proper nouns with data useful for coreference resolution, and a dictionary of hypernyms.
This is one of my 2 master theses (in “language technology”).
Convert annotations that are separated from their text (indexed by character or token positions) to annotations mixed with their text, such as XML. For example, in the sentence: The cat is drinking milk., the 3rd and 4th words (this is the standoff annotation) form a verb, so the inline annotation would be:
The cat <verb>is drinking</verb> milk.