Some of my projects: Natural Language Processing (NLP), programming, linguistics... But also Ancient Greek, electronics, photography or cinema. Click on the “Read more” button to go to the project page.
Want some data to fill out your mock site or test your database setup? Find easily accessible data and facts randomly extracted from Wikipedia.
From simple lists of names or emails to tables and more complex data structures with loops and grouping. Complete html tagged texts. And pictures.
Available at randomitems.io.
Analyse coreference in a corpus with a relational databases containing tables for coreference data (mentions, chains, relations) as well as for textual structures (tokens, sentences, paragraphs, texts). Includes linguistic annotations (part of speech, named entity, etc.).
Enriched version of the Democrat corpus for French.
Learn Ancient Grammar with my 150 reference sheets (419 pages): morphology (declension, conjugation), phonetics, syntax, usage of tenses and moods...
A web site to search through coreference data in French (Democrat and Ancor corpora) and compute statistics online.
Annotate, load and analyse your own data.
Will be online soon!
A corpus linguistic study of coreference chains in IMRaD research articles: discussing the concepts of referring expression and coreference, building the corpus (webscrapping), designing annotation guidelines, annotating the texts, analyzing the annotations.
This is one of my 2 master theses (in “French linguistics”).
Search for patterns in list of objects, such as tokens. For example:
to look for a determiner followed by a noun with the lemma cat. This works for objets in any fields, not just for linguistic objects!
Automatically detect coreference with a system built with hand-crafted linguistic rules. I also developed a dictionary of named entities and proper nouns with data useful for coreference resolution, and a dictionary of hypernyms.
This is one of my 2 master theses (in “language technology”).
Convert annotations that are separated from their text (indexed by character or token positions) to annotations mixed with their text, such as XML. For example, in the sentence: The cat is drinking milk., the 3rd and 4th words (this is the standoff annotation) form a verb, so the inline annotation would be:
The cat <verb>is drinking</verb> milk.