dr. caforbes

Clarissa Forbes, PhD
forbesc (at) alumni (dot) ubc.ca

View My GitHub Profile

An FST-driven annotation tool to analyze and process Gitxsanimx̱ words into their common interlinear annotation format, for Python.

This is a rule-based natural language processing tool that uses finite-state transducers (FSTs) to translate the word into its component morphemes and provide options for an interlinear gloss. Its strengths are in its very high accuracy rate (>95%), although it is limited to operating on words found in a central dictionary. This serves a secondary purpose of helping identify words in need of linguistic documentation. Implemented with Python and the foma CLI tool.

This project fed into collaborative machine-learning research initiatives at the University of British Columbia to support word annotation for endangered, low-resource languages.