Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Probabilistic tree transducers for grammatical error correction

Thesis (MSc)--Stellenbosch University, 2013.

Saved in:
Bibliographic Details
Main Author: Buys, Jan Moolman
Other Authors: Van der Merwe, A. B.
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University 2013
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613893599166464
access_status_str Open Access
author Buys, Jan Moolman
author2 Van der Merwe, A. B.
author_browse Buys, Jan Moolman
Van der Merwe, A. B.
author_facet Van der Merwe, A. B.
Buys, Jan Moolman
author_sort Buys, Jan Moolman
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (MSc)--Stellenbosch University, 2013.
format Thesis
id oai:scholar.sun.ac.za:10019.1/85592
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:43:23.129Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2013
publishDateRange 2013
publishDateSort 2013
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/85592 Probabilistic tree transducers for grammatical error correction Buys, Jan Moolman Van der Merwe, A. B. Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Grammar correction -- Data processing Natural language processing Weighted tree transducer Text processing (Computer science) Dissertations -- Mathematical sciences Theses -- Mathematical sciences Dissertations -- Computer science Theses -- Computer science Computational linguistics Error-correcting codes (Information theory) English language -- Grammar Thesis (MSc)--Stellenbosch University, 2013. ENGLISH ABSTRACT: We investigate the application of weighted tree transducers to correcting grammatical errors in natural language. Weighted finite-state transducers (FST) have been used successfully in a wide range of natural language processing (NLP) tasks, even though the expressiveness of the linguistic transformations they perform is limited. Recently, there has been an increase in the use of weighted tree transducers and related formalisms that can express syntax-based natural language transformations in a probabilistic setting. The NLP task that we investigate is the automatic correction of grammar errors made by English language learners. In contrast to spelling correction, which can be performed with a very high accuracy, the performance of grammar correction systems is still low for most error types. Commercial grammar correction systems mostly use rule-based methods. The most common approach in recent grammatical error correction research is to use statistical classifiers that make local decisions about the occurrence of specific error types. The approach that we investigate is related to a number of other approaches inspired by statistical machine translation (SMT) or based on language modelling. Corpora of language learner writing annotated with error corrections are used as training data. Our baseline model is a noisy-channel FST model consisting of an n-gram language model and a FST error model, which performs word insertion, deletion and replacement operations. The tree transducer model we use to perform error correction is a weighted top-down tree-to-string transducer, formulated to perform transformations between parse trees of correct sentences and incorrect sentences. Using an algorithm developed for syntax-based SMT, transducer rules are extracted from training data of which the correct version of sentences have been parsed. Rule weights are also estimated from the training data. Hypothesis sentences generated by the tree transducer are reranked using an n-gram language model. We perform experiments to evaluate the performance of different configurations of the proposed models. In our implementation an existing tree transducer toolkit is used. To make decoding time feasible sentences are split into clauses and heuristic pruning is performed during decoding. We consider different modelling choices in the construction of transducer rules. The evaluation of our models is based on precision and recall. Experiments are performed to correct various error types on two learner corpora. The results show that our system is competitive with existing approaches on several error types. AFRIKAANSE OPSOMMING: Ons ondersoek die toepassing van geweegde boomoutomate om grammatikafoute in natuurlike taal outomaties reg te stel. Geweegde eindigetoestand outomate word suksesvol gebruik in ’n wye omvang van take in natuurlike taalverwerking, alhoewel die uitdrukkingskrag van die taalkundige transformasies wat hulle uitvoer beperk is. Daar is die afgelope tyd ’n toename in die gebruik van geweegde boomoutomate en verwante formalismes wat sintaktiese transformasies in natuurlike taal in ’n probabilistiese raamwerk voorstel. Die natuurlike taalverwerkingstoepassing wat ons ondersoek is die outomatiese regstelling van taalfoute wat gemaak word deur Engelse taalleerders. Terwyl speltoetsing in Engels met ’n baie hoë akkuraatheid gedoen kan word, is die prestasie van taalregstellingstelsels nog relatief swak vir meeste fouttipes. Kommersiële taalregstellingstelsels maak oorwegend gebruik van reël-gebaseerde metodes. Die algemeenste benadering in onlangse navorsing oor grammatikale foutkorreksie is om statistiese klassifiseerders wat plaaslike besluite oor die voorkoms van spesifieke fouttipes maak te gebruik. Die benadering wat ons ondersoek is verwant aan ’n aantal ander benaderings wat geïnspireer is deur statistiese masjienvertaling of op taalmodellering gebaseer is. Korpora van taalleerderskryfwerk wat met foutregstellings geannoteer is, word as afrigdata gebruik. Ons kontrolestelsel is ’n geraaskanaal eindigetoestand outomaatmodel wat bestaan uit ’n n-gram taalmodel en ’n foutmodel wat invoegings-, verwyderings- en vervangingsoperasies op woordvlak uitvoer. Die boomoutomaatmodel wat ons gebruik vir grammatikale foutkorreksie is ’n geweegde bo-na-onder boom-na-string omsetteroutomaat geformuleer om transformasies tussen sintaksbome van korrekte sinne en foutiewe sinne te maak. ’n Algoritme wat ontwikkel is vir sintaksgebaseerde statistiese masjienvertaling word gebruik om reëls te onttrek uit die afrigdata, waarvan sintaksontleding op die korrekte weergawe van die sinne gedoen is. Reëlgewigte word ook vanaf die afrigdata beraam. Hipotese-sinne gegenereer deur die boomoutomaat word herrangskik met behulp van ’n n-gram taalmodel. Ons voer eksperimente uit om die doeltreffendheid van verskillende opstellings van die voorgestelde modelle te evalueer. In ons implementering word ’n bestaande boomoutomaat sagtewarepakket gebruik. Om die dekoderingstyd te verminder word sinne in frases verdeel en die soekruimte heuristies besnoei. Ons oorweeg verskeie modelleringskeuses in die samestelling van outomaatreëls. Die evaluering van ons modelle word gebaseer op presisie en herroepvermoë. Eksperimente word uitgevoer om verskeie fouttipes reg te maak op twee leerderkorpora. Die resultate wys dat ons model kompeterend is met bestaande benaderings op verskeie fouttipes. 2013-09-26T15:23:05Z 2013-12-13T14:53:28Z 2013-09-26T15:23:05Z 2013-12-13T14:53:28Z 2013-12 Thesis http://hdl.handle.net/10019.1/85592 en_ZA Stellenbosch University 108 p. application/pdf Stellenbosch : Stellenbosch University
spellingShingle Grammar correction -- Data processing
Natural language processing
Weighted tree transducer
Text processing (Computer science)
Dissertations -- Mathematical sciences
Theses -- Mathematical sciences
Dissertations -- Computer science
Theses -- Computer science
Computational linguistics
Error-correcting codes (Information theory)
English language -- Grammar
Buys, Jan Moolman
Probabilistic tree transducers for grammatical error correction
title Probabilistic tree transducers for grammatical error correction
title_full Probabilistic tree transducers for grammatical error correction
title_fullStr Probabilistic tree transducers for grammatical error correction
title_full_unstemmed Probabilistic tree transducers for grammatical error correction
title_short Probabilistic tree transducers for grammatical error correction
title_sort probabilistic tree transducers for grammatical error correction
topic Grammar correction -- Data processing
Natural language processing
Weighted tree transducer
Text processing (Computer science)
Dissertations -- Mathematical sciences
Theses -- Mathematical sciences
Dissertations -- Computer science
Theses -- Computer science
Computational linguistics
Error-correcting codes (Information theory)
English language -- Grammar
url http://hdl.handle.net/10019.1/85592
work_keys_str_mv AT buysjanmoolman probabilistictreetransducersforgrammaticalerrorcorrection