Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Verified derivative-based regular expression matching

Thesis (MSc)--Stellenbosch University, 2025.

Saved in:
Bibliographic Details
Main Author: Perry, Keegan Kyle
Other Authors: Bester, Willem
Format: Thesis
Published: Stellenbosch : Stellenbosch University 2026
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867614037858058240
access_status_str Open Access
author Perry, Keegan Kyle
author2 Bester, Willem
author_browse Bester, Willem
Perry, Keegan Kyle
author_facet Bester, Willem
Perry, Keegan Kyle
author_sort Perry, Keegan Kyle
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (MSc)--Stellenbosch University, 2025.
format Thesis
id oai:scholar.sun.ac.za:10019.1/134764
institution Stellenbosch University (South Africa)
last_indexed 2026-06-10T12:45:40.774Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2026
publishDateRange 2026
publishDateSort 2026
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/134764 Verified derivative-based regular expression matching Perry, Keegan Kyle Bester, Willem Van der Merwe, Brink Schulze, Walter Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Computer Science Division. Regular expressions (Computer science) Formal methods (Computer science) Sequential machine theory Thesis (MSc)--Stellenbosch University, 2025. Perry, K. K. 2025. Verified Derivative-Based Regular Expression Matching. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/c1a8427d-bcc5-43f3-b15e-f9c4a9b6d91d ENGLISH ABSTRACT: Regular expressions are often used in security-related contexts where the correctness of the underlying matcher is vitally important. Whereas the traditional problem of regular expression membership is well understood, and correct implementations are easy to verify, modern regular expression matchers have fundamentally changed the matching problem: Instead of just determining whether a string is in the language of a regular expression, modern matchers can also report the location of such a string in the input, as well as how subexpressions match substrings of the input. Historically, in the absence of employing formal approaches, support for new features has resulted in ad hoc solutions leading to informal standards, and conversely, interpretations of informal standards leading to ad hoc implementations. In this thesis, we take a formal approach to practical regular expression matching by verifying the correctness of several implementations with respect to the leftmost-longest and leftmost-greedy disambiguation strategies. When a regular expression is ambiguous, the reported match depends on which matching path is taken. To make the result unique, practical matchers adopt a disambiguation policy, typically either leftmost-longest or leftmost-greedy, to choose a single path. We focus on matching solutions that use Brzozowski derivatives, which have been employed in a number of recent formalisations of leftmost-longest matching. Building on this work, we provide our own verified implementations of leftmost-longest matching, and also introduce the formalisation of leftmost-greedy matching with derivatives, which, thus far, has not been considered in the literature. All formalisations we present are executable and verified in the Lean proof assistant. AFRIKAANSE OPSOMMING: Regulêre uitdrukkings word dikwels gebruik in sekuriteitsverwante kontekste waarin die korrektheid van die onderliggende soekenjin van kardinale belang is. Hoewel die tradisionele probleem van regulêre-uitdrukking lidmaatskap goed verstaan word en korrekte implementerings maklik is om te verifieer, het moderne regulêre-uitdrukking enjins die soekprobleem fundamenteel verander: In plaas daarvan om bloot vas te stel of ’n string in die taal van ’n regulêre uitdrukking is, rapporteer moderne enjins ook die posisie van so ’n string in die toevoer, asook hoe deeluitdrukkings teenoor substringe van die toevoer pas. By afwesigheid van die gebruik van formele metodes, het die ondersteuning van nuwe funksionaliteit histories gelei van ad hoc-oplossings tot informele standaarde en, omgekeerd, vertolkings van informele standaarde het gelei tot ad hoc-implementerings. In dié tesis neem ons ’n formele benadering tot praktiese regulêre-uitdrukking soekenjins aan deur die korrektheid van verskeie implementerings met betrekking tot die mees-linkse langste en mees-linkse gulsige ondubbelsinnigmakingstrategieë te verifieer. Wanneer ’n regulêre uitdrukking dubbelsinnig is, hang die gerapporteerde trefslag af van watter passingspad geneem word. Om die resultaat uniek te maak, neem praktiese enjins ’n ondubbelsinnnigmakingsbeleid aan, tipies óf ’n mees-linkse langste óf ’n mees-linkse gulsige, om ’n enkele pad te kies. Ons fokus op Brzozowski-afgeleides, wat in ’n aantal onlangse formaliserings van mees-linkse langste soektogte gebruik is. Gebou op dié werk voorsien ons ook ons eie geverifieerde implementerings van mees-linkse langste soektogte, en ons stel ook die formalisering van mees-linkse gulsige soektoegte bekend, wat tot dusver nog nie in die literatuur oorweeg is nie. Al die formaliserings wat ons aanbied, is uitvoerbaar en geverifieer met die Lean-bewysassistent. Masters 2026-01-07T08:42:14Z 2026-01-07T08:42:14Z 2025-12 Thesis https://scholar.sun.ac.za/handle/10019.1/134764 Stellenbosch University viii, 117 pages application/pdf Stellenbosch : Stellenbosch University
spellingShingle Regular expressions (Computer science)
Formal methods (Computer science)
Sequential machine theory
Perry, Keegan Kyle
Verified derivative-based regular expression matching
title Verified derivative-based regular expression matching
title_full Verified derivative-based regular expression matching
title_fullStr Verified derivative-based regular expression matching
title_full_unstemmed Verified derivative-based regular expression matching
title_short Verified derivative-based regular expression matching
title_sort verified derivative based regular expression matching
topic Regular expressions (Computer science)
Formal methods (Computer science)
Sequential machine theory
url https://scholar.sun.ac.za/handle/10019.1/134764
work_keys_str_mv AT perrykeegankyle verifiedderivativebasedregularexpressionmatching