Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Finite state automaton construction through regular expression hashing

Dissertation (MEng)--University of Pretoria, 2009.

Saved in:
Bibliographic Details
Other Authors: Kourie, Derrick G.
Format: Thesis
Published: University of Pretoria 2013
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613628064071680
access_status_str Open Access
author2 Kourie, Derrick G.
author_browse Kourie, Derrick G.
author_facet Kourie, Derrick G.
collection Thesis
dc_rights_str_mv © 2009, University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
description Dissertation (MEng)--University of Pretoria, 2009.
format Thesis
id oai:repository.up.ac.za:2263/27536
institution University of Pretoria (South Africa)
last_indexed 2026-06-10T12:39:09.918Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from UPSpace — University of Pretoria Institutional Repository
publishDate 2013
publishDateRange 2013
publishDateSort 2013
publisher University of Pretoria
publisherStr University of Pretoria
record_format dspace
source_str UPSpace — University of Pretoria Institutional Repository
spelling oai:repository.up.ac.za:2263/27536 Finite state automaton construction through regular expression hashing Kourie, Derrick G. Watson, Bruce William spring.haas_meester@gmail.com Coetser, Rayner Johannes Lodewikus Regular expression hashing Finite state Super-automaton Approximate automaton UCTD Dissertation (MEng)--University of Pretoria, 2009. In this study, the regular expressions forming abstract states in Brzozowski’s algorithm are not remapped to sequential state transition table addresses as would be the case in the classical approach, but are hashed to integers. Two regular expressions that are hashed to the same hash code are assigned the same integer address in the state transition table, reducing the number of states in the automaton. This reduction does not necessarily lead to the construction of a minimal automaton: no restrictions are placed on the hash function hashing two regular expressions to the same code. Depending on the quality of the hash function, a super-automaton, previously referred to as an approximate automaton, or an exact automaton can be constructed. When two regular expressions are hashed to the same state, and they do not represent the same regular language, a super-automaton is constructed. A super-automaton accepts the regular language of the input regular expression, in addition to some extra strings. If the hash function is bad, many regular expressions that do not represent the same regular language will be hashed together, resulting in a smaller automaton that accepts extra strings. In the ideal case, two regular expressions will only be hashed together when they represent the same regular language. In this case, an exact minimal automaton will be constructed. It is shown that, using the hashing approach, an exact or super-automaton is always constructed. Another outcome of the hashing approach is that a non-deterministic automaton may be constructed. A new version of the hashing version of Brzozowski’s algorithm is put forward which constructs a deterministic automaton. A method is also put forward for measuring the difference between an exact and a super-automaton: this takes the form of the k-equivalence measure: the k-equivalence measure measures the number of characters up to which the strings of two regular expressions are equal. The better the hash function, the higher the value of k, up to the point where the hash function results in regular expressions being hashed together if and only if they have the same regular language. Using the k-equivalence measure, eight generated hash functions and one hand coded hash function are evaluated for a large number of short regular expressions, which are generated using G¨odel numbers. The k-equivalence concept is extended to the average k-equivalence value in order to evaluate the hash functions for longer regular expressions. The hand coded hash function is found to produce good results. Copyright Computer Science unrestricted 2013-09-07T11:44:37Z 2010-08-25 2013-09-07T11:44:37Z 2010-04-12 2009 2010-08-25 Dissertation Coetser, RJL 2009, Finite state automaton construction through regular expression hashing, MSc dissertation, University of Pretoria, Pretoria, viewed yymmdd < http://hdl.handle.net/2263/27536 > E10/460/gm http://hdl.handle.net/2263/27536 http://upetd.up.ac.za/thesis/available/etd-08252010-133710/ © 2009, University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. application/pdf University of Pretoria
spellingShingle Regular expression hashing
Finite state
Super-automaton
Approximate automaton
UCTD
Finite state automaton construction through regular expression hashing
title Finite state automaton construction through regular expression hashing
title_full Finite state automaton construction through regular expression hashing
title_fullStr Finite state automaton construction through regular expression hashing
title_full_unstemmed Finite state automaton construction through regular expression hashing
title_short Finite state automaton construction through regular expression hashing
title_sort finite state automaton construction through regular expression hashing
topic Regular expression hashing
Finite state
Super-automaton
Approximate automaton
UCTD
url http://hdl.handle.net/2263/27536
http://upetd.up.ac.za/thesis/available/etd-08252010-133710/