Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Text detection in natural images using convolutional neural networks

Thesis (MSc)--Stellenbosch University, 2017

Saved in:
Bibliographic Details
Main Author: Grond, Marco Marten
Other Authors: Brink, Willie
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University 2017
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613986309013504
access_status_str Open Access
author Grond, Marco Marten
author2 Brink, Willie
author_browse Brink, Willie
Grond, Marco Marten
author_facet Brink, Willie
Grond, Marco Marten
author_sort Grond, Marco Marten
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (MSc)--Stellenbosch University, 2017
format Thesis
id oai:scholar.sun.ac.za:10019.1/100999
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:44:51.414Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2017
publishDateRange 2017
publishDateSort 2017
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/100999 Text detection in natural images using convolutional neural networks Grond, Marco Marten Brink, Willie Herbst, B. M. Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Applied Mathematics Text detection Convolutional neural networks Computer vision Machine learning UCTD Thesis (MSc)--Stellenbosch University, 2017 ENGLISH ABSTRACT : In this study we attempt to solve the problem of text detection in natural images. This requires us to identify regions in a natural image that contain text. Possible applications range from assistive technology, human computer interaction and context extraction. Although humans find the task almost trivial, large variations in colour, font, size and orientation must be accounted for, and text shares many features and structures with other objects that cause complications when attempting to automate a solution. We train multiple convolutional neural networks in an attempt to solve this problem. We chose convolutional neural networks both because they have already displayed potential in the context of text recognition, and to better understand how they operate. A sliding window approach is taken, where smaller regions of a full image are classified separately before the results are combined to identify text regions in the full image. Due to an insufficient number of annotated natural training images, we create a supplementary synthetic dataset. Using the synthetic data as a starting point we train networks of different structures, after which the same networks are finetuned on smaller natural datasets. Networks first trained on the synthetic data outperform networks trained solely on the smaller natural datasets, regardless of structure complexity. This is likely due to an inability to identify relevant features from a limited number of training examples. Our experiments further show that a larger network structure is required for generalization, and that smaller datasets are prone to overfitting. We apply our best performing trained network to the task of detecting text in full images, by extracting and classifying regions in an image using a sliding window. Image pyramids are also implemented to allow for greater variance in the size of text that can be detected. We find, however, that implementing image pyramids only slightly improves the accuracy over a single image, likely due to the fact that some scale variation was already present in the network’s training set. Ultimately, we find that convolutional neural networks show promise for the task of text detection in natural images. We also find that training a network on synthetic data and finetuning it on natural data improves the overall accuracy. AFRIKAANSE OPSOMMING : In hierdie studie poog ons om teks in natuurlike beelde op te spoor. Die probleem vereis die identifisering van areas in ’n natuurlike beeld wat teks bevat. Moontlike toepassings sluit in ondersteuningstegnologie, mens-rekenaar interaksie en die onttrekking van konteks. Alhoewel ’n mens die taak baie maklik mag vind, moet variasies in kleur, lettertipe, grootte en oriëntasie in ag geneem word. Teks deel ook sekere kenmerke met ander beeldstrukture, wat die outomatisering van ’n oplossing verder kompliseer. Ons poog om die probleem op te los deur verskeie konvolusie-netwerke vir die taak af te rig. Ons het besluit op hierdie soort neurale netwerke, aangesien hulle alreeds potensiaal in die konteks van teksherkenning getoon het, en ook om ’n beter begrip te ontwikkel oor hoe hulle werk. Ons onttrek kleiner vensters uit die beeld, klassifiseer elkeen afsonderlik, en kombineer dan die klassifikasies om areas van teks in die volle beeld te identifiseer. Vanweë ’n tekort aan geannoteerde data skep ons ’n aanvullende datastel van sintetiese beelde. Deur die sintetiese beelde as beginpunt te gebruik, rig ons verskeie netwerke met verskillende strukture af, waarna ons die netwerke met behulp van natuurlike data verfyn. Netwerke wat eers op sintetiese data afgerig is vaar beter as dié wat slegs op natuurlike data afgerig is, ongeag netwerkstruktuur. Dit is moontlik te danke aan die feit dat ’n netwerk nie relevante kenmerke van teks uit min data kan identifiseer nie. Dit blyk verder uit ons eksperimente dat groter netwerkstrukture nodig is vir beter veralgemening, en dat kleiner datastelle oormatige passing tot gevolg kan hê. Ons gebruik die beste afgerigte netwerk om teks in volle beelde op te spoor, deur vensters uit ’n beeld te onttrek en hulle te klassifiseer. Beeld-piramides word verder gebruik om die netwerke toe te laat om ’n groter variasie in die grootte van teks te kan identifiseer. Die gebruik van beeld-piramides het egter ’n klein impak op akkuraatheid, waarskynlik te danke aan die feit dat die netwerke reeds afgerig was op teks van verskeie groottes. Deur die loop van hierdie studie het ons tot die gevolgtrekking gekom dat konvolusie-netwerke geskik kan wees om teks in natuurlike beelde op te spoor. Ons het ook gevind dat afrigting op sintetiese data en verfyning op natuurlike data die akkuraatheid van ’n netwerk kan verbeter. 2017-02-21T06:58:58Z 2017-03-29T11:56:11Z 2017-02-21T06:58:58Z 2017-03-29T11:56:11Z 2017-03 Thesis http://hdl.handle.net/10019.1/100999 en_ZA Stellenbosch University v, 77 pages ; colour illustrations application/pdf Stellenbosch : Stellenbosch University
spellingShingle Text detection
Convolutional neural networks
Computer vision
Machine learning
UCTD
Grond, Marco Marten
Text detection in natural images using convolutional neural networks
title Text detection in natural images using convolutional neural networks
title_full Text detection in natural images using convolutional neural networks
title_fullStr Text detection in natural images using convolutional neural networks
title_full_unstemmed Text detection in natural images using convolutional neural networks
title_short Text detection in natural images using convolutional neural networks
title_sort text detection in natural images using convolutional neural networks
topic Text detection
Convolutional neural networks
Computer vision
Machine learning
UCTD
url http://hdl.handle.net/10019.1/100999
work_keys_str_mv AT grondmarcomarten textdetectioninnaturalimagesusingconvolutionalneuralnetworks