Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

A lossy, dictionary -based method for short message service (SMS) text compression

Short message service (SMS) message compression allows either more content to be fitted into a single message or fewer individual messages to be sent as part of a concatenated (or long) message. While essentially only dealing with plain text, many of the more popular compression methods do not bring...

Full description

Saved in:
Bibliographic Details
Main Author: Martin, Wickus
Other Authors: Marsden, Gary
Format: Thesis
Language:English
Published: Department of Computer Science 2014
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867614262646538240
access_status_str Open Access
author Martin, Wickus
author2 Marsden, Gary
author_browse Marsden, Gary
Martin, Wickus
author_facet Marsden, Gary
Martin, Wickus
author_sort Martin, Wickus
collection Thesis
description Short message service (SMS) message compression allows either more content to be fitted into a single message or fewer individual messages to be sent as part of a concatenated (or long) message. While essentially only dealing with plain text, many of the more popular compression methods do not bring about a massive reduction in size for short messages. The Global System for Mobile communications (GSM) specification suggests that untrained Huffman encoding is the only required compression scheme for SMS messaging, yet support for SMS compression is still not widely available on current handsets. This research shows that Huffman encoding might actually increase the size of very short messages and only modestly reduce the size of longer messages. While Huffman encoding yields better results for larger text sizes, handset users do not usually write very large messages consisting of thousands of characters. Instead, an alternative compression method called lossy dictionary-based (LD-based) compression is proposed here. In terms of this method, the coder uses a dictionary tuned to the most frequently used English words and economically encodes white space. The encoding is lossy in that the original case is not preserved; instead, the resulting output is all lower case, a loss that might be acceptable to most users. The LD-based method has been shown to outperform Huffman encoding for the text sizes typically used when writing SMS messages, reducing the size of even very short messages and even, for instance, cutting a long message down from five to two parts. Keywords: SMS, text compression, lossy compression, dictionary compression
format Thesis
id oai:open.uct.ac.za:11427/6415
institution University of Cape Town (South Africa)
language eng
last_indexed 2026-06-10T12:49:15.240Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate 2014
publishDateRange 2014
publishDateSort 2014
publisher Department of Computer Science
publisherStr Department of Computer Science
record_format dspace
source_str UCTD — University of Cape Town Open Access Repository
spelling oai:open.uct.ac.za:11427/6415 A lossy, dictionary -based method for short message service (SMS) text compression Martin, Wickus Marsden, Gary Information Technology Short message service (SMS) message compression allows either more content to be fitted into a single message or fewer individual messages to be sent as part of a concatenated (or long) message. While essentially only dealing with plain text, many of the more popular compression methods do not bring about a massive reduction in size for short messages. The Global System for Mobile communications (GSM) specification suggests that untrained Huffman encoding is the only required compression scheme for SMS messaging, yet support for SMS compression is still not widely available on current handsets. This research shows that Huffman encoding might actually increase the size of very short messages and only modestly reduce the size of longer messages. While Huffman encoding yields better results for larger text sizes, handset users do not usually write very large messages consisting of thousands of characters. Instead, an alternative compression method called lossy dictionary-based (LD-based) compression is proposed here. In terms of this method, the coder uses a dictionary tuned to the most frequently used English words and economically encodes white space. The encoding is lossy in that the original case is not preserved; instead, the resulting output is all lower case, a loss that might be acceptable to most users. The LD-based method has been shown to outperform Huffman encoding for the text sizes typically used when writing SMS messages, reducing the size of even very short messages and even, for instance, cutting a long message down from five to two parts. Keywords: SMS, text compression, lossy compression, dictionary compression 2014-08-13T19:31:24Z 2014-08-13T19:31:24Z 2009 Master Thesis Masters MSc http://hdl.handle.net/11427/6415 eng application/pdf Department of Computer Science Faculty of Science University of Cape Town
spellingShingle Information Technology
Martin, Wickus
A lossy, dictionary -based method for short message service (SMS) text compression
thesis_degree_str Master's
title A lossy, dictionary -based method for short message service (SMS) text compression
title_full A lossy, dictionary -based method for short message service (SMS) text compression
title_fullStr A lossy, dictionary -based method for short message service (SMS) text compression
title_full_unstemmed A lossy, dictionary -based method for short message service (SMS) text compression
title_short A lossy, dictionary -based method for short message service (SMS) text compression
title_sort lossy dictionary based method for short message service sms text compression
topic Information Technology
url http://hdl.handle.net/11427/6415
work_keys_str_mv AT martinwickus alossydictionarybasedmethodforshortmessageservicesmstextcompression
AT martinwickus lossydictionarybasedmethodforshortmessageservicesmstextcompression