Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Using NER and Doc2Vec to cluster South African criminal cases

Mini Dissertation (MSc (Computer Science))--University of Pretoria, 2021.

Saved in:
Bibliographic Details
Other Authors: Marivate, Vukosi
Format: Thesis
Language:English
Published: University of Pretoria 2024
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613460805713920
access_status_str Open Access
author2 Marivate, Vukosi
author_browse Marivate, Vukosi
author_facet Marivate, Vukosi
collection Thesis
dc_rights_str_mv © 2021 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
description Mini Dissertation (MSc (Computer Science))--University of Pretoria, 2021.
format Thesis
id oai:repository.up.ac.za:2263/98152
institution University of Pretoria (South Africa)
language English
last_indexed 2026-06-10T12:36:30.275Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from UPSpace — University of Pretoria Institutional Repository
publishDate 2024
publishDateRange 2024
publishDateSort 2024
publisher University of Pretoria
publisherStr University of Pretoria
record_format dspace
source_str UPSpace — University of Pretoria Institutional Repository
spelling oai:repository.up.ac.za:2263/98152 Using NER and Doc2Vec to cluster South African criminal cases Marivate, Vukosi u13140443@tuks.co.za Nchachi, Carel Kagiso UCTD Similar case matching Judicial system Natural language processing (NLP) Named entity recognizer (NER) Engineering, built environment and information technology theses SDG-09 Engineering, built environment and information technology theses SDG-16 Mini Dissertation (MSc (Computer Science))--University of Pretoria, 2021. The judicial system is the central pillar of law and order across the world. It is responsible for maintaining order amongst citizens and also solving litigations that arise. Although this system has worked quite well, there still exists several challenges, such as racial biases in cases, shortage of legal professionals and inconsistencies with regards to rulings in cases. These challenges need to be addressed in order to maintain law and order in society and to help strengthen the criminal justice system. Researchers have incorporated Natural Language Processing (NLP) techniques to help address some of these challenges. Focusing primarily on three legal applications, which are Legal Judgment Prediction (LJP), Similar Case Matching (SCM) and Legal Question Answering (LQA)[28]. SCM focuses on identifying the relationships among cases using the available information. In other words, SCM is focused on segmenting or grouping legal cases. This is especially useful for Common Law judicial systems, where judicial decisions are based on similar and representative cases that have happened in the past. South Africa uses this type of judicial system. Although good progress has been made in SCM applications, there currently exists several challenges found in the these models. These challenges include using entities found in a legal document to improve the matching of similar cases and the interpretability of these models. In this research we will focus on applying the SCM application on South African criminal cases, by creating a model that will be able to match similar crime cases together. This model will also solve the two challenges currently faced in SCM applications. We found that using a Named Entity Recognizer (NER) with a Paragraph Vector-Distributed memory (PV-DM) model produced better results than using conventional PV-DM or TFIDF model. This model also overcomes the current SCM challenges as it uses the entities found in cases as the main variables for the model (using the NER model). Since the entities help explain how the model mapped similar case, this makes the model also interpretable. Based on the accuracy (similarity score) of the model, we can use this model as tool to segment criminal cases in real life. bs2026 Computer Science MSc (Computer Science) Unrestricted Faculty of Engineering, Built Environment and Information Technology SDG-09: Industry, innovation and infrastructure SDG-16: Peace, justice and strong institutions 2024-09-12T09:42:59Z 2024-09-12T09:42:59Z 2024 2021 Mini Dissertation * A2024 http://hdl.handle.net/2263/98152 en © 2021 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. application/pdf University of Pretoria
spellingShingle UCTD
Similar case matching
Judicial system
Natural language processing (NLP)
Named entity recognizer (NER)
Engineering, built environment and information technology theses SDG-09
Engineering, built environment and information technology theses SDG-16
Using NER and Doc2Vec to cluster South African criminal cases
title Using NER and Doc2Vec to cluster South African criminal cases
title_full Using NER and Doc2Vec to cluster South African criminal cases
title_fullStr Using NER and Doc2Vec to cluster South African criminal cases
title_full_unstemmed Using NER and Doc2Vec to cluster South African criminal cases
title_short Using NER and Doc2Vec to cluster South African criminal cases
title_sort using ner and doc2vec to cluster south african criminal cases
topic UCTD
Similar case matching
Judicial system
Natural language processing (NLP)
Named entity recognizer (NER)
Engineering, built environment and information technology theses SDG-09
Engineering, built environment and information technology theses SDG-16
url http://hdl.handle.net/2263/98152