Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Multi-hop Question Answering (MHQA) is a challenging task in NLP which typically involves processing very long sequences of context information. Sparse Transformers [7] have surpassed Graph Neural Networks (GNNs) as the state-of-the-art architecture for MHQA. Noting that the Transformer [4] is a par...
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | English |
| Published: |
Department of Computer Science
2024
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613561331646465 |
|---|---|
| access_status_str | Open Access |
| author | Acton, Shane |
| author2 | Buys, Jan |
| author_browse | Acton, Shane Buys, Jan |
| author_facet | Buys, Jan Acton, Shane |
| author_sort | Acton, Shane |
| collection | Thesis |
| description | Multi-hop Question Answering (MHQA) is a challenging task in NLP which typically involves processing very long sequences of context information. Sparse Transformers [7] have surpassed Graph Neural Networks (GNNs) as the state-of-the-art architecture for MHQA. Noting that the Transformer [4] is a particular message passing GNN, in this work we perform an architectural analysis and evaluation to investigate why the Transformer outperforms other GNNs on MHQA. In particular, we compare attention- and non-attentionbased GNNs, and compare the Transformer's Scaled Dot Product (SDP) attention to the Graph Attention Network [5] (GAT)'s Additive Attention [2]. We simplify existing GNNbased MHQA models and leverage this system to compare GNN architectures in a lower compute setting than token-level models. We evaluate all of our model variations on the challenging MHQA task Wikihop [6]. Our results support the superiority of the Transformer architecture as a GNN in MHQA. However, we find that problem-specific graph structuring rules can outperform the random connections used in Sparse Transformers. We demonstrate that the Transformer benefits greatly from its use of residual connections [3], Layer Normalisation [1], and element-wise feed forward Neural Networks, and show that all tested GNNs benefit from this too. We find that SDP attention can achieve higher task performance than Additive Attention. Finally, we also show that utilising edge type information alleviates performance losses introduced by sparsity |
| format | Thesis |
| id | oai:open.uct.ac.za:11427/39180 |
| institution | University of Cape Town (South Africa) |
| language | eng |
| last_indexed | 2026-06-10T12:38:06.414Z |
| license_str | Not specified — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository |
| publishDate | 2024 |
| publishDateRange | 2024 |
| publishDateSort | 2024 |
| publisher | Department of Computer Science |
| publisherStr | Department of Computer Science |
| record_format | dspace |
| source_str | UCTD — University of Cape Town Open Access Repository |
| spelling | oai:open.uct.ac.za:11427/39180 From GNNs to sparse transformers: graph-based architectures for multi-hop question answering Acton, Shane Buys, Jan Computer Science Multi-hop Question Answering (MHQA) is a challenging task in NLP which typically involves processing very long sequences of context information. Sparse Transformers [7] have surpassed Graph Neural Networks (GNNs) as the state-of-the-art architecture for MHQA. Noting that the Transformer [4] is a particular message passing GNN, in this work we perform an architectural analysis and evaluation to investigate why the Transformer outperforms other GNNs on MHQA. In particular, we compare attention- and non-attentionbased GNNs, and compare the Transformer's Scaled Dot Product (SDP) attention to the Graph Attention Network [5] (GAT)'s Additive Attention [2]. We simplify existing GNNbased MHQA models and leverage this system to compare GNN architectures in a lower compute setting than token-level models. We evaluate all of our model variations on the challenging MHQA task Wikihop [6]. Our results support the superiority of the Transformer architecture as a GNN in MHQA. However, we find that problem-specific graph structuring rules can outperform the random connections used in Sparse Transformers. We demonstrate that the Transformer benefits greatly from its use of residual connections [3], Layer Normalisation [1], and element-wise feed forward Neural Networks, and show that all tested GNNs benefit from this too. We find that SDP attention can achieve higher task performance than Additive Attention. Finally, we also show that utilising edge type information alleviates performance losses introduced by sparsity 2024-03-05T07:43:02Z 2024-03-05T07:43:02Z 2023 2024-03-05T07:41:33Z Thesis / Dissertation Masters MSc http://hdl.handle.net/11427/39180 eng application/pdf Department of Computer Science Faculty of Science |
| spellingShingle | Computer Science Acton, Shane From GNNs to sparse transformers: graph-based architectures for multi-hop question answering |
| thesis_degree_str | Master's |
| title | From GNNs to sparse transformers: graph-based architectures for multi-hop question answering |
| title_full | From GNNs to sparse transformers: graph-based architectures for multi-hop question answering |
| title_fullStr | From GNNs to sparse transformers: graph-based architectures for multi-hop question answering |
| title_full_unstemmed | From GNNs to sparse transformers: graph-based architectures for multi-hop question answering |
| title_short | From GNNs to sparse transformers: graph-based architectures for multi-hop question answering |
| title_sort | from gnns to sparse transformers graph based architectures for multi hop question answering |
| topic | Computer Science |
| url | http://hdl.handle.net/11427/39180 |
| work_keys_str_mv | AT actonshane fromgnnstosparsetransformersgraphbasedarchitecturesformultihopquestionanswering |