Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora

Saved in:
Bibliographic Details
Published in:Journal of the Brazilian Computer Society
Format: Online Article RSS Article
Published: 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1864030189438107649
collection WordPress RSS
FRELIP Feed Integration
container_title Journal of the Brazilian Computer Society
description
discipline_display Engineering & Technology
discipline_facet Engineering & Technology
format Online Article
RSS Article
genre Journal Article
id rss_article:3419
institution FRELIP
journal_source_facet Journal of the Brazilian Computer Society
publishDate 2025
publishDateSort 2025
record_format rss_article
spellingShingle Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora
— — — — Electrical and Electronic Engineering
Electrical & Electronics
Engineering & Technology
sub_discipline_display Electrical & Electronics
sub_discipline_facet Electrical & Electronics
subject_display — — — — Electrical and Electronic Engineering
Electrical & Electronics
Engineering & Technology
— — — — Electrical and Electronic Engineering
Electrical & Electronics
Engineering & Technology
subject_facet — — — — Electrical and Electronic Engineering
Electrical & Electronics
Engineering & Technology
title Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora
title_auth Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora
title_full Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora
title_fullStr Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora
title_full_unstemmed Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora
title_short Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora
title_sort building high-quality datasets for portuguese llms: from common crawl snapshots to industrial-grade corpora
topic — — — — Electrical and Electronic Engineering
Electrical & Electronics
Engineering & Technology
url https://journals-sol.sbc.org.br/index.php/jbcs/article/view/5788