№ 02 / SUMMARIES

#nlp

Every summary, chronological. Filter by category, tag, or source from the rail.

Tag · #nlp
DAY 01June 19, 2026 JUN 19 · 20261 SUMMARIES
arXiv cs.AIAI & LLMs

Toten: Ontological Tokenization for Technical Portuguese

Toten is a knowledge-based tokenization framework designed to accurately parse physical quantities and technical notation in Brazilian Portuguese, addressing common failures in standard NLP tokenizers.

arXiv cs.AI
DAY 02June 6, 2026 JUN 6 · 20261 SUMMARIES
MarkTechPostData Science & Visualization

Building a Semantic Search and Classifier for ResearchMath-14k

This tutorial demonstrates how to build a semantic search engine and status classifier for the ResearchMath-14k dataset using sentence embeddings, TF-IDF, and logistic regression.

MarkTechPost

Showing 2 of 2