#nlp
Every summary, chronological. Filter by category, tag, or source from the rail.
Tag · #nlp
Toten: Ontological Tokenization for Technical Portuguese
Toten is a knowledge-based tokenization framework designed to accurately parse physical quantities and technical notation in Brazilian Portuguese, addressing common failures in standard NLP tokenizers.
arXiv cs.AI
Building a Semantic Search and Classifier for ResearchMath-14k
This tutorial demonstrates how to build a semantic search engine and status classifier for the ResearchMath-14k dataset using sentence embeddings, TF-IDF, and logistic regression.
MarkTechPost
Showing 2 of 2