Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS
Cite
Citation

Files

Abstract

Chinese word segmentation (CWS) is essential for natural language processing, yet written Chinese lacks explicit word boundaries. Recent CWS evaluations include closed tasks (training data only) and open tasks (external resources allowed), but the impact of external resources remains unclear. This study quantifies contributions of various resources and explores integration methods. Results show independent dictionaries significantly improve performance, a finding generalizable to other languages. Additionally, normalization of numbers, ASCII characters, and punctuation provides further gains. These findings offer practical guidance for optimizing open-task CWS systems

Details

PDF

Statistics

from
to
Export
Download Full History