TY - THES AB - Chinese word segmentation (CWS) is essential for natural language processing, yet written Chinese lacks explicit word boundaries. Recent CWS evaluations include closed tasks (training data only) and open tasks (external resources allowed), but the impact of external resources remains unclear. This study quantifies contributions of various resources and explores integration methods. Results show independent dictionaries significantly improve performance, a finding generalizable to other languages. Additionally, normalization of numbers, ASCII characters, and punctuation provides further gains. These findings offer practical guidance for optimizing open-task CWS systems AD - Oregon Health and Science University AU - Chen, Yongshun DA - 2011 DO - 10.6083/M4H41PFB DO - DOI ED - Roark, Brian ED - Advisor ID - 591 KW - Artificial Intelligence KW - Machine Learning KW - Natural Language Processing KW - Language KW - Asian People KW - discriminative method KW - chinese word segmentation KW - Chinese language L1 - https://digitalcollections.ohsu.edu/record/591/files/592_etd.pdf L2 - https://digitalcollections.ohsu.edu/record/591/files/592_etd.pdf L4 - https://digitalcollections.ohsu.edu/record/591/files/592_etd.pdf LK - https://digitalcollections.ohsu.edu/record/591/files/592_etd.pdf N2 - Chinese word segmentation (CWS) is essential for natural language processing, yet written Chinese lacks explicit word boundaries. Recent CWS evaluations include closed tasks (training data only) and open tasks (external resources allowed), but the impact of external resources remains unclear. This study quantifies contributions of various resources and explores integration methods. Results show independent dictionaries significantly improve performance, a finding generalizable to other languages. Additionally, normalization of numbers, ASCII characters, and punctuation provides further gains. These findings offer practical guidance for optimizing open-task CWS systems PB - Oregon Health and Science University PY - 2011 T1 - A controlled study of the contribution of external resources on Chinese word segmentation TI - A controlled study of the contribution of external resources on Chinese word segmentation UR - https://digitalcollections.ohsu.edu/record/591/files/592_etd.pdf Y1 - 2011 ER -