Files
Abstract
The input data for real-time natural language processing applications, such as simultaneous speech-to-speech translation systems, often arrives as a continuous stream without clear boundaries. To facilitate real-time processing, the application requires segments of this stream to be separated appropriately. In this thesis, we introduce "hedge parsing," a fast incremental parsing method that enables syntax-aware segmentation of input streams. Unlike full syntactic parsing, which requires complete data, hedge parsing can work with incomplete information, making it suitable for real-time scenarios. It provides a complete hierarchical structure rather than just bracketing information, enhancing processing performance.