Files
Abstract
WordsEye is a text-to-scene system that converts user descriptions into 3D scenes using VigNet, a unified knowledge base for lexical and real-world knowledge. VigNet maps textual objects and locations to 3D models, with location vignettes representing prototypical object groupings. This thesis explores using Amazon Mechanical Turk (AMT) to populate VigNet. We collected contextual object data and semantic information for location vignettes through three AMT strategies: image descriptions, functional object lists, and visually important objects. Evaluation against manually built vignettes achieved up to 90.62% precision and 87.88% recall, demonstrating AMT as an effective approach for enriching WordsEye’s knowledge base.