Bookmark and Share
|
  • Text +
  • Text -

Computer Science

A Phrase-Based Approach to Text Simplification

Will Coster ('13); Dan Feblowitz ('11); Mentor: David Kauchak

Abstract: Text simplification is the process of altering a document to reduce reading complexity by incorporating more accessible vocabulary and sentence structure. In this work, we examine the problem of text simplification as a statistical mach-ine translation (language translation) problem. We employ a phrase-based approach that, unlike most previous research in sentence simplification, is not syntactically motivated and allows for compressions beyond deletions, including insertions, reorderings and rewordings. We have developed a corpus of approximately 135K aligned sentence pairs by selecting similar sentences from equivalent English and Simple English Wikipedia articles. Using this corpus, we show that a phrase-based translation approach achieves a 1-3% increase in BLEU score over a baseline of no simplifications.
Funding provided by Pomona College SURP (WC), The Paul K. Richter and Evalyn E. Cook Richter Award (WC), The Fletcher Jones Foundation (DF)

Learning Syntax-Based Sentence Compression from The Simple English Wikipedia

Daniel Feblowitz ('11); Will Coster ('13); Mentors: David Kauchak, Kim Bruce

Abstract: Sentence compression is the task of automatically simplifying sentences in natural language. In this work, we introduce a rich, new set of simplification data generated from pairing sentences in Simple English Wikipedia with corresponding sentences in the traditional English Wikipedia. This data set is an order of magnitude larger than previous data sets examined. We use this new data to train a syntax-based, fully lexicalized sentence compressor, implemented on finite tree-to-tree transducers, that models lexical changes in addition to constituent deletions and reorderings. We examine the impact of data set size by comparing our system’s performance to previous models across multiple data sets.
Funding provided by The Fletcher Jones Foundation (DF), Pomona College SURP (WC), The Paul K. Richter and Evalyn E. Cook Richter Award (WC)

Research at Pomona