Computer Science
A Phrase-Based Approach to Text Simplification
Will Coster ('13); Dan Feblowitz ('11); Mentor: David Kauchak
Abstract: Text simplification is the process of
altering a document to reduce reading complexity
by incorporating more accessible vocabulary and
sentence structure. In this work, we examine the
problem of text simplification as a statistical
mach-ine translation (language translation)
problem. We employ a phrase-based approach
that, unlike most previous research in sentence
simplification, is not syntactically motivated and
allows for compressions beyond deletions,
including insertions, reorderings and rewordings.
We have developed a corpus of approximately
135K aligned sentence pairs by selecting similar
sentences from equivalent English and Simple
English Wikipedia articles. Using this corpus, we
show that a phrase-based translation approach
achieves a 1-3% increase in BLEU score over a
baseline of no simplifications.
Funding provided by Pomona College SURP
(WC), The Paul K. Richter and Evalyn E. Cook
Richter Award (WC), The Fletcher Jones Foundation
(DF)
Learning Syntax-Based Sentence Compression from The Simple English Wikipedia
Daniel Feblowitz ('11); Will Coster ('13); Mentors: David Kauchak, Kim Bruce
Abstract: Sentence compression is the task of
automatically simplifying sentences in natural
language. In this work, we introduce a rich, new
set of simplification data generated from pairing
sentences in Simple English Wikipedia with
corresponding sentences in the traditional English
Wikipedia. This data set is an order of magnitude
larger than previous data sets examined. We use
this new data to train a syntax-based, fully
lexicalized sentence compressor, implemented on
finite tree-to-tree transducers, that models lexical
changes in addition to constituent deletions and
reorderings. We examine the impact of data set
size by comparing our system’s performance to
previous models across multiple data sets.
Funding provided by The Fletcher Jones Foundation
(DF), Pomona College SURP (WC), The
Paul K. Richter and Evalyn E. Cook Richter
Award (WC)