We propose a novel approach to unsupervised extractive summarization. Our approach builds a semantic graph for the documents to be summarized. The extracted summary is then formulated as optimizing submodular functions defined on the semantic graph, under a budget constraint. We explore several submodular functions that model the quality of a summary. For monotone submodular objective functions, we theoretically show that a modified greedy algorithm is guaranteed to solve the budgeted maximization problem near-optimally. For non-monotone submodular objectives, we prove that the modified greedy algorithm solves the optimization problem near-optimally with high probability. Experimental results show that our approach significantly outperforms state-of-the-art summarization methods and other graph-based methods. In particular, we show on DUC '04 multi-document summarization task, our approach is superior to the best-performing method from the DUC '04 evaluation in terms of ROUGE-1 scores.
Back to symposium main page