We present a random-walk-based approach to extracting paraphrases from bilingual parallel corpora. The corpora are represented as a graph in which a node corresponds to a phrase, and an edge exists between two nodes their corresponding phrases are aligned. We sample random walks to compute the average number of steps it takes to reach a ranking of paraphrases with better ones being "closer" to the phrase of interest. This approach allows "feature" nodes that represent domain knowledge to be easily incorporated into the graph, and incorporates techniques to prevent the graph from growing too large for efficiency. Current state-of-the-art approaches, by contrast, require the graph to be bipartite, are limited to finding paraphrases that are of length two away from a phrase, and do not generally permit easy incorporation of domain knowledge into the graph. Manual evaluation of generated output shows that this approach outperforms state-of-the-art.
Back to symposium main page