Book Image

Clojure Data Structures and Algorithms Cookbook

By : Rafik Naccache
Book Image

Clojure Data Structures and Algorithms Cookbook

By: Rafik Naccache

Overview of this book

Table of Contents (14 chapters)
Clojure Data Structures and Algorithms Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Summarizing texts by extracting the most representative sentences


In this recipe, we are going to use an extractive method to build a summary out of a set of text documents. By extractive, we mean that rather than drawing any knowledge from the source documents in order to rephrase it in a more concise way, we'll try to detect the most salient sentences in those documents and show these as the summary of the text.

The algorithm we are going to use is somewhat inspired by Google's PageRank and is labeled as LexRank. The spirit behind it is if we try to represent every document sentence as a vector, we shall come up with a graph that represents all of these sentences tied together. Every edge drawn between each couple of sentences is weighted by the distance between these two sentences. If any two distances are close enough, we can say that they are connected. Then, the logic we find in PageRank applies. We define the degree of each sentence as the number of the others that are connected to...