Villa Monte, Augusto

Document summarization using a scoring-based representation - 1 archivo (311,1 kB)

Formato de archivo PDF. -- Este documento es producción intelectual de la Facultad de Informática - UNLP (Colección BIPA/Biblioteca)

Currently, data repositories contain a plethora of information in different formats, most of which consists of text. This situation has raised interest in the study of techniques to automate the identification of the most relevant sentences of a document with the goal of generating a text summary. This article presents a technique for extracting the most representative sentences in a document, employing a user-defined criteria. The criteria is learned by the system using an optimization technique and a training document where the user has ranked the sentences according to their relevance. The proposed method has been applied to a five-chapter thesis with good results. At the end of this paper we provide some conclusions as well as ideas for future work.

Standard No.: DIF-M7759

Subjects--Topical Terms:
MINERÍA DE DATOS