A Hierarchical Extractive Text Summarization Approach
No Thumbnail Available
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Abstract
The explosion of information has made it difficult for users to track information about
topics that is of interest to them. Automatic text summarization is an important tool in
this regard. In this research, we focus on generating various complementary extractive
summaries, each selecting the most relevant sentences in a given document from one or
more points of view.
Towards this end, we have developed a toolbox and a notation that allows the
construction of complicated summarizers as networks of tools or blocks. The basic blocks
are Keyword Extractor (KE), Sentence Extractor (SE), Semantic Grader (SG), Keyword
Emphasizer (KM), Sentence Orderer (SO), in addition to well known preprocessing tools
such as preprocessor, stemmer, stop word removal, etc.
Moreover, we introduced a notation and an approach that allows algebraic operations
on documents and keyword lists such as addition and subtraction, whether at the level of
words, sentences, or documents. As for the performance evaluation of these summarizers,
we have applied some of the considered standards in this field as Average Rouge
Summarization Metric (ARSM). At the same time, we have also advocated the use
of Effective Keyword Extraction (EKE) and Weighted Effective Keyword Extraction
(WEKE). Also, we have developed test procedures for our summarization methods to
ensure their effectiveness, as shown for each method of summarization in its place.
The summarization methods proposed and developed focus primarily on dealing
with the ideas within these documents, with ideas being represented by keywords or
combinations thereof. For example, we have proposed methods that search for common
ideas within several documents, as in the Fuse Document First (FDF) and the Fuse
Keywords First (FKF).The FDF algorithm focuses on fusing documents first, while the
FKF focuses on fusing keywords first. Moreover, we propose and develop methods that
allow for secondary ideas inside the document to emerge separately and in addition to the
primary ideas. Two algorithms, the Subtract-Document (SD) algorithm and the Stratified
Keywords Portioning (SKP) are algorithms are introduced and compared.
Moreover, we have created some methods that summarize text documents based on
the interaction between the ideas inside these documents as in the Comprehensive-View
Summarizer (CVS) and the Mutual View Summarizer (MVS).