Automatic discourse structure generation using rhetorical structure theory.

LeThanh, Huong (2004) Automatic discourse structure generation using rhetorical structure theory. PhD thesis, Middlesex University.

Download (4MB)


This thesis addresses a difficult problem in text processing: creating a System to automatically derive rhetorical structures of text. Although the rhetorical structure has proven to be useful in many fields of text processing such as text summarisation and information extraction, Systems that automatically generate
rhetorical structures with high accuracy are difficult to find. This is beccause discourse is one of the biggest and yet least well defined areas in linguistics. An
agreement amongst researchcrs on the best method for nnalysing thc rhetorical structure of text has not been found.

This thesis focuses on investigating a method to generate the rhetorical structures of text. By exploiting different cohesive devices, it proposes a method to recognise rhetorical relations between spans by checking for the appearance of these devices. These factors include cue phrases, noun-phrase cues, verb-phrase cues, reference words, time references, substitution words, ellipses, and syntactic information. The discourse analyser is divided into two levels: sentence-level and text-level. The former uses syntactic information and cue phrases to segment
sentences into elementary discourse units and to generate a rhetorical structure for each sentence. The latter derives rhetorical relations between large spans and then replaces each sentence by its corresponding rhetorical structure to produce the rhetorical structure of text. The rhetorical structure at the text-level is derived by selecting rhetorical relations to connect adjacent and non-overlapping spans to form a discourse structure that covers the entire text. Constraints of textual organisation and textual adjacency are effectively used in a beam search to reduce the search space in generating such rhetorical structures. Experiments carried out in this research received 89.4% F-score for the discourse segmentation, 52.4% F-score for the sentence-level discourse analyser and 38.1% F-score for the final output of the System. It shows that this approach provides good performance cumparison with current research in discourse.

Item Type: Thesis (PhD)
Additional Information: A thesis submitted to Middlesex University in partial fulfllment of the requirements for the degree of Doctor of Philosophy.
Research Areas: B. > Theses
A. > School of Science and Technology > Computer and Communications Engineering
Item ID: 8002
Depositing User: Devika Mohan
Date Deposited: 14 Jul 2011 05:46
Last Modified: 14 Oct 2016 06:35

Actions (login required)

Edit Item Edit Item