Comparing discourse structures between purely linguistic and situated messages in an annotated corpus

Nicholas Asher, Julie Hunter, Kate Thompson

Abstract


This paper describes a corpus of situated multiparty chats developed for the STAC project (Strategic Conversation, ERC grant n. 269427). and annotated for discourse structure in the style of Segmented Discourse Representation Theory (SDRT; Asher & Lascarides,2003).  The STAC corpus is not only a rich source of data on strategic conversation, but also the first corpus that we are aware of that provides discourse structures for multiparty dialogues situated within a virtual environment.  The corpus was annotated in two stages: we initially annotated the chat moves only, but later decided to annotate interactions between the chat moves and non-linguistic events from the virtual environment. This two-step procedure  has allowed us quantify various ways in which adding information from the nonlinguistic context affects dialogue structure.  In this paper, we  look at how annotations based only on linguistic information were preserved once the nonlinguistic context was factored in.  We explain that while the preservation of relation instances is relatively high when we move from one corpus to the other, there is little preservation of higher order structures  that capture ``the main point" of a dialogue and distinguish it from peripheral information.


Full Text:

PDF



www.dialogue-and-discourse.orgISSN: 2152-9620   Journal doi: 10.5087/dad