he compilation of a corpus that is able to capture th studies, diachronic corpus linguistics is a very prom to have access to a corpus as a representative sample advances in English corpus linguistics include the fo h between a monitor corpus for lexicographical descri was observed in the corpus of Old English texts on th itative analysis of corpus data may yield interesting e kind of reference corpus is represented by the Brit st it out on a test corpus of 50,000 words of spontan he influence of the corpus revolution on applied ling composition of this corpus shows its representativene he compilation of a corpus that is able to capture th studies, diachronic corpus linguistics is a very prom to have access to a corpus as a representative sample advances in English corpus linguistics include the fo h between a monitor corpus for lexicographical descri was observed in the corpus of Old English texts on th itative analysis of corpus data may yield interesting e kind of reference corpus is represented by the Brit st it out on a test corpus of 50,000 words of spontan he influence of the corpus revolution on applied ling composition of this corpus shows its representativene he compilation of a corpus that is able to capture th studies, diachronic corpus linguistics is a very prom to have access to a corpus as a representative sample advances in English corpus linguistics include the fo h between a monitor corpus for lexicographical descri was observed in the corpus of Old English texts on th itative analysis of corpus data may yield interesting e kind of reference corpus is represented by the Brit st it out on a test corpus of 50,000 words of spontan he influence of the corpus revolution on applied ling composition of this corpus shows its representativene he compilation of a corpus that is able to capture th studies, diachronic corpus linguistics is a very prom to have access to a corpus as a representative sample advances in English corpus linguistics include the fo h between a monitor corpus for lexicographical descri was observed in the corpus of Old English texts on th itative analysis of corpus data may yield interesting e kind of reference corpus is represented by the Brit st it out on a test corpus of 50,000 words of spontan he influence of the corpus revolution on applied ling composition of this corpus shows its representativene he compilation of a corpus that is able to capture th studies, diachronic corpus linguistics is a very prom english corpus linguistics Ewa Jonsson Conversational Writing A Multidimensional Study of Synchronous and Supersynchronous Computer-Mediated Communication V o l u m e Thomas Kohnen · Joybrato Mukherjee (eds.) 16 The author analyses computer chat as a form of communication. While some forms of computer-mediated communication (CMC) deviate only marginally from traditional writing, computer chat is popularly considered to be written conversation and the most “oral” form of written CMC. This book systematically explores the varying degrees of conversationality (“orality”) in CMC, focusing in particular on a corpus of computer chat (synchronous and supersynchronous CMC) compiled by the author. The author employs Douglas Biber’s multidi- mensional methodology and situates the chats relative to a range of spoken and written genres on his dimensions of linguistic variation. The study fills a gap both in CMC linguistics as regards a systematic variationist approach to computer chat genres and in variationist linguistics as regards a description of conversational writing. Ewa Jonsson is a researcher in English linguistics at Mid-Sweden University. www.peterlang.com Conversational Writing ENGLISH CORPUS LINGUISTICS Thomas Kohnen / Joybrato Mukherjee (eds.) VOLUME 16 Ewa Jonsson Conversational Writing A Multidimensional Study of Synchronous and Supersynchronous Computer-Mediated Communication Bibliographic Information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the internet at http://dnb.d-nb.de. Library of Congress Cataloging-in-Publication Data Names: Jonsson, Ewa, 1968- author. Title: Conversational writing : a multidimensional study of synchronous and supersynchronous computer-mediated communication / Ewa Jonsson. Description: Frankfurt am Main ; New York : Peter Lang, [2016] | Series: English Corpus Linguistics ; Volume 16 | Includes bibliographical references. Identifiers: LCCN 2016000923| ISBN 9783631671535 (Print) | ISBN 9783653065121 (E-Book) Subjects: LCSH: Conversation analysis--Data processing. | Authorship--Computer net- work sources. | Digital communications. | Online social networks. Classification: LCC P95.45 .J66 2016 | DDC 302.2/24--dc23 LC record available at http://lccn.loc.gov/2016000923 Published with financial support from Mid-Sweden University. This book is an open access book and available on www.oapen.org and www.peterlang.com. It is distributed under the terms of the Creative Commons Attribution Noncommercial, No Derivatives (CC‐BY‐NC‐ND). ISSN 1610-868X ISBN 978-3-631-67153-5 (Print) E-ISBN 978-3-653-06512-1 (E-Book) DOI 10.3726/978-3-653-06512-1 © Peter Lang GmbH Internationaler Verlag der Wissenschaften Frankfurt am Main 2015 Alle Rechte vorbehalten. All rights reserved. Peter Lang Edition is an Imprint of Peter Lang GmbH. Peter Lang – Frankfurt am Main · Bern · Bruxelles · New York · Oxford · Warszawa · Wien All parts of this publication are protected by copyright. Any utilisation outside the strict limits of the copyright law, without the permission of the publisher, is forbidden and liable to prosecution. This applies in particular to reproductions, translations, microfilming, and storage and processing in electronic retrieval systems. This publication has been peer reviewed. www.peterlang.com 5 Acknowledgments Without generous support from the English section of the Department of Humanities at Mid-Sweden University this book might not have seen the light of day. Thus, I wish to express my deep gratitude to my colleagues there for en- couraging its publication, especially Terry Walker and Anders Olsson. The book is based on my doctoral thesis, written and presented at Uppsala University. I would like to acknowledge my Uppsala University colleagues for their con- structive advice and scholarly support during the writing process. Most impor- tantly, I am grateful to my doctoral supervisors Merja Kytö and Christer Geisler in this respect. Thank you all for your professional assistance and inspiration. 7 Preface This book presents a linguistic investigation of two genres of computer-mediated communication (CMC), namely two modes of conversational writing: “Internet relay chat” (synchronous CMC) and “split-window ICQ chat” (supersynchronous CMC). The investigation employs Douglas Biber’s multifeature multidimension- al methodology, taking into account the six dimensions of textual variation in English identified in his 1988 book Variation across speech and writing The book came about as an attempt to disentangle my puzzlement in the early 21st century with some fellow university students’ frequent propensity to prefer written conversation (computer chat) to spoken conversation. I was a member of the board of the university’s computer society and one of few in the society from outside the technological sphere. At board meetings, I noticed a reluctance among board members to sit down and discuss face-to-face. It seemed as if the members had a lack of practice and rather wished to meet and discuss in chat room channels or in Unix Talk. Occasionally, items on the agenda were left un- finished or postponed to discussions in the online environments, and several board members appeared to be more comfortable conversing in writing. I became curious about the board members’ choice of modality – opting for writing instead of speech. Much like the interlocutors in social media today, they appeared to feel safer in the graphemic interface, while still being able to solve issues of the computer society efficiently because of the real-time communica- tion. Conversation in writing seems to filter away a number of cues that users potentially find threatening in face-to-face communication. If I was a psycholo- gist, I might have embarked on a study involving in-depth interviews with chat room users like the board members, but since I am a linguist, I decided to limit my scope to the language communicated in each respective medium. Questions that I address in this book are what the most salient linguistic fea- tures of computer chat are, how synchronous writing is similar to speech and how written conversations differ from spoken conversations. My study does not involve any of the individuals described above, but chat room conversationalists in international, public channels (for synchronous chat) and adolescents in an English-speaking country (for supersynchronous chat). The multidimensional methodology chosen for the investigation identifies, among other things, the most salient linguistic features of their computer chats (features conspicuous either by their high relative frequency or by their relative rarity), and the pro- cedure of positioning the two genres represented by the chats on Biber’s (1988) 8 dimensions enables a systematic lexico-grammatical description of the genres relative to other genres of writing, and speech. Although none of Biber’s (1988) dimensions constitutes a dichotomous dis- tinction between writing and speech, they all differentiate among literate and oral genres in various respects. Among the genres studied by Biber are face-to-face and telephone conversations. By relating the CMC genres to the oral conver- sational genres on the dimensions, it is possible to assess the degree of orality in computer-mediated conversational writing, another undertaking of the study. The investigation presented here considers previous assumptions that synchro- nously mediated texts display more speech-like properties than asynchronous texts, and discusses whether supersynchronously mediated conversational writ- ing texts are more speech-like than synchronously mediated ones. The study further employs M. A. K. Halliday’s model of semiotics, among other reasons to explain differences in the outcome of subtly divergent communicative settings, and argues for the inclusion of Halliday’s measure of lexical density in studies of linguistic variation involving conversational writing. Finally, two fea- tures not included in Biber’s (1988) methodology are here found to be particularly indicative of conversational writing texts: inserts, specified in Biber et al.’s (1999) Longman grammar of spoken and written English , and “emotives” (comprising emoticons and sentiment initialisms), a feature introduced in this study. Why, then, is it important to study conversational writing genres from such an in-depth linguistic point of view? Firstly, linguistic research has found register/ genre variation to be a fundamental aspect of human language. Biber & Conrad (2009: 23) note that “all humans control a range of registers/genres” and that “[g]iven the ubiquity of register/genre variation, an understanding of how lin- guistic features are used in patterned ways across text varieties is of central importance for both the description of particular languages and the develop- ment of cross-linguistic theories of language use.” Biber & Conrad call register/ genre variation a linguistic universal. In the light of this, a study of conversational writing genres is as natural, relevant and important as the study of other gen- res of language. Variationists aim to describe language adequately, to enable the comparison across genres, to map out language users’ competence and to eventu- ally facilitate, for instance, cross-linguistic comparisons. A thorough description of conversational writing may in turn facilitate the development of computa- tional tools for automatic genre classification, editing and translation, as well as the development of new software for digital communication. And last but not least, it may lend a clue to psychologists’ and sociologists’ investigation of people’s motivations for opting for written, rather than spoken, conversations. 9 Table of Contents Tables .........................................................................................................................13 Figures .......................................................................................................................15 Abbreviations .........................................................................................................17 Chapter 1. Introduction .................................................................................19 1.1 Speech vs. writing vs. conversational writing .................................................19 1.2 Aim and scope of the study ..............................................................................23 1.3 Synchronicity of communication ....................................................................30 1.4 Notes on terminology ......................................................................................34 1.5 Outline of the study ..........................................................................................37 Chapter 2. Background ...................................................................................39 2.1 Introductory remarks ........................................................................................39 2.2 Survey of the literature on speech and writing ..............................................39 2.3 Biber’s (1988) dimensions of textual variation ..............................................50 2.4 Halliday’s and others’ essentially qualitative approaches ..............................60 2.5 Survey of the literature on CMC .....................................................................66 2.6 Description of the media for conversational writing ....................................77 2.7 Chapter summary ..............................................................................................81 Chapter 3. Material and method ................................................................83 3.1 Introductory remarks ........................................................................................83 3.2 Creating and annotating a corpus of Internet relay chat ..............................84 10 3.3 Creating and annotating a corpus of split-window ICQ chat ......................................................................................93 3.4 The Santa Barbara Corpus subset ....................................................................96 3.5 Standardization and dimension score computation ................................. 100 3.6 Average figures for writing and speech, respectively ................................ 103 3.7 Chapter summary ........................................................................................... 106 Chapter 4. Salient features in conversational writing ................... 109 4.1 Introductory remarks ..................................................................................... 109 4.2 Distribution of modal auxiliary verbs and personal pronouns .......................................................................................... 111 4.3 Word length, type/token ratio and lexical density...................................... 130 4.4 The most salient features ............................................................................... 149 4.5 Paralinguistic features and extra-linguistic content ................................... 172 4.6 Inserts and emotives ...................................................................................... 189 4.7 Chapter summary ........................................................................................... 204 Chapter 5. Conversational writing positioned on Biber’s (1988) dimensions ................................................... 205 5.1 Introductory remarks ..................................................................................... 205 5.2 Dimension plots ............................................................................................. 211 5.2.1 Dimension 1: Informational versus Involved Production............... 212 5.2.2 Dimension 2: Narrative versus Non-Narrative Concerns ............... 222 5.2.3 Dimension 3: Explicit/Elaborated versus Situation-Dependent Reference.......................................................... 228 5.2.4 Dimension 4: Overt Expression of Persuasion/Argumentation..... 234 5.2.5 Dimension 5: Abstract/Impersonal versus Non-Abstract/ Non-Impersonal Information ............................................................. 241 5.2.6 Dimension 6: On-Line Informational Elaboration .......................... 247 5.3 Chapter summary ........................................................................................... 254 11 Chapter 6. Discussion ................................................................................... 255 6.1 Introductory remarks ..................................................................................... 255 6.2 Hypotheses revisited quantitatively.............................................................. 256 6.3 From genres to text types............................................................................... 263 6.4 Research questions revisited ......................................................................... 275 6.5 Chapter summary ........................................................................................... 289 Chapter 7. Conclusion .................................................................................. 291 7.1 Summary of the study .................................................................................... 291 7.2 Suggestions for further research ................................................................... 295 Appendices ........................................................................................................... 297 Appendix I. Texts used in Biber’s (1988) study ......................................... 297 Appendix II. Descriptive statistics for genres studied ................................. 299 Appendix III. Raw frequencies of linguistic features .................................... 313 Appendix IV. Examples of excluded material ............................................... 319 Appendix V. Features with a |standard score| >2.0 ..................................... 320 Appendix VI. Statistical tests of salient features ............................................ 322 Appendix VII. Word lists for the corpora studied .......................................... 323 Appendix VIII. Dimension score statistics for Biber’s (1988) genres ............ 325 Appendix IX. Computation of cluster affiliations ......................................... 330 Appendix X. Dimension scores for individual texts ................................... 332 List of References ............................................................................................... 333 13 Tables Table 1.1: Principal synchronicity and direction of communication in various genres.........................................................31 Table 2.1: Linguistic features studied in Biber (1988) .......................................51 Table 2.2: Summary of co-occurring features on each dimension ..................53 Table 2.3: Halliday’s three metafunctions in language and related concepts.............................................................................................64 Table 3.1: Size of corpora compiled/sampled and annotated for the present study .........................................................................................83 Table 3.2: Tags used in the annotation of the first twelve turns in Internet relay chat text 4a (UCOW).....................................................88 Table 4.1: Frequencies of possibility, necessity and prediction modals per 1,000 words .................................................................................. 114 Table 4.2: Frequencies of first, second and third person pronouns per 1,000 words......................................................................................... 122 Table 4.3: Type/token ratio, with standard deviation ..................................... 135 Table 4.4: Unweighted lexical density for five corpora................................... 140 Table 4.5: Unweighted lexical density per clause and related measures ........146 Table 4.6: Frequencies per 1,000 words for the most salient linguistic features ......................................................................... 150 Table 4.7: Frequencies of inserts............................................................................190 Table 4.8: Frequencies of emotives........................................................................190 Table 4.9: Examples of turns with inserts in the three annotated corpora ..................................................................................194 Table 4.10: Individuals’ emotives usage in the split-window ICQ corpus, by gender..................................................................................................202 Table 5.1: Descriptive dimension statistics for the UCOW genres and the SBC subset ................................................. 206 14 Table 5.2 Results from ANOVA among the new genres and from Biber’s (1988: 127) tests among his genres. .......................................208 Table 5.3: Results from t-tests among the new genres.................................... 208 Table 5.4: Summary of co-occurring features on each dimension .................210 Table 5.5: Corrected dimension scores for the “ELC other” corpus of BBS conferencing presented in Collot (1991) ..................................211 Table 6.1: Distance of the three CMC genres to oral conversations measured as standard deviation units on each dimension ............258 Table 6.2: Distance of the conversational writing genres to oral conversations indicated as t-values on each dimension.................259 Table 6.3: Results from t-tests among the conversational writing genres and the conversational spoken genres. ............................... 261 Table 6.4: Summary of English text types ........................................................ 267 15 Figures Figure 1.1: Examples of asynchronous, synchronous and supersynchronous modes of written CMC .....................................27 Figure 1.2: Working relationship between modalities, media and genres/modes in the present study ......................................................29 Figure 2.1: Metafunctions in relation to register and genre in semiotics .................................................................................65 Figure 2.2: Approximate emergence of modes for written CMC...................67 Figure 2.3: Screenshot of Internet relay chat window (SCMC)......................79 Figure 2.4: Screenshot of split-window ICQ chat (SSCMC) .............................80 Figure 4.1: Distribution of possibility, necessity and prediction modals per 1,000 words ....................................................................... 114 Figure 4.2: Distribution of first, second and third person pronouns per 1,000 words. ..................................................................................122 Figure 4.3: Proportions for first, second and third person pronouns of total personal pronoun use ..........................................................125 Figure 4.4: Average word length in the five media ........................................ 131 Figure 4.5: Type/token ratio, with standard deviation ....................................135 Figure 4.6: Direct WH-questions.........................................................................152 Figure 4.7: Analytic negation ........................................................................... 152 Figure 4.8: Demonstrative pronouns ..................................................................156 Figure 4.9: Indefinite pronouns .............................................................................. 156 Figure 4.10: Present tense verbs ......................................................................... 159 Figure 4.11: Predicative adjectives .........................................................................159 Figure 4.12: Contractions ........................................................................................163 Figure 4.13: Prepositional phrases.........................................................................163 16 Figure 4.14: Standard score distribution of the linguistic features that, in SCMC or SSCMC, deviate by more than 2 s.d. from Biber’s (1988) mean............................ 171 Figure 4.15: Inserts .............................................................................................. 190 Figure 4.16 Emotives .......................................................................................... 190 Figure 4.17 Distribution of emotives in the conversational writing corpora............................................................................... 201 Figure 5.1a: Mean scores on Dimension 1 for all genres................................ 214 Figure 5.1b: Spread of scores along Dimension 1 for all genres .................... 220 Figure 5.2a: Mean scores on Dimension 2 for all genres................................ 223 Figure 5.2b: Spread of scores along Dimension 2 for all genres .................... 226 Figure 5.3a: Mean scores on Dimension 3 for all genres................................ 229 Figure 5.3b: Spread of scores along Dimension 3 for all genres .................... 234 Figure 5.4a: Mean scores on Dimension 4 for all genres................................ 236 Figure 5.4b: Spread of scores along Dimension 4 for all genres .................... 240 Figure 5.5a: Mean scores on Dimension 5 for all genres................................ 243 Figure 5.5b: Spread of scores along Dimension 5 for all genres .................... 247 Figure 5.6a: Mean scores on Dimension 6 for all genres................................ 249 Figure 5.6b: Spread of scores along Dimension 6 for all genres .................... 254 Figure 6.1: Matrix combining the degree of shared context and the synchronicity of communication in the genres studied. ............283 Figure 6.2: Relationships found between modalities, media and the genres investigated. ............................................................................286 17 Abbreviations ACMC Asynchronous computer-mediated communication AIM America Online instant messenger ASCII American standard code for information interchange BBS Bulletin board system(s) CMC Computer-mediated communication EFL English as a foreign language ELC Electronic Language Corpus FAQ Frequently asked questions ICE International Corpus of English ICQ “I seek you” (chat) IM Instant messaging IRC Internet relay chat LLC London-Lund Corpus (of Spoken English) LOB Lancaster-Oslo/Bergen Corpus LSWE Longman Spoken and Written English Corpus MD Multidimensional MF/MD Multifeature/multidimensional MMORPG Massive multiplayer online role-playing game MOO MUD object-oriented MSN Microsoft network instant messenger MUD Multi-user dungeon OED Oxford English dictionary SBC Santa Barbara Corpus (of Spoken American English) SCMC Synchronous computer-mediated communication SMS Short message service (message) SSCMC Supersynchronous computer-mediated communication TTR Type/token ratio UCOW Uppsala Conversational Writing Corpus 1PP First person pronoun(s) 2PP Second person pronoun(s) 3PP Third person pronoun(s)