The empirical base of linguistics: Grammaticality judgments and linguistic methodology

The empirical base of linguistics Grammaticality judgments and linguistic methodology Carson T. Schütze Classics in Linguistics 2 language science press Classics in Linguistics Chief Editors: Martin Haspelmath, Stefan Müller In this series: 1. Lehmann, Christian. Thoughts on grammaticalization 2. Schütze, Carson T. The empirical base of linguistics: Grammaticality judgments and linguistic methodology 3. Bickerton, Derek. Roots of language ISSN: 2366-374X The empirical base of linguistics Grammaticality judgments and linguistic methodology Carson T. Schütze language science press Carson T. Schütze. 2016. The empirical base of linguistics : Grammaticality judgments and linguistic methodology (Classics in Linguistics 2). Berlin: Language Science Press. This title can be downloaded at: http://langsci-press.org/catalog/book/89 © 2016, Carson T. Schütze Published under the Creative Commons Attribution 4.0 Licence (CC BY 4.0): http://creativecommons.org/licenses/by/4.0/ ISBN: 978-3-946234-02-9 (Digital) 978-3-946234-03-6 (Hardcover) 978-3-946234-04-3 (Softcover) 978-1-523743-32-2 (Softcover US) ISSN: 2366-374X DOI:10.17169/langsci.b89.100 Cover and concept of design: Ulrike Harbort Typesetting: Felix Kopecky, Sebastian Nordhoff, Carson T. Schütze Fonts: Linux Libertine, Arimo, DejaVu Sans Mono Typesetting software: XƎL A TEX Language Science Press Habelschwerdter Allee 45 14195 Berlin, Germany langsci-press.org Storage and cataloguing done by FU Berlin Language Science Press has no responsibility for the persistence or accuracy of URLs for external or third-party Internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. For my mother, Dorly Schütze, and the memory of my father, Ted Schütze It is simultaneously the greatest virtue and failing of linguistic theory that sequence acceptability judgments are used as the basic data. (Bever 1970b) Contents Preface (2016) xi Preface (1996) xvii Acknowledgments (1996) xix 1 Introduction 1 1.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Motivation: Whither Linguistics? . . . . . . . . . . . . . . . . . 10 1.4 A Working Hypothesis . . . . . . . . . . . . . . . . . . . . . . . 13 1.5 Scope and Organization . . . . . . . . . . . . . . . . . . . . . . . 15 2 Definitions and Historical Background 19 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2 A Short History of Grammaticality . . . . . . . . . . . . . . . . 20 2.3 The Use of Judgment Data in Linguistic Theory . . . . . . . . . . 36 2.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.3.2 The Dangers of Unsystematic Data Collection . . . . . . 37 2.3.3 A Case Study in the Use of Subtle Judgments . . . . . . 41 2.3.4 The Interpretation of the Annotations and Degrees of Bad- ness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.4 Introspection, Intuition, and Judgment . . . . . . . . . . . . . . 48 2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3 Judging Grammaticality 55 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2 Tasks that Access Grammaticality . . . . . . . . . . . . . . . . . 56 3.3 The Nature of Graded Judgments . . . . . . . . . . . . . . . . . . 62 3.3.1 Is Grammaticality Dichotomous? . . . . . . . . . . . . . 62 3.3.2 Experiments on Chomsky’s Three Levels of Deviance . . 70 3.3.3 Other Experiments . . . . . . . . . . . . . . . . . . . . . 74 Contents 3.3.4 Ratings, Rankings, and Consistency . . . . . . . . . . . . 77 3.4 The Judgment Process . . . . . . . . . . . . . . . . . . . . . . . . 81 3.5 The Interpretation of Judgments with Respect to Competence 88 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4 Subject-Related Factors in Grammaticality Judgments 97 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.2 Individual Differences: Three Representative Studies . . . . . . . 98 4.3 Organismic Factors . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.3.1 Field Dependence . . . . . . . . . . . . . . . . . . . . . . 106 4.3.2 Handedness . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.3.3 Other Organismic Factors . . . . . . . . . . . . . . . . . 109 4.4 Experiential Factors . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.4.1 Linguistic Training . . . . . . . . . . . . . . . . . . . . . 112 4.4.2 Literacy and Education . . . . . . . . . . . . . . . . . . . 120 4.4.3 Other Experiential Factors . . . . . . . . . . . . . . . . . 124 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 5 Task-Related Factors in Grammaticality Judgments 127 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.2 Procedural Factors . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.2.1 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.2.2 Order of Presentation . . . . . . . . . . . . . . . . . . . 132 5.2.3 Repetition . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.2.4 Mental State . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.2.5 Judgment Strategy . . . . . . . . . . . . . . . . . . . . . 142 5.2.6 Modality and Register . . . . . . . . . . . . . . . . . . . 144 5.2.7 Speed of Judgment . . . . . . . . . . . . . . . . . . . . . 146 5.3 Stimulus Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 5.3.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 5.3.2 Meaning . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 5.3.3 Parsability . . . . . . . . . . . . . . . . . . . . . . . . . . 160 5.3.4 Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . 161 5.3.5 Lexical Content . . . . . . . . . . . . . . . . . . . . . . . 162 5.3.6 Morphology and Spelling . . . . . . . . . . . . . . . . . 164 5.3.7 Rhetorical Structure . . . . . . . . . . . . . . . . . . . . 164 5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 viii Contents 6 Theoretical and Methodological Implications 167 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 6.2 Modeling Grammaticality Judgments . . . . . . . . . . . . . . . 168 6.2.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . 168 6.2.2 The Outlines of a Preliminary Model . . . . . . . . . . . 169 6.2.3 Applications of the Model . . . . . . . . . . . . . . . . . 177 6.3 Methodological Proposals . . . . . . . . . . . . . . . . . . . . . . 180 6.3.1 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . 180 6.3.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 183 6.3.3 Analysis and Interpretation of Results . . . . . . . . . . 191 6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 7 Looking Back and Looking Ahead 199 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 7.2 Directions for Further Research . . . . . . . . . . . . . . . . . . 201 7.3 The Future in Linguistics . . . . . . . . . . . . . . . . . . . . . . 206 References 209 Indexes 229 Name Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 ix Preface (2016) Since the original version of this book (University of Chicago Press, 1996) went out of print in the 2000s, I have continued to receive inquiries from people asking how they can obtain a copy. I am therefore thrilled that Language Science Press has offered to make the title available again, as part of their Classics in Linguistics series. I would like to thank series editors Stefan Müller and Martin Haspelmath, as well as Sebastian Nordhoff and Felix Kopecky, for their help in making this happen. The content of this new printing is identical to the first printing, with the fol- lowing exceptions: • I have altered the wording in a few places where I found it insufficiently clear or terminologically outdated; • my uses of the term informant(s) have been replaced with consultant(s) or speaker(s) , in keeping with current practice (of course, the former term still appears in some quoted passages); • I have updated the reference information for a couple of works that had not been published at the time of the original printing, particularly Cowart (1997); • the original index has been split into name and subject indexes, and both are now more comprehensive. In terms of presentation, the following things have changed: • the format of citations and references has been adapted to LangSci house style, as have other minor typographical choices; • full given names have been added to references whenever available; • since the text has been freshly typeset, the page numbers do not match those of the original printing; however, the (sub)section numbers are un- changed: I suggest using those if it is necessary to specify a location within a chapter. Example numbers are also unchanged. Preface (2016) Importantly, I have not attempted to update the content in light of subsequent relevant research, since this would undoubtedly have compelled me to try to write a whole new book. Of course, linguistics and psycholinguistics have chang- ed a great deal in the 20 years since I completed the original manuscript; e.g., “theoretical” linguistics has notably become more “experimental.” Also, some of my own views on the issues have evolved over those two decades. There are passages in the book that I would have omitted or altered, if I had allowed myself to make any substantive revisions. Instead, I have chosen to restrict all follow- up discussion to this preface. In what follows I try to point readers to works that should allow them to “get up to speed” on intervening developments. For collections that are comprised mainly of papers on topics that are impor- tant in the book, see McNair et al. (1996), Penke & Rosenbach (2004), Kepser & Reis (2005), Borsley (2005), Featherston (2007) and replies in the same journal issue, Featherston & Sternefeld (2007), Featherston & Winkler (2009), and Win- kler & Featherston (2009). My more recent views can be found in the following surveys: Schütze (2006; 2011) and Schütze & Sprouse (2013). There have been (at least) four major developments involving the empirical base of linguistics that anyone interested in the topic should be aware of. 1. The adaptation of the magnitude estimation task from psychophysics to judgment collection (Bard, Robertson & Sorace 1996). This was touted as having numerous potential advantages over the traditional Likert scale task, most or all of which have been subsequently refuted (see Weskott & Fanselow 2011 and Sprouse, Schütze & Almeida 2013). 2. The use of World Wide Web searches to establish attestation, and infer ac- ceptability, of certain sentence/construction types. I discuss the limitations of this approach in Schütze (2009). 3. The use of Amazon Mechanical Turk (AMT) and potentially other crowd- sourcing platforms as sources of subjects for acceptability judgment and many other psycholinguistic experiments (so far, in only a handful of lan- guages). For an empirical investigation of how AMT results compare with judgments collected in the lab (on a small range of constructions in En- glish), see Sprouse (2011). 4. Detailed empirical challenges to – and defenses of – the proposal, advo- cated in Section 7.2 in the book, that Subjacency effects could be reduced to processing factors. See Yoshida et al. (2014) as well as the Stanford/ Maryland debate (Hofmeister & Sag (2010); Hofmeister, Staum Casasanto xii Preface (2015) & Sag (2012a,b); Sprouse, Wagers & Phillips (2012a,b), and many of the contributions in Sprouse & Hornstein 2014). Finally, there is a statement by Chomsky, which I attribute in the book (p. 195) to a popular press source, about which I have often been questioned, wherein Chomsky calls it a truism that genetically based Universal Grammar (UG) is sub- ject to some individual variation. For those who have asked whether Chomsky’s position can be confirmed in any academic publications, I offer the following quotes: Putting aside genetic variation (an interesting but marginal phenomenon in the case of language) and conceivable but unknown epigenetic effects, the principles of UG, whatever they are, are invariant. (Chomsky 2013: 35) It is hardly controversial that [the faculty of language] is a common human possession apart from pathology, to an approximation so close that we can ignore variation. (Chomsky 2008: 138) I am aware of no empirical evidence that would indicate how much UG can vary across individuals. Carson T. Schütze December 2015 References Bard, Ellen Gurman, Dan Robertson & Antonella Sorace. 1996. Magnitude esti- mation of linguistic acceptability. Language 72. 32–68. Borsley, Robert D. (ed.). 2005. Data in theoretical linguistics. Special issue. Lingua 115(11). Chomsky, Noam. 2008. On phases. In Robert Freidin, Carlos P. Otero & Maria Luisa Zubizarreta (eds.), Foundational issues in linguistic theory: Essays in honor of Jean-Roger Vergnaud , 133–166. Cambridge, MA: MIT Press. Chomsky, Noam. 2013. Problems of projection. Lingua 130. 33–49. Cowart, Wayne. 1997. Experimental syntax: Applying objective methods to sentence judgments . Thousand Oaks, CA: SAGE Publications. Featherston, Sam. 2007. Data in generative grammar: The stick and the carrot. Theoretical Linguistics 33(3). 269–318. xiii Preface (2015) Featherston, Sam & Wolfgang Sternefeld (eds.). 2007. Roots: Linguistics in search of its evidential base . Berlin: Mouton de Gruyter. Featherston, Sam & Susanne Winkler (eds.). 2009. The fruits of empirical linguis- tics. Volume 1: Process . Berlin: Mouton de Gruyter. Hofmeister, Philip & Ivan A. Sag. 2010. Cognitive constraints and island effects. Language 86. 366–415. Hofmeister, Philip, Laura Staum Casasanto & Ivan A. Sag. 2012a. How do individ- ual cognitive differences relate to acceptability judgments? A reply to Sprouse, Wagers, and Phillips. Language 88. 390–400. Hofmeister, Philip, Laura Staum Casasanto & Ivan A. Sag. 2012b. Misapplying working-memory tests: A reductio ad absurdum. Language 88. 408–409. Kepser, Stephan & Marga Reis (eds.). 2005. Linguistic evidence: Empirical, theoret- ical, and computational perspectives . Berlin: Mouton de Gruyter. McNair, Lisa, Kora Singer, Lise M. Dobrin & Michelle M. AuCoin (eds.). 1996. Papers from the parasession on theory and data in linguistics (CLS 32/2). Chicago: Chicago Linguistic Society. Penke, Martina & Anette Rosenbach (eds.). 2004. What counts as evidence in linguistics? Special issue. Studies in Language 28(3). Phillips, Colin. 2006. The real-time status of island phenomena. Language 82. 795– 823. Schütze, Carson T. 2006. Data and evidence. In Keith Brown (ed.), Encyclopedia of language and linguistics , 2nd edn., vol. 3, 356–363. Oxford: Elsevier. Schütze, Carson T. 2009. Web searches should supplement judgements, not sup- plant them. Zeitschrift für Sprachwissenschaft 28. 151–156. Schütze, Carson T. 2011. Linguistic evidence and grammatical theory. Wiley In- terdisciplinary Reviews: Cognitive Science 2. 206–221. Schütze, Carson T. & Jon Sprouse. 2013. Judgment data. In Robert J. Podesva & Devyani Sharma (eds.), Research methods in linguistics , 27–50. New York: Cam- bridge University Press. Sprouse, Jon. 2011. A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory. Behavior Research Methods 43(1). 155–167. Sprouse, Jon & Norbert Hornstein (eds.). 2014. Experimental syntax and island effects . Cambridge: Cambridge University Press. Sprouse, Jon, Carson T. Schütze & Diogo Almeida. 2013. A comparison of infor- mal and formal acceptability judgments using a random sample from Linguistic Inquiry 2001–2010. Lingua 134. 219–248. xiv Preface (2015) Sprouse, Jon, Matt Wagers & Colin Phillips. 2012a. A test of the relation between working memory and syntactic island effects. Language 88. 82–123. Sprouse, Jon, Matt Wagers & Colin Phillips. 2012b. Working-memory capacity and island effects: A reminder of the issues and the facts. Language 88. 401– 407. Weskott, Thomas & Gisbert Fanselow. 2011. On the informativity of different mea- sures of linguistic acceptability. Language 87. 249–273. Winkler, Susanne & Sam Featherston (eds.). 2009. The fruits of empirical linguis- tics. Volume 2: Product . Berlin: Mouton de Gruyter. Yoshida, Masaya, Nina Kazanina, Leticia Pablos & Patrick Sturt. 2014. On the origin of islands. Language 29(7). 761–770. xv Preface (1996) The goal of this book is to demonstrate that the absence of methodology of gram- maticality judgments in linguistics constitutes a serious obstacle to meaningful research, and to begin to propose suitable remedies for this problem. Throughout much of the history of linguistics, judgments of the grammaticality/acceptability of sentences (and other linguistic intuitions) have been the major source of evi- dence in constructing grammars. While this seems to have been an exceedingly fruitful approach, some skeptics have worried that theoretical linguists are in fact constructing grammars of intuition, which might not have much to do with the competence that underlies everyday production or comprehension of language. Also, in the pseudoexperimental procedure of judgment elicitation there is typi- cally no attempt to impose any of the standard experimental controls, and often the only subject is the theorist himself or herself. Should we linguists be wor- ried? I think so. I survey the way grammaticality judgments are currently used in theoretical syntax, and argue that such uses, together with the problems of intuition and experimental design, demand a careful examination of judgments, not as pure sources of data, but as instances of metalinguistic performance. Several important issues arise when this view of grammaticality judgments is pursued, including which tasks one should use to elicit them, what people are doing when they give them, and what they can really tell us about linguistic competence. On the assumption that grammaticality judgments result from in- teractions among primary language faculties of the mind and general cognitive processes, I try to understand the process by identifying and analyzing its com- ponent parts. I review the psycholinguistic research that has examined ways in which the judgment process can vary with differences among subjects, experi- mental manipulations, and spurious features of the stimulus. Parallels with other cognitive behaviors are pointed out. After drawing together the substantive and methodological findings into a schematic picture of what the overall process of giving linguistic intuitions might look like, I propose strategies for collecting these intuitions that avoid the pitfalls of previous work and take account of the conditions that have been shown to influence such judgments. I suggest that we can actually strengthen the case for linguistic universals by giving empirical ar- Preface (1996) guments that much of the variability in judgments can be explained without ap- pealing to differences in Universal Grammar. Finally, I discuss how mainstream linguistic theory might be affected by the growing body of research in this area. I think we will increasingly feel not just a need but also a desire to tackle dif- ficult data questions, particularly as theoretically sophisticated psycholinguistic research increases and we come to understand more about the ways in which linguistic competence is put to use in the mind. xviii