EXPRESSION OF EMOTION IN MUSIC AND VOCAL COMMUNICATION Topic Editors Anjali Bhatara, Petri Laukka and Daniel J. Levitin PSYCHOLOGY FRONTIERS COPYRIGHT STATEMENT ABOUT FRONTIERS © Copyright 2007-2014 Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering Frontiers Media SA. All rights reserved. approach to the world of academia, radically improving the way scholarly research is managed. All content included on this site, such as The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share text, graphics, logos, button icons, images, and generate knowledge. Frontiers provides immediate and permanent online open access to all video/audio clips, downloads, data compilations and software, is the property its publications, but this alone is not enough to realize our grand goals. of or is licensed to Frontiers Media SA (“Frontiers”) or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their FRONTIERS JOURNAL SERIES respective authors, subject to a license granted to Frontiers. The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online The compilation of articles constituting journals, promising a paradigm shift from the current review, selection and dissemination this e-book, wherever published, as well as the compilation of all other content on processes in academic publishing. this site, is the exclusive property of All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service Frontiers. For the conditions for downloading and copying of e-books from to the scholarly community. At the same time, the Frontiers Journal Series operates on a revo- Frontiers’ website, please see the Terms lutionary invention, the tiered publishing system, initially addressing specific communities of for Website Use. If purchasing Frontiers e-books from other websites or sources, scholars, and gradually climbing up to broader public understanding, thus serving the interests the conditions of the website concerned of the lay society, too. apply. Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission. DEDICATION TO QUALITY Individual articles may be downloaded Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interac- and reproduced in accordance with the principles of the CC-BY licence subject to tions between authors and review editors, who include some of the world’s best academicians. any copyright or other notices. They may Research must be certified by peers before entering a stream of knowledge that may eventually not be re-sold as an e-book. reach the public - and shape society; therefore, Frontiers only applies the most rigorous and As author or other contributor you grant a CC-BY licence to others to reproduce unbiased reviews. your articles, including any graphics and Frontiers revolutionizes research publishing by freely delivering the most outstanding research, third-party materials supplied by you, in accordance with the Conditions for evaluated with no bias from both the academic and social point of view. Website Use and subject to any copyright By applying the most advanced information technologies, Frontiers is catapulting scholarly notices which you include in connection with your articles and materials. publishing into a new generation. All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. WHAT ARE FRONTIERS RESEARCH TOPICS? For the full conditions see the Conditions Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are for Authors and the Conditions for Website Use. collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot ISSN 1664-8714 research area! ISBN 978-2-88919-263-2 Find out more on how to host your own Frontiers Research Topic or contribute to one as an DOI 10.3389/978-2-88919-263-2 author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org Frontiers in Psychology August 2014 | Expression of emotion in music and vocal communication | 1 EXPRESSION OF EMOTION IN MUSIC AND VOCAL COMMUNICATION Topic Editors: Anjali Bhatara, Université Paris Descartes, France Petri Laukka, Stockholm University, Sweden Daniel J. Levitin, McGill University, Canada Two of the most important social skills in humans are the ability to determine the moods of those around us, and to use this to guide our behavior. To accomplish this, we make use of numerous cues. Among the most important are vocal cues from both speech and non-speech sounds. Music is also a reliable method for communicating emotion. It is often present in social situations and can serve to unify a group’s mood for ceremonial purposes (funerals, The owner of this image is Petri Laukka weddings) or general social interactions. Scientists and philosophers have speculated on the origins of music and language, and the possible common bases of emotional expression through music, speech and other vocalizations. They have found increasing evidence of commonalities among them. However, the domains in which researchers investigate these topics do not always overlap or share a common language, so communication between disciplines has been limited. The aim of this Research Topic is to bring together research across multiple disciplines related to the production and perception of emotional cues in music, speech, and non-verbal vocalizations. This includes natural sounds produced by human and non-human primates as well as synthesized sounds. Research methodology includes survey, behavioral, and neuroimaging techniques investigating adults as well as developmental populations, including those with atypical development. Studies using laboratory tasks as well as studies in more naturalistic settings are included. Frontiers in Psychology August 2014 | Expression of emotion in music and vocal communication | 2 Table of Contents 05 Expression of Emotion in Music and Vocal Communication: Introduction to the Research Topic Anjali Bhatara, Petri Laukka and Daniel J. Levitin 07 Emotional Expression in Music: Contribution, Linearity, and Additivity of Primary Musical Cues Tuomas Eerola, Anders Friberg and Roberto Bresin 19 Music, Emotion, and Time Perception: The Influence of Subjective Emotional Valence and Arousal? Sylvie Droit-Volet, Danilo Ramos, Lino José L. O. Bueno and Emmanuel Bigand 31 Preattentive Processing of Emotional Musical Tones: A Multidimensional Scaling and ERP Study Thomas F. Münte, Katja Spreckelmeyer, Eckart Altenmüller and Hans Colonius 42 Changing the Tune: Listeners Like Music that Expresses a Contrasting Emotion E. Glenn Schellenberg, Kathleen A. Corrigall, Olivia Ladinig and David Huron 51 Effects of Voice on Emotional Arousal Psyche Loui, Justin P. Bachorik, Hui C. Li and Gottfried Schlaug 57 Predicting Musically Induced Emotions From Physiological Inputs: Linear and Neural Network Models Frank A. Russo, Naresh N. Vempala and Gillian M. Sandstrom 65 Play it Again, Sam: Brain Correlates of Emotional Music Recognition Eckart Altenmüller, Susann Siggel, Bahram Mohammadi, Amir Samii and Thomas F. Münte 73 Emotion Felt by the Listener and Expressed by the Music: Literature Review and Theoretical Perspectives Emery Schubert 91 Dynamic Musical Communication of Core Affect Nicole K. Flaig and Edward W. Large 103 The Same, Only Different: What Can Responses to Music in Autism Tell Us About the Nature of Musical Emotions? Rory Allen, Reubs Walsh and Nick Zangwill 107 Valence, Arousal, and Task Effects in Emotional Prosody Processing Silke Paulmann, Martin Bleichner and Sonja A. E. Kotz 117 Feeling Backwards? How Temporal Order in Speech Affects the Time Course of Vocal Emotion Recognition Simon Rigoulot, Eugen Wassiliwizky and Marc D. Pell 131 The Siren Song of Vocal Fundamental Frequency for Romantic Relationships Sarah Weusthoff, Brian R. Baucom and Kurt Hahlweg Frontiers in Psychology August 2014 | Expression of emotion in music and vocal communication | 3 140 Voice Quality in Affect Cueing: Does Loudness Matter? Irena Yanushevskaya, Christer Gobl and Ailbhe Ní Chasaide 154 Encoding Conditions Affect Recognition of Vocally Expressed Emotions Across Cultures Rebecca Jürgens, Matthis Drolet, Ralph Pirow, Elisabeth Scheiner and Julia Fischer 164 Perception of Emotionally Loaded Vocal Expressions and Its Connection to Responses to Music. A Cross-Cultural Investigation: Estonia, Finland, Sweden, Russia, and The USA Teija Waaramaa and Timo Leisiö 177 Cross-Cultural Differences in the Processing of Non-Verbal Affective Vocalizations by Japanese and Canadian Listeners Michihiko Koeda, Pascal Belin, Tomoko Hama, Tadashi Masuda, Masato Matsuura and Yoshiro Okubo 185 Cross-Cultural Decoding of Positive and Negative Non-Linguistic Emotion Vocalizations Petri Laukka, Hillary Anger Elfenbein, Nela Söder, Henrik Nordström, Jean Althoff, Wanda Chui, Frederick K. Iraki, Thomas Rockstuhl and Nutankumar S. Thingujam 193 The Role of Motivation and Cultural Dialects in the In-Group Advantage for Emotional Vocalizations Disa Sauter 202 What Does Music Express? Basic Emotions and Beyond Patrik N. Juslin 216 Repetition and Emotive Communication in Music Versus Speech Elizabeth Hellmuth Margulis 220 Emotional Communication in Speech and Music: The Role of Melodic and Rhythmic Contrasts Lena Quinto, William Forde Thompson and Felicity Louise Keating 228 On the Acoustics of Emotion in Audio: What Speech, Music, and Sound Have in Common Felix Weninger, Florian Eyben, Björn W. Schuller, Marcello Mortillaro and Klaus R. Scherer 240 The “Musical Emotional Bursts”: A Validated Set of Musical Affect Bursts to Investigate Auditory Affective Processing Sébastien Paquette, Isabelle Peretz and Pascal Belin 247 A Vocal Basis for the Affective Character of Musical Mode in Melody Daniel Bowling 253 Animal Signals and Emotion in Music: Coordinating Affect Across Groups Gregory A. Bryant 266 Speech vs. Singing: Infants Choose Happier Sounds Marieve Corbeil, Sandra E. Trehub and Isabelle Peretz 277 Child Implant Users’ Imitation of Happy- and Sad- Sounding Speech David Jueyu Wang, Sandra E. Trehub, Anna Volkova and Pascal van Lieshout 285 Age-Related Differences in Affective Responses to and Memory for Emotions Conveyed by Music: A Cross-Sectional Study Sandrine Vieillard and Anne-Laure Gilet Frontiers in Psychology August 2014 | Expression of emotion in music and vocal communication | 4 EDITORIAL published: 05 May 2014 doi: 10.3389/fpsyg.2014.00399 Expression of emotion in music and vocal communication: Introduction to the research topic Anjali Bhatara 1,2*, Petri Laukka 3 and Daniel J. Levitin 4 1 Sorbonne Paris Cité, Université Paris Descartes, Paris, France 2 Laboratoire Psychologie de la Perception, CNRS, UMR 8242, Paris, France 3 Department of Psychology, Stockholm University, Stockholm, Sweden 4 Department of Psychology, McGill University, Montreal, QC, Canada *Correspondence: bhatara@gmail.com Edited and reviewed by: Luiz Pessoa, University of Maryland, USA Keywords: music, speech, emotion, voice, cross-domain cognition In social interactions, we must gauge the emotional state of oth- a listener and emotion expressed by a piece of music, Schubert ers in order to behave appropriately. We rely heavily on auditory (2013) provided a review and suggestions for future research on cues, specifically speech prosody, to do this. Music is also a com- the internal and external loci of musical emotion. There were plex auditory signal with the capacity to communicate emotion also two theoretical papers on musical emotions: Flaig and Large rapidly and effectively and often occurs in social situations or (2014) speculated that music may induce affective response by ceremonies as an emotional unifier. speaking to the brain in its own language by way of neurodynam- Scientists and philosophers have speculated about the com- ics, and Allen et al. (2013) presented a view of the general nature mon cognitive origins of music and language. Perhaps their of musical emotions based on studies on autism. common origin lies in their efficacy for emotional expression. In the speech domain, Paulmann et al. (2013) used EEG to Unlike semantic or syntactic aspects of language (and music), investigate influences of arousal and valence on cortical responses many of their acoustic and emotional aspects are shared with to emotional prosody. Rigoulot et al. (2013) used a gating sounds made by other species (Fitch, 2006); music and speech paradigm to demonstrate the importance of utterance-final syl- share a common acoustic code for expressing emotion (Juslin lables in emotion recognition. Two papers focused on the role of and Laukka, 2003). Until recently, however, scientists working in specific acoustic cues in vocal expression: Weusthoff et al. (2013) the two domains of music and speech rarely communicated, so discussed the role of fundamental frequency in the success of research was restricted to one domain or the other. The purpose romantic relationships, and Yanushevskaya et al. (2013) examined of this Research Topic was to bring these researchers together and the role of loudness, both independently and in conjunction with encourage cross-talk. voice quality. Over 25 groups of researchers contributed their expertise, and Several researchers undertook cross-cultural studies of emo- the included papers give an overview of the diversity of current tion perception in speech and non-verbal vocalizations. Jürgens research, both in terms of research questions and methodology. et al. (2013) examined the perception of German emotional Some articles focus on aspects in one of the two domains, whereas speech tokens across three cultures. Waaramaa and Leisiö other articles directly compare, contrast, or combine music and (2013) examined the recognition of emotion in Finnish pseudo- vocal communication. sentences by listeners from five countries. There were also three Empirical studies on music perception include work by Eerola cross-cultural investigations of non-verbal vocalizations: Koeda et al. (2013), in which they systematically manipulated musical et al. (2013) examined perception of emotional vocalizations by cues to determine their effects on perception of emotion, and Canadian and Japanese listeners, Laukka et al. (2013) examined Droit-Volet et al. (2013), who altered acoustic elements associ- Swedish listeners’ perception of vocalizations from four countries, ated with emotion to examine the effect of these changes on time and Sauter (2013) examined the role of motivation in the in- perception. Effects of context on music understanding were also group advantage for emotion recognition by presenting listeners investigated: Spreckelmeyer et al. (2013) examined preattentive with vocalizations produced by in- or out-group members. processing of emotion, measuring ERPs during the processing Discussing the similarity between music and speech emotion of a sad tone within the context of happy tones and the reverse. expression, Juslin (2013) forwarded the argument that this sim- Schellenberg et al. (2012) demonstrated a listener preference for ilarity lies at the “core” or basic emotion level, and that more music that expressed emotion contrasting with an established complex emotions are more domain-specific. Several authors context, and Loui et al. (2013) examined the role of vocals on empirically tested the similarity and contrasts between music perceived arousal and valence in songs. and vocal expression. Margulis (2013) posited that the relative Turning to emotional responses to music, Russo et al. (2013) preponderance of repetition in music compared to speech con- developed models aimed at predicting the emotion being expe- tributes to a fundamental difference between the two domains. rienced using information in the listeners’ physiological signals, Quinto et al. (2013) showed differences in the functions of and Altenmüller et al. (2014) used fMRI to investigate the neu- pitch and rhythm between these domains. Weninger et al. (2013) ral basis of episodic memory for arousing film music. Following synthesized information from databases including speech, music, up on Gabrielsson’s (2002) distinction between emotion felt by and environmental sounds, and thereby took a step toward a www.frontiersin.org May 2014 | Volume 5 | Article 399 | 5 Bhatara et al. Emotion in music and voice holistic computational model of affect in sound. To aid future Laukka, P., Elfenbein, H. A., Söder, N., Nordström, H., Althoff, J., Chui, W., et al. cross-domain research, Paquette et al. (2013) presented a new (2013). Cross-cultural decoding of positive and negative non-linguistic emotion vocalizations. Front. Psychol. 4:353. doi: 10.3389/fpsyg.2013.00353 validated set of stimuli—a musical equivalent to vocal affective Loui, P., Bachorik, J. P., Li, H. C., and Schlaug, G. (2013). Effects of voice on bursts. Bowling (2013) reviewed the affective character of musical emotional arousal. Front. Psychol. 4:675. doi: 10.3389/fpsyg.2013.00675 modes, based in the biology of human vocal emotion expression, Margulis, E. H. (2013). Repetition and emotive communication in music versus and Bryant (2013) further argued that research on music and speech. Front. Psychol. 4:167. doi: 10.3389/fpsyg.2013.00167 emotion might benefit from research on form and function in Paquette, S., Peretz, I., and Belin, P. (2013). The “musical emotional bursts”: a val- idated set of musical affect bursts to investigate auditory affective processing. non-human animal signals. Front. Psychol. 4:509. doi: 10.3389/fpsyg.2013.00509 Three papers examined developmental and lifespan changes. Paulmann, S., Bleichner, M., and Kotz, S. A. (2013). Valence, arousal, and Corbeil et al. (2013) contrasted the perception of speaking and task effects in emotional prosody processing. Front. Psychol. 4:345. doi: singing in infancy, and found that it is not the domain (music 10.3389/fpsyg.2013.00345 or speech) that matters but rather the level of (positive) emotion. Quinto, L., Thompson, W. F., and Keating, F. L. (2013). Emotional communication in speech and music: the role of melodic and rhythmic contrasts. Front. Psychol. Wang et al. (2013) examined early auditory deprivation, asking 4:184. doi: 10.3389/fpsyg.2013.00184 children with cochlear implants to imitate happy and sad utter- Rigoulot, S., Wassiliwizky, E., and Pell, M. D. (2013). Feeling backwards? How ances. Vieillard and Gilet (2013) found an increase in positive temporal order in speech affects the time course of vocal emotion recognition. responding to music with aging. Front. Psychol. 4:367. doi: 10.3389/fpsyg.2013.00367 In sum, the main contribution of this Research Topic, along Russo, F. A., Vempala, N. N., and Sandstrom, G. M. (2013). Predicting musically induced emotions from physiological inputs: linear and neural network models. with highlighting the variety of research being done already, is Front. Psychol. 4:468. doi: 10.3389/fpsyg.2013.00468 to show the places of contact between the domains of music and Sauter, D. A. (2013). The role of motivation and cultural dialects in the vocal expression that occur at the level of emotional communica- in-group advantage for emotional vocalizations. Front. Psychol. 4:814. doi: tion. In addition, we hope it will encourage future dialog among 10.3389/fpsyg.2013.00814 Schellenberg, E. G., Corrigall, K. A., Ladinig, O., and Huron, D. (2012). Changing researchers interested in emotion in fields as diverse as computer the tune: listeners like music that expresses a contrasting emotion. Front. science, linguistics, musicology, neuroscience, psychology, speech Psychol. 3:574. doi: 10.3389/fpsyg.2012.00574 and hearing sciences, and sociology, who can each contribute Schubert, E. (2013). Emotion felt by listener and expressed by music: a knowledge necessary for studying this complex topic. literature review and theoretical investigation. Front. Psychol. 4:837. doi: 10.3389/fpsyg.2013.00837 Spreckelmeyer, K. N., Altenmüller, E. O., Colonius, H., and Münte, T. F. (2013). REFERENCES Preattentive processing of emotional musical tones: a multidimensional scaling Allen, R., Walsh, R., and Zangwill, N. (2013). The same, only different: what can and ERP study. Front. Psychol. 4:656. doi: 10.3389/fpsyg.2013.00656 responses to music in autism tell us about the nature of musical emotions? Front. Vieillard, S., and Gilet, A.-L. (2013). Age-related differences in affective responses Psychol. 4:156. doi: 10.3389/fpsyg.2013.00156 to and memory for emotions conveyed by music: a cross-sectional study. Front. Altenmüller, E., Siggel, S., Mohammadi, B., Samii, A., and Münte, T. (2014). Play Psychol. 4:711. doi: 10.3389/fpsyg.2013.00711 it again Sam: brain correlates of emotional music recognition. Front. Psychol. Waaramaa, T., and Leisiö, T. (2013). Perception of emotionally loaded vocal expres- 5:114. doi: 10.3389/fpsyg.2014.00114 sions and its connection to responses to music. a cross-cultural investigation: Bowling, D. L. (2013). A vocal basis for the affective character of musical mode in Estonia, Finland, Sweden, Russia, and the USA. Front. Psychol. 4:344. doi: melody. Front. Psychol. 4: 464. doi: 10.3389/fpsyg.2013.00464 10.3389/fpsyg.2013.00344 Bryant, G. A. (2013). Animal signals and emotion in music: coordinating affect Wang, D. J., Trehub, S. E., Volkova, A., and van Lieshout, P. (2013). Child implant across groups. Front. Psychol. 4:990. doi: 10.3389/fpsyg.2013.00990 users’ imitation of happy-and sad-sounding speech. Front. Psychol. 4:351. doi: Corbeil, M., Trehub, S. E., and Peretz, I. (2013). Speech vs. singing: infants choose 10.3389/fpsyg.2013.00351 happier sounds. Front. Psychol. 4:372. doi: 10.3389/fpsyg.2013.00372 Weninger, F., Eyben, F., Schuller, B. W., Mortillaro, M., and Scherer, K. R. (2013). Droit-Volet, S., Ramos, D., Bueno, J. L. O., and Bigand, E. (2013). Music, emotion, On the acoustics of emotion in audio: what speech, music, and sound have in and time perception: the influence of subjective emotional valence and arousal? common. Front. Psychol. 4:292. doi: 10.3389/fpsyg.2013.00292 Front. Psychol. 4:417. doi: 10.3389/fpsyg.2013.00417 Weusthoff, S., Baucom, B. R., and Hahlweg, K. (2013). The siren song of vocal Eerola, T., Friberg, A., and Bresin, R. (2013). Emotional expression in music: con- fundamental frequency for romantic relationships. Front. Psychol. 4:439. doi: tribution, linearity, and additivity of primary musical cues. Front. Psychol. 4:487. 10.3389/fpsyg.2013.00439 doi: 10.3389/fpsyg.2013.00487 Yanushevskaya, I., Gobl, C., and Ní Chasaide, A. (2013). Voice quality in affect cue- Fitch, W. T. (2006). The biology and evolution of music: a comparative perspective. ing: does loudness matter? Front. Psychol. 4:335. doi: 10.3389/fpsyg.2013.00335 Cognition 100, 173–215. doi: 10.1016/j.cognition.2005.11.009 Flaig, N. K., and Large, E. W. (2014). Dynamic musical communication of core Conflict of Interest Statement: The authors declare that the research was con- affect. Front. Psychol. 5:72. doi: 10.3389/fpsyg.2014.00072 ducted in the absence of any commercial or financial relationships that could be Gabrielsson, A. (2002). Emotion perceived and emotion felt: same or different? construed as a potential conflict of interest. Music. Sci. 5, 123–147. doi: 10.1177/10298649020050S105 Jürgens, R., Drolet, M., Pirow, R., Scheiner, E., and Fischer, J. (2013). Encoding con- Received: 26 March 2014; accepted: 15 April 2014; published online: 05 May 2014. ditions affect recognition of vocally expressed emotions across cultures. Front. Citation: Bhatara A, Laukka P and Levitin DJ (2014) Expression of emotion in music Psychol. 4:111. doi: 10.3389/fpsyg.2013.00111 and vocal communication: Introduction to the research topic. Front. Psychol. 5:399. Juslin, P. N. (2013). What does music express? Basic emotions and beyond. Front. doi: 10.3389/fpsyg.2014.00399 Psychol. 4:596. doi: 10.3389/fpsyg.2013.00596 This article was submitted to Emotion Science, a section of the journal Frontiers in Juslin, P. N., and Laukka, P. (2003). Communication of emotions in vocal expres- Psychology. sion and music performance: different channels, same code? Psychol. Bull. 129, Copyright © 2014 Bhatara, Laukka and Levitin. This is an open-access article dis- 770–814. doi: 10.1037/0033-2909.129.5.770 tributed under the terms of the Creative Commons Attribution License (CC BY). Koeda, M., Belin, P., Hama, T., Masuda, T., Matsuura, M., and Okubo, Y. The use, distribution or reproduction in other forums is permitted, provided the (2013). Cross-cultural differences in the processing of non-verbal affective original author(s) or licensor are credited and that the original publication in this vocalizations by Japanese and Canadian listeners. Front. Psychol. 4:105. doi: journal is cited, in accordance with accepted academic practice. No use, distribution or 10.3389/fpsyg.2013.00105 reproduction is permitted which does not comply with these terms. Frontiers in Psychology | Emotion Science May 2014 | Volume 5 | Article 399 | 6 ORIGINAL RESEARCH ARTICLE published: 30 July 2013 doi: 10.3389/fpsyg.2013.00487 Emotional expression in music: contribution, linearity, and additivity of primary musical cues Tuomas Eerola 1*, Anders Friberg 2 and Roberto Bresin 2 1 Department of Music, University of Jyväskylä, Jyväskylä, Finland 2 Department of Speech, Music, and Hearing, KTH - Royal Institute of Technology, Stockholm, Sweden Edited by: The aim of this study is to manipulate musical cues systematically to determine the Anjali Bhatara, Université Paris aspects of music that contribute to emotional expression, and whether these cues operate Descartes, France in additive or interactive fashion, and whether the cue levels can be characterized as linear Reviewed by: or non-linear. An optimized factorial design was used with six primary musical cues (mode, Frank A. Russo, Ryerson University, Canada tempo, dynamics, articulation, timbre, and register) across four different music examples. Dan Bowling, University of Vienna, Listeners rated 200 musical examples according to four perceived emotional characters Austria (happy, sad, peaceful, and scary). The results exhibited robust effects for all cues and the *Correspondence: ranked importance of these was established by multiple regression. The most important Tuomas Eerola, Department of cue was mode followed by tempo, register, dynamics, articulation, and timbre, although Music, University of Jyväskylä, Seminaarinkatu 35, Jyväskylä, the ranking varied across the emotions. The second main result suggested that most FI-40014, Finland cue levels contributed to the emotions in a linear fashion, explaining 77–89% of variance e-mail: tuomas.eerola@jyu.fi in ratings. Quadratic encoding of cues did lead to minor but significant increases of the models (0–8%). Finally, the interactions between the cues were non-existent suggesting that the cues operate mostly in an additive fashion, corroborating recent findings on emotional expression in music (Juslin and Lindström, 2010). Keywords: emotion, music cues, factorial design, discrete emotion ratings INTRODUCTION distinguished through cues of tempo, pitch height, and mode: the One of the central reasons that music engages the listener so expression of happiness is associated with faster tempi, a high- deeply is that it expresses emotion (Juslin and Laukka, 2004). Not pitch range, and a major rather than minor mode, and these cues only do music composers and performers of music capitalize on are reversed in musical expressions of sadness (Hevner, 1935, the potent emotional effects of music but also the gaming and 1936; Wedin, 1972; Crowder, 1985; Gerardi and Gerken, 1995; film industries, as do the marketing and music therapy indus- Peretz et al., 1998; Dalla Bella et al., 2001). Other combinations of tries. The way music arouses listeners’ emotions has been studied musical cues have been implicated for different discrete emotions from many different perspectives. One such method involves the such as anger, fear, and peacefulness (e.g., Bresin and Friberg, use of self-report measures, where listeners note the emotions 2000; Vieillard et al., 2008). that they either recognize or actually experience while listening to In real music, it is challenging to assess the exact contribution the music (Zentner and Eerola, 2010). Another method involves of individual cues to emotional expression because all cues are the use of physiological and neurological indicators of the emo- utterly intercorrelated. Here, the solution is to independently and tions aroused when listening to music (recent overview of the systematically manipulate the cues in music by synthesizing vari- field is given in Eerola and Vuoskoski, 2012). Although many ants of a given music. Such a factorial design allows assessment extra-musical factors are involved in the induction of emotions of the causal role of each cue in expressing emotions in music. (e.g., the context, associations, and individual factors, see Juslin Previous studies on emotional expression in music using facto- and Västfjäll, 2008), the focus of this paper is on those prop- rial design have often focused on relatively few cues as one has erties inherent in the music itself which cause emotions to be to manipulate each level of the factors separately, and the ensu- perceived by the listener that are generally related to mechanism ing exhaustive combinations will quickly amount to an unfeasible of emotional contagion (Juslin and Västfjäll, 2008). total number of trials needed to evaluate the design. Because of Scientific experiments since the 1930s have attempted to deter- this complexity, the existing studies have usually evaluated two mine the impact of such individual musical cues in the commu- or three separate factors using typically two or three discrete lev- nication of certain emotions to the listener (Hevner, 1936, 1937). els in each. For example, Dalla Bella et al. (2001) studied the A recent summary of this work can be found in Gabrielsson contribution of tempo and mode to the happiness-sadness con- and Lindström’s (2010) study that states that the most potent tinuum. In a similar vein, Ilie and Thompson (2006) explored the musical cues, also most frequently studied, are mode, tempo, contributions of intensity, tempo, and pitch height on three affect dynamics, articulation, timbre, and phrasing. For example, the dimensions. distinction between happiness and sadness has received consid- Interestingly, the early pioneers of music and emotion erable attention—these emotions are known to be quite clearly research did include a larger number of musical factors in their www.frontiersin.org July 2013 | Volume 4 | Article 487 | 7 Eerola et al. Emotional expression in music experiments. For example, Rigg’s experiments (1937, 1940a,b, simultaneously adjust all seven cues of emotional expression to cited in Rigg, 1964) might have only used five musical phrases, but produce compelling rendition of five emotions (neutral, happy, a total of seven cues were manipulated in each of these examples sad, scary, peaceful, and sad) on four music examples. The results (tempo, mode, articulation, pitch level, loudness, rhythm patterns, identified the optimal values and ranges for the individual musical and interval content). He asked listeners to choose between happy cues, which can be directly utilized to establish both a reasonable and sad emotion categories for each excerpt, as well as fur- range of each cue and also an appropriate number of levels so ther describe the excerpts using precise emotional expressions. that each of the emotions could be well-represented in at least His main findings nevertheless indicated that tempo and mode one position in the cue space for these same music examples. were the most important cues. Hevner’s classic studies (1935, 1937) manipulated six musical cues (mode, tempo, pitch level, AIMS AND RATIONALE rhythm quality, harmonic complexity, and melodic direction) and The general aim of the present study is to corroborate and test she observed that mode, tempo and rhythm were the determi- the hypotheses on the contribution of musical cues to the expres- nant cues for emotions in her experiments. Rather contemporary, sion of emotions in music. The specific aims were: (1) to assess complex manipulations of musical cues have been carried out predictions from studies on musical cues regarding the causal by Scherer and Oshinsky (1977), Juslin (1997c), and Juslin and relationships between primary cues and expressed emotions; (2) Lindström (2010). Scherer and Oshinsky manipulated seven cues to assess whether the cue levels operate in a linear or non-linear in synthesized sequences (amplitude variation, pitch level, pitch manner; and (3) to test whether cues operate in an additive or contour, pitch variation, tempo, envelope, and filtration cut-off level, interactive fashion. For such aims, a factorial manipulation of the as well as tonality and rhythm in their follow-up experiments) but musical cues is required since these the cues are completely inter- again mostly with only two levels. They were able to account for correlated in a correlation design. Unfortunately, the full factorial 53–86% of the listeners’ ratings of emotionally relevant seman- design is especially demanding for such an extensive number of tic differential scales using linear regression. This suggests that a factors and their levels, as it requires a substantial number of trials linear combination of the cues is able to account for most of the (the number of factors multiplied by the number of factor levels) ratings, although some interactions did occur between the cues. and an a priori knowledge of the settings for those factor levels. Similar overall conclusions were drawn by Juslin (1997c), when he We already have the answers to the latter in the form of the pre- manipulated synthesized performances of “Nobody Knows The vious study by Bresin and Friberg (2011). With regard to all the Trouble I’ve Seen” in terms of five musical cues (tempo—three combinations required for such an extensive factorial design, we levels, dynamics—three levels, articulation—two levels, timbre— can reduce the full factorial design by using optimal design prin- three levels and tone attacks—two levels). The listeners rated ciples, in other words, by focusing on the factor main effects and happiness, sadness, anger, fearfulness, and tenderness on Likert low-order interactions while ignoring the high-order interactions scales. Finally, Juslin and Lindström (2010) carried out the most that are confounded in the factor design matrix. exhaustive study to date by manipulating a total of eight cues (pitch, mode, melodic progression, rhythm, tempo, sound level, MATERIALS AND METHODS articulation, and timbre), although seven of the cues were lim- A factorial listening experiment was designed in which six pri- ited to two levels (for instance, tempo had 70 bpm and 175 bpm mary musical cues (register, mode, tempo, dynamics, articulation, version). This design yielded 384 stimuli that were rated by 10 and timbre) were varied on two to six scalar or nominal levels listeners for happiness, anger, sadness, tenderness, and fear. The across four different music structures. First, we will go through the cue contributions were determined by regression analyses. In all, details of these musical cues, and then, we will outline the optimal 77–92% of the listener ratings could be predicted with the linear design which was used to create the music stimuli. combination of the cues. The interactions between the cues only provided a small (4–7%) increase in predictive accuracy of the MANIPULATION OF THE CUES models and hence Juslin and Lindström concluded that the “back- The six primary musical cues were, with one exception (mode), bone of emotion perception in music is constituted by the main the same cues that were used in the production study by Bresin effects of the individual cues, rather than by their interactions” and Friberg (2011). Each of these cues has been previously (p. 353). implicated as having a central impact on emotions expressed A challenge to the causal approach (experimental manipula- by music [summary in Gabrielsson and Lindström (2010), and tion rather than correlational exploration) is choosing appropri- past factorial studies, e.g., Scherer and Oshinsky, 1977; Juslin and ate values for the cue levels. To estimate whether the cue levels Lindström, 2010] and have a direct counterpart in speech expres- operate in a linear fashion, they should also be varied in such sion (see Juslin and Laukka, 2003; except for mode, see Bowling a manner. Another significant problem is determining a priori et al., 2012). Five cues—register, tempo, dynamics, timbre and whether the ranges of each cue level are musically appropri- articulation (the scalar factors)—could be seen as having linear or ate, in the context of all the other cues and musical examples scalar levels, whereas mode (a nominal factor) contains two cat- used. Fortunately, a recent study on emotional cues in music egories (major and minor). Based on observations from the pro- (Bresin and Friberg, 2011) established plausible ranges for seven duction study, we chose to represent register with six levels, tempo musical cues, and this could be used as a starting point for a and dynamics with five levels, and articulation with four levels. systematic factorial study of the cues and emotions. In their This meant that certain cues were deemed to need a larger range study, a synthesis approach was taken, in which participants could in order to accommodate different emotional characteristics, Frontiers in Psychology | Emotion Science July 2013 | Volume 4 | Article 487 | 8 Eerola et al. Emotional expression in music while others required less subtle differences between the levels a duration of 25 s, resulting in an estimated 80 min-experiment. (articulation and timbre). Finally, we decided to manipulate these The factors are also orthogonal with respect to each other and, factors across different music structures derived from a past study thus, are well-suited for statistical techniques such as regression. to replicate the findings using four different music excerpts, which Details about the individual cues and their levels are given in the we treat as an additional seventh factor. Because we assume that next section. the physiological states have led to the configuration of cue codes, we derive predictions for each cue direction for each emotion DETAILS OF THE SEVEN CUES based on the vocal expression of affect [from Juslin and Scherer Mode (two nominal levels) (2005), summarized for our primary cues in Table 3]. For mode, The mode of each music example was altered using a modal which is not featured in speech studies, we draw on the recent translation so that an original piece in an Ionian major scale was cross-cultural findings, which suggest a link between emotional altered to the Aeolian minor scale in the same key and vice versa. expression in modal music and speech mediated by the rela- Thus, the translation from major to minor did not preserve a tive size of melodic/prosodic intervals (Bowling et al., 2012). The major dominant chord. For example, the V-I major progression comparisons of our results with those of past studies on musical was translated to Vm-Im. This translation was chosen because expression on emotions rely on a summary by Gabrielsson and it allowed a simple automatic translation and also enhanced the Lindström (2010) and individual factorial studies (e.g., Scherer minor quality of the examples according to informal listening. and Oshinsky, 1977; Juslin and Lindström, 2010), which present a more or less comparable pattern of results to those obtained in Tempo (five scalar levels) the studies on vocal expression of emotions (Juslin and Laukka, Tempo was represented by the average number of non- 2003). simultaneous onsets per second overall voices (called notes per second, NPS). NPS was chosen to indicate tempo because the OPTIMAL DESIGN OF THE EXPERIMENT measure was nearly constant over different music examples when A full factorial design with these particular factors would have the subjects were asked to perform the same emotional expres- required 14,400 unique trials to completely exhaust all factor sion in the production study (Bresin and Friberg, 2011). The five and level couplings (6 × 5 × 5 × 4 × 2 × 3 × 4). As such different levels were 1.2, 2, 2.8, 4.4, and 6 NPS, corresponding to an experiment is impractically large by any standards, a form of approximately the median values for the different emotions in the reduction was required. Reduced designs called fractional facto- production study. rial designs (FFD) and response surface methodologies (RSM), collectively called optimal designs provide applicable solutions; Dynamics (five scalar levels) however, widespread usage of these techniques within the behav- The range of the dynamics was chosen corresponding to the ioral sciences is still rare in spite of their recommendation (see typical range of an acoustic instrument, which is about 20 dB McClelland, 1997; Collins et al., 2009). The main advantage of (Fletcher and Rossing, 1998). The step size corresponds roughly optimal designs over full factorials designs is that they allow the to the musical dynamics marks pp, p, mp/mf, f, ff: −10, −5, research resources to be concentrated on particular questions, 0, +5, +10 dB, respectively. These values corresponded to the thereby minimizing redundancy and maximizing the statistical ones obtained in the production study. The dynamics values in dB power. This is primarily done by eliminating high-order factor were controlling the sample synthesizer (see below). The resulting interactions (see Myers and Well, 2003, p. 332)1 . sound was not just a simple scaling of the sound level since also We constructed the factor design matrix so that the num- the timber changed according to the input control. This change ber of cases for each factor level was approximately equal for corresponds to how the sound level and timber change simulta- both main effects and first-order interactions. In this way, the neously according to played dynamics in the real counterpart of design was compatible with traditional statistical analysis meth- the respective acoustic instrument. ods and also gave the listener a balanced array of factor com- binations. In effect, this meant applying a D-optimal design Articulation (four scalar levels) algorithm to the full factorial matrix, to maximize the determi- The articulation here is defined as the duration of a note rel- nant of the information matrix (Box and Draper, 1987; Meyer ative to its interonset interval. Thus, a value of 1 corresponds and Nachtsheim, 1995). The number of maximum trials was set to legato, and a value of ∼0.5, to staccato. The articulation was to 200, with the intention that each trial would use stimuli with applied using three rules from the previously developed rule sys- tem for music performance (Bresin, 2001; Friberg et al., 2006). 1 Consider a full factorial design with 8 factors, each with 2 levels (28 ), requir- The Punctuation rule finds small melodic fragments and performs ing 256 combinations to be tested. For factor effects, the degrees of freedom the articulation on the last note of each fragment, so it is longer (initially 255) would be 8 for factor main effects, 28 for two-factor interaction with a micropause after it (Friberg et al., 1998). The Repetition effects and the remaining 219 degrees of freedom (255 − 8 − 28 = 219) for the rule performs a repetition of the chosen note with a micropause higher order interaction effects. In this design, 86% (219/255) of the research between. Finally, the Overall articulation rule simply applies the resources would be utilized to assess the higher-order (3rd, 4th, etc.) inter- action effects that are of no primary interest and difficult to interpret. The articulation to all the notes except very short ones. In addition, a extent of this waste of effort is proportional to the number of factor levels in limit on the maximum articulation was imposed to ensure that the design and hence in our design, the higher order factor interactions cover the duration of each note would not be too short. Using this 98.6% of the full factorial design matrix. combination of rules, the exact amount of articulation varied www.frontiersin.org July 2013 | Volume 4 | Article 487 | 9 Eerola et al. Emotional expression in music depending on the note. However, the four different levels roughly velocity values and an interpolation curve was defined, making corresponded to the values 1, 0.75, 0.5, 0.25—thus, a range from it possible to specify the dynamics in decibels, which was then legato to staccatissimo. The same combination of rules was used in translated to the right velocity value in the MIDI file. The onset the production study. delays were adjusted aurally for each solo instrument in such a manner that simultaneous notes in the piano and in the solo Timbre (three scalar levels) instrument were perceptually occurring at the same time. The Three different instrument timbers were used for the melody resulting audio was saved in non-compressed stereo files (16-bit voice: flute, horn, and trumpet. The same timbers were also used wav) with the sampling rate at 44.1 kHz. Examples of the stim- in the production experiment and were initially chosen for their uli are available as Supplementary material (Audio files 1–4 that varied expressive character, namely brightness, which has been represent prototypical examples of each rated emotion). found to have a large impact on emotional ratings in a previ- ous experiment (Eerola et al., 2012). The estimation of brightness PROCEDURE was based on the amount of spectral energy below a cut-off of The subjects were sitting either in a semi-anechoic room 1500 Hz, because this correlated strongly (r = −0.74, p < 0.001, (Stockholm) or in a small laboratory room (Jyväskylä). Two N = 110) with the listeners’ ratings when they were asked to loudspeakers (Audio-Pro 4–14 in Stockholm/Genelec 8030 in judge the emotional valence of 110 isolated instruments sounds Jyväskylä) were placed slightly behind and either side of the (Eerola et al., 2012). Flute has the lowest and the trumpet has the computer screen. The sound level at the listening position was highest brightness value. calibrated to be at 72 dB (C). Several long notes of the horn were used as the calibration signal, performed at the middle scalar Register (six scalar levels) value of dynamics (0 dB—as detailed above). The whole piece was transposed so that the average pitches of The subjects were first asked to read the written instructions the melody were the following: F3 , B3 , F4 , B4 , F5 , and B5 corre- (in Swedish, English, or Finnish). Their task was to rate each sponding to the MIDI note numbers 53, 59, 65, 71, 77, and 83, example (n = 200) on each of the emotions provided (four con- respectively. These values were close to the actual settings for the current ratings for each example). They were asked to focus on different emotions in the production study. emotional expression (i.e., perceived emotions rather than felt emotional experiences) of the example and the ratings were made Music structure (four nominal levels) on a seven-point Likert scale. The emotions were tender/peaceful, Finally, the seventh cue music structure was added in order to happy, sad, angry/scary in Stockholm and tender, peaceful, happy, extend the design across four different music examples cho- sad, and angry in Jyväskylä. One reason behind the variation in sen from the Montreal battery of composed emotion examples terms between the laboratories was to compare the terms used in (Vieillard et al., 2008). Each example represented a different the original study by Vieillard et al. (2008) to terms frequently emotion and was selected according to how it had been vali- used by other studies adopting the basic emotion concepts for dated by Vieillard et al. (2008). Therefore, the selected examples music (e.g., Bresin and Friberg, 2000; Juslin, 2000; Juslin and were from among the most unambiguous examples of sadness Lindström, 2010; Eerola and Vuoskoski, 2011). The second reason (T01.mid in the original stimulus set), happiness (G04.mid), to vary the labels was to explore whether collapsing the ratings peacefulness (A02.mid), and fear (P02.mid) from the study of similar emotions (e.g., tender and peaceful) would result in by Vieillard et al. Because the study consisted of four differ- large differences when compared to the uncollapsed versions of ent musical examples many compositional factors like melody, the same emotions. A free response box was also provided for the harmony, and rhythm varied simultaneously; these same four participants to use in cases where none of the given emotion labels music examples were also used in the previous production study could be satisfactorily used to describe the stimulus. However, we (Bresin and Friberg, 2011). will not carry out a systematic analysis of these textual responses here, as they were relatively rare (the median number of excerpts CREATION OF THE STIMULI commented on was 2 out of 200, the mean 3.4, SD = 4.7) and The stimulus examples were generated with an algorithm using the participants that did comment did not comment on the same the Director Musices software (Friberg et al., 2000). The resulting examples, which further hinders such an analysis. MIDI files were rendered into sound using the Vienna Symphonic The stimuli were presented in a different random order Library with the Kontakt 2 sampler. This library contains high- for each participant. The scale’s position had no influence on quality, performed sounds for different instruments using dif- response patterns. The experiment itself was run using the pro- ferent sound levels, registers, and playing techniques2 . All the gram Skatta3 at Stockholm and a patch in MAX/MSP at Jyväskylä. accompaniment voices were played on a sampled piano (Steinway For each example, there was a play button and four different slid- light) and the melody voices were played on samples of each ers for the corresponding emotion labels. The subject was free to solo instrument (horn, flute, and trumpet). The sound level of repeat the examples as many times as he/she wished. The whole each instrument was measured for a range of different MIDI session took between 1 and 2 h to complete. The subjects were also encouraged to take frequent pauses, and refreshments were 2 More technical information about the Vienna Symphonic Library is available available. from (http://vsl.co.at/) and Kontakt 2 from (http://www.native-instruments. com/). 3 http://sourceforge.net/projects/skatta/ Frontiers in Psychology | Emotion Science July 2013 | Volume 4 | Article 487 | 10 Eerola et al. Emotional expression in music PARTICIPANTS CORRELATIONS BETWEEN PERCEIVED EMOTIONS In all, 46 participants took part in the experiment, 20 in Next, we explored intercorrelations between the emotion ratings Stockholm and 26 in Jyväskylä. Because the ratings collected in by looking specifically at correlations between the four consis- these two laboratories were nearly identical (detailed later), we tently rated emotions (happy, sad, peaceful, and scary). These will not document all the data gathered in each of the laborato- displayed a typical pattern, wherein happy correlated negatively ries separately. The mean age of all participants was 30.2 years with sad (r = −0.79 p < 0.001 and N = 200), and happy corre- (SD = 8.7), 20 of the participants were female and 25 were male; lated positively with peaceful, albeit weakly (r = 0.21, p < 0.01), one participant did not indicate his/her gender. Most of the par- and happy correlated significantly with scary (r = −0.56, p < ticipants had an extensive musical background as, between them, 0.001). Sad was weakly correlated with peaceful (r = 0.16, p < they reported having music as a hobby for an average of 16.1 0.05) while sad showed no correlation with scary (r = 0.04, years (SD = 10.5) and studying music at a professional level for p = 0.55). Finally, peaceful-scary exhibited significant opposite an average of 7.0 years (SD = 6.3). Their musical taste was a trend as would perhaps be expected (r = −0.72, p < 0.001). mixture of many styles, and the participants also represented Similar patterns have also been observed in a study by Eerola and various ethnicities (some of whom were not native speakers of Vuoskoski (2011). Swedish or Finnish). All participants were compensated for their Next, we investigated the emotion scales with examples that efforts (≈9 C). were judged highest for each emotion to see the overall dis- crimination of the scales (see Figure 1, these examples are also RESULTS given as audio files 1–4). Each of these prototype examples is The description of the analysis will proceed according to the fol- clearly separated from the other emotions, yet the overall pattern lowing plan. First, the consistencies of the ratings across and reveals how particular emotions are related to other emotions. between the emotions will be reported. Next, the main hypothe- For instance, happy and sad prototypes get modest ratings also ses will be investigated using a series of regression analyses. The in peaceful, and the peaceful prototype scores similar ratings in first regression analysis will address the contribution of cues to sadness. However, these overlaps do not imply explicit confusions the emotions, the second one will address the linearity of the between the emotions, as evidenced by 95% confidence intervals. cue levels, and the third one will-seek to quantify the degree This suggests that all four scales are measuring distinct aspects of interactions between the cues in the data, and compare the of emotions in this material. The exact cue levels—shown on the results with results obtained using models that are additive. All top panels—for each prototype, clear show four distinct cue pat- of the analyses will be carried out separately for each of the terns. Interestingly, there are not only extreme cue levels used in four emotions. the optimal profiles (e.g., low tempo, dynamics, and articulation and high register for peaceful) but also intermediate levels being INTER-RATER CONSISTENCY used (e.g., middle register and dynamics for sad and happy pro- There was no missing data, and no univariate (in terms of totypes). However, a structured analysis of the cue contributions the z-scores) or bivariate outliers were identified (using squared is carried out in the next sections. Mahalanobis distances with p < 0.05 according to the Wilks’ method, 1963). The inter-rater consistency among the partic- ipants was high at both laboratories, (the Cronbach α scores were between 0.92 and 0.96 in Stockholm, and 0.94 and 0.97 in Jyväskylä). Because of substantial inter-participant agreement for each emotion, and because individual differences were not of interest, the analyses that follow treat the stimulus (N = 200) as the experimental unit, with the dependent variable being the mean rating averaged across all participants. The Pearson correlations between the mean ratings from the two laborato- ries were also high for the identical emotion labels (r[198] = 0.94 and 0.89 for happy and sad, both with p < 0.0001 for both). For the emotion labels that were varied between the laboratories, significant correlations between the variants also existed; tender/peaceful (Stockholm) and peaceful (Jyväskylä) correlated highly (r = 0.81, p < 0.0001, N = 200) so did ten- der/peaceful (Stockholm) and tender (Jyväskylä), r = 0.89. In addition, angry/scary (Stockholm) and angry (Jyväskylä) exhib- ited a similar, highly linear trend (r = 0.96, p < 0.0001, N = 200). Due to these high correspondences between the data obtained from the two laboratories, pooling tender/peaceful (Stockholm) with tender and peaceful (Jyväskylä) to peaceful, FIGURE 1 | Means and 95% confidence intervals of four emotion ratings for four prototype examples that received the highest mean on and, angry/scary (Stockholm) with angry (Jyväskylä) to scary was each emotions. carried out. www.frontiersin.org July 2013 | Volume 4 | Article 487 | 11 Eerola et al. Emotional expression in music CUE CONTRIBUTIONS TO EMOTIONS had a clear effect on the emotions happiness and fearfulness. A For an overview of the cues and their levels for each emotion rat- higher register corresponded to a higher happiness rating while a ing, a visualization of the mean ratings is given in Figure 2. Most lower register corresponded to a higher fearfulness rating. Similar cues exhibited a strikingly clear pattern across the levels for most trends were displayed in tempo, mode, dynamics and articulation, of the four emotions. For example, register can be seen to have though the specific emotions and the directions of the cues levels were different. It is also worth noting that the nominal cues, mode and music structure, showed large differences across the cue levels. This suggests that these cues had a powerful impact on each emo- tion rating scale. For music structure, the appropriate emotion can always be seen as a peak in the mean ratings of that emotion. In other words the prototypically “happy” musical example was con- sistently rated by participants to be the highest in happiness, not in other emotions. This effect was most pronounced in the case of scary and least evident in peacefulness. To assess the impact of each cue for each emotion, regression analyses were carried out for each emotion using all the cues (see Table 1). As can be observed from the Table 1, the ratings of all emotions can be predicted to a high degree (77–89%) by a linear coding of the five scalar cues. Beta coefficients facilitate the interpretation of the model and the squared semipartial correlations (sr2 ) are use- ful for showing the importance of any particular cue within the regression equation as it shows the unique proportion of variance explained by that cue. The cues are ranked along the median sr2 values across the emotions. Note that the music structure cue is displayed using three dummy-coded variables, allowing us to dis- criminate between the effects related to the four different music structures used. Scary is predominantly communicated by the structure of the music (a nominal cue), in that a combination of low register, minor mode, and high dynamics contributes to these ratings. The most effective way of expressing happiness is a major, fast tempo, high register, and staccato articulation within this par- ticular set of examples. For sadness, the pattern of beta coefficients is almost the reverse of this, except a darker timber and a decrease in dynamics also contributes to the ratings. These patterns are FIGURE 2 | Means and 95% confidence intervals of four emotion intuitively clear, consistent with previous studies (Juslin, 1997c; ratings across all musical cues and levels. Juslin and Lindström, 2003, 2010). Table 1 | Summary of regression models for each emotion with linear predictors (mode and music structure are encoded in a non-linear fashion). Scary Happy Sad Peaceful Median sr 2 R 2adj = 0.85 R 2adj = 0.89 R 2adj = 0.89 R 2adj = 0.77 β sr 2 β sr 2 β sr 2 β sr 2 Mode −0.74*** 0.08 1.77*** 0.48 −1.6*** 0.54 0.43*** 0.05 0.29 Tempo 0.07** 0.01 0.25*** 0.12 −0.32*** 0.21 −0.27*** 0.15 0.14 Music struct. 3 1.56*** 0.33 −0.53*** 0.02 −0.49*** 0.04 −0.99*** 0.12 0.08 Register −0.23*** 0.15 0.18*** 0.09 −0.05* 0.01 0.15*** 0.06 0.08 Dynamics 0.20*** 0.08 −0.01 0.00 −0.05** 0.01 −0.28*** 0.14 0.04 Articulation −0.03 0.00 0.14*** 0.02 −0.18*** 0.04 −0.10** 0.01 0.02 Timbre 0.15*** 0.02 0.01 0.00 −0.14*** 0.01 −0.45*** 0.13 0.01 Music struct. 2 −0.18* 0.00 0.42*** 0.03 0.42*** 0.02 0.02 0.00 0.01 Music struct. 1 −0.11 0.00 0.14 0.00 −0.11 0.00 0.10 0.00 0.00 df = 9,190, *p < 0.05, **p < 0.01, ***p < 0.001. β, standardized betas; R2adj , R2 adjusted; corrected for multiple independent variables. Frontiers in Psychology | Emotion Science July 2013 | Volume 4 | Article 487 | 12 Eerola et al. Emotional expression in music The first thing we see is that the relative contributions of the as the overall contribution of the cue themselves because this is cues vary markedly for each emotion, just as in previous stud- the appropriate analysis technique for an optimal design with a ies (Scherer and Oshinsky, 1977; Juslin, 1997c, 2000; Juslin and partial factor interaction structure (e.g., Myers and Well, 2003, Lindström, 2010). For example, mode is extremely important for pp. 615–621; Rosenthal and Rosnow, 2008, p. 476). happy and sad emotions (sr2 = 0.48 and 0.54), whereas it has a The cue levels were represented using (a) linear, (b) quadratic relatively low impact on scary and peaceful (sr2 = 0.08 and 0.05). and (b) cubic using the mean ratings over subjects (200 Similar asymmetries are apparent in other cues as well. For observations for each emotion). Each emotion was analyzed instance, dynamics significantly contributes to scary and peaceful separately. This was applied to all five scalar cues. For com- emotions (sr2 = 0.08 and 0.14) but has little impact on happy and pleteness, the nominal cues (mode and music structure) were sad (sr2 = −0.01 and 0.01). This latter observation is somewhat also included in the analysis and were coded using dummy puzzling, as previously, dynamics has often been coupled with variables. changes in valence (Ilie and Thompson, 2006) and happy or sad Mode used one dummy variable, where 0 indicated a minor emotions (Adachi and Trehub, 1998; Juslin and Laukka, 2003). and 1 a major key; while music structure used three dummy However, when direct comparisons are made with other facto- variables in order to accommodate the non-linear nature of the rial studies of emotional expression (Scherer and Oshinsky, 1977; cue levels. None of the cues were collinear (variance inflation Juslin, 1997c; Juslin and Lindström, 2010), it becomes clear that factors <2 for all cues) as they were the by-product of opti- dynamics have also played a relatively weak role in sad and happy mal factorial design. Table 1 displays the prediction rates, the emotions in these studies. If we look at the cues that contributed standardized beta coefficients as well as squared semi-partial the most to the ratings of sadness, namely mode and tempo, we correlations for each cue and emotion. can simply infer that the ratings were primarily driven by these The Step 1 of the hierarchical regression is equal to the results two factors. reported in Table 1. Based on Figure 2 and previous studies, we The overall results of the experiment show that the musical might think that linear coding does not do full justice to cer- manipulations of all cues lead to a consistent variation in emo- tain cues, such as register or timbre. To explore this, we add tional evaluations and that the importance of the musical cues quadratic encoding of the five cues (register, tempo, dynamics, bears a semblance to the synthetic manipulations of musical cues articulation, and timbre) to this regression model at Step 2. As made in previous studies. We will summarize these connections quadratic encoding alone would reflect both linear and quadratic later in more detail. Instead of drawing premature conclusions on effects, the original linear version of the variable in question the importance of particular musical cues and the exceptions to was kept in the analysis to partial out linear effects (Myers and the theory, we should wait until the specific properties of the cue Well, 2003, pp. 598–559). Adding the quadratic variables at the levels have been taken into account. These issues will therefore be Step 2 results in increased fit for scary [+3%, F(185, 5) = 10.0, addressed in-depth in the next section. p < 0.001], sad [+0.05%, F(185, 5) = 2.4, p < 0.05], and peaceful [+8%, F(185, 5) = 23.5, p < 0.001] emotions but no increase for LINEARITY VERSUS NON-LINEARITY OF CUE LEVELS the ratings of happy emotion (see Table 2). For the ratings of scary We used hierarchical regression analysis to estimate three quali- emotion, quadratic versions of register, dynamics, and timbre are ties of the cue levels (namely linear, quadratic, and cubic) as well responsible for the increased fit of the model which suggests that Table 2 | Hierarchical regression comparing linear, quadratic, and cubic predictors. Scary Happy Sad Peaceful df R 2adj F R 2adj F R 2adj F R 2adj F Step 1. Linear 9,190 0.85 0.89 0.89 0.77 Step 2. Quadratic 5,185 0.88 10.0*** 0.89 0.68 0.89 2.4* 0.85 23.9*** Register2 *** – – *** Tempo2 – – ** *** Dynamics2 *** – – *** Articulation2 – – – ** Timbre2 *** – – *** Step 3. Cubic 5,180 0.88 0.75 0.89 0.97 0.89 1.3 0.86 1.8 Register3 – – – – Tempo3 – – – – Dynamics3 – – – – Articulation3 – – – – Timbre3 – – – – df refers to number of predictors in the model, F denotes the comparison of model at Steps 1, 2, and 3 for the complete regression models, and also the individual significance (t) of the cues, ***p < 0.001,**p < 0.01, *p < 0.05. www.frontiersin.org July 2013 | Volume 4 | Article 487 | 13 Eerola et al. Emotional expression in music these particular cues do contribute to the emotions in non-linear In conclusion, the results of the analysis of additivity vs. inter- fashion. activity were found to be consistent with the observations made A similar observation was made in the ratings of peacefulness. by Scherer and Oshinsky (1977); Juslin (1997c), and Juslin and A quadratic variant of the timbre, register, tempo, articulation, Lindström (2010) that the cue interactions are comparatively and dynamics provided statistically significant change to model small or non-existent, and additivity is a parsimonious way to at Step 2 (+8.0%, see Table 2). Ratings of Sad emotion also explain the emotional effects of these musical cues. received a marginal, albeit statistically significant, change at Step 2 due to contribution of quadratic encoding of tempo. The over- DISCUSSION all improvement of these enhancements will be presented at The present study has continued and extended the tradition of the end of this section. At Step 3, cubic versions of the five manipulating important musical cues in a systematic fashion to cues (register, tempo, dynamics, articulation, and timbre) were evaluate, in detail, what aspects of music contribute to emotional added to the regression model but these did not led to any sig- expression. The main results brought out the ranked importance nificant improvements beyond the Step 2 in any emotion (see of the cues by regression analyses (cf. Table 1). The nominal cue, Table 2). mode, was ranked as being of the highest importance, with the For all of these cues and emotions, cubic variants of the cue other cues ranked afterwards in order of importance as follows: levels did not yield a better fit with the data than with quadratic tempo, register, dynamics, articulation, and timbre, although the versions. It is also noteworthy that the quadratic versions of ranking varied across the four emotions and music structures. the cues were included as additional cues, in that they did not Seventy nine percent of the cue directions for each emotion were replace the linear versions of the cues. It suggests that some of in line with physiological state theory (Scherer, 1986), and simul- the cue levels violated the linearity of the factor levels. Therefore, taneously, in accordance with the previous results from studies small but significant quadratic effects could be observed in the on the cue directions in music (e.g., Hevner, 1936; Juslin, 1997c; data mainly for the cues of timbre, dynamics and register, and Gabrielsson and Lindström, 2001; Juslin and Lindström, 2010). these were specifically concerned with the emotions of scary and The second main result suggested that most cue levels contributed peacefulness. In the context of all of the cues and emotions, the to the emotions in a linear fashion, explaining 77–89% of vari- overall contribution of these non-linear variants was modest at ance in the emotion ratings. Quadratic encoding of three cues the best (0–8% of added prediction rate) but nevertheless revealed (timbre,register, and dynamics) did lead to minor yet significant that linearity cannot always be supported. Whether this obser- increases of the models (0–8%). Finally, no significant inter- vation relates to the chosen cue levels or to the actual nature actions between the cues were found suggesting that the cues of cues, remains open at present. The overarching conclusion operate in an additive fashion. is that the many cue levels were successfully chosen and repre- A plausible theoretical account of how these particular cue sented linear steps based on the production experiment (Bresin combinations communicate emotional expressions connects the and Friberg, 2011). These selected levels predominantly commu- cues to underlying physiological states. This idea, first proposed nicated changes in emotional characteristics to the listeners in a by Spencer in 1857, builds on the observation that different emo- linear fashion. tions cause physiological changes that alter vocal expression (e.g., increased adrenalin production in a frightened state tightens the ADDITIVITY vs. INTERACTIVITY OF THE CUES vocal cords, producing a high-pitched voice). This physiological Previous findings on the additivity or interactivity of musical state explanation (Scherer, 1986) is typically invoked to explain cues are inconsistent. According to Juslin (1997c); Juslin and emotions expressed in speech, since it accounts for the cross- Lindström (2010), and Scherer and Oshinsky (1977), cue inter- cultural communication of emotions (Scherer et al., 2001) and actions are of minor importance (though not inconsequential), assumes that these state-cue combinations have been adapted whereas others have stressed the importance of cue interac- to common communicational use, even without the necessary tions (Hevner, 1936; Rigg, 1964; Schellenberg et al., 2000; Juslin underlying physiological states (e.g., Bachorowski et al., 2001). and Lindström, 2003; Lindström, 2003, 2006; Webster and Weir, This theoretical framework has an impact on musically commu- 2005). To evaluate the degree of cue interactions in the present nicated emotions as well, because many of the cues (speech rate, data, a final set of regression analyses were carried out. In these mean F0 , voice quality) that contribute to vocally expressed emo- analyses, each two-way interaction is tested separately for each tions have been observed to operate in an analogous fashion in emotion (21 tests for each emotion) using the mean ratings (N = music (e.g., Juslin and Laukka, 2003; Bowling et al., 2012). This 200). This analysis failed to uncover any interactions between theory enables direct predictions of the cue properties (impor- the cues in any emotion after correcting for multiple testing (all tance and cue directions) that convey particular emotions. We 84 comparisons result in non-significant interactions, p > 0.315, have compiled the predictions from expressive vocal cues (Juslin df = 0196). It must be noted that some of the interactions that and Scherer, 2005) and expressed emotions in music to the would be significant without corrections for multiple testing (reg- Table 3. When we look at the summary of the cue directions from ister and mode, and mode and tempo in Happiness, mode and the present study, also inserted to the Table 3, out of 24 predic- tempo in Sadness), are classic interacting cues of musical expres- tions of cue directions based on vocal expression, 19 operated in sion (Scherer and Oshinsky, 1977; Dalla Bella et al., 2001; Webster the manner predicted by the physiological state theory (Scherer, and Weir, 2005), and could be subjected to a more thorough, 1986), three against the predictions, and two were inconclusive multi-level modeling with individual (non-averaged) data. (see Tables 1, 3). Two aberrations in the theory were related to Frontiers in Psychology | Emotion Science July 2013 | Volume 4 | Article 487 | 14 Eerola et al. Emotional expression in music Table 3 | Theoretical prediction and results for each cue and emotion. Levels Scary Happy Sad Peaceful Pred./Res. Pred./Res. Pred./Res. Pred./Res. Register 6L / / / / Tempo 5L /= / / / Dynamics 5L / /− / / Articulation 4L /− / / / Timbre 3L / =/= / / Mode 2C / / / / Music struct. 4C F>H>S>P/F>S>H>P H>P>S>F/H>P>S>F S>P>H>F/S>P>H>F P>S>H>F/P>S>H>F L, Linear; C, Categorical; Pred, predictions; Res, results; , high; =, moderate; , low; , minor; , major—refers to not statistically significant, and letters in music structure refer to predictions based on Montreal Battery (F, Fearful; H, Happy; S, Sad; P, Peaceful). Predictions are based on Juslin and Scherer (2005), except Mode is based on Gabrielsson and Lindström (2010). register, which is known to have varying predictions in vocal This probabilistic functionalism helps to form stable relation- expression with respect to the type of anger (hot vs. cold anger, ships between the emotion and the interpretation. In emo- see Scherer, 2003). The third conflict with the theory concerns tions expressed by music, Juslin has employed the lens model tempo. Previous studies of the musical expression of emotions as a framework to clarify the way expressed emotions are have suggested tempo as the most important cue Gundlach, 1935; communicated from performer to listener (Juslin, 1997a,b,c, Hevner, 1937; Rigg, 1964; Scherer and Oshinsky, 1977; Juslin and 2000). Lindström, 2010 and here mode takes the lead. We speculate that The cue substitution property of the lens model presumes the nominal nature of mode led to higher effect sizes than lin- that there are no significantly large interactions between the cues, early spaced levels of tempo, but this obviously warrants further because the substitution principle typically assumes an addi- research. tive function for the cues (Stewart, 2001). Therefore, our third We interpret these results to strengthen that the musical cues research question asked whether the cues in music contribute to may have been adopted from the vocal expression (Bowling et al., emotions in an additive or interactive fashion. Significant inter- 2012 for a similar argument). We also acknowledge the past actions would hamper the substitution possibilities of the lens empirical findings of the expressive properties of music [e.g., model. Empirical evidence on this question of expressed emo- as summarized in Gabrielsson and Lindström (2010)] but since tions in music is divided; some studies have found significant these largely overlap with the cues in vocal expression (Juslin and interactions (Hevner, 1935; Rigg, 1964; Schellenberg et al., 2000; Laukka, 2003), we rely on vocal expression for the theoretical Gabrielsson and Lindström, 2001, p. 243; Lindström, 2003, 2006; framework and use past empirical studies of music as supporting Webster and Weir, 2005) between the cues when the contribu- evidence. It must be noted that expressive speech has also been tion of 3–5 cues of music have been studied, while other studifes used as a source of cues that are normally deemed solely musical, have failed to find substantial interactions in similar designs such as mode (Curtis and Bharucha, 2010; Bowling et al., 2012). with a large amount of cues (Scherer and Oshinsky, 1977; Juslin A further challenge related to the reliable communication and Lindström, 2010). In the vocal expression of emotions, the of emotions via cue combinations is that the same cue lev- importance of the interactions between the cues has typically els may have different contributions to different emotions (e.g., been downplayed (Ladd et al., 1985). Our second research ques- the physiological state of heightened arousal causes a high tion probed whether the cues contribute to emotions in a linear speech rate or musical tempo, which is the same cue for both fashion. Previous studies have predominantly explored cues with fearfulness and happiness, or, as in the reverse situation, a two levels e.g., high-low (Scherer and Oshinsky, 1977; Juslin and low F0 conveys boredom, sadness, and peacefulness). An ele- Lindström, 2010), which do not permit to draw inferences about gant theoretical solution is provided by the Brunswik’s lens the exact manner (linear or non-linear) in which cue values model (adapted to vocal emotions by Scherer in 1978), which contribute to given emotions (Stewart, 2001). Based on the phys- details the process of communication from (a) the affective state iological state explanation, we predicted a high degree of linearity expressed, (b) acoustic cues, (c) the perceptual judgments of within the levels of the cues, because the indicators of the under- the cues and (d) the integration of the cues. The lens model lying physiological states (corrugator muscle, skin-conductance postulates that cues operate in a probabilistic fashion to sta- level, startle response magnitude, heart rate) are characterized by bilize the noise inherent in the communication (individual linear changes with respect to emotions and their intensities (e.g., differences, contextual effects, environmental noise—the same Mauss and Robinson, 2009). The results confirmed both linear- cues may contribute to more than one emotion). Specifically, ity and additivity of the cue contributions although non-linear Brunswik coined the term vicarious functioning (1956, pp. 17–20) effects were significant for some cues. to describe how individual cues may be substituted by other The most cue levels represented in scalar steps did indeed cues in order to tolerate the noise in the communication. contribute to emotion ratings in a linear fashion. The exceptions www.frontiersin.org July 2013 | Volume 4 | Article 487 | 15 Eerola et al. Emotional expression in music concerned mainly timbre, for which we had only three levels. the studies (e.g., Hevner, 1936; Juslin, 1997c; Gabrielsson and These levels were determined using the single timbral character- Lindström, 2001; Juslin and Lindström, 2010), although the small istic of brightness, but the three instrument sounds used also pos- number of studies and cues studied within these studies prevents sessed differences in other timbral characteristics. Nevertheless, one from drawing extensive conclusions yet. We acknowledge the observed relationship between emotions and timbre was con- that the choice of musical cues used for this study has, a priori, sistent with previous studies. However, the results of one par- certainly excluded others from this ranking. Certain important ticular observation proved the hypotheses drawn from the past musical cues such as harmony, melodic contour, or dissonance research wrong. Dynamics turned out to be of low importance could be of equal relevance for attributing emotions to music and both for the sad and happy emotions although it has previously were included within the music structure of our design without been implicated as important for emotions in a number of stud- any systematic manipulation. We also recognize that the variable ies using both emotion categories (Scherer and Oshinsky, 1977; contribution of the cues is a built-in feature of the brunswikian Juslin, 1997c; Juslin and Madison, 1999; Juslin and Lindström, lens model, according to which communication may be accurate 2010) and emotion dimensions (Ilie and Thompson, 2006). It using multiple cues although the relative contribution of the cues is unlikely that our results are due to insufficient differences in will depend on the context. dynamics (±5 and ±10 dB) because ratings for the emotions As per Hevner’s cautionary remarks about the results of any peaceful and scary were nevertheless both heavily influenced by music and emotion study (1936), any emotional evaluations are these changes. However, they might be related to the specific dependent on the context established by the musical materials in emotions, as this musical cue has been previously noted to be question. The present work differs in three material ways from the a source of discrepancy between speech and music (Juslin and two previous studies (Scherer and Oshinsky, 1977; Juslin, 1997c; Laukka, 2003). Our results are further vindicated by the fact that Juslin and Lindström, 2010) that also used extensive cue manip- the emotions happy and sad have not exhibited large differences ulations. Both Scherer and Oshinsky (1977) and Juslin (1997c) in dynamics in previous production studies (Juslin, 1997b, 2000). used just one synthetic, artificial melody for the basis of manip- Finally, the assumption inherent in the lens model that cues ulations and 2–3 large differences between the cue levels. Juslin operate in additive fashion was validated. The interactions failed and Lindström (2010) also had four simple melodic progres- to reach statistical significance consistent with comments made by sions, all based on same triadic and scalar and rhythmic elements. previous surveys of emotional cues (Gabrielsson and Lindström, The present experiment was built around four polyphonic, com- 2001, p. 243; Juslin and Laukka, 2004) and a number of stud- posed and validated musical examples that were initially chosen ies (e.g., Juslin, 1997c; Juslin and Lindström, 2003). This means to represent four emotion categories in a maximally clear way. it should therefore be realistic to construct expressive models of Additionally, the selection of cue range was grounded in past emotions in music with linear, additive musical cues, and this empirical work and combined both performance-related and construction greatly decreases the complexity of any such model. compositional aspects of music. Whether this holds true for other musical cues, than those stud- The results of the present study offer links to the findings in ied here, remains to be verified. This also provides support for the expressive speech research because the hypotheses about the cue mainly additive model that is used for combining different per- direction taken from expressive speech were largely supported formance cues in the Director Musices rule system, for example, (Scherer, 1986; Murray and Arnott, 1993; Juslin and Laukka, for the rendering of different emotional expressions (Bresin and 2003; Scherer, 2003). In future, it would be important to combine Friberg, 2000). the factorial manipulation approach with special populations, The strength of the current approach lies in the fact that the such as children, people from different cultures, or patients with cues and their levels can be consistently compared since the study particular neural pathologies and to use other measurement tech- design capitalized on a previous production study of emotional niques than self-report to further isolate the musical cues in terms expression in music (Bresin and Friberg, 2011) and the analy- of the underlying mechanisms. These combinations would allow ses were kept comparable to past studies of expressive cues of us to determine specifically what aspects of affect perception are music (Scherer and Oshinsky, 1977; Juslin and Lindström, 2010). mostly the products of learning, as well as gain a better idea of the The present study allowed us to establish plausible ranges for underlying processes involved. the cue levels in each of the manipulations. The drawback of our scheme was that the optimal sampling did not contain all ACKNOWLEDGMENTS the possible cue combinations. This means that the prototype The work was funded by the European Union (BrainTuning FP6- examples (Figure 1) could be still be improved in terms of their 2004-NEST-PATH-028570) and the Academy of Finland (Finnish emotional expression, but at least the factorial design was exhaus- Center of Excellence in Interdisciplinary Music Research). We tive enough to assess the main hypotheses about the cue level thank Alex Reed for proof-reading and Tuukka Tervo for collect- and their interactions in general. Also, our decision of using alter- ing the data at the University of Jyväskylä. nate sets of emotions (tender vs. peaceful) in the two laboratories was a design weakness that failed to achieve the extension of the SUPPLEMENTARY MATERIAL emotions covered. The Supplementary Material for this article can be found In the context of musical expression, the ranking of the impor- online at: http://www.frontiersin.org/Emotion_Science/10.3389/ tance of the musical cues for emotions seems to coalesce across fpsyg.2013.00487/abstract Frontiers in Psychology | Emotion Science July 2013 | Volume 4 | Article 487 | 16 Eerola et al. Emotional expression in music REFERENCES 18–49. doi: 10.1177/0305735 Hevner, K. (1937). The affective value Juslin, P. N. (1997c). Perceived emo- Adachi, M., and Trehub, S. E. (1998). 610362821 of pitch and tempo in music. tional expression in synthesized Children’s expression of emotion in Eerola, T., and Vuoskoski, J. K. Am. J. Psychol. 49, 621–630. doi: performances of a short melody: song. Psychol. Music 26, 133–153. (2012). A review of music and 10.2307/1416385 capturing the listeners’s judge- doi: 10.1177/0305735698262003 emotion studies: approaches, Ilie, G., and Thompson, W. F. ment policy. Musicae Sci. 1, Bachorowski, J.-A., Smoski, M. J., and emotion models and stimuli. (2006). A comparison of acous- 225–256. Owren, M. J. (2001). The acous- Music Percept. 30, 307–340. doi: tic cues in music and speech Juslin, P. N. (2000). Cue utilization tic features of human laughter. 10.1525/mp.2012.30.3.307 for three dimensions of affect. in communication of emotion in J. Acoust. Soc. Am. 110, 1581–1597. Eerola, T., Ferrer, R., and Alluri, V. Music Percept. 23, 319–329. doi: music performance: relating per- doi: 10.1121/1.1391244 (2012). Timbre and affect dimen- 10.1525/mp.2006.23.4.319 formance to perception. J. Exp. Bowling, D. L., Sundararajan, J., sions: evidence from affect and Juslin, P. N., and Laukka, P. (2003). Psychol. Hum. Percept. Perform. 26, Han, S., and Purves, D. (2012). similarity ratings and acoustic Communication of emotions in 1797–1813. Expression of emotion in Eastern correlates of isolated instrument vocal expression and music per- Ladd, D. R., Silverman, K. E., Tolkmitt, and Western music mirrors vocal- sounds. Music Percept. 30, 49–70. formance: different channels, same F., Bergmann, G., and Scherer, K. ization. PLoS ONE 7:e31942. doi: doi: 10.1525/mp.2012.30.1.49 code? Psychol. Bull. 129, 770–814. R. (1985). Evidence for the inde- 10.1371/journal.pone.0031942 Fletcher, N. H., and Rossing, T. D. Juslin, P. N., and Laukka, P. (2004). pendent function of intonation con- Box, G. E. P., and Draper, N. R. (1998). The Physics of Musical Expression, perception, and tour type, voice quality, and F0 (1987). Empirical Model-Building Instruments, 2nd Edn. New York, induction of musical emotions: range in signaling speaker affect. and Response Surfaces. (New York, NY: Springer. a review and a questionnaire J. Acoust. Soc. Am. 78, 435–444. doi: NY: Wiley) Friberg, A., Bresin, R., and Sundberg, J. study of everyday listening. J. New 10.1121/1.392466 Bresin, R. (2001). “Articulation rules (2006). Overview of the KTH rule Music Res. 33, 217–238. doi: Lindström, E. (2003). The contribu- for automatic music performance,” system for musical performance. 10.1080/0929821042000317813 tion of immanent and performed in Proceedings of the International Adv. Cogn. Psychol. 2, 145–161. doi: Juslin, P. N., and Lindström, E. (2003). accents to emotional expression Computer Music Conference- 10.2478/v10053-008-0052-x “Musical expression of emotions: in short tone sequences. J. New ICMC 2001, eds A. Schloss, R. Friberg, A., Bresin, R., Frydén, L., and modeling composed and performed Music Res. 32, 269–280. doi: Dannenberg, and P. Driessen (San Sundberg, J. (1998). Musical punc- features”’ in Paper Presented at the 10.1076/jnmr.32.3.269.16865 Francisco, CA: ICMA), 294–297. tuation on the microlevel: auto- Fifth Conference of the European Lindström, E. (2006). Impact of Bresin, R., and Friberg, A. (2000). matic identification and perfor- Society for the Cognitive Sciences of melodic organization on per- Emotional coloring of computer- mance of small melodic units. Music, (Hanover: ESCOM). ceived structure and emotional controlled music performances. J. New Music Res. 27, 271–292. doi: Juslin, P. N., and Lindström, E. (2010). expression in music. Musicae Comp. Music J. 24, 44–63. doi: 10.1080/09298219808570749 Musical expression of emotions: Scientiae, 10, 85–117. doi: 10.1162/014892600559515 Friberg, A., Colombo, V., Frydén, L., modeling listeners’ judgements of 10.1177/102986490601000105 Bresin, R., and Friberg, A. (2011). and Sundberg, J. (2000). Generating composed and performed features. Mauss, I. B., and Robinson, M. D. Emotion rendering in music: musical performances with Director Music Anal. 29, 334–364. doi: (2009). Measures of emotion: a range and characteristic values of Musices. Comp. Music J. 24, 23–29. 10.1111/j.1468-2249.2011.00323.x review. Cogn. Emot. 23, 209–237. seven musical variables. Cortex 47, doi: 10.1162/014892600559407 Juslin, P. N., and Madison, G. doi: 10.1080/02699930802204677 1068–1081. doi: 10.1016/j.cortex. Gabrielsson, A., and Lindström, E. (1999). The role of timing pat- McClelland, G. (1997). Optimal design 2011.05.009 (2001). “The influence of musical terns in recognition of emotional in psychological research. Psychol. Brunswik, E. (1956). Perception structure on emotional expression,” expression from musical per- Methods 2, 3–19. doi: 10.1037/1082- and the Representative Design of in Music and Emotion: Theory and formance. Music Percept. 17, 989X.2.1.3 Psychological Experiments. Berkeley, Research, eds P. N. Juslin and J. A. 197–221. doi: 10.2307/40285891 Meyer, R. K., and Nachtsheim, C. J. CA: University of California Press. Sloboda (Oxford: Oxford University Juslin, P. N., and Scherer, K. R. (2005). (1995). The coordinate-exchange Collins, L., Dziak, J., and Li, R. Press), 235–239. “Vocal expression of affect,” in algorithm for constructing exact (2009). Design of experiments with Gabrielsson, A., and Lindström, E. The New Handbook of Methods in optimal experimental designs. multiple independent variables: a (2010). “The role of structure in the Nonverbal Behavior Research, eds J. Technometrics 37, 60–69. doi: resource management perspective musical expression of emotions,” in A. Harrigan, R. Rosenthal, and K. 10.1080/00401706.1995.10485889 on complete and reduced facto- Handbook of Music and Emotion: R. Scherer (Oxford, MA: Oxford Murray, I. R., and Arnott, J. L. (1993). rial designs. Psychol. Methods 14, Theory, Research, and Applications, University Press), 65–135. Toward the simulation of emotion 202–224. doi: 10.1037/a0015826 eds P. N. Juslin and J. A. Sloboda Juslin, P. N., and Västfjäll, D. (2008). in synthetic speech: a review of the Crowder, R. G. (1985). Perception of (Oxford: Oxford University Press), Emotional responses to music: literature on human vocal emotion. the major/minor distinction II: 367–400. the need to consider underlying J. Acoust. Soc. Am. 93, 1097–1108. experimental investigations. Gerardi, G. M., and Gerken, L. (1995). mechanisms. Behav. Brain Sci. 31, doi: 10.1121/1.405558 Psychomusicology 5, 3–24. doi: The development of affective 559–575. Myers, J. L., and Well, A. D. (2003). 10.1037/h0094203 responses to modality and melodic Juslin, P. N., Friberg, A., and Bresin, Research Design and Statistical Curtis, M. E., and Bharucha, J. J. contour. Music Percept. 12, 279–290. R. (2001). Toward a computa- Analysis. New York, NY: Lawrence (2010). The minor third commu- doi: 10.2307/40286184 tional model of expression in music Erlbaum. nicates sadness in speech, mirror- Gundlach, R. H. (1935). Factors deter- performance: the GERM model. Peretz, I., Gagnon, L., and Bouchard, ing its use in music. Emotion 10, mining the characterization of Musicae Sci. 6, 63–122. B. (1998). Music and emotion: 335–348. doi: 10.1037/a0017928 musical phrases. Am. J. Psychol. 47, Juslin, P. N. (1997a). Can results perceptual determinants, immedi- Dalla Bella, S., Peretz, I., Rousseau, 624–643. doi: 10.2307/1416007 from studies of perceived expression acy, and isolation after brain dam- L., and Gosselin, N. (2001). A Hevner, K. (1935). The affective charac- in musical performances be gen- age. Cognition 68, 111–141. doi: developmental study of the affective ter of the major and minor modes in eralized across response formats? 10.1016/S0010-0277(98)00043-2 value of tempo and mode in music. music. Am. J. Psychol. 47, 103–118. Psychomusicology 16, 77–101. Rigg, M. G. (1937). An experiment Cognition 80, B1–B10. doi: 10.2307/1416710 Juslin, P. N. (1997b). Emotional com- to determine how accurately col- Eerola, T., and Vuoskoski, J. K. (2011). Hevner, K. (1936). Experimental stud- munication in music performance: lege students can interpret intended A comparison of the discrete ies of the elements of expression in a functionalist perspective and meanings of musical compositions. and dimensional models of emo- music. Am. J. Psychol. 48, 248–268. some data. Music Percept. 14, J. Exp. Psychol. 21, 223–229. doi: tion in music. Psychol. Music 39, doi: 10.2307/1415746 383–418. 10.1037/h0056146 www.frontiersin.org July 2013 | Volume 4 | Article 487 |17 Eerola et al. Emotional expression in music Rigg, M. G. (1940a). Effect of register for future research. Psychol. Bull. Vieillard, S., Peretz, I., Gosselin, was conducted in the absence of any and tonality upon musical mood. 99, 143–165. doi: 10.1037/0033- N., Khalfa, S., Gagnon, L., and commercial or financial relationships J. Musicol. 2, 49–61. 2909.99.2.143 Bouchard, B. (2008). Happy, that could be construed as a potential Rigg, M. G. (1940b). Speed as a Scherer, K. R. (2003). Vocal com- sad, scary and peaceful musical conflict of interest. determiner of musical mood. J. Exp. munication of emotion: a review excerpts for research on emotions. Psychol. 27, 566–571. of research paradigms. Speech Cogn. Emot. 22, 720–752. doi: Rigg, M. G. (1964). The mood Commun. 40, 227–256. doi: 10.1080/02699930701503567 Received: 31 March 2013; accepted: 11 effects of music: a comparison 10.1016/S0167-6393(02)00084-5 Webster, G., and Weir, C. (2005). July 2013; published online: 30 July 2013. of data from earlier investiga- Scherer, K. R., Banse, R., and Wallbott, Emotional responses to music: Citation: Eerola T, Friberg A and Bresin tions. J. Psychol. 58, 427–438. H. G. (2001). Emotion inferences interactive effects of mode, texture, R (2013) Emotional expression in music: doi: 10.1080/00223980.1964. from vocal expression correlate and tempo. Motiv. Emot. 29, 19–39. contribution, linearity, and additivity of 9916765 across languages and cultures. doi: 10.1007/s11031-005-4414-0 primary musical cues. Front. Psychol. Rosenthal, R., and Rosnow, P. (2008). J. Cross Cult. Psychol. 32, 76–92. doi: Wedin, L. (1972). A multidimensional 4:487. doi: 10.3389/fpsyg.2013.00487 Essentials of Behavioral Research: 10.1177/0022022101032001009 study of perceptual-emotional qual- This article was submitted to Frontiers in Methods and Data Analysis. New Scherer, K. R., and Oshinsky, J. S. ities in music. Scand. J. Psychol. Emotion Science, a specialty of Frontiers York, NY: McGraw-Hill. (1977). Cue utilization in emo- 13, 241–257. doi: 10.1111/j.1467- in Psychology. Schellenberg, E. G., Krysciak, A. tion attribution from auditory stim- 9450.1972.tb00072.x Copyright © 2013 Eerola, Friberg and M., and Campbell, R. J. (2000). uli. Motiv. Emot. 1, 331–346. doi: Wilks, S. S. (1963). Multivariate statis- Bresin. This is an open-access article dis- Perceiving emotion in melody: 10.1007/BF00992539 tical outliers. Sankhya Ind. J. Stat. A tributed under the terms of the Creative interactive effects of pitch and Spencer, H. (1857). The origin 25, 407–426. Commons Attribution License (CC BY). rhythm. Music Percept. 18, 155–171. and function of music. Fraser’s Zentner, M. R., and Eerola, T. (2010). The use, distribution or reproduction doi: 10.2307/40285907 Magazine 56, 396–408. “Self-report measures and models,” in other forums is permitted, provided Scherer, K. R. (1978). Personality Stewart, T. R. (2001). “The lens in Handbook of Music and Emotion, the original author(s) or licensor are inference from voice quality: the model equation,” in The eds P. N. Juslin and J. A. Sloboda credited and that the original publica- loud voice of extroversion. Eur. Essential Brunswik: Beginnings, (Boston, MA: Oxford University tion in this journal is cited, in accor- J. Soc. Psychol. 8, 467–487. doi: Explications, Applications, eds Press), 187–221. dance with accepted academic practice. 10.1002/ejsp.2420080405 K. R. Hammond and T. Stewart No use, distribution or reproduction is Scherer, K. R. (1986). Vocal affect (Oxford: Oxford University Press), Conflict of Interest Statement: The permitted which does not comply with expression: a review and a model 357–362. authors declare that the research these terms. Frontiers in Psychology | Emotion Science July 2013 | Volume 4 | Article 487 | 18 ORIGINAL RESEARCH ARTICLE published: 17 July 2013 doi: 10.3389/fpsyg.2013.00417 Music, emotion, and time perception: the influence of subjective emotional valence and arousal? Sylvie Droit-Volet 1*, Danilo Ramos 2 , José L. O. Bueno 3 and Emmanuel Bigand 4* 1 Laboratoire de Psychologie Sociale et Cognitive, University Blaise Pascal, CNRS, Clermont-Ferrand, France 2 Departamento de Mùsica, Federal University of Paraná, Paraná, Brazil 3 Faculdade de Filosofia, Ciências e Letras, University of São Paulo, São Paulo, Brazil 4 Laboratoire d’étude de l’apprentissage et du développement, University of Burgundy, CNRS, Dijon, France Edited by: The present study used a temporal bisection task with short (<2 s) and long (>2 s) stimulus Anjali Bhatara, Université Paris durations to investigate the effect on time estimation of several musical parameters Descartes, France associated with emotional changes in affective valence and arousal. In order to manipulate Reviewed by: the positive and negative valence of music, Experiments 1 and 2 contrasted the effect Marion Noulhiane, UMR663 Paris Descartes University, France of musical structure with pieces played normally and backwards, which were judged to Steven R. Livingstone, Ryerson be pleasant and unpleasant, respectively. This effect of valence was combined with a University, Canada subjective arousal effect by changing the tempo of the musical pieces (fast vs. slow) *Correspondence: (Experiment 1) or their instrumentation (orchestral vs. piano pieces). The musical pieces Sylvie Droit-Volet, Laboratoire de were indeed judged more arousing with a fast than with a slow tempo and with an Psychologie Sociale et cognitive (CNRS, UMR 6024), Université orchestral than with a piano timbre. In Experiment 3, affective valence was also tested Blaise Pascal, 34 avenue Carnot, by contrasting the effect of tonal (pleasant) vs. atonal (unpleasant) versions of the same 73000 Clermont-Ferrand, France musical pieces. The results showed that the effect of tempo in music, associated with a e-mail: sylvie.droit-volet@ univ-bpclermont.fr; subjective arousal effect, was the major factor that produced time distortions with time Emmanuel Bigand, Pôle being judged longer for fast than for slow tempi. When the tempo was held constant, AAFE-Esplanade Erasme, Université no significant effect of timbre on the time judgment was found although the orchestral de Bourgogne, 34 avenue Carnot, music was judged to be more arousing than the piano music. Nevertheless, emotional BP 26513 21065, Dijon Cedex, France valence did modulate the tempo effect on time perception, the pleasant music being e-mail: emmanuel.bigand@ judged shorter than the unpleasant music. u-bourgogne.fr Keywords: time perception, music, emotion, valence, arousal Music is a powerful emotional stimulus that changes our rela- judge time to be shorter when these events occur earlier in the tionship with time. Time does indeed seem to fly when listening piece than expected, and longer when they occur later. This find- to pleasant music. Music is therefore used in waiting rooms to ing highlights the influence exerted by musical structures (pitch reduce the subjective duration of time spent waiting or in super- and rhythmic structure) on attention during the estimation of markets to encourage people to stay for longer and buy more. musical time (see also Tillmann et al., 2007; Firmino and Bueno, A number of studies have indeed shown that a period of wait- 2008; Firmino et al., 2009). ing is judged shorter when there is accompanying music than However, without rejecting the important role of musical when there is none (e.g., Stratton, 1992; North and Hargreaves, structure, other researchers mention the critical role of the emo- 1999; Roper and Manela, 2000; Guegen and Jacob, 2002) and tional qualities of music per se. Indeed, music is remarkable in that this subjective shortening of time appears to be greater its ability to induce emotions in listeners (Juslin and Sloboda, when the subjects enjoy this accompanying music (Yalch and 2001). Many studies conducted over the last decade have indeed Spangenberg, 1990; Lopez and Malhotra, 1991; Kellaris and Kent, demonstrated the consistency of emotional responses to music 1994; Cameron et al., 2003). These findings raise the question: (e.g., Peretz et al., 1998; Bigand et al., 2005). However, the musi- What are the musical parameters that produce emotions and cal structure of a piece of music may also induce emotions in change our time judgments? listeners, with the result that musical structure and emotional Music is a complex structure of sounds whose different param- qualities cannot be easily dissociated. Quite surprisingly, only a eters can affect the perception of time. Much of the published small number of studies in the fields of music cognition and time literature considers that the major cause of subjective time dis- perception have investigated the influence of musical structure tortions in response to music is due to the temporal regularities and emotional qualities. The present study therefore focuses on of musical events. According to Jones and Boltz (1989), the effect the potential influence of the emotional qualities of musical pieces of music on time estimation is due to the perceptual expectan- on time judgment. cies that listeners develop when they hear a piece of music. The As far as the emotional qualities of musical pieces are con- way musical accents are patterned through time leads listeners to cerned, the musical mode has been found to have robust effects anticipate the timing and nature of incoming events. They thus on perceived emotion, with pieces perceived as sounding happy www.frontiersin.org July 2013 | Volume 4 | Article 417 | 19 Droit-Volet et al. Music, emotion, and time perception when played in a major key and sad when played in a minor the arousal dimension of emotional stimuli corresponds to a sub- key (e.g., Crowder, 1984; Peretz et al., 1998; Fritz et al., 2009). jective state ranging from calm-relaxed to excited-stimulated. An Influences of mode on time estimation have been reported in increase in arousal level is indeed associated with physiological studies using stimulus durations of several minutes (Kellaris and activation of the autonomic nervous system (Juslin and Västfjäll, Kent, 1992; Bisson et al., 2009). For instance, Bisson et al. (2009) 2008). In addition, it has been demonstrated that physiological showed that the duration of a joyful musical piece (taken from measures of arousal (heart rate or skin conductance) are cor- Bach’s Brandenburg Concertos) was overestimated compared to related with self-assessment of arousal on the Self-Assessment that of a sad piece (Barber’s Adagio for Strings). However, given Manikin Scale (SAM, Lang, 1980; Lang et al., 1999). Therefore, that the two emotions were instantiated only by two entirely one aim of the present study was to examine the effect of differ- different pieces, it is difficult to be sure that this difference in ent musical pieces on time estimation by comparing the effects time estimation was not caused by other structural parameters of different tempi. Tempo, however, is thought to play a role in (rhythm, meter, tempo) that are not necessarily directly related the subjective emotional arousal assessed by the SAM scale (Lang, to emotion. Indeed, a piece of music in a major key that is judged 1980) and not in affective valence. happy is often associated with a fast tempo, whereas pieces written In music, the concept of emotional valence may be understood in a minor key tend to be played in a slow tempo. In such cases, in two different ways (Bigand et al., 2005). First, valence may be the critical factor may thus be the musical rhythm rather than thought in terms of an opposition between “sad” and “happy” the mode per se. Moreover, two recent studies conducted using music, that is to say, between negative and positive emotions shorter stimulus durations and various temporal paradigms failed (see also Juslin and Västfjäll, 2008). One effective way of imple- to find any significant effect of major vs. minor mode on time esti- menting this opposition is to contrast music in major and minor mation. Using a retrospective time estimation paradigm, in which keys. However, neither Bueno and Ramos (2007) nor Droit-Volet the participants were informed that they had to estimate time only et al. (2010a) found any effect of mode on the perception of after the presentation of the event, Bueno and Ramos (2007) did time. Second, valence may be viewed in terms of “pleasant” and not observe any differences in time estimation between a musi- “unpleasant” music. In this perspective, music qualified as “sad” cal piece (64.3 s) played in major and minor mode. Similarly, could easily be experienced as very pleasant (Droit-Volet et al., using a prospective time estimation paradigm (i.e., a temporal 2010a). In a study run by Blood et al. (1999), extremely pleas- bisection task) in which the subjects were instructed that they ant music was found to stimulate the reward circuit of the brain. would have to estimate time, Droit-Volet et al. (2010a) did not Consequently, sad music can also bring about this rewarding report a significant effect of mode on time judgments when effect. It is therefore possible that the valence of musical stimuli the musical excerpts were matched on all parameters except for contributes differently to time estimation depending on whether mode. Consequently, these authors concluded that the emotional the implemented contrast is between negative/positive emotions valence of music may have little influence on time perception, at or pleasant/unpleasant emotions. In the present study, we manip- least when all other parameters, such as pitch structure, are held ulated this aspect of musical valence (pleasant vs. unpleasant) by constant. inverting the amplitude envelope of the musical pieces. More pre- Finally, we can assume that it is the structure of musical pieces, cisely, the structure of the musical stimuli was changed by playing which is indirectly responsible for inducing emotions, that affects the sound wave either normally or backward. We expected this the perception of time rather than the emotional valence per se. backward version to render the music unpleasant for two reasons: Using simple sequences of clicks, numerous studies on timing it destroys the musical relationships between tones and it modifies have shown that faster rhythms lead to longer time estimates than the amplitude envelope of each musical tone. slower rhythms (e.g., Treisman et al., 1990, 1992; Penton-Voak In sum, in a first experiment, the participants performed a et al., 1996; Droit-Volet and Wearden, 2002; Ortega and López, temporal bisection task composed of a training and a testing 2008). To explain these results, the various authors argue that the phase (Allan and Gibbon, 1991; Wearden, 1991; Droit-Volet and sequence of clicks increases the level of arousal that makes the Wearden, 2001). In the training phase, the participants were ini- internal clock run faster. According to the internal clock mod- tially trained to respond “short” or “long” for a short and a long els (Treisman, 1963; Gibbon, 1977; Gibbon et al., 1984), the raw standard duration presented in the form of a white noise. In the material for the representation of time consists of pulses that are testing phase, they were then presented with different comparison emitted by a pacemaker-like system and accumulated in a counter stimulus durations, equal to the short or the long standard dura- during the presentation of the stimulus duration. Consequently, tion, or of intermediate value. Their task was to judge whether when the internal clock speeds up under the influence of clicks, each comparison duration was more similar to the short or to the more pulses are accumulated for a given duration, and time is long standard duration. However, in the testing phase, the com- judged longer. It therefore seems reasonable to consider that the parison stimulus durations were not a white noise, but musical critical factor in time distortions with music is the musical tempo pieces whose tempo (fast vs. slow) and valence (normal vs. back- that also seems to affect the emotional arousal. As explained in ward) were both manipulated. Our main hypothesis was that the Droit-Volet and Meck (2007), an increase in the arousal level with psychometric function in bisection (proportion of long responses emotional stimuli is associated with a speeding up of the inter- plotted against comparison durations) would be shifted toward nal clock, with the result that time is judged longer. According the left for the musical pieces with a fast tempo compared to to psychophysiological studies that have used standardized emo- that for the musical pieces with a slow tempo, the participants tional material (e.g., Greenwald et al., 1989; Lang et al., 1999), responding more often long for the former. Using emotional Frontiers in Psychology | Emotion Science July 2013 | Volume 4 | Article 417 | 20 Droit-Volet et al. Music, emotion, and time perception scales similar to those employed in the SAM scale developed by The comparison durations were 0.5, 0.7, 0.9, 1.1, 1.3, 1.5, and Lang et al. (1999), we also verified whether tempo was associ- 1.7 s. For the longer duration range, S and L were 2.0 and 6.8 s, ated with the subjective emotional arousal and the normal vs. and the comparison durations 2.0, 2.8, 3.6, 4.4, 5.6, 6, and 6.8 s. backward opposition with the subjective emotional valence. In each condition, the participants were instructed not to count the time (for the methods used to prevent counting, see Rattat EXPERIMENT 1 and Droit-Volet, 2012). METHOD After the bisection task, the participants were asked to evalu- Participants ate the emotional qualities of the musical stimuli. More precisely, Forty undergraduate students (27 women and 13 men, they heard each musical stimulus and rated its affective valence mean age = 19.2, SD = 1.02) at Burgundy University, France, from “unpleasant” to “pleasant” and its arousal dimension from participated in this experiment. “calm” to “exciting” on a 9-point scale (range 1–9) similar to that used in the SAM by Lang et al. (1999). The two emotional Material scales were randomly presented. The presentation duration of The participants sat in a quiet laboratory room in front a PC each musical stimulus was at the mid-point between the two stan- computer that controlled the experimental events and recorded dard durations employed in the bisection task. In the 0.5/1.7 and the responses via E-prime. The participant’s responses consisted the 2.0/6.8 s duration conditions, the participants thus gave their in pressing the “D” or the “K” keys of the computer keyboard. emotional judgments for stimuli of 1.1 and 4.4 s, respectively. The participants also listened to the stimuli through headphones which were connected to the computer. The stimuli to be timed RESULTS AND DISCUSSION consisted of musical sequences. Each excerpt was recorded using EMOTIONAL EVALUATION OF MUSICAL STIMULI Cubase 4 musical software (Steinberg). A set of 5 different musi- Table 1 displays the emotional ratings for the music, presented for cal piano pieces were used as the stimuli to be timed. The same 1.1 and 4.4 s, as a function of the affective and arousal dimensions 5 musical pieces, with identical musical parameters, were sub- of each version of the pieces tested, when these were presented jected to two types of manipulation: one for the tempo and forward (original version) or backward and at a slow or fast the other for the valence. As far as tempo is concerned, we tempo. changed the tempo from slow (72 beats per min) to fast (184 An ANOVA was run on each of the pleasantness and arousal beats per min). To manipulate the valence, we changed the struc- ratings, with duration, backward version and tempo as within- ture of the stimuli by playing the sound wave either normally subject factors. There was a significant main effect of both version, or backward. Manipulating both the tempo (slow vs. fast) and F(1, 40) = 168.16, p < 0.05, η2 = 0.81, and tempo, F(1, 40) = the valence (original vs. backward) for the 5 musical pieces 60.99, p < 0.05, η2 = 0.60, on pleasantness. The main effect of resulted in the generation of 20 musical sequences for use in this duration, F(1, 40) = 0.10, p > 0.05, was not significant, thus indi- experiment. cating that the presentation duration of the music (short or long) did not affect pleasantness. There was no significant interaction Procedure involving these different factors (all p > 0.05). In line with our The participants performed a temporal bisection task composed hypothesis, our results thus showed that the normal version of of two phases: training and test phase. In the training phase, the music was clearly judged to be more pleasant (7.20) than the the participants were presented with a short (S) and a long (L) backward version (3.01). The fast tempo was also judged more standard duration presented in the form of a white noise. There pleasant than the slow tempo (5.63 vs. 4.57), although the ratings were 16 trials, 8 for each standard duration, presented in a ran- tended more toward a median value on the 9-point scale. dom order. In this phase, the participants were trained to respond As far as the arousal ratings are concerned, the ANOVA showed “short” for S and “long” for L, by pressing the corresponding key. a significant main effect of tempo, F(1, 40) = 234.50, p < 0.05, The button press order was counterbalanced across subjects. Only η2 = 0.85, thus demonstrating that the music played at a fast participants who obtained at least 70% correct responses were tempo was judged more arousing than the music played at a slow included in the testing phase. In this testing phase, the partici- tempo (7.11 vs. 3.5). There was, however, a significant interaction pants were presented with 7 comparison durations presented in between the tempo and the backward version, F(1, 40) = 41.88, the form of the musical pieces described above: one for each com- p < 0.05, η2 = 0.51. Tempo did not significantly interact with parison duration similar to S or L, and one for the 5 intermediate any other factor (all p > 0.05). This significant interaction indi- comparison durations. For each musical piece, the participants cated that, at the fast tempo, the participants judged the music must respond whether its comparison duration was more simi- to be more arousing in its normal than in its backward version lar to S or to L. The test phase consisted of 280 trials presented (7.77 vs. 6.44, F1(1,41) = 18.22, p < 0.05, η2 = 0.31). In contrast, at in 2 blocks of 140 trials each: 10 trials for the musical stimuli the slow tempo, there was no difference between the normal and (2 × 5 different musical pieces) with two types of tempo (slow vs. the backward version (3.27 vs. 3.73, F(1, 41) = 1.83, p > 0.05). fast) and two types of valence (normal vs. backward) for each of In addition, the ANOVA found a significant interaction between the 7 comparison durations. The trials were presented randomly the backward version and the duration, F(1, 40) = 4.31, p < 0.05, within each block. In addition, the participants were divided into η2 = 0.10. The original music was judged more arousing than two groups as a function of the duration range used: 0.5/1.7 or 2.0/6.8 s. For the shorter duration range, S was 0.5 s and L 1.7 s. 1 Bonferroni corrections were applied for all comparisons. www.frontiersin.org July 2013 | Volume 4 | Article 417 | 21 Droit-Volet et al. Music, emotion, and time perception Table 1 | Mean and standard deviation of ratings of arousal and pleasantness (9-point scale) for musical excerpts presented in their original and backward version with a fast and a slow tempo for a 1.1 and a 4.4-s duration. Music Arousal Pleasantness 1.1 s 4.4 s 1.1 s 4.4 s M SD M SD M SD M SD Original fast 7.31 1.83 8.22 0.81 7.60 1.24 7.98 1.05 Original slow 3.26 1.41 3.28 1.33 6.30 1.87 6.90 1.43 Backward fast 6.62 1.22 6.27 1.17 3.52 1.74 3.43 1.83 Backward slow 4.29 1.84 3.18 1.55 2.80 1.67 2.27 1.19 FIGURE 1 | Proportion of long responses plotted against stimulus duration for the original and the backward music with a slow and fast tempo in the 0.5–1.7 and the 2.0–6.8 s duration conditions. the backward music when the presentation duration was long a function of their tempo (fast vs. slow, respectively) and pleasant (5.75 vs. 4.74, F(1, 20) = 12.93, p < 0.05, η2 = 0.39), while both or unpleasant as a function of their version (original vs. back- forms were judged to be similarly arousing when the duration ward). An examination of Figure 1 reveals that the major factor was shorter (3.27 vs. 3.73, F(1, 20) = 0.11, p > 0.05). However, that produced time distortions was the tempo. Indeed, the musi- the arousal rating did not exceed 5.75 on the 9-point scale. No cal stimuli were systematically judged longer with a fast than other significant effect was found. In summary, in line with our a slow tempo. To examine the bisection performance in more hypotheses, the results suggested that the type of presentation detail, we calculated two indexes: The point of subjective equal- (original vs. backward) was the main factor affecting the assess- ity, also called the bisection point (BP), and the Weber Ratio ment of the valence of the musical pieces, and the tempo the main (WR) (Table 2). The former is the stimulus duration (t) that factor affecting the level of arousal induced by music, although gives rise to p(long) = 0.50. The WR is an index of time sensi- with the fast tempo, the subjective arousal increased more with tivity. It is the Difference Limen (t[p(long) = 0.75] − t[p(long) the normal than with the backward version of musical pieces. = 0.25] /2) divided by the BP. The lower the WR value, the higher the sensitivity to time. The regression method originally TEMPORAL BISECTION used by Church and Deluty (1977) and subsequently employed Figure 1 presents the proportion of long responses [p(long)] plot- by other authors (e.g., Wearden and Ferrara, 1996; Droit-Volet ted against the comparison durations for the different types of and Wearden, 2002) was used to calculate these 2 temporal musical pieces, which were judged to be high or low-arousing as indexes. Frontiers in Psychology | Emotion Science July 2013 | Volume 4 | Article 417 | 22 Droit-Volet et al. Music, emotion, and time perception Table 2 | Means and standard deviation of the Bisection Points and The overall ANOVA run on the WR with tempo, backward ver- Weber Ratios for musical excerpts presented in their original and sion and duration as factors did not reveal any significant effect backward version with a fast and a slow tempo in the 0.5/1.7 and the (all p > 0.05). Therefore, the perception of the music distorted 2.0/6.8-s duration condition. time without altering the fundamental ability to discriminate Music Bisection point Weber ratio different durations. Experiment 1 showed a main effect of tempo on time judg- M SD M SD ment revealing that the musical pieces with a fast tempo were judged longer than those with a slower tempo. There was nev- 0.5/1.7 S ertheless an interactive effect of the version (normal vs. backward Original fast 0.87 0.18 0.18 0.14 version) and tempo of musical stimuli on time judgment. This Original slow 1.31 0.17 0.19 0.25 interaction indicated that the backward version of the music, that Backward fast 0.99 0.23 0.18 0.13 was rated as affecting the valence (pleasantness) of the musical Backward slow 1.24 0.22 0.17 0.20 pieces, modulated rather than reversed the effect of tempo on the 2.0/6.8 S timing of music. Indeed, whatever the stimulus duration ranges Original fast 3.88 0.72 0.16 0.16 (< 2 s >), the musical pieces were always judged longer at the fast Original slow 4.92 0.71 0.19 0.15 than at the slow tempo. However, the magnitude of this lengthen- Backward fast 3.96 0.67 0.14 0.07 ing effect due to tempo was larger for the original than for the Backward slow 4.3 0.75 0.16 0.13 backward version of musical pieces. In other words, the origi- nal or backward version affecting the valence of musical pieces increased or decreased the difference in time judgment between An ANCOVA was conducted on the BP with 2 within-subject the fast and the slow tempo, without eliminating or reversing the factors (tempo, backward version) and 1 between-subjects factor tempo effect. (duration), with the arousal and the valence scores for each type Our Experiment 1 therefore demonstrates that musical tempo of musical pieces as-covariates. This ANCOVA showed a main was the major factor affecting time judgments. A musical piece effect of duration, F(1, 25) = 362.72, p < 0.05, η2 = 0.94, indicat- with a fast tempo was systematically judged longer than a musi- ing that the BP was higher for the long than for the short anchor cal piece with a slower tempo. Our study with musical pieces durations. No other factor significantly interacted with duration. thus replicated those of studies using simple click trains, which More interestingly, there was a significant main effect of tempo, have showed that a faster click rate produces longer time esti- F(1, 25) = 8.37, p < 0.05, η2 = 0.25. This main effect of tempo mates (e.g., Treisman et al., 1990, 1992). In addition, our results demonstrates that the BP was lower for the fast than for the slow on the emotional evaluation of musical stimuli revealed that the tempo and therefore indicates that the music was judged longer fast pieces of music were systematically judged to be more arous- when played at a faster tempo. ing that the slower pieces. There was also a significant interaction The main effect of backward version was not significant, between the tempo and the subjective arousal measures which F(1, 25) = 0.72, p > 0.05, and the backward version did not inter- indicated that the lengthening effect obtained with the fast tempo act with any co-variables (all ps > 0.05). There was nevertheless was, when compared to the slow tempo, related to the increase in a significant tempo × backward interaction, F(1, 33) = 5.63, p < the subjective arousal level of the musical pieces. Consequently, 0.05, η2 = 0.18. This revealed that the music with a fast tempo the increase in subjective arousal level associated with the fast was judged longer than that with a slow tempo for both the orig- tempo would be the source of the temporal lengthening effect 35) = 60.01, p < 0.05, η = 0.63, and the back- 1 inal version, F(1, 2 observed in our study. Such a conclusion would be consistent ward version, F(1, 39) = 10.34, p < 0.05, η2 = 0.21. However, the with the results of numerous studies showing that high-arousing difference in the lengthening effect between the fast and the emotional stimuli (facial expressions, images, movies) produce slow tempo appeared to be larger for the original than for the a temporal lengthening effect whereas low-arousing emotional backward version, F(1, 34) = 13.59, p < 0.05, η2 = 0.29. In line stimuli do not (e.g., Droit-Volet and Gil, 2009; Droit-Volet et al., with results that have been obtained for the assessment of the 2010b, 2011; Gil and Droit-Volet, 2011; Tipples, 2008, 2011). arousal and valence level of musical pieces, there was a signif- However, the issue of whether the effect of tempo associated with icant interaction between the tempo and the arousal measures arousal is due to tempo per se or to the arousing qualities of the for the fast backward music, F(2, 25) = 4.39, p < 0.05, η2 = 0.15, music. We therefore decided to run a second experiment similar demonstrating that the tempo effect on the BP increased with to Experiment 1 but with a parameter other than tempo that is the arousal scores: The higher the arousal scores, the longer the also thought to increase the subjective arousal level assessed by musical pieces were judged to be. There were also a signifi- the SAM scale (Lang et al., 1999). More precisely, we manipu- cant interaction between the tempo and the valence measures, lated the timbre of the musical pieces by playing them in a piano both for the fast and the slow backward version of the musi- and an orchestral form. Previous studies have manipulated the cal pieces, revealing that the difference in the lengthening effect timbre of musical sounds and demonstrated that the more com- between the slow and the fast tempo tended to decrease for the plex the timbre, the greater the arousal (e.g., Behrens and Green, backward version as the pleasantness of the music increased. 1993; Balkwill and Thompson, 1999). Accordingly, piano versions No other main effect or interaction involving the co-variables were expected to induce lower arousal than orchestral versions of was found. the same musical pieces. Our hypothesis was that, irrespective of www.frontiersin.org July 2013 | Volume 4 | Article 417 | 23 Droit-Volet et al. Music, emotion, and time perception whether arousal level per se is the cause of the temporal length- forward (normal) or backward. The results of the ANOVA on ening, we should observe a temporal lengthening effect for the the pleasantness ratings showed a significant main effect of back- orchestrated variants similar to those produced by variations in ward version, F(1, 36) = 315.07, p < 0.05, η2 = 0.90, thus con- tempo. firming that the normal music was judged pleasant (7.43) and its backward version unpleasant (2.91). In addition, there was EXPERIMENT 2 a significant backward × duration interaction, F(1, 36) = 15.25, METHOD p < 0.05, η2 = 0.30. This interaction revealed that the differ- Participants ence in affective assessment between a normal piece and its The sample consisted of forty new undergraduate students (24 backward version was greater when the presentation duration of women and 16 men, mean age = 21.3; SD = 1.54). the music was long (4.4 s) than when it was short (1.1 s) (5.50 vs. 3.52, F(1, 36) = 15.25, p < 0.05, η2 = 30). The ANOVA also Material and procedure showed that the main effect of orchestration did not reach sig- The material was similar to that used in Experiment 1 with nificance on the pleasantness ratings, F(1, 33) = 3.28, p > 0.05. the exception of the musical stimuli to be timed. To manip- This suggests that instrumentation per se was not sufficient to ulate the arousal induced by the musical stimuli, we changed modify the pleasant nature of the music. However, there was their instrumentation. In the piano version, only the piano tim- a significant backward × orchestration × duration interaction, bre was used. In the orchestral version, additional tracks per- F(1, 36) = 5.62, p < 0.05, η2 = 0.14. For both the short and the formed by double bass, woodwind, brass and percussion were long presentation durations, the backward version of the music included. Increasing the number of virtual performers rendered was systematically judged to be less pleasant whatever its instru- the music livelier and thus more dynamic. The valence was mentation (piano or orchestra) (all p < 0.05). The only difference manipulated in the same way as in Experiment 1 by playing in the pleasantness ratings between the piano and the orches- the sound file either normally or backwards. The 5 musical tral music was found for the long presentation duration, with the pieces were consequently played either by piano only or with backward version being judged more unpleasant with the orches- orchestral instrumentation and were run either normally or tral than with the piano sound (2.14 vs. 2.79, F(1, 18) = 4.86, backwards. p < 0.05, η2 = 0.21). The procedure was also identical to that used in Experiment In accordance with our hypothesis, the ANOVA on the 1, with a white noise being used for the standard durations arousal ratings showed that the orchestral music was judged presented in the training phase and the musical pieces for the more arousing that the piano music, 6.95 vs. 4.66, F(1, 37) = comparison durations presented in the test phase. The test phase 139.49, p < 0.05, η2 = 0.79. In addition, the backward version consisted of 280 trials presented in 2 blocks of 140 trials each: of the music had no significant effect on subjective arousal, 10 musical stimulus trials (2 × 5 different musical pieces) for two F(1, 37) = 0.02, p > 0.05. There was no other significant effect. types of instrumentation (piano vs. orchestral instrumental) and To summarize, by varying the version and instrumentation, two types of valence (normal vs. backward) for each of the 7 we achieved an all but perfect orthogonal manipulation of comparison durations. As in Experiment 1, after the bisection the valence and the arousing qualities of the musical stimuli. task, the participants were again asked to evaluate the emotional Manipulating the orchestration did indeed selectively affect the qualities of the musical stimuli presented for 1.1 and 4.4 s (mid- arousing values of emotion, while not producing any change in point between S and L) on an affective valence scale ranging from valence. “unpleasant” to “pleasant” and an arousal scale from “calm” to “exciting” (Lang et al., 1999). TEMPORAL BISECTION Figure 2 presents the psychophysical function when the orches- RESULTS AND DISCUSSION tral and piano pieces were played forward and backward in the EMOTIONAL EVALUATION OF MUSICAL STIMULI short and the longer duration range. In contrast to Experiment Table 3 shows the results of emotional ratings of the orches- 1 in which tempo was the major factor modifying time judg- tral pieces and corresponding piano versions, presented either ment, Figure 2 suggests that the orchestration, although it was Table 3 | Mean ratings and standard deviation of arousal and pleasantness (9-point scale) of musical excerpts in original × backward and orchestral × piano conditions for a 1.1 and a 4.4-s duration. Music Arousal Pleasantness 1.1 s 4.4 s 1.1 s 4.4 s M SD M SD M SD M SD Original orquestral 6.78 1.28 6.98 1.45 6,56 1.36 7.86 1.32 Original piano 4.29 1.26 5.05 1.84 7.21 1.11 8.07 1.25 Backward orquestral 6.98 1.17 7.04 1.29 3.42 1.6 2.14 1.31 Backward piano 4.45 1.91 4.83 1.80 3.31 1.31 2.79 1.40 Frontiers in Psychology | Emotion Science July 2013 | Volume 4 | Article 417 | 24 Droit-Volet et al. Music, emotion, and time perception FIGURE 2 | Proportion of long responses plotted against stimulus duration for the original and the backward version of orchestral and piano music in the 0.5–1.7 and the 2.0–6.8 s duration conditions. also associated with a higher subjective level of arousal, did not Table 4 | Means and standard deviation of the Bisection Points and affect time judgment. This is confirmed by the results of the Weber Ratios for original × backward and orchestral × piano music ANCOVA performed on the BP (Table 4) with the same factor in the 0.5/1.7 and the 2.0/6.8 s duration condition. design as that used in Experiment 1. Music Bisection point Weber ratio As in Experiment 1, the ANCOVA run on the BP revealed a significant main effect of duration, F(1, 27) = 595.32, p < 0.05, M SD M SD η2 = 0.96, with no significant interaction involving this fac- 0.5/1.7 S tor. Consequently, the BP was higher in the long than in the Original orchestral 1.12 0.15 0.15 0.10 short duration range. However, and more interestingly, there are Original piano 1.19 0.18 0.18 0.13 neither main effect of orchestration, F(1, 27) = 1.72, p > 0.05, Backward orchestral 1.03 0.09 0.12 0.07 nor main effect of backward version, F(1, 27) = 0.18, p < 0.05. Backward piano 1.13 0.17 0.12 0.08 Furthermore, the arousal measures entered into the ANCOVA 2.0/6.8 S as covariates were not significant (all p < 0.05). The only sig- Original orchestral 4.54 0.58 0.15 0.10 nificant effect was the interaction between the backward version Original piano 4.62 0.55 0.19 0.18 and the valence measures for the original version of the orches- Backward orchestral 4.34 0.48 0.14 0.09 tral music, F(1, 27) = 6.42, p < 0.05, η2 = 19. This revealed that Backward piano 4.38 0.58 0.15 0.09 the BP increased with the positive valence of the music. In other words, more pleasant the music was judged to be, the shorter the estimate of its duration. reveal any difference in time perception induced by the musi- The ANCOVA on the WR failed to reveal any significant effect, cal timbre. Therefore, as we discuss below, both the variations except for a significant interaction between the backward version, in orchestration and in tempo modified the subjective level of the orchestration and the arousal measures for the original ver- arousal, but only the tempo significantly modified the judgment sion of the orchestral music, F(1, 27) = 4.67, p < 0.05, η2 = 0.15. of time. Finally, when different orchestral pieces were used, only This interaction was due solely to the WR value for the origi- the backward version of the music that modified the affective nal piano music which increased significantly with the subjective valence of the music affected time judgments, with the duration of valence level [r(39) = 0.36, p < 0.05]. In other words, sensitivity the musical pieces been judged shorter when their positive valence to time decreased as the pleasure expressed by the participants (pleasantness) increased. when they heard the piano music increased. In sum, the backward version of musical pieces (original vs. To summarize, although the orchestral music was rated as backward) used in our studies to change the emotional valence being more arousing than the piano music, our results did not of the music appeared to produce a shortening effect which, in www.frontiersin.org July 2013 | Volume 4 | Article 417 | 25 Droit-Volet et al. Music, emotion, and time perception the case of Experiment 1, modulated the tempo effect on time The procedure was again identical to that employed in the pre- judgment. However, playing music backwards significantly alters vious experiments, with a white noise being used to indicate the the structure of the music, such as its emotional effect on the standard durations presented in the training phase and the pieces perception of time (i.e., temporal shortening) is perhaps specific of music being used for the comparison durations presented in to this manipulation of the musical pieces. Therefore, to further the test phase. However, in the test phase, only two types of music examine the effect of valence in the temporal judgment of music, were used (atonal vs. tonal). The test phase thus consisted of 140 we decided to run a third experiment involving the manipulation trials subdivided into 2 blocks of 70 trials each: 10 trials (5 musical of other musical parameters that it was considered to modify the pieces × 2) in their tonal and atonal versions for each of the 7 emotional valence of music. In a recent study conducted using stimulus durations. After the bisection task, the participants were a similar temporal bisection task as that used in Experiments 1 again asked to evaluate the emotional qualities of the stimuli on and 2, Droit-Volet et al. (2010a) tested the emotional valence both an affective valence and an arousal scale. of musical pieces by presenting the same pieces in two variants: a major key for positive valence and a minor key for negative RESULTS AND DISCUSSION valence. However, as we explained in our Introduction, they did EMOTIONAL EVALUATION OF MUSICAL STIMULI not report any significant effect of mode on the perception of Table 5 displays the average emotional ratings provided by the time with different duration ranges. In Experiments 1 and 2, we participants. Not surprisingly, tonal music was considered more manipulated the valence of the music by inverting the amplitude pleasant than atonal music irrespective of stimulus duration The envelope of the musical pieces (forward vs. backward version). analysis of variance (ANOVA) run on the pleasantness ratings Another approach consists in contrasting tonal and atonal music. showed a significant main effect of tonality, F(1, 28) = 156.57, p < Using a retrospective temporal judgment paradigm, Kellaris and 0.05, η2 = 0.85, and no significant effect of duration, F(1, 28) = Kent (1992) made a pop song played in the major or minor mode 1.55, p > 0.05, or significant duration × tonality interaction, and lasting 2.5 min atonal by changing the pitch of appropriates F(1, 28) = 1.44, p > 0.05. By contrast, the ANOVA on the arousal tones. The participants judged the piece played in the major mode ratings did not reveal any significant effect: Tonality, F(1, 29) = (associated with happiness) as lasting longer (3.45 min) than that 0.01, Duration, F(1, 28) = 3.34, Tonality × Duration, F(1, 29) = played in the minor mode (3.07 min) or in an atonal variant 3.24, all p > 0.05. This finding suggests that the change in pitch (2.95 music). The authors therefore concluded that the strongest structure primarily affected only the valence of the pieces, with valence effects were found when major and atonal versions of the atonal music being judged more unpleasant than tonal music. same music were contrasted. Consequently, in Experiment 3, we used a temporal bisection task to examine the differences in time TEMPORAL BISECTION perception caused by tonal and atonal pieces of music. Figure 3 indicates the psychophysical functions for the two types of music. This Figure suggests that, in line with the results found EXPERIMENT 3 in Experiment 1, there was a tonality effect for the long duration METHOD range (2.0/6.8-s), with the tonal pleasant music being perceived Participants as lasting for less time than the atonal pleasant music. However, Forty new undergraduate students (22 women and 18 men, no clear-cut effect of this type seems to be observed for the very mean age = 24.2, SD = 2.03) participated in this experiment. short duration range (0.5/1.7-s). Table 6 presents the BP and WR calculated using the regres- Material and Procedure sion method as in Experiment 1. The ANCOVA was performed The same 5 musical pieces as in Experiment 1 were used, but now on the BP and the WR with duration as between-subjects factor, in their tonal and atonal versions. The tonal and atonal versions music as within-subjects factor, and arousal and valence scores as of each piece had identical musical parameters such as rhythm, co-variables. The ANCOVA on the BP showed a significant main meter, and melodic contour. All the stimuli (tonal and atonal) effect of duration, F(1, 24) = 585.96, p < 0.05, η2 = 0.96, as in were played at a fast tempo of 108 beats per min. They differed the previous experiments. There was also a significant main effect only in the fact that the atonal version contained pitches that did of tonality, F(1, 24) = 4.84, p < 0.05, η2 = 0.17, as well as a sig- not belong to a unique key, thus creating dissonant intervals. nificant tonality × valence interaction, F(1, 24) = 5.38, p < 0.05, Table 5 | Mean and standard deviation of ratings of arousal and pleasantness (on a 9-point scale) for musical excerpts in tonal and atonal conditions for a 1.1 and a 4.4-s duration. Music Arousal Pleasantness 1.1 s 4.4 s 1.1 s 4.4 s M SD M SD M SD M SD Tonal 4.92 1.62 6.4 1.71 7.2 1.11 8.1 0.67 Atonal 5.84 1.39 5.38 1.72 3.01 1.42 3.02 1.98 Frontiers in Psychology | Emotion Science July 2013 | Volume 4 | Article 417 | 26 Droit-Volet et al. Music, emotion, and time perception FIGURE 3 | Proportion of long responses plotted against stimulus duration for the tonal and atonal music in the 0.5–1.7 and the 2.0–6.8 s duration conditions. Table 6 | Mean and standard deviation of the Bisection Points and well as for that of other stimuli (Wearden and Lejeune, 2008). Weber Ratios for tonal and atonal music in the 0.5/1.7 and the In conclusion, the manipulation of physical properties of musical 2.0/6.8 s duration condition. pieces produced time distortions without impairing the funda- Bisection point Weber ratio mental ability to discriminate different durations. In sum, the results of Experiment 3 revealed that the stimulus M SD M SD durations were judged shorter with the tonal than with the atonal music. As the tonality affected the emotional valence with the 0.5/1.7 S tonal music being judged more pleasant than the atonal music, Tonal 1.25 199 0.09 0.03 our results demonstrated that hearing a pleasant music produced Atonal 1.31 192 0.09 0.04 a temporal shortening effect compared to an unpleasant music. 2.0/6.8 S Consequently, modulating the emotional valence of music by Tonal 4.73 796 0.13 0.07 changing its tonality or by inversing its amplitude envelope (back- Atonal 4.25 615 0.11 0.09 ward version) produced a similar temporal shortening effect for different duration ranges. η2 = 0.18. The BP was thus significantly higher for the tonal GENERAL DISCUSSION music than for the atonal music, indicating that the duration Numerous studies have addressed the influence of emotion on of the tonal music was judged shorter than that of the atonal the perception of time (for reviews, see Droit-Volet and Meck, music. In addition, this shortening effect increased with emo- 2007; Droit-Volet, 2013; Droit-Volet et al., 2013). However, most tional valence, i.e., as the assessment of the music as pleasant of these have used emotional visual stimuli (i.e., emotional facial increased. expressions, pictures from IAPS). Only two experiments, con- The ANCOVA on the WR also found a main effect of emotion ducted by Noulhiane et al. (2007) and Mella et al. (2011), has valence for the tonal music, F(1, 21) = 4.85, p < 0.05, η2 = 0.19, been undertaken with sounds from the International Affective indicating that sensitivity to time decreased with the increase Digital Sounds (IADS, Bradley and Lang, 1999). The results of in the positive valence of the music. The ANCOVA did not these 2 experiments showed that the emotional sounds were show any other significant effect (tonality, F(1, 24) = 0.03, tonal- judged longer than the neutral sounds, and more so in the ity × duration, F(1, 24) = 0.10, duration, F(1, 39) = 0.004, all p > case of the negative compared to the positive sounds. These 0.05). This lack of significant effect for the WR involving dura- results were explained within the theoretical framework of the tion in Experiment 3 as well in Experiments 1 and 2 confirmed internal clock models (Treisman, 1963; Gibbon, 1977; Gibbon that Weber’s law holds for the temporal judgment of music as et al., 1984) in terms of arousal effects which speed up the www.frontiersin.org July 2013 | Volume 4 | Article 417 | 27 Droit-Volet et al. Music, emotion, and time perception internal clock rate. According to the internal clock models, to the physiological activation resulting from accelerated tempo, when the speed of the internal clock increases, more tempo- was systematically observed in our study whatever the emotional ral units (pulses) are accumulated and time is judged longer. valence of the musical pieces and irrespective of their duration As in most studies of time and emotion, Noulhiane et al. (shorter or longer than 2 s). However, the results of our study (2007) therefore concluded that “physiological activation is also revealed an effect of emotional valence on judgments of the predominant aspect of the influence of emotions on time the duration of musical pieces, even when stimulus durations perception, as all emotional stimuli regardless of their self- were particularly short. Indeed, regardless of the type of musi- assessed valence are perceived as being longer than neutral ones” cal property that changed the emotional valence (the backward (p. 702). version, the tonality), our studies demonstrated that listening However, emotional sounds differ from other emotional stim- to music with a positive valence led to shorter time estimates. uli (visual) because they are dynamic stimuli involving different This finding is entirely consistent with the results of previous parameters that evolve through time. Without specific experi- studies in which participants were asked to evaluate the dura- mental manipulations of these different parameters, it is thus tion of a long period of music (e.g., Yalch and Spangenberg, difficult to identify the real sources of temporal distortions in 1990; Kellaris and Kent, 1994). Finally, emotional valence rated response to these sounds. For instance, musical pieces played in terms of pleasure (unpleasant vs. pleasant) seems to be a more in a major key at a fast tempo are judged happier than those sensitive index of emotional effects on time judgments than emo- played in a minor key at a slow tempo (e.g., Peretz et al., 1998; tional valence rated in terms of mode (sad vs. happy music) Fritz et al., 2009). More specifically, in the case of the percep- (Bueno and Ramos, 2007; Droit-Volet et al., 2010a,b). As argued tion of time, the tempo in itself must affect the speed of the by Droit-Volet et al., 2010a, sad music can be also judged as internal clock independently of emotional effects. Many differ- pleasant. ent studies have shown that a simple sequence of periodic stimuli The question that must now be asked is: Why did the emo- (clicks, flickers) increases temporal estimates (for a review, see tional valence of the music produce a shortening effect on time Wearden et al., 2009). Wearden et al. (2009) concluded that the judgments, whereas arousal produced a contrasting lengthening click train effect on the perception of time due to a speeding effect? As explained above, the lengthening effect obtained with up of the internal clock is one of the most robust effects to be arousal/tempo is probably due to an automatic speeding up of observed in time psychology. However, the use of music provides the internal clock. In contrast, the effect of valence (unpleas- an elegant way of manipulating two dimensions while keep- ant vs. pleasant) might call on controlled attentional processes ing a number of other parameters constant. The present study which are linked to the awareness of pleasure experienced when addressed this issue by manipulating, in Experiments 1 and 2, listening to pleasant music. According to attentional models of two different dimensions of arousal (tempo and timbre) as well timing, the temporal and the non-temporal processors compete as a parameter associated with emotional valence (backward vs. for the same pool of attentional resources (Thomas and Weaver, forward music). Our results revealed that variations in tempo 1975; Zakay, 1989; Zakay and Block, 1996, 1998). Temporal units are indeed associated with different subjective levels of arousal, (pulses) that underpin the representation of time would be lost with music played at a faster tempo being judged as more arous- when attentional resources are distracted away from the process- ing that played at a slow tempo. In the same way, orchestration ing of time, thus resulting in a shortening effect. This assumption, was found to affect arousal level, with orchestral music being made by the attention-based models of timing, has been widely judged to be more arousing than piano music when the tempo validated by the results of numerous studies that have used the of these two types of music was held constant. Nevertheless, in dual-task paradigm (e.g., Fortin and Breton, 1995; Casini and our temporal bisection studies we found that, although these two Macar, 1997; Gautier and Droit-Volet, 2002; Coull et al., 2004). musical parameters affected the subjective level of arousal, only The results of our study, which showed that hearing musical the tempo significantly modified the perception of time. Indeed, pieces of positive valence shortened the passage of time, are in Experiment 1, the psychophysical functions were systemati- thus consistent with this attentional assumption. Consequently, cally shifted toward the left, with the BP being lower for the hearing pleasant music seems to divert attention away from fast than for the slow music, thus indicating that the fast music time processing. In other words, time flies when subjects lis- was judged as lasting longer than the slow music. By contrast, in ten to pleasant music. In addition, our results in Experiment 1 Experiment 2, no significant effect of timbre on the perception of revealed that this attention-related shortening effect was greater time was observed although the orchestral music was judged to in the case of low-arousing music with a slow tempo. However, be more arousing than the piano music. In conclusion, as far as further experiments must be run to gain a better understand- music is concerned, tempo is one of the major factors associated ing of the effect of the interaction between the two emotional with the emotional arousal that leads to distortions in tempo- dimensions of the music (valence and arousal) on the timing of ral judgments. In other words, the physical properties of music music. plays a fundamental role in the time distortions associated with In conclusion, the originality of our study lies in the fact that it emotion. reveals that the arousal and valence-related properties of a musical In addition, Noulhiane et al. (2007) have suggested that, com- stimulus have an interactive effect on time perception. However, pared to physiological activation, the valence of emotional sounds our study also showed that the critical factor responsible for has only a small influence on the perception of time. This idea producing time distortions was the tempo of the music. In conse- finds support in the fact that a temporal lengthening effect, related quence, the emotional effect of music on the perception of time Frontiers in Psychology | Emotion Science July 2013 | Volume 4 | Article 417 | 28 Droit-Volet et al. Music, emotion, and time perception is intrinsically linked to the temporal dynamic of music, i.e., ACKNOWLEDGMENTS its musical tempo. It is therefore particularly important to con- These study were supported by a CAPES-COFECUB Program tinue our investigation of music in order to better understand (Brazil-France) to José L. O. Bueno, Emmanuel Bigand, and Sylvie the way emotions affect time perception because emotional music Droit-Volet, and by a grant from the ANR 11 EMOCO01201 has dynamic temporal properties which are not present in visual (national Agency for research) from France given to Sylvie Droit- emotional stimuli. Volet. REFERENCES Casini, L., and Macar, F. (1997). Psychol. 55, 142–159. doi: Music 30, 210–214. doi: 10.1177/ Allan, L. G., and Gibbon, J. (1991). Effects of attention manipulation 10.1006/jecp.2001.2631 0305735602302007 Humans bisection at the geometric on judgments of duration and Droit-Volet, S., and Wearden, J. H. Ivry, R. B., and Schlerf, J. R. (2008). mean. Learn. Motiv. 22, 39–58. doi: of intensity in the visual modal- (2002). Speeding up an internal Dedicated and intrinsic mod- 10.1016/0023-9690(91)90016-2 ity. Mem. Cogn. 25, 812–818. doi: clock in children? Effects of visual els of time perception. Trends Balkwill, L. L., and Thompson, W. 10.3758/BF03211325 flicker on subjective duration. Q. Cogn. Sci. 12, 273–280. doi: F. (1999). A cross-cultural inves- Church, R. M., and Deluty, M. Z. J. Exp. Psychol. 55B, 193–211. doi: 10.1016/j.tics.2008.04.002. tigation of the perception of (1977). Bisection of temporal inter- 10.1080/02724990143000252 Jones, M. R., and Boltz, M. (1989). emotion in music: psychophys- vals. J. Exp. Psychol. 3, 216–228. Firmino, E. A., and Bueno, J. L. Dynamic attending and responses to ical and cultural cues. Music Coull, J. T., Vidal, F., Nazarian, B., O. (2008). Tonal modula- time. Psychol. Rev. 96, 459–491. doi: Percept. 17, 43–64. doi: 10.2307/ and Macar, F. (2004). Functional tion and subjective time. J. N. 10.1037/0033-295X.96.3.459 40285811 anatomy of the attentional modu- Music Res. 37, 275–297. doi: Juslin, P. N., and Sloboda, J. A. (eds.). Behrens, G. A., and Green, S. G. lation of time estimation. Science 10.1080/09298210802711652 (2001). Music and Emotion: Theory (1993). The ability to identify 303, 1506–1508. doi: 10.1126/sci- Firmino, E. A., Bueno, J. L. O., and Research. Oxford: Oxford emotional content of solo impro- ence.1091573 and Bigand, E. (2009). Travelling University Press. visations performed vocally and Crowder, R. G. (1984). Perception through pitch space speeds up musi- Juslin, P. N., and Västfjäll, D. (2008). on three different instruments. of the major/minor distinction: I. cal time. Music Percept. 26, 205–209. Emotional responses to music: Psychol. Music 21, 20–33. doi: Historical and theoretical founda- doi: 10.1525/mp.2009.26.3.205 the need to consider underlying 10.1177/030573569302100102 tions. Psychomusicology 4, 3–12. doi: Fortin, C., and Breton, R. (1995). mechanisms. Behav. Brain Sci. 31, Bigand, E., Vieillard, S., Madurell, 10.1037/h0094207 Temporal interval production and 559–621. F., Marozeau, J., and Dacquet, A. Droit-Volet, S. (2013). Time percep- processing in working memory. Kellaris, J., and Kent, R. (1994). (2005). Multidimensional scaling tion, emotions and mood disorders. Percept. Psychophys. 57, 203–215. An exploratory investigation of of emotional responses to music: J. Physiol. (Paris) doi: 10.1016/ doi: 10.3758/BF03206507 responses elicited by music varying the effect of musical expertise and j.jphysparis.2013.03.005. [Epub Fritz, T., Jentschke, S., Gosselin, N., in tempo, tonality and texture. of the duration of the excerpts. ahead of print]. Sammler, D., Peretz, I., Turner, R., J. Consum. Psychol. 2, 381–401. doi: Cogn. Emot. 19, 1113–1139. doi: Droit-Volet, S., Bigand, E., Ramos, et al. (2009). Universal recogni- 10.1016/S1057-7408(08)80068-X 10.1080/02699930500204250 D., and Bueno, J. L. O. (2010a). tion of three basic emotions in Kellaris, J. J., and Kent, R. (1992). Bisson, N., Tobin, S., and Grondin, S. Time flies with music whatever music. Curr. Biol. 19, 573–576. doi: The influence of music on con- (2009). Remembering the duration its modality. Acta Psychol. 135, 10.1016/j.cub.2009.02.058 sumers’ temporal perceptions: does of joyful and sad musical excerpts. 226–236. doi: 10.1016/j.actpsy. Gautier, T., and Droit-Volet, S. time fly when you’re having fun? Neuroquantology 7, 46–57. 2010.07.003 (2002). Attention and time esti- J. Consum. Psychol. 1, 365–376. doi: Blood, A. J., Zatorre, R. J., Bermudez, P., Droit-Volet, S., Mermillod, M., mation in 5- and 8-year-old 10.1016/S1057-7408(08)80060-5 and Evans, A. C. (1999). Emotional Cocenas-Silva, R., and Gil, S. children: a dual-task procedure. Lang, P. J. (1980). “Behavioral responses to pleasant and unpleas- (2010b). The effect of expectancy Behav. Process. 58, 56–66. doi: treatment and biobehavioral ant music correlate with activ- of a threatening event on time 10.1016/S0376-6357(02)00002-5 assessment: computer applica- ity in paralimbic brain regions. perception in human adults. Gibbon, J. (1977). Scalar expectancy tions,” in Technology in Mental Nat. Neurosci. 2, 382–387. doi: Emotion 10, 908–914. doi: 10.1037/ theory and Weber’s Law in animal Health Care Delivery Systems, eds 10.1038/7299 a0020258 timing. Psychol. Rev. 84, 279–325. J. Sidowski, J. Johnson, and T. Bradley, M. M., and Lang, P. J. (1999). Droit-Volet, S., Fayolle, S. L., and Gil, doi: 10.1037/0033–295X.84.3.279 Williams (Norwood, NJ: Ablex), International affective digitized S. (2011). Emotional state and time Gibbon, J., Church, R. M., and Meck, 119–137. sounds (IADS): stimuli, instruc- perception: mood elicited by films. W. H. (1984). Scalar timing in Lang, P. J., Bradley, M. M., tions manual and affective ratings Front. Integr. Neurosci. 5:33. doi: memory. Ann. N.Y. Acad. Sci. and Cuthbert, B. N. (1999). (Tech.Rep. No. B-2)ernational 10.3389/fnint.2011.00033 423, 52–77. doi: 10.1111/j.1749- International Affective Picture affective digitized sounds (IADS): Droit-Volet, S., Fayolle, S., Lamotte, 6632.1984.tb23417.x System (IAPS): Affective Ratings of stimuli, instructions manual and M., and Gil, S. (2013). Time, emo- Gil, S., and Droit-Volet, S. (2011). Time Pictures and Instruction Manual. affective ratings (Tech.Rep. No. tion and the embodiment of timing. flies in the presence of angry faces, Gainesville, FL: Center for Research B-2). Gainesville, FL: University Timing Time Percept. 1–30. [Epub depending on the temporal task in Psychophysiology, University of of Florida, Center for Research in ahead of print]. used! Acta Psychol. 136, 354–362. Florida. Psychophysiology. Droit-Volet, S., and Gil, S. (2009). The doi: 10.1016/j.actpsy.2010.12.010 Lewis, P. A., and Miall, R. C. (2009). Bueno, J. L. O., and Ramos, D. time-emotion paradox. J. Philos. Greenwald, M. K., Cool, E. W., The precision of temporal judg- (2007). Musical mode and estima- Trans. R. Soc. B Biol. Sci. 364, and Lang, P. J. (1989). Affective ment: milliseconds, many minutes, tion of time. Percept. Mot. Skills 105, 1943–1953. judgment and psychophysio- and beyond. Phil. Trans. R. Soc. 1087–1092. Droit-Volet, S., and Meck, W. H. logical response: dimensional B Biol. Sci. 364, 1897–1905. doi: Cameron, M. A., Baker, J., Peterson, M., (2007). How emotions colour covariation in the evaluation of 10.1098/rstb.2009.0020 and Braunsberger, K. (2003). The our time perception. Trends pictorial stimuli. J. Psychophysiol. 3, Lopez, L., and Malhotra, R. (1991). effect of music, wait-length eval- Cogn. Sci. 1, 504–513. doi: 51–64. Estimation of time intervals with uation, and mood on a low-cost 10.1016/j.tics.2007.09.008 Guegen, N., and Jacob, C. (2002). most preferred and least preferred wait experience. J. Bus. Res. 56, Droit-Volet, S., and Wearden, J. The influence of music on music. Psychol. Stud. 36, 203–209. 421–430. doi: 10.1016/S0148-2963 H. (2001). Temporal bisec- temporal perceptions in an on- Maricq, A. V., Roberts, S., and Church, (01)00244-2 tion in children. J. Exp. Child hold waiting situation. Psychol. R. M. (1981). Metamphetamine and www.frontiersin.org July 2013 | Volume 4 | Article 417 | 29 Droit-Volet et al. Music, emotion, and time perception time estimation. J. Exp. Psychol. Ramos, D., Bueno, J. L. O. and Psychol. Monogr. 77, 1–13. doi: North-Holland), 365–397. doi: Anim. Behav. Process. 7, 18–30. doi: Bigand, E. (2013). Manipulating 10.1037/h0093864 10.1016/S0166-4115(08)61047-X 10.1037/0097-7403.7.1.18 Greek musical modes and tempo Treisman, M., Faulkner, A., and Naish, Zakay, D., and Block, R. A. (1996). “The Mella, N., Conty, L., and Pouthas, affects perceived musical emotion in P. (1992). On the relation between role of attention in time estimation V. (2011). The role of physiolog- musicians and nonmusicians. Braz. time perception and the timing of processes,” in Time, Internal Clocks ical arousal in time perception: J. Med. Biol. Res. [Epub ahead of motor action: evidence for a tempo- and Movement, eds M. A. Pastor and psychophysiological evidence print]. ral oscillator controlling the timing J. Artieda (Amsterdam: Elsevier), from an emotion regulation Rattat, A. C., and Droit-Volet, S. movement. Q. J. Exp. Psychol. 45A, 143–164. doi: 10.1016/S0166-4115 paradigm. Brain Cogn. 75, (2012). What is the best and 235–263. (96)80057-4 182–187. doi: 10.1016/j.bandc. easiest method of preventing Treisman, M., Faulkner, A., Naish, P., Zakay, D., and Block, R. A. (1998). 2010.11.012 counting in different temporal and Brogan, D. (1990). The inter- “New perspectives on prospective North, A. C., and Hargreaves, D. J. tasks? Behav. Res. Methods 44, nal clock: evidence for a temporal time estimation,” in Time and the (1999). Can music move people? 67–80. doi: 10.3758/s13428- oscillator underlying time percep- Dynamic Control of Behavior, eds The effect of musical complex- 011-0135-3 tion with some estimates of its char- V. DeKeyser, G. d’Ydewalle, and ity and silence on waiting time. Roper, J. M., and Manela, J. (2000). acteristic frequency. Perception 19, A. Vandierendonck (Göttingen: Environ. Behav. 31, 136–149. doi: Psychiatric patients perception of 705–743. doi: 10.1068/p190705 Hogrefe and Huber), 129–141. 10.1177/00139169921972038 waiting time in the psychiatric Wearden, J. H. (1991). Human perfor- Noulhiane, M., Mella, N., Samson, emergency service. J. Psychosoc. mance on an analogue of an interval S., Ragot, R., and Pouthas, V. Nurs. Ment. Health Serv. 38, 19–27. bisection task. Q. J. Exp. Psychol. Conflict of Interest Statement: The (2007). How emotional auditory Stratton, V. (1992). Influence of 43B, 59–81. authors declare that the research stimuli modulate time percep- music and socializing on per- Wearden, J. H., and Ferrara, A. (1996). was conducted in the absence of any tion. Emotion 7, 697–704. doi: ceived stress while waiting. Stimulus range effects in tempo- commercial or financial relationships 10.1037/1528-3542.7.4.697 Percept. Mot. Skills 75:334. doi: ral bisection by humans. Q. J. Exp. that could be construed as a potential Ortega, L., and López, F. (2008). 10.2466/pms.1992.75.1.334 Psychol. 49B, 24–44. conflict of interest. Effects of visual flicker on subjec- Thomas, E. A. C., and Weaver, W. Wearden, J. H., and Lejeune, H. (2008). tive time in a temporal bisection B. (1975). Cognitive processing Scalar properties in human tim- task. Behav. Process. 78, 380–386. and time perception. Percept. ing: conformity and violations. Q. Received: 25 April 2013; accepted: 19 doi: 10.1016/j.beproc.2008.02.004 Psychophys. 17, 363–367. doi: J. Exp. Psychol. 61, 569–587. doi: June 2013; published online: 17 July Penton-Voak, E. P., Edwards, H., 10.3758/BF03199347 10.1080/17470210701282576 2013. Percival, A., and Wearden, J. H. Tillmann, B., Peretz, I., Bigand, E., Wearden, J. H., Smith-Spark, J. H., Citation: Droit-Volet S, Ramos D, Bueno (1996). Speeding up an inter- and Gosselin, N. (2007). Harmonic Cousins, R., Edelstyn, N. M. J., JLO and Bigand E (2013) Music, emo- nal clock in humans? Effects priming in an amusic patient: Cody, F. W. J., and O’Boyle, D. tion, and time perception: the influ- of click trains on subjective the power of implicit tasks. Cogn. J. (2009). Effect of click trains ence of subjective emotional valence duration. J. Exp. Psychol. Anim. Neuropsychol. 24, 603–622. doi: on duration estimated by peo- and arousal? Front. Psychol. 4:417. doi: Behav. Process. 22, 307–320. doi: 10.1080/02643290701609527 ple with Parkinson’s disease. Q. J. 10.3389/fpsyg.2013.00417 10.1037/0097-7403.22.3.307 Tipples, J. (2008). Negative emotional- Exp. Psychol. (Hove) 62, 33–40. doi: This article was submitted to Frontiers in Peretz, I., Gagnon, L., and Bouchard, ity influences the effects of emotion 10.1080/17470210802229047 Emotion Science, a specialty of Frontiers B. (1998). Music and emo- on time. Emotion 8, 127–131. doi: Yalch, R. A., and Spangenberg, in Psychology. tion: perceptual determinants, 10.1037/1528-3542.8.1.127 E. (1990). Effects of store Copyright © 2013 Droit-Volet, Ramos, immediacy, and isolation after Tipples, J. (2011). When time stands music on shopping behavior. Bueno and Bigand. This is an open- brain damage. Cognition 68, still: fear-specific modulation J. Consum. Mark. 7, 55–63. doi: access article distributed under the terms 111–141. doi: 10.1016/S0010-0277 of temporal bias due to threat. 10.1108/EUM0000000002577 of the Creative Commons Attribution (98)00043-2 Emotion 11, 74–80. doi: 10.1037/ Zakay, D. (1989). “Subjective and License, which permits use, distribution Pöppel, E. (2009). Pre-semantically a0022015 attentional resource alloca- and reproduction in other forums, pro- defined temporal windows for cog- Treisman, M. (1963). Temporal tion: an integrated model of vided the original authors and source nitive processing. Philos. Trans. R. discrimination and the indiffer- time estimation,” in Time and are credited and subject to any copy- Soc. B Biol. Sci. 364, 1887–1896. doi: ence interval: implications for Human Cognition, eds I. Levin right notices concerning any third-party 10.1098/rstb.2009.0015 a model of the “internal clock”. and D. Zakay (Amsterdam: graphics etc. Frontiers in Psychology | Emotion Science July 2013 | Volume 4 | Article 417 | 30 ORIGINAL RESEARCH ARTICLE published: 23 September 2013 doi: 10.3389/fpsyg.2013.00656 Preattentive processing of emotional musical tones: a multidimensional scaling and ERP study Katja N. Spreckelmeyer 1 , Eckart Altenmüller 2 , Hans Colonius 3 and Thomas F. Münte 4* 1 Department of Psychology, Stanford University, Stanford, CA, USA 2 Institute of Music Physiology and Musicians’ Medicine, University of Music, Drama, and Media, Hannover, Germany 3 Department of Psychology, University of Oldenburg, Oldenburg, Germany 4 Department of Neurology, University of Lübeck, Lübeck, Germany Edited by: Musical emotion can be conveyed by subtle variations in timbre. Here, we investigated Anjali Bhatara, Université Paris whether the brain is capable to discriminate tones differing in emotional expression Descartes, France by recording event-related potentials (ERPs) in an oddball paradigm under preattentive Reviewed by: listening conditions. First, using multidimensional Fechnerian scaling, pairs of violin tones Clayton R. Critcher, University of California, Berkeley, USA played with a happy or sad intonation were rated same or different by a group of Lars Kuchinke, Ruhr Universität non-musicians. Three happy and three sad tones were selected for the ERP experiment. Bochum, Germany The Fechnerian distances between tones within an emotion were in the same range as *Correspondence: the distances between tones of different emotions. In two conditions, either 3 happy and Thomas F. Münte, Department of 1 sad or 3 sad and 1 happy tone were presented in pseudo-random order. A mismatch Neurology, University of Lübeck, Ratzeburger Allee 160, 23562 negativity for the emotional deviant was observed, indicating that in spite of considerable Lübeck, Germany perceptual differences between the three equiprobable tones of the standard emotion, e-mail: thomas.muente@ a template was formed based on timbral cues against which the emotional deviant was neuro.uni-luebeck.de compared. Based on Juslin’s assumption of redundant code usage, we propose that tones were grouped together, because they were identified as belonging to one emotional category based on different emotion-specific cues. These results indicate that the brain forms an emotional memory trace at a preattentive level and thus, extends previous investigations in which emotional deviance was confounded with physical dissimilarity. Differences between sad and happy tones were observed which might be due to the fact that the happy emotion is mostly communicated by suprasegmental features. Keywords: preattentive processing, musical emotion, timbre, event-related potential, mismatch negativity, multidimensional scaling INTRODUCTION and the range of the fundamental frequency (F0) (Williams and Music, as well as language, can be used to transport emotional Stevens, 1972; Scherer, 1988; Sloboda, 1990; Pihan et al., 2000) information and, from an evolutionary perspective, it does not with low F0 being related to sadness and, conversely, high mean come as a surprise that the way emotion is encoded in music is F0 level being related to happiness. In music, Hevner (1935, 1936, similar to the encoding of emotion in human or animal vocal- 1937) in her classical studies found that tempo and mode had the izations. Interestingly, the emotional and semantic processing of largest effects on listeners’ judgments, followed by pitch level, har- speech has been shown to be supported by different brain sys- mony, and rhythm. According to Juslin (2001) musical features tems by the method of double dissociation (e.g., Heilman et al., encoding sadness include slow mean tempo, legato articulation, 1975). While six patients with right temporoparietal lesions and small articulation variability, low sound level, dull timbre, large left unilateral neglect were demonstrated to have a deficit in the timing variations, soft duration contrasts, slow tone attacks, flat comprehension of affective speech, six patients with left tem- micro-intonation, slow vibrato, and final ritardando, whereas poroparietal lesions exhibited fluent aphasia, i.e., problems with happiness is encoded by fast mean tempo, small tempo variabil- the content of speech, but no problems with affective processing. ity, staccato articulation, large articulation variability, fairly high Likewise, in music processing the Montreal group around Isabelle sound level, little sound level variability, bright timbre, fast tone Peretz has described a patient that is selectively impaired in the attacks, small timing variations, sharp duration contrasts, and deciphering of emotions from music while being unimpaired for rising micro-intonation. the processing of other aspects of music (Peretz et al., 2001). While suprasegmental features are thought to be, at least in Researchers have tried to identify segmental and suprasegmen- part, the result of a lifelong sociocultural conventionalization and tal features that are used to encode emotional information in therefore, maybe less hardwired (Sloboda, 1990), a considerable human speech, animal vocalizations, and music. With regard to part of the emotional information is transmitted by segmental animals, similar acoustic features are used by different species to features concerning individual tones. For example, a single vio- communicate emotions (Owings and Morton, 1998). In humans, lin tone might be recognized as sad or happy with a rather high perceived emotion appears to be mainly driven by the mean level accuracy. Indeed, string and wind instruments which afford a www.frontiersin.org September 2013 | Volume 4 | Article 656 | 31 Spreckelmeyer et al. Preattentive processing of musical emotion high degree of control over the intonation can be used to mimic to changes in the spectral component of tonal timbre (Tervaniemi the segmental features also used by singers to convey emotional et al., 1997). The onset latency of the MMN varies according information. to the nature of the stimulus deviance. Whereas simple, physi- Segmental emotional information can be encoded into a single cally deviant stimuli show an onset latency of the MMN of about tone by varying its timbre, which might be defined as reflect- 150 ms, much later MMNs have been seen with more complex ing the different quality of sounds aside from variations in pitch, forms of deviance. Finally, it is important to stress the fact that the loudness, and duration. In addition to different distributions analysis of the incoming stimulus as well as its encoding appears of amplitudes of the harmonic components of a complex tone to take place automatically since the MMN typically occurs when in a steady state (Helmholtz, 1885/1954), dynamic variations of the subjects do not attend to the eliciting stimuli, for example the sound such as attack time and spectral flux (Grey, 1977; during engagement in a different task such as reading a book Grey and Moorer, 1977) are also important, particularly with (Näätänen, 1992). Returning to the Goydke et al. (2004) study, regard to onset characteristics. Multidimensional scaling proce- deviant tones were associated with an MMN. The MMN scalp dures on tones differing in timbre, because they were produced topography for the emotional deviant was similar to an MMN by different by different musical instruments, showed that this for a control pitch deviant tone. These results were taken to indi- aspect of a tone is determined by variations along three dimen- cate that the brain can categorize tones preattentively on the basis sions termed attack time, spectral centroid, and spectral flux of subtle cues related to the emotional status of the tone (Goydke (McAdams et al., 1995). Likewise, in a recent study using multidi- et al., 2004). Studies using a similar logic using both emotionally mensional scaling (MDS) procedures to investigate the emotional voiced words (Schröder et al., 2006) or vocalizations (Bostanov information transmitted by variations in timbre, Eerola et al. and Kotchoubey, 2004) have revealed analogous findings. Further, (2012) found that affect dimensions could be explained in terms investigating different timbral dimensions (attack time, spectral of three kinds of acoustic features: spectral (= ratio of high- centroid, and spectrum fine structure) and their consequences frequency to low-frequency energy), temporal (= attack slope), for behavioral classification latencies and ERPs in preattentive and spectro-temporal (= spectral flux). (Caclin et al., 2006) and attentive (Caclin et al., 2008) listen- From the discussion above, there is no question as to the ing conditions, Caclin and colleagues showed that these different importance of detection of emotional timbre in voice and—by timbral features are separately represented in sensory auditory extension—in music. The question that we ask here pertains to memory. when in the auditory processing stream emotional timbre is dif- One important aspect has been neglected by these studies, ferentially processed. Given the high evolutionary benefit that however, in the Goydke et al. (2004) study, a single (e.g., happy) might be afforded by the rapid decoding of emotional infor- tone was presented repeatedly as a standard and a single (e.g., sad) mation from single tones (or human calls), we hypothesize that tone was presented repeatedly as the emotional deviant. Thus, such information might be processed “early” in the processing it is possible, that the MMN observed for the deviants in this stream and in an automatic fashion. Indeed, there are a number study might have been driven by the physical differences between of studies that have investigated rapid and preattentive classifi- the standard and deviant stimuli rather than by the postulated cation of emotional sounds. In particular, our group presented preattentive emotional categorization of the stimulus. Indeed, dif- normal non-musician participants with tone series comprising a ferent mechanisms of deviance detection (termed sensory and frequent (standard) single violin tone played with a certain emo- cognitive) have been demonstrated for other types of stimulus tional connotation (happy or sad) and a rare (deviant) violin tone materials (Schröger and Wolff, 1996; Jääskeläinen et al., 2004; played with the “opposite” intonation (Goydke et al., 2004). In Opitz et al., 2005). parallel to the tone series, the EEG was recorded with a focus on Therefore, to answer this question and extend our previous the mismatch negativity (MMN). The MMN has been shown to findings (Goydke et al., 2004), we conducted the present study. As be an ideal tool to address the early, automatic stages of sound pointed out before, segmental features encoding emotion seem to evaluation (Näätänen, 1992; Picton et al., 2000; Näätänen et al., be varied. Thus, what makes the study of acoustical emotion dif- 2001). It is a component of the auditory event related potential ficult is, that the set of features encoding the same emotion does (ERP) which is elicited during passive listening by an infrequent not seem to be very well defined and that there is a great variance change in a repetitive series of sounds. In the original incarna- of feature combinations found within individual emotion cate- tion of the MMN paradigm, it occurs in response to any stimulus gories. We modified the design of our previous MMN study to see which is physically deviant (in frequency, duration or intensity) whether affective expressions are pre-attentively categorized even to the standard tone. Importantly, the standard stimulus in typ- when their acoustical structure differs. In other words, several ical MMN experiments is the same throughout the experiment. (n = 3, probability of occurrence for each tone 25%) instances It has been shown, however, that the MMN can also be obtained of sad (or happy) tones were defined as standards to which an to deviations within complex series of sounds (Picton et al., 2000; equally probable deviant stimulus (25%) of the other emotion Näätänen et al., 2001), in which the memory trace is defined by had to be compared preattentively. To the extent that the MMN some abstract property (e.g., ascending series of tones). Thus, reflects deviance in the sense of “being rare,” an MMN under it appears that the notion of a standard/memory trace can be these circumstances would indicate that the standards have been extended such that the auditory system is capable to extract sys- grouped to define a single “emotional” entity. tematic properties of sound series. Moreover, and important for To test whether the brain automatically builds up categories Goydke et al. (2004) and the present study, the MMN is sensitive of basic emotions across tones of different (psycho)-acoustical Frontiers in Psychology | Emotion Science September 2013 | Volume 4 | Article 656 | 32 Spreckelmeyer et al. Preattentive processing of musical emotion structure, it was necessary to create two sets of tones, where tones Fechnerian scaling is that of regular minimality, requesting that within one set could clearly be categorized as happy and sad, the probability to judge a stimulus as different from itself needs respectively but differed with respect to their acoustical struc- to be lower than any other discrimination probability. ture. To this end, we first performed extensive studies to define In the present experiment Fechnerian scaling is used to estab- the stimulus set for the MMN study using MDS methods. Two lish subjective distances for a set of tones where tones differ with types of criteria were set for tones to be used as standards in the respect to their emotional expression. MMN study: first, each tone needed to be consistently catego- rized as happy or sad and, second, tones within one set as well MATERIALS AND METHODS as across sets needed to be perceived as different. The first point STIMULUS MATERIAL was addressed by performing affect-ratings on a set of violin tones To generate the stimulus material, 9 female violinists (all students which only differed in emotional expression but not in pitch or of the Hanover University for Music and Drama) were asked to instrumental timbre. To tackle point 2, pairwise same-different- play brief melodic phrases all ending on c-sharp. Melodies were comparisons were collected for all tones and fed into a Fechnerian to be played several times with happy, neutral, or sad expres- scaling procedure to assess the perceived similarity among the sions. Before each musician started with a new expression, she tones. We will first describe the scaling experiment and will then was shown a sequence of pictures from the IAPS (Lang et al., turn to the MMN experiment. 2008) which depicted happy, neutral or sad scenes, to give her an For the latter, we had a straightforward expectation: If the idea of what was meant by happy, neutral, and sad. All violinists brain categorizes tones preattentively on the basis of an automatic were recorded on the same day in the same room using the same emotional grouping, we should observe an MMN for emotional recording technique: stereo (2 Neumann-microphones TLM127), deviant stimuli regardless of the fact that these emotional deviants 44.1 kHz sampling rate, 24 bit, distance from the instrument to were as probable as each of the three different standard stimuli. the microphones was always 50 cm. Each musician filled out a form describing the changes in technique that she had applied to SCALING EXPERIMENT achieve the different expressions. From 200 melodic phrases the Multidimensional Fechnerian scaling (Dzhafarov and Colonius, last tone (always c-sharp) was extracted using Adobe Audition. 1999, 2001) is a tool for studying the perceptual relationship Only those tones were selected which were between 1450 and among stimuli. The general aim of MDS is to arrange a set of 1700 ms in length and had a pitch between 550 and 570 Hz. Tones stimuli in a low-dimensional (typically Euclidean) space such from two violinists had to be discarded altogether because they that the distances among the stimuli represent their subjective were consistently below pitch level. The resulting pre-selection (dis)similarity as perceived by a group of judges. Judges generally comprised 35 tones by 7 different violinists. To soften the tone perform their ratings in pairwise comparisons between all stim- onset a smooth fade-in envelope was created from 0 to 100 ms uli in question. Based on the dissimilarity data a MDS procedure post-tone onset. The pre-selection was rated on a 5-point scale finds the best fitting spatial constellation by use of a function min- from very sad (1) to very happy (5) by 9 student subjects (mean imization algorithm that evaluates different configurations with age = 25.9 years, 5 males) naive to the purpose of the study and the goal of maximizing the goodness-of-fit (Kruskal, 1964a,b). different from the participants taking part in the final experiment. Though the dimensions found to span the scaling space can Each tone was rated twice by each participant to test the raters’ often be interpreted as psychologically meaningful attributes that consistency. Tones were not amplitude-normalized, because it was underlie the judgment, no a priori assumptions have to be made found that differences in affective expression could not be dif- about the nature of the dimensions. Thus, with MDS perceptual ferentiated properly in a normalized version. Based on the affect similarity can be studied without the need to introduce prede- ratings and their consistency 10 tones were selected for the final fined feature concepts (as labels for the dimensions) which might stimulus set (Table 1). bias people’s judgments. Fechnerian scaling is a development of classical MDS which is more suitable to be used with psychophysical data. Dzhafarov and Table 1 | Features of the stimulus material. Colonius (2006) have pointed out that certain requirements for Tone Duration (ms) Frequency (Hz), (SD) Mean level [dB(A)] data to be used with classical MDS are usually violated in empir- ical data, namely the property of symmetry and the property of tone01 1676 559.69 (2.41) 64.5 constant self-dissimilarity. The property of symmetry assumes tone02 1526 558.99 (2.04) 66.2 that discrimination probability is independent of presentation tone03 1658 559.98 (4.45) 72.2 order, and, thus, that the probability to judge a stimulus x as tone04 1628 554.39 (3.55) 71.6 different from a stimulus y is the same no matter whether x or tone05 1506 555.86 (1.13) 68.8 y is presented first [p(x; y) = p(y; x)]. It has been known since tone06 1534 561.86 (4.35) 68.5 Fechner (1860) that this is not true. The property of constant self- tone07 1660 563.00 (4.58) 66.6 dissimilarity expects that any given stimulus is never perceived as tone08 1630 561.31 (3.61) 67.8 different from itself, thus, that the probability to judge stimulus tone09 1570 556.96 (1.25) 72.4 x as different from itself is 0 [p(x; x) = p(y; y)]. However, it has tone10 1608 557.64 (0.35) 68.8 been shown repeatedly that this is not the case in psychophysi- cal data (e.g., Rothkopf, 1957). The only requirement made by Mean (SD) 1599 (61.5) 559.3 (2.75) 68.74 (2.66) www.frontiersin.org September 2013 | Volume 4 | Article 656 | 33 Spreckelmeyer et al. Preattentive processing of musical emotion DESIGN OF THE SAME-DIFFERENT FORCED-CHOICE EXPERIMENT were presented twice with the order being randomized for each Participants were 10 students (mean age = 25.4 years, 5 females) participant. Participants were asked to rate each tone on a 5- with no musical expertise who took part in two separate sessions. point-scale ranging from very sad (1) to very happy (5) by press- In session 1 they performed a same-different forced-choice task ing one of the keys from F1 to F5 on the keyboard. Emblematic on the violin tones to provide data for MDS. In session 2 (approx- faces illustrated the sad and the happy end of the scale. imately 1 week later) they were asked to rate the emotional expression of the tones on a five-point-scale. VALENCE AND AROUSAL RATING For the forced-choice task, participants were tested individ- Stimulus material was also rated according to valence and arousal ually while sitting in a comfortable chair 120 cm away from a by two additional groups of participants. All stimuli were pre- 20-zoll-computer screen. All auditory stimuli were presented via sented twice but the order was randomized for each participant. closed head-phones (Beyerdynamic DT 770 M) with a level rang- To give participants an idea what was meant by the terms valence ing from 64 to 73 dB. Presentation software (Neurobehavioral and arousal they performed a short test trial on pictures taken Systems) was used to present trials and to record responses. All from the IAPS. Group A (valence) (5 women, 5 men, mean 10 tones were combined with each other including themselves, age = 27.6) was asked to rate all 10 tones on a 5-point-scale rang- resulting in 10 × 10 = 100 pairs; all 100 pairs were presented ing from very negative (1) to very positive (5). Group B (5 women, ten times, each time in a different randomized order (resulting 5 men, mean age = 24.4) was asked to rate the 10 tones from in 1000 trials altogether). The stimulus onset asynchrony (SOA) very relaxed (German = “sehr entspannt”) (1) to highly aroused between the two tones of a pair was 3500 ms. Participants had (German = “sehr erregt”) (5). to strike one of two keys to respond same or different (forced choice). To make sure participants judged the psychoacoustical RESULTS similarity of the tones unbiased, they were kept uninformed on SAME-DIFFERENT FORCED-CHOICE EXPERIMENT the purpose of the experiment. Trial duration was about 6000 ms. Discrimination probabilities for each pair of tones based on The next trial was automatically started when one of the two participants’ same-different- judgments are shown in Table 2. buttons was pressed. Participants performed a short training to Fechnerian distances for each pair of tones calculated from dis- familiarize them with the procedure and were allowed to pause crimination probabilities are shown in Table 3. Given values after each block of 25 trials. There were 40 blocks altogether. reflect the relative distances between pairs of tones as perceived Participants could end the pause by pressing a button on the key- by the mean participant. For example, tone04 (abbreviated t.04 in board. The duration of the whole experiment was about 2 hours. the row), is perceived about 1.5 times more distant from tone05 Participants were verbally instructed to decide whether the two than from tone07. tones comprising a pair were same or different. For the data anal- ysis responses were recorded as 0 (same) and 1 (different). Mean AFFECT, AROUSAL, AND VALENCE RATING values (discrimination probabilities) per pair of tones were cal- Results of the affect, arousal, and valence ratings are shown in culated over all participants and all responses. Minimum number Table 4 collapsed over the first and second presentation which did of responses per pair was 90. The resulting discrimination proba- not differ significantly. Please note, that the affect rating was per- bilities were transformed into Fechnerian distances using FSDOS formed by the same group of participants that also took part in (Fechnerian Analysis of Discrete Object Sets by Dzhafarov and the same-different forced choice experiment, whereas the arousal Colonius, see http://www.psych.purdue.edu/∼ehtibar/). and valence ratings were performed by two different groups of subjects. Though stemming from different groups of participants, AFFECT RATING there was a high correlation between the affect and the arousal rat- In session 2 each participant from the scaling experiment per- ings [r = 0.937, p < 0.001]. In contrast, the correlation between formed an affect rating of each individual violin tone. All stimuli valence and affect ratings was rather low [r = 0.651, p = 0.042]. Table 2 | Discrimination probabilities for the 10 tones. tone01 tone02 tone03 tone04 tone05 tone06 tone07 tone08 tone09 tone10 t.01 0.06 0.12 1 0.89 0.74 0.81 0.86 0.94 0.88 0.89 t.02 0.16 0.08 0.98 0.91 0.69 0.72 0.85 0.89 0.88 0.93 t.03 0.99 0.97 0.04 0.93 0.97 0.93 0.85 0.88 0.98 0.95 t.04 0.9 0.93 0.96 0.08 0.82 0.42 0.51 0.64 0.6 0.96 t.05 0.7 0.77 1 0.84 0.08 0.79 0.85 0.91 0.78 0.74 t.06 0.89 0.8 0.94 0.62 0.93 0.07 0.3 0.35 0.74 0.79 t.07 0.92 0.91 0.97 0.69 0.86 0.41 0.09 0.2 0.89 0.93 t.08 0.9 0.91 0.94 0.75 0.9 0.31 0.16 0.1 0.86 0.83 t.09 0.88 0.95 0.96 0.66 0.82 0.77 0.8 0.76 0.08 0.26 t.10 0.91 0.94 1 0.91 0.65 0.77 0.89 0.82 0.34 0.06 Given are probabilities with which the mean perceiver judged the row tones to be different from the column tones. Frontiers in Psychology | Emotion Science September 2013 | Volume 4 | Article 656 | 34 Spreckelmeyer et al. Preattentive processing of musical emotion Table 3 | Fechnerian distances. tone01 tone02 tone03 tone04 tone05 tone06 tone07 tone08 tone09 tone10 t.01 0.000 0.140 1.890 1.650 1.290 1.510 1.630 1.670 1.620 1.680 t.02 0.140 0.000 1.830 1.680 1.290 1.370 1.590 1.620 1.660 1.730 t.03 1.890 1.830 0.000 1.770 1.850 1.760 1.690 1.680 1.820 1.850 t.04 1.650 1.680 1.770 0.000 1.500 0.890 1.030 1.190 1.100 1.550 t.05 1.290 1.290 1.850 1.500 0.000 1.570 1.540 1.630 1.440 1.250 t.06 1.510 1.370 1.760 0.890 1.570 0.000 0.550 0.490 1.360 1.430 t.07 1.630 1.590 1.690 1.030 1.540 0.550 0.000 0.170 1.520 1.660 t.08 1.670 1.620 1.680 1.190 1.630 0.490 0.170 0.000 1.440 1.490 t.09 1.620 1.660 1.820 1.100 1.440 1.360 1.520 1.440 0.000 0.460 t.10 1.680 1.730 1.850 1.550 1.250 1.430 1.660 1.490 0.460 0.000 Distances were calculated by FSDOS (the larger the value the more distant the tones). 1.52, and 1.44 among happy tones and 0.14 and 1.29 among sad Table 4 | Results of the affect, arousal, and valence ratings. tones. Affect Arousal Valence Label EVENT-RELATED POTENTIAL EXPERIMENT tone01 1.90 (0.61) 1.75 (0.42) 2.80 (1.40) sad01 METHODS tone02 1.95 (0.61) 1.90 (0.66) 3.20 (0.98) sad02 Participants tone03 4.40 (0.94) 4.55 (0.44) 3.55 (0.90) Of a total of 19 participants three had to be excluded because tone04 2.90 (0.39) 3.15 (1.00) 3.35 (0.67) of technical error (two) or too many blink artifacts in the ERP tone05 2.20 (0.71) 1.80 (0.54) 2.70 (0.63) sad03 data (one). The remaining 16 participants (8 women) were aged tone06 2.70 (0.59) 3.00 (0.62) 3.25 (0.49) between 21 and 29 years (mean = 24.9). None was a professional tone07 3.45 (0.98) 2.95 (0.55) 2.95 (0.44) hap01 musician. tone08 3.60 (0.77) 3.20 (0.71) 3.30 (0.63) hap02 tone09 3.35 (0.71) 3.40 (0.81) 3.25 (1.03) hap03 Design tone10 2.55 (0.55) 2.80 (0.63) 2.70 (1.01) Stimuli were the 6 different single violin tones chosen on the basis of the scaling experiment. Two conditions were set up in a mod- Each scale ranged from 1 to 5; last column gives the label of the tone for the ified oddball-design. In condition A 3 sad tones were presented MMN study. in random order (standards) with 1 happy tone (deviant) ran- domly interspersed. In condition B 3 happy tones were presented This is surprising for it was expected that valence and affect are as standards with 1 sad tone randomly interspersed as deviant closely related. It has to be noted, though, that during the testing it tone. As deviants, the tones with the lowest and highest affect became apparent that participants used different concepts for the ratings were chosen. The probability of occurrence was 25% for valence dimension. While some understood positive—negative each of the three standard tones and the deviant tone, resulting in the sense of pleasant—unpleasant, others linked positive— in an overall probability of 75% for the standard stimuli and 25% negative to the two ends of the dimension to happy and sad. This for the affective deviant. In both conditions each tone was pre- problem is paralleled by a heterogeneous use of the valence-term sented 340 times resulting in a total of 1360 tones per condition. in the literature (see Russell and Barrett, 1999, for a discussion) A randomization algorithm guaranteed that identical tones were and might serve as an explanation for the incongruous pattern. never presented back-to-back. Both conditions were divided in In the current experiment the valence ratings will therefore, be two blocks of 680 tones. The order of blocks was ABAB or BABA. interpreted with caution. All four blocks were presented in one session with one pause between block 2 and 3. The total duration of the experiment was SELECTION OF STIMULI FOR THE MMN EXPERIMENT about 90 min. Three sad tones [tone01 (sad01), tone02 (sad02), tone05 (sad03)] Tones were presented via insert ear phones used with Earlink and 3 happy tones [tone07 (hap01), tone08 (hap02), tone09 ear-tips (Aearo Comp.). Stimulus onset asynchrony between two (hap03)] were chosen from the data set based on their affect tones was 2000 ms. Mean sound pressure level of the presenta- ratings. The happy tones had mean affect ratings of 3.45, 3.60, tion of all tones was 70 dB. To realize a non-attentive listening and 3.35; sad tones were rated 1.90, 1.95, and 2.20, respectively. paradigm, participants were instructed to pay attention to car- Affect ratings of happy and sad tones were significantly different toons (Tom and Jerry—The classical collection 1) presented [F(9, 90) = 12.9 p < 0.001] and scaling procedures demonstrated silently on a computer screen in front of them. To control how that tones were perceived as different even when belonging to well participants had attended the film a difficult post-test was the same emotion category. Fechnerian distances between happy performed after the experiment requiring participants to recog- and sad tones fell between 1.44 and 1.67. Distances were 0.17, nize selected scenes. On average, 85% of the scenes were classified www.frontiersin.org September 2013 | Volume 4 | Article 656 | 35 Spreckelmeyer et al. Preattentive processing of musical emotion correctly, indicating that the participants had indeed attended the film. ERP-recording The electroencephalogram (EEG) was recorded from 32 tin electrodes mounted in an elastic cap according to the 10–20- system. Electrode impedance was kept below 5 k. The EEG was amplified (bandpass 0.1–40 Hz) and digitized continuously at 250 Hz. Electrodes were referenced on-line to the left mastoid. Subsequently, off-line re-referencing to an electrode placed on the nose-tip was performed. Electrodes placed at the outer canthus of each eye were used to monitor horizontal eye movements. Vertical eye movements and blinks were monitored by electrodes above and below the right eye. Averages were obtained for 1024 ms epochs including a 100 ms pre-stimulus baseline period. Trials contaminated by eye movements or amplifier blocking or other artifacts within the critical time window were rejected prior to averaging. For this, different artifact rejection thresholds were defined for the eye- and EEG channels. Separate averages were calculated for each tone in both condi- tions. ERPs were quantified by mean amplitude measures using the mean voltage of the 100 ms period preceding the onset of the stimulus as a reference. Time windows and electrode sites are specified at the appropriate places of the result section. Effects FIGURE 1 | Grand average ERPs for condition (A) (top) and (B) were tested for significance in separate ANOVAs, with stim- (bottom); the respective standard-ERP (bold line) is depicted with the ulus type (standard or deviant) and electrode site as factors. ERP to the emotionally deviating tone when it was presented as The Huynh-Feldt epsilon correction (Huynh and Feldt, 1980) deviant (dotted line) or as standard in the other condition (dashed line). Highlighted time windows mark significant differences in both was used to correct for violations of the sphericity assumption. standard-deviant comparisons. Reported are the original degrees of freedom and the corrected p-values. Significance level was set to p < 0.05. RESULTS the difference waveform in the time-window 500–600 ms was The grand average waveforms to the standard and deviant tones used to create spline-interpolated isovoltage maps. The topo- (Figure 1) are characterized by a N1-P2-complex as typically graphical distribution was typical for an MMN response. In found in auditory stimulation (Näätänen et al., 1988), followed particular, we observed a polarity inversion at temporobasal by a long-duration negative component with a frontal maximum (mastoid) electrode sites (Figure 2). In condition B (Figure 1, and a peak around 400–500 ms. The current design allows two bottom; Table 5), sad deviants, too, elicited a more negative wave- different ways to assess emotional deviants. Firstly, deviants and form than the happy standards, though in an earlier latency range standards collected in the same experimental blocks can be com- (P2, 200–300 ms). However, no difference was found when the pared (i.e., happy standard vs. sad deviant or sad standard vs. ERPs to the sad tone were compared across conditions, suggesting happy deviant). These stimulus classes are emotionally as well as that this effect was triggered by the structural difference of happy physically different. Secondly, the ERP to the deviant can be com- and sad tones rather than their functional significance as standard pared with the same tone when it was presented as standard in and deviant. To summarize the result: presenting a happy tone in the other condition, such that the compared stimuli are physically a series of sad tones resulted in a late negativity that was larger in identical but differ in their functional significance as standard and amplitude than the ERP to the same happy tone functioning as deviant (i.e., sad standard vs. sad deviant and happy standard vs. standard in the opposite condition. In contrast, no difference that happy deviant, see Table 5). Time windows for the statistical anal- could be related to its functional significance was found for the ysis were set as follows: 100–200 ms (N1), 200–300 ms (P2), and sad tone presented in a train of differing happy tones. 380–600 ms. Electrode sites included in the analysis were F3, F4, FC5, FC6, C3, C4, Fz, FCz, Cz. DISCUSSION In condition A, emotional (happy) deviants elicited a more The affective deviant in condition A evoked a clear mismatch negative waveform in a late latency range (from 380 ms), regard- reaction. Though the latency was rather long, its topographic less of the comparison (Figure 1, top; Table 5). Thus, the mis- distribution, including the typical inversion of polarity over tem- match response cannot be explained by the fact that physically poral regions (see Figure 2) in our nose-tip referenced data, different tones elicited the different ERP waveforms. To illustrate suggests that it belongs to the MMN-family. Indeed, it is a known the scalp distribution of this effect, the difference happy deviant fact that MMN-latency increases with discrimination difficulty. minus sad standards was computed and the mean amplitude of In this regard, we would like to point to the predecessor study Frontiers in Psychology | Emotion Science September 2013 | Volume 4 | Article 656 | 36 Spreckelmeyer et al. Preattentive processing of musical emotion Table 5 | Comparison of standard vs. deviant stimuli. Comparison Standard Deviant 100–200 ms 200–300 ms 380–600 ms Condition A Sad standards HAP02 0.93 2.40 7.32* Condition B Happy standards SAD01 0.06 10.94** 0.00 Across conditions HAP02 as std. HAP02 0.27 0.55 9.20** Across conditions SAD01 as std. SAD01 3.04 0.00 0.01 Given are the F-values (df = 1,15). **p < 0.01; *p< 0.05. FIGURE 2 | Spline-interpolated isovoltage maps depicting the mean amplitude of the “happy deviant minus sad standard” difference wave from condition A. A typical frontal maximum was observed. The polarity inversion at temporobasal electrodes suggests that this response belongs to the MMN family. FIGURE 3 | Arrangement of tones in a three dimensional space based on the multidimensional scaling procedure. Note that orientation of (Goydke et al., 2004), in which we obtained a rather long latency dimensions is arbitrary. of the MMN response for emotional deviants, even though the latency was still shorter than in the present study. No doubt, dis- from each other on a perceptual basis (e.g., sad01 and sad03: crimination was particularly difficult in the present experiment, Fechnerian distance = 1.29) as was the happy deviant from the because the difference in timbre was reduced to subtle changes sad standards (e.g., sad03 vs. happy deviant: Fechnerian dis- in the expression of same-pitch and same-instrument tones. The tance = 1.44). Relative distances are visualized in Figure 3. The mismatch reaction observed for condition A suggests that a happy arrangement of tones in a three dimensional space results from tone was pre-attentively categorized as different from a group of feeding Fechnerian distance values into a MDS procedure (Alscal different sad tones. An MMN reflects change detection in a previ- in SPSS) which finds the optimal constellation of stimuli in an n- ously established context (Näätänen, 1992). Thus, for it to occur, dimensional space based on dissimilarity data. Three dimensions a context needs to be set up first. Consequently, the important were found to explain 99% of variance. Note that the orien- question in the present experiment is not, what is so particular tation of the dimensions is arbitrary. Though the positions of about the happy tone? The question is, what has led to grouping SAD01 and SAD02 are relatively close, both are rather distant the standard (sad) tones into one mutual category, so that the sin- from SAD03. Grouping, thus, cannot be explained by perceptual gle happy tone was perceived as standing out? For the happy tone similarity alone. to be categorized as deviant it was required that the sad tones— though different in structure—were perceived as belonging to the EMOTIONAL SIMILARITY same context, i.e., category. The question thus, arises: what has led Affect ratings (1.90, 1.95, and 2.20) indicate that the tones were to grouping of the sad tones? Three possibilities seem plausible: perceived as equally sad in expression. There thus, is some sup- port for the hypothesis that the tones were grouped together • perceptual similarity based on their emotional category. However, if it was the emo- • emotional similarity or tional expression that has led to the automatic categorization • emotion-specific perceptual similarity why did it not work in condition B? No index was found for a mismatch reaction in response to a sad tone randomly inter- PERCEPTUAL SIMILARITY spersed in a train of different happy tones. Arguing along the From the result of the scaling-experiment it can be derived, that same line as before, this (non)-finding implies that either no tones within the sad category were perceived quite as different mutual standard memory trace was built for the happy tones or www.frontiersin.org September 2013 | Volume 4 | Article 656 | 37 Spreckelmeyer et al. Preattentive processing of musical emotion that this memory trace was considerably weaker for these tones. explanation why the sad tone did not evoke a MMN. It was Since the affect ratings of the happy tones were just as homoge- not registered as deviant against a happy context, because no neous (3.35, 3.45, and 3.60) as those of the sad tones, the question such context existed. Nevertheless, the hypothesis that the MMN arises, if the affect ratings gave a good enough representation of reflects deviance detection based on emotional categorization can the emotion as it was decoded by the listeners. Against the back- at least be maintained for condition A. ground that decoding accuracy of acoustical emotion expressions has repeatedly been reported to be better for sadness than for EMOTION-SPECIFIC PERCEPTUAL SIMILARITY happiness (Johnstone and Scherer, 2000; Elfenbein and Ambady, It was presupposed that emotion recognition in acoustical stimuli 2002; Juslin and Laukka, 2003), it might be necessary to take a is based on certain acoustical cues coding the emotion intended second look at the stimulus material. Banse and Scherer (1996) to be expressed by the sender. To test whether the sad tones in found that if participants had the option to choose among many the present experiment were similar with regard to prototypi- different emotional labels to rate an example of vocal expres- cal cues for sadness an acoustical analysis was performed on the sion, happiness was often confused with other emotions. In the stimulus set. Tones were analyzed on the parameters found to present experiment participants had given their rating on bipo- be relevant in the expression of emotion on single tones (Juslin, lar dimensions ranging from happy to sad. It cannot be ruled 2001). Using PRAAT (Boersma, 2001) and dBSonic, tones were out that the response format biased the outcome. It is, for exam- assessed for the following features: high frequency energy, attack, ple, possible that in some cases participants chose to rate happy mean pitch, pitch contour, vibrato amplitude, vibrato rate, sound because the tone was found to be definitely not-sad, even if it level. For each feature, the range of values was divided into was not perceived as being really happy either. In an attempt three categories (low, medium, high) and each tone was classi- to examine the perceived similarity of the tones with respect to fied accordingly (Table 7). The acoustical analysis revealed that the expressed emotion without pre-selected response categories, some though not all parameters were manipulated the way it a similarity rating on emotional expression was performed post- would have been expected based on previous findings. However, hoc. For that purpose, the same students who had participated in the first scaling-experiment were asked to perform another same-different-judgment on the same stimulus material, though this time with regard to the emotion expressed in the tone. The results are depicted in Table 6 and show that sad tones (t.01, t.02, t.05) were perceived considerably more similar to each other with respect to the emotion expressed than the happy tones (t.07, t.08, t.09). In fact, sad tones were judged half as dissimilar from each other than the happy tones (0.503 vs. 1.02). Figure 4 shows the relation of same and different responses given for happy and sad tone pairs, respectively. Sad tones were considerably more often considered to belong to the same emotional category than happy tones (80% vs. 57% “same”-responses). It can be assumed that in the MMN-experiment, too, sad tones (in condition A) were perceived as belonging into one emotional category while FIGURE 4 | Same and different responses for tone pairs in the happy tones (in condition B) were not. The difficulty to attribute categories sad (left) and happy (right), respectively. the happy tones to the same “standard” category can serve as Table 6 | Fechnerian distances as calculated from same-different-judgments of emotional expression for the 10 tones. tone01 tone02 tone03 tone04 tone05 tone06 tone07 tone08 tone09 tone10 t.01 0.000 0.012 1.763 1.003 0.491 0.943 1.103 1.003 1.072 0.983 t.02 0.012 0.000 1.751 0.991 0.503 0.931 1.091 0.991 1.072 0.971 t.03 1.763 1.751 0.000 1.390 1.700 1.040 0.880 0.990 1.420 1.560 t.04 1.003 0.991 1.390 0.000 0.820 0.580 0.630 0.620 0.600 0.750 t.05 0.491 0.503 1.700 0.820 0.000 1.020 1.170 1.080 0.730 0.650 t.06 0.943 0.931 1.040 0.580 1.020 0.000 0.160 0.060 0.860 0.850 t.07 1.103 1.091 0.880 0.630 1.170 0.160 0.000 0.110 1.020 1.010 t.08 1.003 0.991 0.990 0.620 1.080 0.060 0.110 0.000 0.920 0.910 t.09 1.072 1.072 1.420 0.600 0.730 0.860 1.020 0.920 0.000 0.150 t.10 0.983 0.971 1.560 0.750 0.650 0.850 1.010 0.910 0.150 0.000 Given are perceived distances of row tones and column tones with respect to their emotional expression; sad tones were t.01, t.02, and t.05, happy tones were t.07, t.08, and t.09. Frontiers in Psychology | Emotion Science September 2013 | Volume 4 | Article 656 | 38 Spreckelmeyer et al. Preattentive processing of musical emotion Table 7 | Results of the acoustical analysis of the sad tones. how sadness is acoustically encoded. Several researchers have sug- gested the existence of such hard-wired templates for the rapid SAD01 SAD02 SAD03 processing of emotional signals (Lazarus, 1991; LeDoux, 1991; Timbre (high frequency energy) Low Low Low Ekman, 1999; Scherer, 2001). It is assumed that to allow for quick Attack Medium Medium Medium adaptational behavior, stimulus evaluation happens fast and auto- Mean pitch Low Medium Medium matic. Incoming stimuli are expected to run through a matching Pitch contour Normal Down Down process in which comparison with a number of schemes or tem- Vibrato amplitude Medium Medium Low plates takes place. Templates can be innate and/or formed by Vibrato rate Slow Medium Slow social learning (Ekman, 1999). The present study, while blind Sound level Low Medium Medium with respect to the origin of the template, provides some infor- mation as to how such a matching process might be performed Tested were parameters expected to be relevant cues to express emotion on on a pre-attentive level. Given the long latency of the MMN in the single tones. Categorization as low, medium, and high was based on comparison present experiment, it can be assumed that basic sensory process- with the “happy” tones. ing has already taken place before the mismatch reaction occurs. Therefore, the MMN in the current experiment appears to reflect Table 7 indicates that the cues were not used homogeneously. the mismatch between the pattern of acoustic cues identified as For example, mean pitch level was not a reliable cue. Moreover, emotionally significant and the template for sad stimuli activated vibrato was manipulated in individual ways by the musicians. by the preceding standard tones. Our data is thus, in line with Timbre, however, was well in line with expectations. All sad tones considerations that the MMN does not only occur in response were characterized by little energy in the high frequency spec- to basic acoustical feature processing. Several authors have sug- trum. In contrast, more energy in high frequencies was found gested that the MMN can also reflect “holistic” (Gomes et al., in the spectrum of the deviant happy tone. Based on the find- 1997; Sussman et al., 1998) or “gestalt-like” (Lattner et al., 2005) ings by Tervaniemi et al. (1994) it appears that a difference in perception. They assume that the representation of the “stan- spectral structure alone can trigger the MMN. That would mean dard” in the auditory memory system is not merely built up based that the sad tones were grouped together as standards based on on the just presented standard-stimuli, but that it can be influ- their mutual feature of attenuated higher partials. It has to be enced by prototypical representations stored in other areas of noted though that the high-frequency energy parameter is a very the brain (Phillips et al., 2000). Evidence from a speech-specific coarse means to describe timbre. Especially in natural tones [com- phoneme processing task suggested that the MMN-response does pared to synthesized tones as used by Tervaniemi et al. (1994)] not only rely on matching processes in the transient memory store the spectrum comprises a large number of frequencies with dif- but that long-term representations for prototypical stimuli were ferent relative intensities. As a consequence, the tones still have accessed already at a pre-attentive level. For phonemes, (Näätänen very individual spectra (and consequently sounds), even if they and Winkler, 1999) assumed the existence of long-term memory all display a relatively low high-frequency energy level. This fact is traces serving as recognition patterns or templates in speech per- also reflected in the low perceptual similarity ratings. Moreover, ception. He further posited that these can be activated by sounds if the spectral structure really was the major grouping princi- “nearly matching with the phoneme-specific invariant codes” ple, it should also have applied to the happy tones in condition (p. 14). In another contribution, Näätänen et al. (2005) point B. Here, all happy tones were characterized by a high amount out that the “mechanisms of generation of these more cognitive of energy in high frequencies, while the sad deviant was not. kinds of MMNs of course involve other, obviously higher-order, Nevertheless, no MMN was triggered. To conclude, though the neural populations than those activated by a mere frequency possibility cannot be completely ruled out, it is not very likely change.” (p. 27). that the grouping of the sad tones was based solely on similarities In the model of Schirmer and Kotz (2006) emotional-prosodic of timbre structure. Instead, the heterogeneity of parameters in processing is conceptualized as a hierarchical process. Stage 1 Table 7 provides support for Juslin’s idea of redundant code usage comprises initial sensory processing of the auditory informa- in emotion communication (Juslin, 1997b, 2001). Obviously, tion before emotionally significant cues are integrated (stage 2) expressive cues were combined differently in different sad tones. and cognitive evaluation processes (stage 3) take place. The Thus, though the sad tones did not display homogeneous patterns MMN in response to emotional auditory stimuli might reflect of emotion-specific cues, each tone was characterized by at least the stage of integrating emotionally significant cues (Schirmer two prototypical cues for sadness expression. Based on the model et al., 2005). The present data is compatible with the model assumption of redundant code usage, it seems likely that tones albeit in the area of nonverbal auditory emotion processing. were grouped together because they were identified as belonging The current data contributes to disentangling the processes to one emotional category based on emotion specific-cues. underlying emotion recognition in the auditory domain. It What implication does this consideration have for the ques- has to be pointed out though that the present results can tion of grouping principles in the MMN-experiment? From what only give a first glimpse on the mechanisms underlying pro- is known about the principles of the MMN, the results imply cessing of emotionally expressive tones. More studies with a that the representation of the standard in memory included larger set of tones characterized by different cues are needed invariances across several different physical features. The invari- to systematically examine the nature of the stimulus evaluation ances, however, needed to be in line with a certain template on process. www.frontiersin.org September 2013 | Volume 4 | Article 656 | 39
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-