Methodology of Educational Measurement and Assessment Randy E. Bennett Matthias von Davier Editors Advancing Human Assessment The Methodological, Psychological and Policy Contributions of ETS Methodology of Educational Measurement and Assessment Series editors Bernard Veldkamp, Research Center for Examinations and Certification (RCEC), University of Twente, Enschede, The Netherlands Matthias von Davier, National Board of Medical Examiners (NBME), Philadelphia, USA 1 1This work was conducted while M. von Davier was employed with Educational Testing Service. This book series collates key contributions to a fast-developing field of education research. It is an international forum for theoretical and empirical studies exploring new and existing methods of collecting, analyzing, and reporting data from educational measurements and assessments. Covering a high-profile topic from multiple viewpoints, it aims to foster a broader understanding of fresh developments as innovative software tools and new concepts such as competency models and skills diagnosis continue to gain traction in educational institutions around the world. Methodology of Educational Measurement and Assessment offers readers reliable critical evaluations, reviews and comparisons of existing methodologies alongside authoritative analysis and commentary on new and emerging approaches. It will showcase empirical research on applications, examine issues such as reliability, validity, and comparability, and help keep readers up to speed on developments in statistical modeling approaches. The fully peer-reviewed publications in the series cover measurement and assessment at all levels of education and feature work by academics and education professionals from around the world. Providing an authoritative central clearing-house for research in a core sector in education, the series forms a major contribution to the international literature. More information about this series at http://www.springer.com/series/13206 Randy E. Bennett • Matthias von Davier Editors Advancing Human Assessment The Methodological, Psychological and Policy Contributions of ETS ISSN 2367-170X ISSN 2367-1718 (electronic) Methodology of Educational Measurement and Assessment ISBN 978-3-319-58687-8 ISBN 978-3-319-58689-2 (eBook) DOI 10.1007/978-3-319-58689-2 Library of Congress Control Number: 2017949698 © Educational Testing Service 2017. This book is an open access publication Open Access This book is licensed under the terms of the Creative Commons Attribution- NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. This work is subject to copyright. All commercial rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Editors Randy E. Bennett Educational Testing Service (ETS) Princeton, NJ, USA Matthias von Davier National Board of Medical Examiners (NBME) Philadelphia, PA, USA CBAL, E-RATER, ETS, GRADUATE RECORD EXAMINATIONS, GRE, PRAXIS, PRAXIS I, PRAXIS II, PRAXIS III, PRAXIS SERIES, SPEECHRATER, SUCCESSNAVIGATOR, TOEFL, TOEFL IBT, TOEFL JUNIOR, TOEIC, TSE, TWE, and WORKFORCE are registered trademarks of Educational Testing Service (ETS). C-RATER, M-RATER, PDQ PROFILE, and TPO are trademarks of ETS. ADVANCED PLACEMENT PROGRAM, AP, CLEP, and SAT are registered trademarks of the College Board. PAA is a trademark of the College Board. PSAT/NMSQT is a registered trademark of the College Board and the National Merit Scholarship Corporation. All other trademarks are the property of their respective owners. v Foreword Since its founding in 1947, Educational Testing Service (ETS), the world’s largest private nonprofit testing organization, has conducted a significant and wide-ranging research program. The purpose of Advancing Human Assessment: The Methodological, Psychological and Policy Contributions of ETS is to review and synthesize in a single volume the extensive advances made in the fields of educa- tional and psychological measurement, scientific psychology, and education policy and evaluation by researchers in the organization. The individual chapters provide comprehensive reviews of work ETS research- ers conducted to improve the science and practice of human assessment. Topics range from test fairness and validity to psychometric methodologies, statistics, and program evaluation. There are also reviews of ETS research in education policy, including the national and international assessment programs that contribute to pol- icy formation. Finally, there are extensive treatments of research in cognitive, devel- opmental, and personality psychology. Many of the developments presented in these chapters have become de facto standards in human assessment, for example, item response theory (IRT), linking and equating, differential item functioning (DIF), and confirmatory factor analysis, as well as the design of large-scale group-score assessments and the associated analysis methodologies used in the National Assessment of Educational Progress (NAEP), the Programme for International Student Assessment (PISA), the Progress in International Reading Literacy Study (PIRLS), and the Trends in International Mathematics and Science Study (TIMSS). The breadth and the depth of coverage the chapters provide are due to the fact that long-standing experts in the field, many of whom contributed to the develop- ments described in the chapters, serve as lead chapter authors. These experts con- tribute insights that build upon decades of experience in research and in the use of best practices in educational measurement, evaluation, scientific psychology, and education policy. The volume’s editors, Randy E. Bennett and Matthias von Davier, are themselves distinguished ETS researchers. vi Randy E. Bennett is the Norman O. Frederiksen chair in assessment innovation in the Research and Development Division at ETS. Since the 1980s, he has con- ducted research on integrating advances in the cognitive and learning sciences, mea- surement, and technology to create new forms of assessment intended to have positive impact on teaching and learning. For his work, he was given the ETS Senior Scientist Award in 1996, the ETS Career Achievement Award in 2005, and the Distinguished Alumni Award from Teachers College, Columbia University in 2016. He is the author of many publications including “Technology and Testing” (with Fritz Drasgow and Ric Luecht) in Educational Measurement (4th ed.). From 2007 to 2016, he led a long-term research and development activity at ETS called the Cognitively Based Assessment of , for , and as Learning ( CBAL ® ) initiative, which created theory-based assessments designed to model good teaching and learning practice. Matthias von Davier is a distinguished research scientist at the National Board of Medical Examiners in Philadelphia, PA. Until January 2017, he was a senior research director at ETS. He managed a group of researchers concerned with meth- odological questions arising in large-scale international comparative studies in edu- cation. He joined ETS in 2000 and received the ETS Research Scientist Award in 2006. He has served as the editor in chief of the British Journal of Mathematical and Statistical Psychology since 2013 and is one of the founding editors of the SpringerOpen journal Large-Scale Assessments in Education , which is sponsored by the International Association for the Evaluation of Educational Achievement (IEA) and ETS through the IEA-ETS Research Institute (IERI). His work at ETS involved the development of psychometric methodologies used in analyzing cogni- tive skills data and background data from large-scale educational surveys, such as the Organisation for Economic Co-operation and Development’s PIAAC and PISA, as well as IEA’s TIMSS and PIRLS. His work at ETS also included the development of extensions and of estimation methods for multidimensional models for item response data and the improvement of models and estimation methods for the analy- sis of data from large-scale educational survey assessments. ETS is proud of the contributions its staff members have made to improving the science and practice of human assessment, and we are pleased to make syntheses of this work available in this volume. Research and Development Division Ida Lawrence Educational Testing Service, Princeton, NJ, USA Foreword vii Preface An edited volume on the history of the scientific contributions to educational research, psychology, and psychometrics made by the staff members of a nonprofit organization, Educational Testing Service (ETS), in Princeton, NJ, begs the ques- tions: Why this self-inspection and who might benefit from that? The answer can be found in current developments in these fields. Many of the advances that have occurred in psychometrics can be traced to the almost 70 years of work that transpired at ETS, and this legacy is true also for select areas in statis- tics, the analysis of education policy, and psychology. When looking at other publications, be they conference proceedings, textbooks, or comprehensive collections like the Handbook of Item Response Theory (van der Linden 2016), the Handbook of Test Development (Lane et al. 2015), or Educational Measurement (4th ed.; Brennan 2006), one finds that many of the chapters were contributed by current and former ETS staff members, interns, or visiting scholars. We believe that this volume can do more than summarize past achievements or collect and systematize contributions. A volume that compiles the scientific and policy work done at ETS in the years since 1947 also shows the importance, and the long-term effects, of a unique organizational form—the nonprofit measurement organization—and how that form can contribute substantially to advancing scien- tific knowledge, the way students and adults are assessed, and how learning out- comes and education policy more generally are evaluated. Given the volume’s purpose, we expect that it will be most attractive to those concerned with teaching, advancing, and practicing the diverse fields covered herein. It thus should make for an invaluable reference for those interested in the genesis of important lines of study that began or were significantly advanced at ETS. Contributors to this volume are current and former ETS researchers who were asked to review and synthesize work around the many important themes that the organization explored over its history, including how these contributions have affected ETS and the field beyond. Each author brings his or her own perspective This work was conducted while M. von Davier was employed with Educational Testing Service. viii and writing style to the challenge of weaving together what in many cases consti- tutes a prodigious body of work. Some of the resulting accounts are thematically organized, while other authors chose more chronologically oriented approaches. In some chapters, individuals loom large simply because they had such significant impact on the topic at hand, while in other accounts, there were so many contributors that substantive themes offered a more sensible organizational structure. As a result, the chapters follow no single template, but each tells its own story in its own way. The book begins with the reprint of a 2005 report by Randy E. Bennett that reviews the history of ETS, the special role that scientific research has played, and what that history might mean for the future. With that opening chapter as context, Part I centers on ETS’s contributions to the analytic tools employed in educational measurement. Chapters 2 and 3 by Tim Moses cover the basic statistical tools used by psychometricians and analysts to assess the quality of test items and test scores. These chapters focus on how ETS researchers invented new tools (e.g., confirmatory factor analysis), refined other tools (e.g., reliability indices), and developed versions of basic statistical quantities to make them more useful for assessing the quality of test and item scores for low- and high-stakes tests (e.g., differential item functioning procedures). Chapter 4 by Neil J. Dorans and Gautam Puhan summarizes the vast literature developed by ETS researchers on one of the most fundamental procedures used to assure comparability of test scores across multiple test forms. This chapter on score linking is written with an eye toward the main purpose of this procedure, namely, to ensure fairness. James E. Carlson and Matthias von Davier describe in Chap. 5 another staple of educational and psychological measurement that had significant roots at ETS. Their chapter on item response theory (IRT) traverses decades of work, focusing on the many developments and extensions of the theory, rather than summarizing the even more numerous applications of it. In Chap. 6, Henry Braun describes important contributions to research on statis- tics at ETS. To name a few, ETS research contributed significantly to the methodol- ogy and discourse with regard to missing data imputation procedures, statistical prediction, and various aspects of causal inference. The closing chapter in this part, Chap. 7 by Neil J. Dorans, covers additional top- ics relevant to ensuring fair assessments. The issues dealt with here go beyond that addressed in Chap. 4 and include procedures to assess differential item functioning (DIF) and give an overview of the different approaches used at ETS. The chapters in Part II center on ETS’s contributions to education policy and program evaluation. Two chapters cover contributions that built upon the develop- ments described in Chaps. 2, 3, 4, 5, 6, and 7. Chapter 8 by Albert E. Beaton and John L. Barone describes work on large-scale, group-score assessments of school- based student populations. The chapter deals mainly with the National Assessment of Educational Progress (NAEP), describing the methodological approach and related developments. The following chapter by Irwin Kirsch, Mary Louise Lennon, Kentaro Yamamoto, and Matthias von Davier focuses on methods and procedures developed over 30 years at ETS to ensure relevance, comparability, and interpret- Preface ix ability in large-scale, international assessments of adult literacy. It describes how methods developed for NAEP were extended to focus on new target populations and domains, as well as to link assessments over time, across countries and language versions, and between modes of delivery. Chapter 10 discusses longitudinal studies and related methodological issues. This contribution by Donald A. Rock reviews the series of important longitudinal investigations undertaken by ETS, showing the need to carefully consider assump- tions made in such studies and in the vertical linking of tests associated with them. The approaches taken offer solutions to methodological challenges associated with the measurement of growth and the interpretation of vertical scales. Besides the book’s opening chapter, the only other contribution not written for the current volume is Chap. 11, by Samuel Ball. The chapter was originally pub- lished by ETS in 1979. It describes the pinnacle of work at ETS on large program evaluation studies, including the classic investigation of the effects of Sesame Street. We include it because Ball directed much of that work and his account offers a unique, firsthand perspective. Part II concludes with Chap. 12, a review by Richard J. Coley, Margaret E. Goertz, and Gita Z. Wilder of the extensive set of projects, primary and secondary analyses, and syntheses produced on education policy. Those endeavors have ranged widely, from school finance analyses to help build more equitable funding approaches to uncovering the complex of factors that contribute to achievement gaps. Part III concerns ETS’s contributions to research in psychology and consists of three chapters. ETS’s work in personality, social, cognitive, and developmental psy- chology was extensive. Much of it, particularly in the early years, centered on the- ory development, as well as on the invention and improvement of assessment methodology. The first two chapters, Chap. 13 by Lawrence J. Stricker and Chap. 14 by Nathan Kogan, respectively, cover the very broad range of investigations con- ducted in cognitive, personality, and social psychology. Chap. 15 by Nathan Kogan, Lawrence J. Stricker, Michael Lewis, and Jeanne Brooks-Gunn documents the large and extremely productive research program around the social, cognitive, and psy- chological development of infants and children. The final part concerns contributions to validity. Chapter 16 by Michael Kane and Brent Bridgeman reviews ETS staff members’ seminal work on validity theory and practice, most notably that of Samuel Messick. Donald E. Powers’ Chap. 17 delves into the historically contentious area of special test preparation, an activity that can threaten or enhance the validity of test scores. The part closes with Isaac I. Bejar’s Chap. 18, a wide-ranging examination of constructed-response formats, with special attention to their validity implications. We end the book with Chap. 19, a synthesis of the material covered. This book would not have been possible without the vision of Henry Braun, who suggested the need for it; Ida Lawrence, who supported it; Lawrence J. Stricker, whose contributions of time, effort, advice, and thought were invaluable; and the authors, who gave of their time to document the accomplishments of their col- leagues, past and present. Also invaluable was the help of Kim Fryer, whose edito- rial and managerial skills saw the project to a successful completion. Preface x We hope that this collection will provide a review worthy of the developments that took place over the past seven decades at ETS. Princeton, NJ, USA Randy E. Bennett September 2017 Matthias von Davier References Brennan, R. L. (Ed.). (2006). Educational measurement (4th ed.). Westport: Praeger. Lane, S., Raymond, M. R., & Haladyna, T. M. (Eds.). (2015). Handbook of test development (2nd ed.). New York: Routledge. van der Linden, W. J. (Ed.). (2016). Handbook of item response theory: Models, statistical tools, and applications (Vols. 1–3). Boca Raton: Chapman & Hall/CRC. Preface xi Contents 1 What Does It Mean to Be a Nonprofit Educational Measurement Organization in the Twenty-First Century? ................ 1 Randy E. Bennett Part I ETS Contributions to Developing Analytic Tools for Educational Measurement 2 A Review of Developments and Applications in Item Analysis ....................................................................................... 19 Tim Moses 3 Psychometric Contributions: Focus on Test Scores ............................. 47 Tim Moses 4 Contributions to Score Linking Theory and Practice ......................... 79 Neil J. Dorans and Gautam Puhan 5 Item Response Theory ............................................................................ 133 James E. Carlson and Matthias von Davier 6 Research on Statistics ............................................................................. 179 Henry Braun 7 Contributions to the Quantitative Assessment of Item, Test, and Score Fairness ......................................................................... 201 Neil J. Dorans Part II ETS Contributions to Education Policy and Evaluation 8 Large-Scale Group-Score Assessment ................................................... 233 Albert E. Beaton and John L. Barone xii 9 Large-Scale Assessments of Adult Literacy .......................................... 285 Irwin Kirsch, Mary Louise Lennon, Kentaro Yamamoto, and Matthias von Davier 10 Modeling Change in Large-Scale Longitudinal Studies of Educational Growth: Four Decades of Contributions to the Assessment of Educational Growth ............................................ 311 Donald A. Rock 11 Evaluating Educational Programs ........................................................ 341 Samuel Ball 12 Contributions to Education Policy Research ........................................ 363 Richard J. Coley, Margaret E. Goertz, and Gita Z. Wilder Part III ETS Contributions to Research in Scientific Psychology 13 Research on Cognitive, Personality, and Social Psychology: I ............ 391 Lawrence J. Stricker 14 Research on Cognitive, Personality, and Social Psychology: II .......... 413 Nathan Kogan 15 Research on Developmental Psychology ............................................... 453 Nathan Kogan, Lawrence J. Stricker, Michael Lewis, and Jeanne Brooks-Gunn Part IV ETS Contributions to Validity 16 Research on Validity Theory and Practice at ETS ............................... 489 Michael Kane and Brent Bridgeman 17 Understanding the Impact of Special Preparation for Admissions Tests ................................................................................ 553 Donald E. Powers 18 A Historical Survey of Research Regarding Constructed-Response Formats ............................................................. 565 Isaac I. Bejar 19 Advancing Human Assessment: A Synthesis Over Seven Decades ................................................................................ 635 Randy E. Bennett and Matthias von Davier Author Index .................................................................................................... 689 Subject Index ................................................................................................... 703 Contents xiii About the Editors Randy E. Bennett is the Norman O. Frederiksen chair in assessment innovation in the Research and Development Division at Educational Testing Service in Princeton, NJ. Bennett’s work has focused on integrating advances in cognitive science, technology, and educational measurement to create approaches to assessment that have positive impact on teaching and learning. From 1999 through 2005, he directed the NAEP Technology-Based Assessment Project, which included the first adminis- tration of computer-based performance assessments with nationally representative samples of school students and the first use of “clickstream,” or logfile, data in such samples to measure the processes used in problem-solving. From 2007 to 2016, he directed an integrated research initiative titled Cognitively Based Assessment of, for, and as Learning ( CBAL ® ), which focused on creating theory–based summative and formative assessment intended to model good teaching and learning practice. Randy Bennett is the president of the International Association for Educational Assessment (IAEA) (2016–), an organization primarily constituted of governmental and non- governmental nonprofit measurement organizations throughout the world, and pres- ident of the National Council on Measurement in Education (NCME) (2017–2018), whose members are individuals employed primarily in universities, testing organi- zations, state education departments, and school districts. He is a fellow of the American Educational Research Association. Matthias von Davier is a distinguished research scientist at the National Board of Medical Examiners (NBME), in Philadelphia, PA. Until 2016, he was a senior research director in the Research and Development Division at Educational Testing Service (ETS) and codirector of the Center for Global Assessment at ETS, leading psychometric research and operations of the center. He earned his Ph.D. at the University of Kiel, Germany, in 1996, specializing in psychometrics. In the Center for Advanced Assessment at NBME, he works on psychometric methodologies for analyzing data from technology-based high-stakes assessments. He is one of the editors of the Springer journal Large-Scale Assessments in Education , which is jointly published by the International Association for the Evaluation of Educational Achievement (IEA) and ETS. He is also editor in chief of the British Journal of xiv Mathematical and Statistical Psychology (BJMSP) and coeditor of the Springer book series Methodology of Educational Measurement and Assessment . Dr. von Davier received the 2006 ETS Research Scientist Award and the 2012 NCME Bradley Hanson Award for contributions to educational measurement. His areas of expertise include topics such as item response theory, latent class analysis, diagnos- tic classification models, and, more broadly, classification and mixture distribution models, computational statistics, person-fit statistics, item-fit statistics, model checking, hierarchical extension of models for categorical data analysis, and ana- lytical methodologies used in large-scale educational surveys. About the Editors 1 © Educational Testing Service 2017 R.E. Bennett, M. von Davier (eds.), Advancing Human Assessment , Methodology of Educational Measurement and Assessment, DOI 10.1007/978-3-319-58689-2_1 Chapter 1 What Does It Mean to Be a Nonprofit Educational Measurement Organization in the Twenty-First Century? Randy E. Bennett The philosopher George Santayana (1905) said, “Those who cannot remember the past are condemned to repeat it” (p. 284). This quote is often called, “Santayana’s Warning,” because it is taken to mean that an understanding of history helps avoid having to relive previous mistakes. But the quote can also be read to suggest that, in order to make reasoned decisions about the future, we need to be always cognizant of where we have come from. This claim is especially true for a nonprofit organiza- tion because its continued existence is usually rooted in its founding purposes. This chapter uses Educational Testing Service (ETS), the largest of the nonprofit educational measurement organizations, to illustrate that claim. The chapter is divided into four sections. First, the tax code governing the establishment and oper- ation of educational nonprofits is reviewed. Second, the history around the founding of ETS is described. Third, the implications of ETS’s past for its future are dis- cussed. Finally, the main points of the paper are summarized. This chapter was originally published in 2005 by Educational Testing Service. R.E. Bennett ( * ) Educational Testing Service, Princeton, NJ, USA e-mail: rbennett@ets.org 2 1.1 What Is an Educational Nonprofit? The term nonprofit refers to how an organization is incorporated under state law. To be federally tax exempt, an educational nonprofit must become a 501(c)3 corpora- tion. 1, 2 What is 501(c)3? It is a very important section in the Internal Revenue Code. The section is important because of what it does, and does not, allow educational nonprofits to do, as well as because of how the section came about. Section 501(c)3 exempts certain types of organizations from federal income tax. 3 To qualify, an organization must meet certain discrete “tests.” The tests are the orga- nizational test, operational test, inurement test, lobbying restriction, electioneering prohibition, public benefit test, and public policy test (Harris 2004). Each of these tests is briefly reviewed in turn. Under the Internal Revenue Code, to be exempt, an organization must be set up exclusively for one or more of the following purposes: charitable, religious, educa- tional, scientific, literary, testing for public safety, fostering amateur national or international sports competition, or the prevention of cruelty to children or animals (Internal Revenue Service [IRS] 2003b). An entity meets this organizational test if its articles of incorporation limit its function to one or more exempt purposes (e.g., educational) and do not expressly allow the organization to engage, other than insubstantially, in activities that are not consistent with those purposes. ETS’s exempt purpose is educational and its organizing documents specify the activities it can pursue in keeping with that purpose. Paraphrasing the 2005 revision of the organization’s charter and bylaws (ETS 2005), those activities are to: • Conduct educational testing services, • Counsel test users on measurement, • Serve as a clearinghouse about research in testing, • Determine the need for, encourage, and carry on research in major areas of assessment, • Promote understanding of scientific educational measurement and the mainte- nance of the highest standards in testing, • Provide teachers, parents, and students (including adults) with products and ser- vices to improve learning and decisions about opportunities, • Enhance educational opportunities for minority and educationally disadvantaged students, and • Engage in other advisory services and activities in testing and measurement from time to time. 1 For purposes of this chapter, nonprofit and 501 ( c)3 corporation are used to mean the same thing, even though they are legally different. 2 The terms nonprofit and not-for-profit are not legally distinct, at least not in the Internal Revenue Code. 3 The IRS has 27 types of organizations that are tax exempt under 501(c), only one of which covers those institutions exempt under 501(c)3 (Internal Revenue Service [IRS] 2003a). R.E. Bennett 3 To meet the second—or operational—test, the organization must be run exclu- sively for one or more of the exempt purposes designated in its articles. The test is met if the organization’s stated purpose and activities conform. Although Section 501(c)3 indicates that the organization must be operated “exclusively” for exempt purposes, the term exclusively has been interpreted by the IRS to mean “primarily” or “substantially.” Thus, Section 501(c)3 does allow exempt organizations to engage in activities un related to their exempt purposes (IRS 2000). But those activities must not become “substantial” and tax must be paid on this unrelated business income. Note that the operational test makes clear that engaging in unrelated activities to support the exempt purpose is, in itself, a nonexempt purpose , if it is done any more than in substantially (IRS n.d.-a). 4 To prevent such unrelated activities from becom- ing so substantial that they threaten tax-exempt status—as well as to allow outside investment and limit liability—an exempt organization may create for-profit subsidiaries. 5 The inurement test is often cited as the fundamental difference between for-profit and nonprofit corporations. This test says that no part of the organization’s net earn- ings may benefit any private individual. For example, there may be no stockholders and no distribution of net earnings, as in a dividend. The lobbying restriction and electioneering prohibition mean, respectively, that no significant part of an organization’s activities may consist of “carrying on propa- ganda or otherwise attempting to influence legislation...” and that an exempt orga- nization may not participate or intervene in any political campaign for or against any candidate for public office (IRS n.d.-b). 6 Unlike the lobbying restriction, the electioneering prohibition is absolute. To meet the public benefit test, the organization must operate for the advantage of public, rather than private, interests. 7 Private interests can be benefited, but only incidentally. Further, the principal beneficiaries of the organization’s activities must be sufficiently numerous and well-defined, so that the community is, in some way, served. Finally, there is the public policy test, which essentially says that an otherwise qualifying organization’s “...purpose must not be so at odds with the common com- munity conscience as to undermine any public benefit that might otherwise be con- ferred” ( Bob Jones University v. United States 1983). The quintessential example is Bob Jones University, which lost its tax-exempt status as a result of racially 4 There does not appear to be a statutory or regulatory definition of substantial . However, experts in nonprofit tax law often advise limiting gross unrelated business income to under 20% of gross revenue (e.g., FAQ’s—501(c)(3) Status (n.d). 5 The Chauncey Group International would be one example from ETS’s history. 6 The dollar limits associated with the lobbying restriction are defined by a relatively complex formula. See Restrictions on Nonprofit Activities (n.d.). Also see IRS (n.d.-c). 7 This test differs from the inurement test in that the inurement test applies only to insiders—per- sons having a private interest in the organization’s activities—whereas the “private interests” cited in the public benefit test apply more generally. 1 What Does It Mean to Be a Nonprofit Educational Measurement Organization... 4 discriminatory practices that the IRS believed, and the Supreme Court affirmed, violated fundamental public policy. 8 Organizations set up for educational purposes under 501(c)3 have several addi- tional requirements (IRS 2003b). First, the “positions” they take must be educa- tional. According to the IRS, “Advocacy of a particular position ... may be educational if there is a sufficiently full and fair exposition of pertinent facts to permit an individual or the public to form an independent opinion or conclusion” (IRS 2003b, p. 25). Also, the method used by an organization to develop and present its views is a factor in determining if the organization is “educational.” What constitutes an “educational” method? The IRS says that the method is not educational when: 1. The presentation of viewpoints unsupported by facts is a significant part of the organization’s communications. 2. The facts that purport to support the viewpoints are distorted. 3. The organization’s presentations express conclusions more on the basis of emo- tion than objective evaluation. (IRS 2003b, p. 25) That, then, is what 501(c)3 is about. But why did Congress decide to grant tax exemptions to certain organizations in the first place, thereby forgoing huge amounts of future revenue? The statutory roots of 501(c)3 are commonly traced to the Tariff Act of 1894, which imposed a corporate income tax and exempted entities organized and con- ducted solely for charitable, religious, or educational purposes from having to pay it (Scrivner 2001). The congressional intent behind the exemption was to give pref- erential treatment because such organizations provided a benefit to society. Congress reaffirmed this view in the Revenue Act of 1938 when it said that tax exemption was based on the theory that the loss of revenue is compensated by relieving the govern- ment of a function it would otherwise have to perform (presumably because the for-profit sector would not, or should not be allowed to, perform it) and because of the benefits to the general welfare that the function would serve ( Bob Jones University v. United States 1983). 9 The Revenue Act of 1950 added unrelated business income tax rules, which were intended to eliminate unfair competition by taxing the unrelated activities of exempt 8 The IRS revoked the tax-exempt status of Bob Jones University in 1975 even though the school had not violated any provision of 501(c)(3). The IRS revoked its tax-exempt status because the university had, on the basis of religious belief, at first, refused admission to Black students, and then only to Black students married within their own race. It then admitted Black students gener- ally but enforced strict rules, including expulsion, against interracial dating. The university sued when its exempt status was revoked. The Supreme Court upheld the IRS decision by an 8–1 vote. 9 Why shouldn’t the for-profit sector supply some services? Because the need for profit may come into direct conflict with the intended public benefit behind the service. Some services require a disinterested party. See, for example, the inurement test, the lobbying restriction, and the election- eering prohibition, which are intended to distance the service provider from self-interest that could otherwise affect the provision of the service. Occupational and professional licensing and certifica- tion, which is often handled by private nonprofit associations, would be an example. R.E. Bennett 5 organizations in the same way as competing for-profit corporations were taxed (Scrivner 2001). The Internal Revenue Code of 1954 was a restructuring to the cur- rent numbering, which resulted in the section known today as “Section 501(c)3.” Finally, the 1959 Regulations for the 1950 Act and the 1954 Code defined charity to more closely approach the English common-law definition (Scrivner 2001). That is, not only the relief of poverty, but also the advancement of education, religion, and other purposes beneficial to the community. So, legally, many 501(c)3 organizations like ETS are, in fact, “public charities.” 10 To summarize, in the words of the majority opinion rendered by the U.S. Supreme Court in Bob Jones University v. United States (1983), “In enacting ... 501(c)3, Congress sought to provide tax benefits to charitable organizations to encourage the development of private institutions that serve a useful public purpose or supplement or take the place of public institutions of the same kind.” Thus, Section 501(c)3 has its roots in the idea that the government might not be able to provide all the services the public needs, that the for-profit sector might not fill the gap, and that those organizations that do voluntarily address such social needs should be compensated through tax exemption. How did ETS come to be a 501(c)3? The reasons for that lie fundamentally in how ETS came about. That story begins at the end of the nineteenth century, just prior to the establishment of the College Entrance Examination Board. 1.2 Where Did ETS Come From? Prior to the founding of the College Entrance Examination Board (CEEB), admis- sion to college and university in the United States was a disorganized, if not chaotic process (Fuess 1950). The Ivy League institutions each administered their own tests, which varied widely in subjects assessed, quality, and administration date. Wilson Farrand, principal of Newark Academy, summarized the disarray in entrance requirements as follows (cited in Fuess 1950, p. 17): Princeton requires Latin of candidates for one co