Excerpt for A Brief History of Student Learning Assessment: How We Got Where We Are and a Proposal for Where to Go Next

Richard J. Shavelson, with a forward by Carol Geary Schneider and Lee S. Shulman

Untitled Document

Forward

Many of us in higher education today are thinking hard about assessment. But often our cogitations tend toward what can only be called wishful thinking.

A wish that has no doubt washed over all of us at one time or another is that if we simply ignore assessment, or hold it off long enough, the issues (like the misguided new administrator) will finally give up and go away. But in our wiser moments, we know that this is not an answer. Indeed, as is clear from Richard Shavelson's lively tour of the twists and turns of assessment over the past century, the names may change, and the technology has evolved, but assessment has stayed with us with great persistence. "Today's demand for a culture of evidence of student learning appears to be new" Shavelson tells us, but it turns out "to be very old," and there's no wishing it away. Moreover, we should not be eager to wish it away.

Nor is there a magic bullet. One of the most dangerous and persistent myths in American education is that the challenges of assessing student learning will be met if only the right instrument can be found--the test with psychometric properties so outstanding that we can base high-stakes decisions on the results of performance on that measure alone.

This wish is not only self-indulgent but self-defeating. Ironically, no test can possess such properties because, to achieve validity, test designers have to narrow the focus on any particular instrument to a sobering degree. Thus, the better the arguments we can make regarding the validity of any given measure--whether of knowledge, skills, or some other virtue--the less appropriate that measure is as the basis for consequential decisions about a student's overall learning gains, much less as the sole determinant of an institution's educational quality.

Thinking of assessment as primarily a technical challenge--though certainly it is that, as Shavelson's analysis also makes clear--is another form of wishful thinking. The far-reaching questions raised through assessment cannot be solved through technical ingenuity alone.

What's needed, of course, is educational thinking, and happily there has been a good deal of that in the past two decades of assessment activity.With the wave of state mandates for assessment in the mid and later 1980s, and new accreditation requirements in the 1990s, campuses began to organize themselves to respond. Many did so grudgingly, and there were plenty of missteps, misunderstandings, and dead ends. But there were also wonderful examples of what can happen when educators take up the challenge to figure out and clearly articulate what they want their students to know and be able to do: the core task of assessment. Many of these efforts were funded by the U.S. Department of Education's Fund for the Improvement of Postsecondary Education, and the results, in turn, provided models and momentum for additional campuses that came together--hundreds of them--at the assessment forum led by the American Association for Higher Education until 2005, when that organization closed its doors.

Toward the end of his essay, Shavelson makes a crucial point that campuses committed to assessment know well: that assessment all by itself is an insufficient condition for powerful learning and improvement. Of course, more and better evidence of student learning is important, but knowing what to make of that evidence, and how to act on it, means getting down to core questions about the character of the educational experience and the goals of liberal learning. These are not questions that higher education can dare leave to the testing companies or to external agencies, no matter how well intentioned and enlightened.

This view of assessment has become central to the work of both the Carnegie Foundation for the Advancement of Teaching and the Association of American Colleges and Universities (AAC&U). Shavelson traces Carnegie's work over the first part of the twentieth century as a story about standards and standardization. But the needs are different today, and Carnegie's more recent work places much greater emphasis on the role of faculty in exploring what our students do--and don't--learn. The foundation's extensive work on the scholarship of teaching and learning, for instance, has helped fuel a movement in which "regular" faculty, across the full spectrum of disciplines and institutional types, are treating their classrooms and programs as laboratories for studying student learning in order to improve it. Seen through the lens of classroom inquiry, assessment is a feature of the pedagogical imperative in which faculty see themselves as responsible for the learning of their students and for deepening our collective sense of the conditions in which important forms of learning can occur.

Through its Liberal Education and America's Promise (LEAP) initiative, AAC&U is working with its member campuses to develop assessments that strengthen students' learning and assess their best work rather than just the attainment of a few narrowly defined foundational skills and/or basic knowledge. As AAC&U's board of directors put it in their official statement on assessment (2004, 3), colleges and universities should hold themselves "accountable for assessing [their] students' best work, not generic skills and not introductory levels of learning."

In its recently released LEAP report, College Learning for the New Global Century, AAC&U recommends tying assessment efforts much more closely to the curriculum and to faculty priorities for student learning across the curriculum. The report affirms, as well, that any national assessment measure, however well developed, is only part of the solution to the viii problem of underachievement. As the report notes, "standardized tests that stand outside the regular curriculum are, at best, a weak prompt to needed improvement in teaching, learning, and curriculum. Tests can, perhaps, signal a problem, but the test scores themselves do not necessarily point to where or why the problem exists or offer particulars as to solutions" (2007, 40).

A fuller strategy, the LEAP report proposes, would prepare students to produce a substantial body of work--capstone projects and/or portfolios--that require their best efforts. The resulting accomplishments should be assessed for evidence of students' competence on liberal education outcomes such as analytical reasoning and integrative learning, as well as their achievement in their chosen fields. Standardized assessments can then fill out the emerging picture, providing the ability to benchmark accomplishment against peer institutions, at least on some aspects of student learning.

As the LEAP report notes, "however the assessments are constructed . . . the framework for accountability should be students' ability to apply their learning to complex problems. Standards for students' expected level of achievement also will vary by field, but they should all include specific attention to the quality of the students' knowledge, their mastery of key skills, their attentiveness to issues of ethical and social responsibility, and their facility in integrating different parts of their learning" (2007, 41?42).

Richard Shavelson offers an important historical context to consider as institutions across the country continue to develop new methods of assessment in response to renewed calls for greater accountability and, more importantly, the urgent need to raise levels of student achievement. He helps us better understand the "state of the art" in standardized testing today, and what we should ask from testing agencies in the future. Above all, he helps us understand why psychometricians themselves are so opposed to any efforts at institutional ranking or comparisons based on standardized tests.

We are grateful to Richard Shavelson for taking the time to put current debates in a larger historical and educational forum. Everyone who is thinking today about assessment and public accountability will benefit greatly from the insights this study provides.

Carol Geary Schneider President, Association of American Colleges and Universities

Lee S. Shulman President, the Carnegie Foundation for the Advancement of Teaching

Introduction

Over the past thirty-five years, state and federal policy makers, as well as the general public, have increasingly been pressuring higher education to account for student learning and to create a culture of evidence. While virtually all states already use proxies (e.g., graduation rates) to report on student performance, states are now being pressured to measure learning directly. U.S. Secretary of Education Margaret Spellings's Commission on the Future of Higher Education, for example, has called for standardized tests of students' critical thinking, problem solving, and communication skills.

While the current demand to establish a culture of evidence appears to be new, it has a long lineage. The future development of this culture may very well depend on how well we appreciate the past. Cultures of evidence will not automatically lead to educational improvement, if what counts as evidence does not count as education. Narrow definitions and narrow tests of what count as learning outcomes in college may very well distort the culture of evidence we seek to establish. As we shall see from the past, and as we know from current studies (Immerwahr 2000; AAC&U 2007), there is more to be learned and assessed in higher education than the broad abilities singled out by the Spellings Commission for measurement by standardized tests. These additional outcomes include learning to know, understand, and reason in an academic discipline. They also include personal, civic, moral, social, and intercultural knowledge and actions--outcomes the Educational Testing Service has described as "soft." Some experts say that such "soft" outcomes cannot be measured adequately because "the present state of the art in assessing these skills is not adequate for supporting the institution of a nationwide set of standardized measures" (Dwyer, Millett, and Payne 2006, 20). But this position is unsatisfactory. This set of outcomes--which, following the lead of the Association of American Colleges and Universities, I will call personal and social responsibility (PSR) skills--are every bit as demanding as the academic skills that often get labeled exclusively as the cognitive skills and are too important not to be measured. If we do not measure PSR skills, they will drop from sight as accountability pressures force campuses to focus on a more restricted subset of learning outputs that can be more easily and less expensively measured.

The outcomes framework depicted in figure 1(omitted in this online excerpt) demonstrates the importance of extending the range of outcomes we assess beyond broad abilities. Such outcomes could range from the development of specific factual, procedural, and conceptual knowledge and reasoning in a discipline (such as history) to the development of the skills on which the Spellings Commission focused (critical thinking, problem solving, and communication), the development of reasoning applicable to a very wide variety of situations, or the development of intelligence. Moreover, "cognitive" outcomes include PSR skills insofar as reasoning and thinking are involved in personal relations, moral challenges, and civic engagement. The PSR skills are not so soft; they involve cognition and more, as do academic skills. Finally, the arrows in figure 1 remind us that general abilities influence the acquisition of knowledge in concrete learning environments, that direct experiences are the stuff on which reasoning and abstract abilities are developed, and that cognitive performance on academic and PSR skills is influenced by the interaction of individuals' accumulated experience in multiple environments with their inheritance.

Furthermore, the standardized tests that the Spellings Commission and others have in mind for outcomes assessment are not interchangeable. There are different ways to measure student learning; some standardized tests focus only on a narrow slice of achievement, while others focus on broader abilities developed over an extended course of study. Especially for higher education, the different assumptions about what ought to be measured that are embedded in every assessment instrument need to be clarified and carefully considered before specific tests are chosen to assess students' cumulative gains from college study.

The multiple-choice technology developed almost a hundred years ago, for example, is inherently limited when it comes to measuring the full array of student learning outcomes depicted in figure 1. Multiple-choice measures have a long history, as we shall see. They are the basis of the standardized tests that are often used today, including the Collegiate Assessment of Academic Proficiency (CAAP), the Measure of Academic Proficiency and Progress (MAPP), and the College Basic Academic Subjects Examination (CBASE). (The MAPP was recommended by the Spellings Commission as a way of assessing student learning in college.) But these measures are limited in their ability to get at some of the more complex forms of reasoning and problem solving that are commonly viewed as distinctive strengths of American higher education.

If the learning outcomes of higher education are narrowly measured because cost, capacity, and convenience dictate reductive choices, then we stand the risk of narrowing the mission and diversity of the American system of higher education, as well as the subject matter taught. What we need to do instead is to learn from the rich history of student learning assessment and take responsible steps to develop and measure the learning outcomes our nation values so highly.