When is performance assessment most appropriate
The assessment itself can be an individual or group project, a portfolio with potentially one or more pieces foregrounded or an open-ended response exercise. The creation process of the work is then graded according to a set of pre-agreed criteria or a checklist, shared with the student in advance. Standardized testing is becoming increasingly outdated in K—12 contexts, according to a report published jointly by the Massachusetts Consortium for Innovative Education Assessment and the Center for Collaborative Education.
This kind of traditional testing exacerbates socioeconomic differences while failing to properly assess skills pre-higher education. If this kind of shift in assessment is truly underway for the freshmen of the future, performance assessment is worth considering sooner rather than later.
Performance assessment looks at higher-order thinking skills and problem-solving abilities. Other features like time management and clear communication are also tested in these kinds of assessments. This ultimately leads to a deeper and more meaningful learning process. High-stakes standardized testing evaluates whether students know enough about a subject.
Performance assessments, on the other hand, measure whether students can apply the knowledge appropriately in various contexts. If interim goals are created and applied correctly, performance assessments allow students to monitor themselves. This type of metacognition, particularly in a test environment, is enormously beneficial to higher-level student learning.
Any instructors who use performance assessments need to include the standards they expect and the steps that they must take in applying the knowledge in the curriculum.
Performance assessments go hand-in-hand with modern teaching strategies like active learning and critical thinking. The educator sets a task for which there is more than one route to completion or a complex problem to tackle with considerable leeway for interpretation.
These standards complement the subject-area standards devel-. States have attempted to deal with the variability in students' English proficiency by developing policies to exempt students with limited English proficiency from statewide tests. But the criteria vary among the states. In most cases, the time the student has spent in the United States is the determining factor; in others, the time the student has spent in an English-as-a-second-language program has governed such decisions.
However, some have argued that time is not the critical factor and instead have recommended that students demonstrate language proficiency before states and districts determine whether they will participate in assessments. A few states use such determinations, formally or informally Council of Chief State School Officers, In addition to exempting English-language learners from tests, most states permit some form of accommodations for such students. The most common accommodations are in presentation, such as repeating directions, having a familiar person administer the test, and reading directions orally; in timing, such as extending the length of the testing period and permitting breaks; and in setting, such as administering tests in small groups or in separate rooms.
A few states also permit modifications in response format, such as permitting students to respond in their native language. In addition to the modifications, 11 states also have in place alternate assessments for English-language learners. Most commonly these alternatives take the form of foreign-language versions of the test. The second-language versions are not simple translations, however.
Translations would not capture idioms or other features unique to a language or culture. Second-language assessments are controversial. Since the purpose of the test is to measure students' knowledge and skills in content areas, many states have provided alternate assessments in subjects other than English; to test English ability, states have continued to rely on English-language assessments.
The voluntary national test proposed by President Clinton would follow a similar policy; some districts that had agreed to participate pulled out after they realized that the fourth grade reading test would be administered only in English.
As with accommodations for students with disabilities, the research on the effects of test accommodations for English-language learners is inconclusive. It is not always clear, for example, that different versions of tests in different languages are in fact measuring the same things National Research Council, b.
Moreover, attempts to modify the language of tests—for example, simplifying directions—have not always made English-language tests easier to understand Abedi, One recent study of the effects of accommodations in a large-scale testing program, the state assessment in Rhode Island, found that the state's efforts to. However, the study concluded that the effects of the accommodations are uncertain, and that they may not work as intended Shepard et al.
Are valid and reliable measures used to evaluate the level of students' proficiency in English? Are clear guidelines in place for accommodations that permit English-language learners to participate in assessments administered for accountability. Is there evidence that the assessment, even with accommodations, cannot measure the knowledge or skill of particular students or groups of students before alternate assessments are administered?
Are assessments provided in languages other than English when the numbers of students who can take such assessments is sufficiently large to warrant their use? Assessments for English-language learners should follow the same criteria used for assessments generally, which were described above.
In addition, such assessments should also meet additional criteria based on the unique problems associated with testing English-language learners. The committee recommends that, in developing an assessment system for English-language learners, states and districts adhere to the following criteria:. The assessments should provide a means of including all students; they should be exempt only when assessments, even with accommodations, do not yield valid and reliable information about students' knowledge and skills.
The following examples show the practices of a district and a state that have clear policies for including English-language learners in assessments. Both use measures of English-language proficiency to determine whether students can take part in the regular assessment or use a native-language test or an accommodation. Both disaggregate test results to show performance of English-. The tests are used for both student and school accountability. For students, the 10th grade tests in reading, mathematics, and writing are designed as exit-level tests, which students must pass in order to graduate.
To determine which version of the test students take, language-proficiency assessment committees at each school, consisting of a site administrator, a bilingual educator, an English-as-a-second-language educator, and a parent of a child currently enrolled, make judgments according to six criteria. On the basis of these criteria, the committee determines whether a student is tested on the English-language TAAS, tested on the Spanish-language TAAS, or is exempted and provided an alternate assessment.
Those entering U. The results for students who take the Spanish TAAS or for those who are exempted are not included in the totals used for accountability purposes; however, the Spanish-language results are reported for each school. In , 2. In Philadelphia, the district administers the Stanford Achievement Test-9th Form SAT-9 as part of an accountability system; the results are used, along with attendance rates, to determine whether schools are making adequate progress in bringing students toward district standards.
The district also administers the Spanish-language version of the SAT-9, known as Aprenda, in reading and mathematics. To determine how students should be tested, the district measures the students' English-language proficiency. The district has used the Language Assessment Scales LAS , a standard measure that gauges proficiency on a four-point scale; more recently, district educators have developed their own descriptors of language proficiency.
The district is currently conducting research to compare the locally developed descriptors with the LAS. Students at the lowest level of proficiency—those who are not literate in their native language—are generally exempted from the. SAT-9, as are recently arrived immigrants who are level 2 beginner.
Those in the middle levels of proficiency, level 2 beginner and level 3 intermediate , who are instructed in bilingual programs, are administered Aprenda in reading and mathematics, and a translated SAT-9 open-ended test in science.
Those in levels 2 and 3 who are not in bilingual programs take the SAT-9 with accommodations. Those at level 4 advanced take the SAT-9 with appropriate accommodations. Accommodations include extra time; multiple shortened test periods; simplification of directions; reading aloud of questions for mathematics and science ; translation of words and phrases on the spot for mathematics and science ; decoding of words upon request not for reading ; use of gestures and nonverbal expressions to clarify directions and prompts; student use of graphic organizers and artwork; testing in a separate room or small-group setting; use of a study carrel; and use of a word-match glossary.
All students who take part in the assessment are included in school accountability reports. Those who are not tested receive a score of zero. For schools eligible for Title I schoolwide status those with high proportions of low-income students , the district is pilot-testing a performance assessment in reading and mathematics. The performance assessment may become part of the district's accountability system. Students at all levels of English proficiency participate in the performance assessment, with accommodations National Research Council, a.
In many ways, reporting the results of tests is one of the most significant aspects of testing and assessment. Test construction, item development, and scoring are means of gathering information.
It is the information, and the inferences drawn from the information, that makes a difference in the lives of students, parents, teachers, and administrators. The traditional method of reporting test results is in reference to norms; that is, by comparing student performance to the performance of a national sample of students, called a norm group, who took the same test.
Norm-referenced test scores help provide a context for the results by showing parents, teachers, and the public whether student performance is better or worse than that of others. This type of reporting may be useful for making selection decisions. Norm-referenced reporting is less useful for providing information about what students know or are able to do. To cite a commonly used analogy, norm-referenced scores tell you who is farther up the mountain; they do not tell you how far anyone has climbed.
For that type of information, criterion-referenced, or standards-referenced, reports are needed. These types of reports compare. However, the type of report a test is intended to produce influences how it is designed. Tests designed to produce comparative scores generally omit items that nearly all students can answer or those that nearly all students cannot answer, since these items do not yield comparisons. Yet such items may be necessary for a standards-referenced report, if they measure student performance against standards.
Some of the ways test results are reported confound the distinction between norm-referenced and standards-referenced reporting. Because of the interest among policy makers and the public for both types of information—information about comparative performance and performance against standards—several states combine standards-based reports with norm-referenced reports; similarly, states participate in the National Assessment of Educational Progress to provide comparative information as well.
The law also requires states to set at least three levels of achievement: proficient, advanced, and partially proficient. However, the law leaves open the possibility that states can provide norm-referenced information as well. Reporting results from tests according to standards depends first on decision rules about classifying students and schools.
Creating those decision rules is a judgmental process, in which experts and lay people make decisions about what students at various levels of achievement ought to know and be able to do Hambleton, One group's judgments may differ from another's.
As a result, reports that indicate that a proportion of students are below the proficient level—not meeting standards—may not reflect the true state of student achievement. Another process may suggest that more students have in fact met standards Mills and Jaeger, The experience of the National Assessment Governing Board NAGB in setting achievement levels for the National Assessment of Educational Progress illustrates the challenges in making valid and reliable judgments about the levels of student performance.
Students who performed at the basic level could perform tasks intended to demonstrate proficient achievement, for example. Moreover, researchers have found that the overall levels appear to have been set too high, compared with student performance on other measures.
One issue surrounding the use of achievement levels relates to the precision of the estimates of the proportions of students performing at each level. The risk of misclassification is particularly high when states and districts use more than one cutscore, or more than two levels of achievement, as NAEP does Ragosa, However, other efforts have shown that it is possible to classify students' performance with a relatively high degree of accuracy and consistency Young and Yoon, In any case, such classifications always contain some degree of statistical uncertainty; reports on performance should include data on the level of confidence with which the classification is made.
Another problem with standards-based reporting stems from the fact that tests generally contain relatively few items that measure performance against particular standards or groups of standards. While the test overall may be aligned with the standards, it may include only one or two items that measure performance on, say, the ability to identify the different types of triangles. Because student performance can vary widely from item to item, particularly with performance items, it would be inappropriate to report student results on each standard Shavelson et al.
As a result, reports that may be able to indicate whether students have attained standards can seldom indicate which standards students have attained. This limits their instructional utility, since the reports can seldom tell teachers which topic or skill a student needs to work on.
The challenges of reporting standards-based information are exacerbated with the use of multiple indicators. In some cases, the results for a student on two different measures could be quite different.
For example, a student may perform well on a reading comprehension test but perform poorly on a writing assessment. This is understandable, since the two tests measure different skills; however, the apparent contradiction could appear confusing to the public National Research Council, b. In an effort to help avoid such confusion and provide an overall measure of performance, many states have combined their multiple measures into a single.
Such indices enable states and districts to serve one purpose of test reporting: to classify schools in order to make judgments about their overall performance. However, the complex formulas states and districts use to calculate such indices make it difficult to achieve a second important purpose of reporting: to send cues about instructional improvement.
Teachers and principals may have difficulty using the index to relate scores to performance or to classroom practices. Is there a way to determine whether the proficient level of achievement represents a reasonable estimate of what students in a good program can attain, over time, with effort? Do reports indicate the confidence interval or probability of misclassification?
Are multiple indicators used for reporting progress toward standards? When these indicators are combined into a single index, are the components of the index and the method used to compute it simultaneously reported? Relation to Standards. Assessment results provide the most useful information when they report student performance against standards. To the extent possible, reports indicating performance against particular standards or clusters of standards provide instructionally useful information.
Reports that show in an understandable way how students performed in relation to standards are useful. Reports that combine information from various sources into a single index should include the more detailed information that makes up the index as well. The reports should state clearly the limits of the information available and indicate the inferences that are appropriate.
It shows a range of information on student performance—including test scores, course taking, and graduation rates—along with contextual information about the qualifications of teachers and the students' background. The test the district uses includes norm-referenced reports rather than standards-referenced reports.
In addition, the report does not indicate the degree of statistical uncertainty of the test scores. Department of Education. Used with permission. In addition to reporting overall data on student performance, states and districts also disaggregate the data to show the performance of particular groups of students. The Title I statute requires states and districts to report the performance of students by race, gender, economic status, and other factors.
This requirement was intended to ensure that states and districts do not neglect disadvantaged students. Disaggregating data helps provide a more accurate picture of performance and makes it possible to use assessment data to improve performance.
For example, one state examined two districts that had vastly different overall rates of performance. But when state officials broke out the data by race and poverty, they found that poor black students performed roughly equally in both districts. This finding suggested that the higher-performing district's overall scores reflected its success with the majority of students, not all students. This kind of information can be quite powerful.
Rather than rest on their laurels, the high-performing district can look for ways to adjust its instructional program for poor black students. That suggests a strategy that might not be apparent if the district looked only at overall results.
In addition, states and districts can use disaggregated results to see the effects of their policies and practices on various groups. It may be, for example, that implementing a new form of assessment without changing the conditions of instruction in all schools could widen the gap in performance between white and black students.
By looking at results for different groups of students, districts and states can monitor the unintended effects of their policies and make needed changes. The idea of disaggregation stems in part from a substantial body of literature aimed at determining the effects of schooling on student performance Raudenbush and Willms, These studies, which examined the variation in school performance after taking into account the background of the students in the schools, found that some schools do a better job than others in educating children, and the researchers have examined the characteristics of successful schools.
However, as Willms points out, despite these findings, states and school districts continue to report misleading information about school performance by publishing overall average test scores, without taking into account the range of performance within a school.
Overall averages can be misleading because the variation in performance within schools is much greater than the variation among schools Willms, That is, to take a hypothetical example, the difference between the performance of white students and black students in School A is much greater than the. Simply reporting the schools' overall performance, without showing the differences within the schools, could lead to erroneous conclusions about the quality of instruction in each school.
And if districts took action based on those conclusions, the remedies might be inappropriate and perhaps harmful. Breaking down assessment results into results for smaller groups increases the statistical uncertainty associated with the results, and affects the inferences drawn from the results.
This is particularly true with small groups of students. For example, consider a school of students, of whom 30 are black. A report that disaggregates test scores by race would indicate the performance of the 30 black students. Although this result would accurately portray the performance of these particular students, it would be inappropriate to say the results show how well the school educates black students.
Another group of black students could perform quite differently Jaeger and Tucker, In addition, states and districts need to be careful if groups are so small that individual students can be identified. A school with just two American Indian students in 4th grade risks violating the students' privacy if it reports an average test score for American Indian students.
Disaggregated results can also pose challenges if results are compared from year to year. If a state tests 4th grade students each year, its assessment reports will indicate the proportion of students in 4th grade in at the proficient level compared with the proportion of 4th graders in at that level.
But the students are not the same each year, and breaking down results by race, gender, and other categories increases the sampling error. Reports that show performance declining from one year to the next may reflect differences in the student population more than differences in instructional practice. Do schools collect and report data on performance of all groups within each school, particularly economically disadvantaged students and English-language learners?
Are there methods for determining the margin of error associated with disaggregated data? Breaking out test results by race, gender, income, and other categories enhances the quality of the data and provides a more complete picture of achievement in a school or district.
In order to enhance the quality of inferences about achievement drawn from the data, states and districts need to reveal the extent of error and demonstrate how that error affects the results. When groups of students are so small that there is a risk of violating their privacy, the results for these groups should not be reported.
The following example describes the practice in a state that disaggregates test data for each school and uses the disaggregated data to hold schools accountable for performance. Under the Texas accountability system, the state rates districts each year in four categories—exemplary, recognized, academically acceptable, and academically unacceptable—and rates schools as exemplary, recognized, acceptable, and low-performing.
The ratings are based on student performance on the state test, the Texas Assessment of Academic Skills, the dropout rate, and the attendance rate. Schools that might have met the requirements for a high rating because of high average performance but fell short because of relatively low performance by students from a particular group have focused their efforts on improving the lagging group's performance—a response that might not have taken place if they had not disaggregated the results.
State education departments and school districts face an important challenge in implementing a new law that requires disadvantaged students to be held to the same standards as other students.
The new requirements come from provisions of the reauthorization of Title I, the largest federal effort in precollegiate education, which provides aid to "level the field" for disadvantaged students. Testing, Teaching, and Learning is written to help states and school districts comply with the new law, offering guidance for designing and implementing assessment and accountability systems. This book examines standards-based education reform and reviews the research on student assessment, focusing on the needs of disadvantaged students covered by Title I.
With examples of states and districts that have track records in new systems, the committee develops a practical "decision framework" for education officials.
The book explores how best to design assessment and accountability systems that support high levels of student learning and to work toward continuous improvement.
Testing, Teaching, and Learning will be an important tool for all involved in educating disadvantaged students—state and local administrators and classroom teachers. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.
Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book. Whenever you gather with other professionals, parents, your supervisor, principal, or members of your school board, use the opportunity to promote the use of some kind of performance assessment in early childhood classrooms.
Copy and distribute the "Benefits of Performance Assessment" see below. Also, remember that whether you use an available system of performance or authentic assessment, or develop a tool of your own, the actual examples of children's work that you have on file, as well as ongoing observations of their individual growth and development, are the strongest possible advocates for performance assessment. Use them to show the depth and breadth of information they contain about each child.
No other approach has so much to say about what the child brings to the learning situation, and what the learning situation brings to the child! Benefits Of Performance Assessment. Performance Assessment By Samuel J. Meisels, Ed. Back to top Components Of Performance Assessment A comprehensive performance assessment system should contain some variation of the following components: Developmental Checklists Checklists covering domains such as language and literacy, mathematical thinking, and physical development, are designed to reflect developmentally appropriate practices.
Advocating For Performance Assessment Whenever you gather with other professionals, parents, your supervisor, principal, or members of your school board, use the opportunity to promote the use of some kind of performance assessment in early childhood classrooms. Benefits Of Performance Assessment A system of developmental checklists, portfolios of children's work, and summary reports, when used together, can help you to: Recognize that children can express what they know and can do in many different ways.
Evaluate progress as well as performance. Evaluate the "whole child. Establish a framework for observing children that is consistent with the principles of child development.
Contribute to meaningful curriculum planning and the design of developmentally appropriate educational interventions.
0コメント