INTERLINK Curriculum Guide
8. Benchmarks
Benchmarks are standards for gauging proficiency. If students are not grouped by proficiency, it is more difficult for them to interact effectively and for teachers to conduct classes, and there is more likelihood of redundancy and gaps in course design. Benchmarks seek to insure that students entering a particular level meet a specified level of proficiency and skill development. Teachers determine if Benchmarks have been reached by daily observation and occasional testing, and monitor student progress throughout the term to help students meet the Benchmarks. Students who fall short need to keep working until they are capable of performing at the appropriate level. Our goal is not to cull students who can't meet the Benchmarks, but to help everyone over whatever hurdles are preventing the Benchmarks from being met.
Every Benchmark uses the formula "demonstrate ability to . . .". Determination of whether a student meets a Benchmark is not based on a single test or sample but rather on holistic assessment that takes place throughout the course of the term. If a particular writing sample falls short of Benchmark standards, but the whole range of writing done throughout the term shows the required competence, the Benchmark is met. Conversely, if a particular writing sample is good enough to meet standards but is not reflected generally in a student's writing, the Benchmark is not met. Benchmark evaluation, then, is not static but diachronic, and also holistic.
Holistic assessment is at the core of Benchmark evaluation. When we evaluate proficiency, we look at a whole range of considerations. For example, in evaluating writing ability, we look at a student's vocabulary, grammatical sophistication and accuracy, fluency, ability to organize ideas, spelling and punctuation skills, and even handwriting. When we evaluate holistically, we look at all of these elements blended together, plus many others - some of which we may not even be consciously aware of, to get an overall sense of a student's ability to express her/himself coherently and effectively in the medium of writing. In holistic evaluation, a weakness in one area may be compensated for by strength in another area. It may be helpful to imagine a series of vertical tubes filled to varying degrees with fluid, each tube representing an aspect or criterion for measuring a specific skill area. In holistic evaluation we look at the average level of the various tubes. So a student with a strong vocabulary and weak spelling may meet a Benchmark along with a student with average vocabulary and spelling.
Holistic evaluation is not easy. It requires of the teacher linguistic knowledge and considerable experience with non-native speakers. It requires communication and cooperation among faculty to assure consistency in evaluation. It may take time and effort to adjust to a system that seems subjective and lacking in definitive numerical scores. We may long for the kind of quantification that allows us to say a student has made 17.6% improvement in vocabulary in a 9-week term. We may miss the certainty of calculating a grade of 82.3% by averaging test scores. Quantitative assessment of discrete skills provides wonderful exactitude and accountability and has only one drawback - it is likely to be completely meaningless as a measure of the one thing we are interested in, namely, language proficiency. The seeming subjectivity of holistic evaluation should be weighed against the greater magnitude of subjectivity represented by the particular selection of items on discrete grammar, vocabulary or spelling tests and their questionable relevance to overall proficiency.
Some things are easy to test and quantify, but they may not be the things that need to be tested and quantified to gauge proficiency. Einstein is said to have kept a placard on his wall stating: There are some things that count that can't be counted. And some things that can be counted that don't count. That aphorism is relevant in a discussion of proficiency assessment because, conventionally, it is what can be easily counted rather than what really counts that is used as the measure of proficiency. Rubrics assigning points for specific areas and elements or subtracting points for specific infractions attempt to turn assessment into a mechanical and supposedly objective process divorced from human judgment, but what they typically do is count what doesn't count while ignoring what does count. As much as we may wish to feed raw data into a machine and have it belch out precise, incontrovertible numbers, we must recognize that human judgment is necessary for defining the quality of writing (or speaking or comprehension, for that matter) and that we must rely on holistic assessment if we wish to focus on what counts rather than on what can be counted, and if we attempt to measure anything more significant than how well students can regurgitate what has been "taught" to them in class.
Individual teachers' criteria for assessment are bound
to differ from one another, and successful holistic assessment at a center relies
on the development of communal notions of competence. The airing and sharing of
different viewpoints necessary for building a solid basis for assessment supports
the formation of a cohesive faculty community and affords an exceptional opportunity
for professional development and growth. Holistic assessment is not so much a
skill to be mastered as an evolving, organic process fostered by ongoing reflection
and communication.
Class levels are not absolute, distinct entities but rather convenient groupings of students with approximately similar levels of proficiency. Students within the same level may vary considerably with respect to a particular skill but still belong in the same class because of equivalencies in overall competence. It is possible for a level 2 student to have better pronunciation or be more grammatically accurate than a level 3 student without having the same degree of overall competence, and that is why holistic evaluation is so important. Neither are there hard and and fast lines that demarcate one level from another. The gradation may be likened to that of the color spectrum with one color gradually morphing into another. As the spectrum graphic below demonstrates, distinguishing between contiguous colors may not be easy, but there is, nevertheless a clear distinction, at least in Western cultures, between red, orange, yellow, green and blue. While a particular hue might be arguably red or orange, it is much less likely for it to be confused with a green or blue. Likewise, the breakdown of language proficiency into five levels allows for a fair distinction of overall proficiency despite some fuzziness at the edges.
Benchmarks work like color samples. As the teacher considers a student's overall abilities in each skill area, s/he measures them against the spectrum of Benchmarks to determine the best fit. Measurement is accomplished by comparing student samples with the Benchmarks and determining how close the match is. The verbal descriptions for the Benchmarks do not constitute the Benchmarks themselves nor are the samples meant as ultimate prototypes. The descriptions and samples together serve as constructs to help teachers develop a sense of the proficiency expected at each level and measure how well students match up to expectations. There is a great danger in relying too much on verbal descriptions or specific samples to assess proficiency. It is the gestalt built upon descriptions, samples and accumulated experience that produces a reliable sense of Benchmark achievement. The Merriam-Webster dictionary defines gestalt as "a structure, configuration, or pattern of physical, biological, or psychological phenomena so integrated as to constitute a functional unit with properties not derivable by summation of its parts." Because overall proficiency is not equivalent to the "summation of its parts", discrete skills (such as vocabulary, grammar, pronunciation etc.) cannot effectively form the basis of proficiency measurement and holistic measurement is the only instrument capable of performing the task. So while measurement of discrete skills, which can be quantified and graphed, seems more objective and less messy than holistic evaluation, it really does not yield reliable information about overall proficiency.
The
Benchmarks are intended to measure each of the skill areas - listening, speaking,
reading and writing - and not a student's performance of a particular task. For
the Benchmarks to be effective, there must be consistency from term to term and
from teacher to teacher. To help achieve that consistency, the Benchmarks are
stated as simply as possible and are associated with samples and short discussions
intended to clarify what it is students should be capable of doing.
To view the Benchmarks, click the links below.