The Right Stuff: What Many Teacher Evaluations Are Missing

Teaching is complex.

Educators manage fluid learning environments, reconciling student needs and external expectations by making split-second adjustments, often right in the middle of lessons. It becomes even more complex when teachers are caught between competing education initiatives.

Consider two efforts that have impacted teachers in recent years: state-mandated teacher evaluation and support systems and statewide adoption of new, more rigorous and focused student learning standards based on the Common Core.

Our research team at AIR, with the support of the Bill and Melinda Gates Foundation, recently analyzed 45 teacher education rubrics, or documents outlining the standards for evaluation, from across the country. These documents form the foundational framework of teacher evaluation systems. Our sample included rubrics from seven states, five charter organizations, and four nonprofits offering support directly to teachers. Some were state models while others were developed locally.

We scored these rubrics for how well they matched up with an AIR instrument designed to measure alignment to teaching practices associated with Common Core-aligned standards. The goal was to see if the documents used to guide teachers’ year-end evaluations lined up with the messages teachers get about instruction through standards, curriculum materials, and professional development.

Our principal finding is something many classroom teachers probably already know: The implicit and explicit messages that teachers receive about teaching—through professional development, instructional materials, or the standards themselves—are rarely reinforced by the tools used to evaluate these same teachers. Too many instruments are written for a generic classroom, which in principle can work for all lessons and all grade levels, but in practice do not work well for any.

In light of these findings, policymakers should work with educators to update their teacher evaluation rubrics, improving them in three ways:

Including additional instructional examples
Increasing emphasis on underrepresented instructional practices
Adding additional subject-specific content

More instruction. Rubrics need more instructional content, including more instructional indicators and more examples of quality classroom instruction from all levels and subjects. Too many rubrics contain a disproportionate share of non-instructional indicators focused on teacher knowledge, collegiality, or parent contact. The instruments need more indicators focused on in-class, academic instruction.

Of the 45 rubrics our team scored, 10 contained almost no instructional content. We also collected an additional 17 that contained too little content to be scored reliably on that dimension.

One state’s model contained only one indicator (out of 15) that explicitly focused on instruction. On the other hand, the charter organization Alliance College-Ready’s rubric contains four standards and 17 indicators, with an entire section devoted exclusively to instruction and three-fourths of the indicators instructionally focused.

Greater emphasis on instructional shifts. Key instructional moves such as emphasizing academic vocabulary, employing technology in the classroom, engaging students in rigorous peer-to-peer discourse, or requiring students to substantiate answers all appear infrequently in evaluation rubrics but are critical to ensuring students meet higher and deeper standards.

Denver’s LEAP rubric includes these vital, but infrequently mentioned, instructional moves. Ten of 12 indicators are marked as corresponding to Common Core-related instructional shifts. The tool contains ample detail, providing a continuum of teacher behaviors, student behaviors, and specific instructional moves as examples.

Subject-specific content. Finally, more teacher evaluation rubrics should either be subject-specific or contain broad, generalizable instructional principles with more subject-specific sub-indicators. Only eight of the 45 rubrics we examined were subject-specific. These were notable for their depth of detail in several instructional practices that were less common, or less well defined, in other instruments. Practices such as student-to-student discourse, challenging teacher questioning, use of evidence, and integrating skills within an academic subject were generally more likely to be present and well defined in nonprofit organizations’ subject-specific rubrics.

New York-based Student Achievement Partners, for example, publishes four instruments, using separate tools for English Language Arts and mathematics and dividing each subject area into upper and lower grade levels. The state model rubric from Colorado highlights some universally desirable teaching practices for all teachers but still uses sub-indicators specific to particular subject areas or grade levels, such as calling on teachers to develop deep content knowledge in students, but differentiating between desired teacher and student actions in English Language Arts and mathematics classrooms.

Evaluation should be a source of professional growth. Educational leaders can help teachers grow by evaluating their existing rubrics (using tools like those created by AIR’s GTL Center) and then providing educators with access to high-quality instruments that provide start-of-the-art guidance on instruction. Otherwise, an already complex profession becomes mired in trying to reconcile competing priorities, rather than in supporting teachers’ growth in creating rich and rigorous lessons for their students.

Matthew Welch is an AIR researcher based in Massachusetts, studying state and local educational systems.

Contact