Mention the phrase ‘graded lesson observations’ in any staffroom in the country and what would be the response?
In many staffrooms they are derided as an ugly feature of a particular strain of virulent OFSTED-itus. Only three or four years ago ‘graded lesson observations’ were the norms in pretty much every school in the nation. Since then, with repeated confirmation from OFSTED, the practice is on the wane. Still, however, many staffrooms will speak of still being subjected to this discredited and discouraging practice.
So why are we hanging onto this zombie of supposed-school-improvement?
I would be intrigued if we could pin some exact statistics onto the ongoing use of graded lesson observations. Maybe they have been given a new name, rebranded as ‘lesson grading over time’, ‘learning tours’, or some such linguistic sophistry. My anecdotal understanding is that the practice is alive and well in too many schools to mention – though proper statistics would prove useful.
Perhaps a reminder of some statistics about lesson observations would prove helpful here…
Professor Rob Coe made what was for me a defining speech on the unreliability of graded lesson observation data at ResearchEd, back in 2013 (the detail is written up in this superb blog – here). The cold facts: if a lesson was to be judged outstanding, the probability that a second observer would attribute a completely different grade would between 51% and 78%’; if a lesson were to be judged inadequate, then the probability of a second observer attributing a different grading would rise to 90%. Professor Coe cited Bill and Melinda Gates’ seminal MET Study – costing a whopping $50 million dollars. Its findings: you are pretty much better off tossing a coin that bargaining on the accuracy of a single school leader grading a lesson observation!
Research by Ho and Kane (2013) on ‘The Reliability of Classroom Observations by School Personnel’ showed that observers regress to the mean and avoid giving the top and bottom gradings (with a scale of four grades, just like OFSTED has, or had); we rate teachers in our own school better than others; we develop positive impressions of teachers, which influences all future ratings (surely the converse is true too). Finally, having more than one observer proved a must if you were to have any validity attributed to a judgment.
Many of the findings regarding unreliability relate to our emotional reasoning – or, more accurately, our inability to separate our emotional biases from our professional judgments. We simply cannot separate the person and practice. We judge physically confident people as more competent. Too make matters more dubious, we overestimate the effects of the teacher and underestimate the effects of the context. For example, if students were calm and quiet, we will likely assume that is the control of the teacher, rather than the potential of early morning sleepiness on the part of the class – see the research on Attribution Error and the Quest for Teacher Quality.
Much of this evidence has already been widely shared and I am flailing and flogging at a horse that has long since been taken to the knackers yard. And yet, we still have a number of schools grading lessons; we have a number of schools rebranding the practice, but with little real difference to what has come before.
Only recently, I spoke to an excellent school leader who had devised their own lesson grading system that they thought foolproof. They may be right – a brilliant outlier that defies the mass of internationally available evidence– but they probably aren’t. Thinking we are exempt from the biases and failings evident in the research evidence is commonplace. When it comes to the development, and even the well being of teachers, we should remain highly circumspect.
In his recent blog, entitled ‘The Semmelweis Reflex: Why does Education Ignore Important Research?’, Carl Hendrick relates the definition of the Semmelweis reflex. That is to say: “the reflex-like tendency to reject new evidence or new knowledge because it contradicts established norms, beliefs or paradigms.” This dismissive response to evidence that doesn’t suit our agenda is alive and kicking sand in the face of teachers everywhere.
Why are lesson observation grading still so attractive? Well, they provide a lazy method of managerial compliance. Too many school leaders, afeared of the toe-capped boot of OFSTED, feel they cannot risk dropping lesson gradings, but some will also secretly harbor the thought that constant OFSTED-fear is no bad thing in leveraging control in their school.
You have to wonder: why do so many OFSTED whispers linger on in schools when school leaders hear every update? Is it a little more than Stockholm Syndrome – or do some schools secretly want to utilise the jack-boot of high-stakes accountability for their own ends? Teachers should ask questions of this zombie practice and they should demand better.
Continuing with lesson observation grading may be explained away as just one small element of a teacher performance development judgment, but it is a sad indictment given that formative feedback on lesson observations could garner greater collegiate trust and actually help genuinely develop the quality of teaching in our schools. I don’t imagine this trust will be established overnight in schools – teachers will distrust lesson observations for a while yet – but we can make a start now by killing off the zombie that is graded lesson observations. We can surely do better.