Like most teachers I was always respectful of educational research–until I became a member of the National Reading Panel and started to read about how most studies are done. I found out that many reports of successful teaching practices overemphasize very small positive results, that often researchers rely on flimsy evidence, and that many studies have never been replicated. In today’s post I will explain why educational research, in general, is less trustworthy than medical research.
When the National Reading Panel began its work in 1998, one of its goals was to bring reading research up to the “Gold Standard” of medical research. Unfortunately, reaching this goal was –and still is–impossible because reading researchers, unlike medical researchers, cannot separate treatment effects from effects caused by other factors, such as classroom distractions, student hunger or illness, home culture, work habits, and personal feelings.
Medical researchers have the advantage of being able to remove or minimize outside factors by assembling very large groups of potential subjects (often lab rats) and then forming randomly selected treatment and control groups. Just as important, they are able to use a “double blind” procedure in which neither the subjects nor the people applying the treatments know which doses are the experimental ones and which are placebos.
Rather than trying to explain how all the factors present in Educational research may skew results, I have chosen to focus on the one I think is the most troublesome: the “Hawthorne Effect.” This term derives from a series of studies done in the Hawthorne Works Plant of Western Electric near Chicago from 1924 to 1932. There, researchers manipulated various physical conditions, such as high or low lighting and the number of rest breaks for workers, to see which ones most decreased their output. What they found was that almost any change–good or bad– increased production, and what they concluded was that knowing you were a part of a scientific experiment created feelings of importance and belonging that were a more powerful determinant of productivity than any change in conditions.
It is hard to believe that the outcomes of much reading research are not similarly affected. In most studies both the teachers and their students know when they are part of a research plan and whether they are in the treatment group or the control group. In many studies both groups are also in the same school. There is no way to “double-blind” such a study when the teachers are the people applying the treatment, only one group of students has new materials, and the type of instruction given to those students is different from what is going on in the rest of the school.
Under such conditions, the feelings of the experimental group of teachers and students are pretty certain to be positive. After all, being chosen to implement a scientific experiment is an honor. It implies that the researchers and the school’s administrators think you are intelligent, competent, and trustworthy enough to do it well. Add to this the special training often given teachers in an experimental study and the frequent classroom visits of researchers monitoring implementation.
In contrast, neither the control group teachers nor its students get any extra attention, new materials, or special training. They may also feel dishonored by not being the chosen ones. Their feelings could be called a “Negative Hawthorne Effect”
Just as production increased in the Hawthorne studies, we can expect that the results will be better for the experimental group in the type of reading study just described, whether or not the new program or strategy really is any better than the old. This explains why so many studies report positive results for the treatment applied, and it should make us cautious about accepting those conclusions.
But, hold on! There is a way to do education research that limits the influence of the Hawthorne Effect: have two or more treatment groups, with each one trying out a different program and receiving the benefits of being part of an experiment. Ideally, there is also a control group using the old program. Fortunately, a few research studies do follow this two-treatment format, but not enough of them. It is more expensive and more complicated to carry out, and most researchers—being human—believe before they start their research that one type of program is superior to all others and want to prove what they already believe when they design their studies.
The lesson for us as consumers of education research is to consider not only results but also how studies were carried out, especially how likely it was that the “Hawthorne Effect” came into play. If you have the time and access to the full reports of reading studies, read them or their abstracts to find out how they were conducted. But if you don’t—and what classroom teacher does—let me try to be your surrogate. From time to time, I will identify those studies that most strongly influence government mandates, published programs, or current trends in school practice and examine them for the flaws that make their results questionable. In this blog I will let you know what I find.
I agree with Joanne’s characterization of much of the educational research. Let me add some thoughts. I reviewed 8 years’ worth of reading research, starting with the National Reading Panel, and found the research studies anemic. They tended to focus on one small feature of reading such as phonemic segmentation. The studies used at-risk students as their subjects, and they only compared whether the treatment group did better on a post-test, generally, after the control group received no treatment. So you teach Group A to do X, you don’t teach Group B to do it, and at the end Group A does better. Hardly earthshaking. I never saw evidence that the treatments raised the students’ overall reading performance near grade level. However, these studies that were limited in scope and showed very paltry results were adopted to apply to the whole student population. As for the “medical model,” medical experiments are informed by knowledge of how drugs and bodies interact. I never saw that kind of background information in the reading research articles. And medical studies don’t claim that positive results have implications for everyone. They describe how the treatment might help some individuals in some situations, ad they come with a statement of limitations. There was nothing “scientific” in the so-called scientifically based reading research.
LikeLike
Gary, Maybe we can talk about some of those studies and you or I could write about them for this blog.
LikeLike
Excellent post. I’m reminded of business school. In our introduction the prof talked about management science. Then said: “there is no such thing. Science requires the ability to hypothesize then test in controlled circumstances. It is impossible to do that testing in a business because the factors out if control of the researcher far outnumber those that can be controlled.”
Educational research is the same. And it needs to be held to far higher standards – which would certainly limit the amount if work that could be published with headline of “Research Shows…”
And that would be a good thing.
I’ll also suggest a book I’m reading which challenges the fundamental statistical assumption used in most of this research (and management research). It’s conclusion is that we need to drastically raise the bar before something is published. “Corrupt Research: The Case for Reconceptualizing Empirical Management and Social Science”
LikeLike
Doug, Maybe we can work together to debunk a study wildly heralded as “Proof” that a particular method is better than everything else.
LikeLike