An On–Going Discussion with Reid Lyon

By Michael F. Shaughnessy Senior Columnist EducationNews.org
Published 06/3/2008
Commentaries and Reports
Rating: Unrated

Michael F. Shaughnessy Senior Columnist EducationNews.org

Dr. Shaughnessy is currently Professor in Educational Studies and is a Consulting Editor for Gifted Education International and Educational Psychology Review. In addition, he writes for www.EdNews.org and the International Journal of Theory and Research in Education. He has taught students with mental retardation, learning disabilities and gifted. He is on the Governor's Traumatic Brain Injury Advisory Council and the Gifted Education Advisory Board in New Mexico. He is also a school psychologist and conducts in-services and workshops on various topics.

View all articles by Michael F. Shaughnessy Senior Columnist EducationNews.org

An On–Going Discussion with Reid Lyon

Michael F. Shaughnessy
Senior Columnist Columnist EducationNews.org
Eastern New Mexico University

QUESTION :

Reid, since I have interviewed you recently about Reading First, the "comments " section has been a veritable " hot bed of activity" with a number of questions, comments and concerns.

Since not all of our readers read each and every comment, could you briefly summarize what you see in your mind, as the TOP issues that the comments section seems to have brought up?

ANSWER

One top issue that not only seems to come up in the comments to the interview is that some folks may not have a clear understanding of what the Reading First Implementation Study (RFIS) is evaluating.This can be seen in comments both in EdNews and elsewhere that the program to include the assessment, instructional, and professional development components were not effective.But the RFIS did not evaluate the effect of any particular program(s), assessments, or professional development strategies, alone or in combination, on reading comprehension.The RFIS was designed to measure the extent to which a specific funding stream in the form of Reading First (RF) money impacted reading comprehension.

The impact of Reading First funding was addressed by comparing eligible RF schools who received Reading First money with eligible schools that did not receive R F money.The RFIS IS NOT an experiment to test the efficacy of the intervention packet defined by RF (e.g., instructional programs, assessment and professional development strategies, etc.). It's an impact evaluation of a treatment (THE GIVING OF MONEY) in the setting of an effectiveness trial. In an effectiveness study, the "control" is not controlled, nor is the treatment. The study team was not able to prescribe any behaviors on the part of the comparison schools other than compliance with testing of students, and observation of instruction. For this type of question – funding versus no funding, the regression-discontinuity design the evaluators used was entirely appropriate. But it is possible, if not probable, that the funding of Reading First eligible schools caused changes in non-reading first schools (the comparison group) that were not anticipated. For example, we know from state Reading First evaluation reports, that some eligible RF schools not receiving funding implemented similar professional development and instruction programs as did the funded schools. They may – and many did – receive additional state/district funding to do so (more on this later). So the assumption that the eligible non- funded RF schools would continue doing what they were always doing is not valid in many cases.

Again, it is critical to understand that the RFIS did not examine the specific effects of programs, materials, or the impact of professional development, etc., on reading outcomes. Answers to these questions would have been more informative in an impact study that was designed to look at variance in treatment effects.The RFIS was supposed to do this among many other analyses, but it did not.It is possible that some data on program specific effectiveness with better comparisons will be produced in the final report, but the current design and scope of the study makes this doubtful.

As has been seen, neither the education press reporting on the study nor several commentators in the reading community had a clear understanding of what the study is evaluating.Toppo from USA Todayled out with "Study: Bush's Reading First program ineffective,without explaining that the study only examined the impact of a funding stream and not the specific programs being purchased by the funding stream.He goes on to write, "Advocates of Reading First, an integral part of the 2002 No Child Left Behind law, have long maintained that its emphasis on phonics, scripted instruction by teachers and regular, detailed analyses of children's skills would raise reading achievement, especially among the low-income kids it targets. But the new study by the U.S. Education Department's Institute of Education Sciences (IES) shows that children in schools receiving Reading First funding had virtually no better reading skills than those in schools that didn't get the funding".Unfortunately, the RFIS, as designed, is not capable of examining whether "scripted" phonics instruction had a differential impact on reading comprehension.Somehow, he forgets this critical feature while at the same time, reverting back to his obsession with phonics as synonymous with Reading First (Greg, please read the darn legislation).Sam Dilllon, reporting in the N.Y., Times leads out with "An Initiative on Reading Is Rated Ineffective"without explaining that that the RFIS examined the impact of funding rather than the impact of instructional programs, assessments, professional development programs and the like.Why is this a problem?Because he goes on to associate the null findings reported in the Interim Report with statements from Higgins, Kennedy, and Miller that allude to publishers and programs.But specific programs, no matter who published them, were not evaluated for effectiveness.

So why did the RFIS not evaluate the impact of what was transpiring in schools and classrooms that received Reading First funding (other than the amount of time spent in instruction by reading component).I can only guess at this point.First, it does not appear that IES or the contractors actually examined the legislative language that required the evaluation of the Reading First program.Had they done so, this is what they would have seen:

the evaluation Shall (meaning must) conduct:

1) An analysis of the relationship between each of the essential components of reading instruction and overall reading proficiency.

(2) An analysis of whether assessment tools used by State educational agencies and local educational agencies measure the essential components of reading.

(3) An analysis of how State reading standards correlate with the essential components of reading instruction.

(4) An analysis of whether the receipt of a targeted assistance grant under section 1204 results in an increase in the number of children who read proficiently.

(5) A measurement of the extent to which specific instructional materials improve reading proficiency.

(6) A measurement of the extent to which specific screening, diagnostic, and classroom-based instructional reading assessments assist teachers in identifying specific reading deficiencies.

(7) A measurement of the extent to which professional development programs implemented by State educational agencies using funds received under this subpart improve reading instruction.

(8) A measurement of how well students preparing to enter the teaching profession are prepared to teach the essential components of reading instruction.

(9) An analysis of changes in students' interest in reading and time spent reading outside of school

(10) Any other analysis or measurement pertinent to this subpart that is determined to be appropriate by the Secretary.

Second, given that the recruitment of contractors and their planning of the evaluation was delayed for unknown reasons, the amount of time requiredto carry out the tasks required in the evaluation (above) were probably not possible.Apparently then, the narrow questions addressing the impact of Reading First funding, while an important part of the evaluation, were addressed in isolation.Note that the delay in starting the evaluation was a concern expressed early on by staff from the House Education and Work Force Committee – a concern expressed in documents sent to the Secretary of Education (Paige) and in face to face meetings with IES and the contractors.

Third, in discussing the RFIS with people working on the evaluation, some were under the impression that the current study was the best that could be done given the resources at hand.One advisor stated that he was literally shocked to learn (1) that the congressional intent was to address tasks 1 through 10 above, and (2) that the Department had been allocated $150 million dollars ($25 million per year) to address the evaluation tasks in detail.

Apparently neither the advisors to the study nor the contractors were provided the legislative language articulating the scope of the required evaluation nor apprised of the resources available - which, by the way, were sufficient to Cary out the mother of all evaluations.I am not sure, but I believe approximately $30 million dollars was expended for the current RFIS.It seems that resources may have been thought to be an issue given the sampling strategies employed and the absence of analyses addressing the majority of tasks specified in the legislation.

Not only did the education press not report these issues, they were remarkably silent on what appears to be a significant problem with the study – at least as reported in the Interim Report.While there are several confounds that limit interpretation of the data presented in the RFIS Interim Report, a hefty one is a lack of control over what is taking place in eligible Reading First Schools that were funded and Eligible Reading First schools that were not funded.A major problem is that the funded schools and the non-funded schools were doing the same thing in many cases.Tim Shanahan, an advisor to the RFIS, and one deeply familiar with not only the current study but previous implementation studies has explained this clearly in a Q and A with Eduflack.Rather than summarize, it is important to look at the details.With Tim's and Eduflack's permission, here is the interview:

EDUFLACK: What does the IES study really say? How strong are the findings?

SHANAHAN: THE IMPLEMENTATION STUDIES INDICATE THAT THE DIFFERENCES BETWEEN RF AND NON-RF SCHOOLS WERE PRETTY MODEST (ABOUT 50 MINUTES OF INSTRUCTIONAL DIFFERENCE PER YEAR IN AMOUNT OF INSTRUCTION), MEANING THAT RF KIDS PROBABLY RECEIVED FEWER THAN 30 HOURS OF ADDITIONAL READING INSTRUCTION EACH YEAR DUE TO THE INTERVENTION. CLEARLY A MODEST INTERVENTION, ESPECIALLY GIVEN THE SIMILARITIES IN CURRICULUM, INSTRUCTIONAL MATERIALS, PROFESSIONAL DEVELOPMENT, AND ASSESSMENTS.

Q: How valid are the findings, knowing there may be contamination across groups (that both the RF and non-RF groups may have been doing the same things in the classroom)?

A: MOST SCHOOLS EMPLOY SOME KIND OF COMMERCIAL CORE PROGRAM. WHEN READING FIRST EMPHASIZED THE ADOPTION OF PROGRAMS WITH CERTAIN DESIGNS ALL MAJOR PUBLISHERS CHANGED THEIR DESIGNS TO MATCH THE REQUIREMENTS.

READING FIRST SCHOOLS ALL BOUGHT NEW PROGRAMS IN YEAR 1; ALMOST ALL OTHER TITLE I SCHOOLS ADOPT NEW CORE PROGRAMS EVERY FOUR OR FIVE YEARS. THAT MEANS IN YEAR 1, 100% OF THE RF SCHOOLS GOT A NEW PROGRAM, AND 25% OF THE OTHER SCHOOLS DID. IN YEAR 2, THAT NUMBER WENT TO 50%, IN YEAR THREE 75%. ALL RF SCHOOLS HIRED COACHES IN YEAR 1, SO DID MORE THAN 80% OF THE OTHER SCHOOLS. ETC.

THIS ISN'T A CASE OF SPOT CONTAMINATION, IT WAS INTENTIONAL AND PERVASIVE (IN FACT, IT WAS PART OF THE RF LAW ITSELF—20% OF THE STATE MONEY, THAT MEANS $1 BILLION TOTAL WAS DEVOTED TO GETTING NON-READING FIRST SCHOOLS TO ADOPT THESE REFORMS).

Q: Given that contamination, are there contamination rates that can be tolerated in the design? For example, let's say 15 percent of the RF and comparison groups received identical programs/PD. Is this level of contamination tolerable? What if there is a 30 percent overlap – is this level tolerable? Are there ways to estimate the degree to which percent contamination will indicate a need to increase sample size?

A: THE PERCENTAGES OF OVERLAP WERE 75-100% DEPENDING ON THE VARIABLE. THE ONLY ONE WHERE WE HAVE ANY KIND OF IDEA ABOUT WHAT IS TOLERABLE IS WITH TIME.
FROM PAST RESEARCH, ONE SUSPECTS THAT 100 HOURS OF ADDITIONAL INSTRUCTION WOULD HAVE A HIGH LIKELIHOOD OF GENERATING A LEARNING DIFFERENCE, A 50-60 HOUR DIFFERENCE WOULD STILL HAVE A REASONABLE CHANCE OF RESULTING IN A DIFFERENCE. AT 25-30 HOURS A SMALL DIFFERENCE IN LEARNING MIGHT BE OBTAINED, BUT IT IS MUCH LESS LIKELY (ESPECIALLY IF THE CURRICULA WERE THE SAME).

Q: Did the evaluation design include procedures/strategies to avoid contamination between RF and the comparison group?

A: IT [THE IES STUDY] NOT ONLY DID NOT TRY TO AVOID CONTAMINATION, IT COULDN'T POSSIBLY DO IT SINCE THE SOURCES OF THE CONTAMINATION WERE SO PERVASIVE. FIRST, THE FEDERAL POLICY EXPLICITLY CALLED FOR SUCH CONTAMINATION TO BE PUSHED. SECOND, STATES AND LOCAL DISTRICTS MADE THEIR OWN CHOICES (AND THEY FELT ENTICED OR PRESSURED TO MATCH RF).

FOR EXAMPLE, SYRACUSE, NY RECEIVED READING FIRST MONEY FOR SOME SCHOOLS, BUT MANDATED THAT ALL OF ITS SCHOOLS ADOPT THE SAME POLICIES AND PROGRAMS. THERE SHOULD HAVE BEEN NO DIFFERENCES BETWEEN RF AND NON-RF SCHOOLS IN SYRACUSE, THE ONLY DIFFERENCE WOULD BE IN FUNDING STREAM—HOW THE CHANGES WERE PAID FOR, AS THE NON-RF SCHOOLS ATTENDED THE SAME MEETINGS AND TRAININGS, ADOPTED THE SAME BOOKS AND ASSESSMENTS, RECEIVED THE SAME COACHING, PUT IN PLACE THE SAME POLICIES, ETC.

Q: Did the evaluation design describe practices in the comparison groups?

A: YES, THE IMPLEMENTATION STUDIES SHOW THE SIMILARITIES IN PRACTICES AND HOW, OVER TIME, THE PRACTICES THAT WERE SIMILAR AT THE BEGINNING BECAME INCREASINGLY SIMILAR EACH YEAR. THAT WILL BE CLEARER IN THE NEXT STUDY OUT

Q: Did the evaluation design account in any way for contamination, crossover, compensatory rivalry, etc.?

A: NO. THE FEDERAL LAW CALLED FOR THE EVALUATION OF READING FIRST IN TERMS OF THE EFFECTIVENESS OF THE INSTRUCTIONAL MODEL, BUT DID NOT CALL FOR A STUDY OF THE IMPACT OF READING FIRST UPON THE ENTIRE EDUCATIONAL SYSTEM.

EVEN THOUGH I HAD PERSONALLY MADE A BIG DEAL OUT OF THE PROBLEM FROM THE VERY FIRST STUDY DESIGN MEETING, THE METHODOLOGISTS THOUGHT THEY COULD HANDLE MY PROBLEM SIMPLY BY ACCOUNTING FOR THE RF ROLLOUT EACH YEAR. THEIR ASSUMPTION WAS THAT RF WOULD IMPLEMENT SOME CHANGES IN YEAR 1, OTHERS IN YEAR 2, AND STILL OTHERS IN YEAR 3 AND THAT THIS PATTERN OF IMPLEMENTATION WOULD ALLOW THEM TO EXAMINE A CONTINUING LAG BETWEEN THE RF AND NON-RF SCHOOLS.

I DIDN'T UNDERSTAND THAT THEY WERE THINKING THAT AND THEY NEVER ASKED DIRECTLY ABOUT THAT. LAST YEAR, I FIGURED OUT WHAT THEY WERE THINKING AND I HAD TO EXPLAIN SEVERAL TIMES THAT RF PUT ALL OF ITS REFORMS IN PLACE DURING YEAR 1, WITH NOTHING NEW IN YEARS 2 AND 3, SO IT WOULD BE IMPOSSIBLE TO TEST THE EFFECTS OF DIFFERENT PARTS OF THE IMPLEMENTATION, ETC. USING THEIR APPROACH. I MIGHT HAVE BEEN ABLE TO GET THIS FIXED IF I HAD UNDERSTOOD THAT THEY WERE ASSUMING THAT KIND OF DESIGN (OR IF THEY HAD ASKED ME ABOUT THAT SPECIFICALLY).

Q: Can we assume that the RF group is just like the comparison group except for exposure to RF funding? I the counterfactual valid?

A: READ THE IMPLEMENTATION PART OF THE REPORT (AND THERE IS ANOTHER STUDY COMING LATER THAT WILL MAKE THIS CLEARER) AND YOU'LL SEE THE DEGREE OF SIMILARITY IN THE KEY FACTORS BETWEEN THE TWO SETS OF SCHOOLS. I RAISED THIS AS A THEORETICAL PROBLEM ORIGINALLY, BUT THE IMPLEMENTATION STUDY CLEARLY SHOWS THAT CONTAMINATION WAS A BIG PROBLEM (IT CANNOT TELL US WHETHER THE CONTAMINATION CAME FROM THE $1 BILLION FEDERAL EXPENDITURE ON THIS, BECAUSE THE STATES AND LOCAL DISTRICTS OFTEN SIMPLY ADOPTED THE SAME IDEAS.

AS ONE ILLINOIS DISTRICT TOLD ME, "IF THIS IS THE RIGHT STUFF TO DO, THEN WE ARE GOING TO DO IT WITH EVERYONE."

All this makes you want to scratch your head.Somehow, an opportunity to design and conduct one of the most comprehensive evaluations of an educational program was squandered despite having the resources to carry it out.It boggles the mind.

It is interesting that the public was not presented more accurate information about what the RFIS was designed to do and what it did not do.It examined the impact of funding on an educational outcome, not the impact of what the funding paid for and its impact on reading outcomes.It is also interesting that the risk and the effects of contamination across Reading First and comparison schools were not more fully addressed in the Interim Report and in the press. Shanahan's analysis indicates clearly that the evaluators knew that this was a problem.The fact that the evaluation that was carried out was not the one that Congress intended is, for lack of a better word, strange.

Published June 4, 2008

Interview with Reid Lyon: Reading First is the largest concerted reading intervention program in the history of the civilized world

Spread The Word

Comments

Comment #1 (Posted by Dick Schutz) Rating: ratingfull

Lyon is right that the IES thumbed it's nose at the legislative mandate to evaluate FR. What he misses in all the fal de ral about "contamination is that neither the RF nor the comparison group came close to getting kids to "grade level" at any grade 1-3. The design used was weak as was the "comprehension test." The unacountables are at the top of the EdChain. That's where "reform" should be focused.

Comment #2 (Posted by Jill Kerper Mora) Rating: ratingfull

This discussion is evidence that the Reading First (RF) foundational “scientific research” paradigm has come back to haunt its architects, primarily and including Reid Lyon. RF was based on a medical model of research using “experimental” research that its designers claimed was the only empirically sound way for determining effective strategies for teaching reading. The National Reading Panel created a set of criteria for either accepting or rejecting studies for inclusion in their findings based on their application of experimental or quasi-experimental research where there was an experimental group that received a certain “treatment” or “intervention” and a control group that did not. Now Reid Lyon complains that this model does not tell us much about the effects of the Reading First funding as an “intervention” in K-3 students’ test scores in reading comprehension (SAT 10). I have great sympathy for the evaluation team who were asked to do the impossible and had great difficulty sorting out how to do it. Using RF’s favorite model, the medical model, let’s consider an analogy: The federal government tells a contracted evaluator to determine whether funds going to a clinic in a low income neighborhood are showing an increase in health and wellness in the community. The feds have mandated certain health and wellness information (while restricting information from other sources), certain medical procedures and allowable drug treatments, and a specific set of “components” of services that must be offered at the clinic. These mandates are not based on an assessment of the health conditions, needs, lifestyles, etc. in the community, but rather on a set of “scientific” drug trials that show that each separate drug works to treat certain ailments and therefore the clinic must prescribe all of them as an intervention for undiagnosed health conditions. They demand “program fidelity” in an attempt to ensure that the health services are delivered according to plan. Then they fund an extensive evaluation study to compare a community that has a clinic that didn’t get the funding to determine the impact of the funding on 1) the overall effects of the funding on community health; 2) the proportion of impact of each separate treatment or intervention on the overall health of the patients; and 3) factors over which the clinic has no control such as the level of knowledge and skills in each “component” of the program that the health service workers acquired before in their professional development programs before entering the profession. What do you think the medical and health care professional communities would say to the federal government about this research model? Could the feds identify the impact of the taxpayers’ expenditure on community health? Reading First is caught in a trap of its own making. We must examine whether reading achievement should have been viewed using the lens of a medical model in the first place, in part because of the ubiquitous and rather repulsive use of medical terminology implies (treatments, interventions, contamination) and we must insist that literacy education policy be framed around a coherent theoretical framework about how students learn to read and write that does not imply that low reading achievement is an illness that can only be cured by the Spin Doctors of Science.

Comment #3 (Posted by Reid Lyon) Rating: ratingfull

It might be helpful for Jill Kerper Mora to take a look at the recent IES study of specific reading interventions implemented with Title I students (they did this one pretty well), the work of the Campbell Collaboration, and the work of the Coalition for Evidence-Based Policy. The latter has recently updated its interactions with Congress and the scope of the implementation of Randomized studies testing the effectiveness of many programs and initiatives. I have copied some of the info below: The Coalition for Evidence-Based Policy The Coalition for Evidence-Based Policy is a non-profit, non-partisan organization, sponsored by the Council for Excellence in Government, with the mission to promote government policymaking based on rigorous evidence of program effectiveness. In the field of medicine, public policies based on scientifically-rigorous evidence have produced extraordinary advances in health over the past 50 years. By contrast, in most areas of social policy -- such as education, poverty reduction, crime and justice, and substance abuse prevention -- government programs often are implemented with little regard to evidence, costing billions of dollars yet failing to address critical needs of our society. However, rigorous studies have identified a few highly-effective social interventions, suggesting that a concerted government strategy to build the number of these proven interventions, and spur their widespread use, could bring rapid progress to social policy similar to that which transformed medicine. Since the Coalition's founding in 2001, our work with top Congressional and Executive Branch policymakers has resulted in important evidence-based reforms. As illustrative examples, we have helped advance: Concrete advances in Congressional funding and support for scientifically-rigorous evaluations in education, crime prevention, and other areas of social policy; Key reforms in the Office of Management and Budget's (OMB) process for assessing the performance of federal programs government-wide, including new OMB guidance on What Constitutes Strong Evidence of Program Effectiveness; and A new "priority" in a number of the Education Department's competitive grant programs for applicants that build a rigorous evaluation into their proposed project. An independent evaluation of our work, conducted for the William T. Grant Foundation, found that the Coalition has been "instrumental in transforming a theoretical advocacy of evidence-based policy among certain [federal] agencies into an operational reality." The Coalition’s bipartisan Board of Advisors is comprised of distinguished former government officials, scholars, and other individuals from a broad range of policy areas. The Coalition's Executive Director is Jon Baron (email, tel. 202-530-3279). Click here for a two-page overview of the Coalition's purpose and agenda. New and Noteworthy: NEW On April 8, 2008, the Coalition will conduct a new workshop -- "How to Read Research Findings to Distinguish Evidence-Based Programs from Everything Else: Tools for Public Officials and Other Stakeholders to Become Independent Experts." Click here for more information, including how to sign up. NEW Our work with Congress and OMB helped create a new $10 million evidence-based home visitation program in the FY 08 Appropriations Act (Public Law 110-161). This HHS program will provide seed money to scale up research-proven models such as the Nurse Family Partnership. Based on our input, the final Congressional language directs HHS to "ensure that States use the funds to support models that have been shown in well-designed randomized controlled trials, to produce sizeable, sustained effects on important child outcomes such as abuse and neglect. In May 2007, the Congressionally-established Academic Competitiveness Council, to which the Coalition has been a main advisor on evaluation, issued an important report calling for evidence-based reforms in federal math and science education programs. The Council, led by Secretary of Education Margaret Spellings and comprised of top officials from 13 federal agencies, issued a report that includes, as a main element, the Coalition's Hierarchy of Study Designs (see full report, with the Hierarchy on p. 14). NEW The Second Chance Act -- which per our input contains a 2% set-aside for rigorous evaluations of strategies to facilitate prisoner re-entry into the community -- has moved closer to enactment. The House passed its version of the Act in November, containing a provision that we helped develop to set aside 2% of program funds for rigorous -- preferably randomized -- evaluations (H.R. 1593). In August, the Senate Judiciary Committee passed its version of the Act, containing the same provision (S. 1060).

Comment #4 (Posted by Reid Lyon) Rating: ratingfull

David Francis, Chair of Psychology at the University of Houston, was asked by IES to discuss the IES RFIS study with respect to strengths and weaknesses. He did so at the IES conference today in Washington. David is one of the most highly respected methodologists in the country, among other notable scientific capabilities, and presents an objective, coherent, and balanced review of the RFIS - warts and all. His comments are instructive for any that choose to read them objecttively and his recommendations for both interpreting the data and extending the study in needed ways make great sense. It is also noteworthy that he accomplishes his review of the RFIS without invoking any ad hominem attacks on individuals - a practice that has been and still continues to be a defining characteristic of discourse within many educational disciplines, most notably reading. Dr. Francis's analysis of the RFIS presented at the IES meeting are below: Reading First Impact Study: What have we learned and where do we go from here? David J. Francis Texas Institute for Measurement, Evaluation, and Statistics University of Houston Presented at IES Research Conference June 11, 2008 Washington, DC Thank you Tracy, Beth, and James! It’s an honor to have the opportunity to read and comment on this important study of this most important piece of federal legislation. Without a doubt, NCLB has brought intense focus on our nation’s educational system and especially on the education of our most at-risk students: children growing up in poverty, children with disabilities, children who speak languages other than English. The future of our nation is only as bright as the educational outcomes for these students at-risk. Within that context, Reading First is the most massive, focused and sustained educational intervention of any type, ever undertaken, anywhere in the world. Whether the results are seen as disappointing, expected, surprising, or heartbreaking depends on many factors, most having little or nothing to do with the study itself, but with one’s philosophy about how students learn to read and one’s cynicism about federal efforts to improve what has long been viewed as a local responsibility. My own sense is that something quite significant has been accomplished here, if we can just keep our eyes on the prize – improving educational outcomes for children and using science to inform the process. At the very least, we have succeeded in imposing a rigorous impact evaluation onto a piece of federal legislation, an event that is all too rare in the US, where policy is made and evaluations of it just seem to document what was done along with guesses about what outcomes have been obtained. Having achieved the important milestone of imposing a rigorous impact evaluation into discussions about Reading First, it is now incumbent on us to determine what the results do and do not mean, to determine what other data to bring to bear on the question, and to strive to determine what steps to take to achieve the goal of having all children – that is, ALL CHILDREN – read on grade level by the end of grade 3. In discussing the RFIS, I would like to begin by talking about what the Reading First Impact Study is not. Some critics of RF (or of SBRR) would use the results of the RFIS to discard the initiative, or as an indictment of the treatment, or of the managers of the program. But none of these conclusions is a logical necessity given the study and its findings. The findings of the RFIS will undoubtedly be cast, incorrectly, by critics of the National Reading Panel Report and the NRC report, Preventing Reading Difficulties in Young Children, as a study of the efficacy of Phonics based instruction. Some will even go so far as to say that the results of the RFIS were predictable and a consequence of overstated results in the report of the NRP. I would argue that these individuals do not know the difference between efficacy and effectiveness. Reading First was rolled out over a period of two to three years to roughly 5,000 schools in 50 states. In some states, RF was implemented in over 300 schools; in Texas the number was over 700. RF was not designed as a test of the efficacy of explicit instruction, or of the NRP’s “Big Five”, or even the RF operationalization of this approach to reading instruction. No, RF is not about the efficacy of SBRR; it is about the effectiveness of RF as a large-scale intervention to improve the reading achievement of students in high-poverty, low-performing schools. And the RFIS should be thought of as an effectiveness study. To see that efficacy and effectiveness are not the same, it is instructive to look to medicine. In his book Better, Atul Gawande informs us that each year some 2 million Americans become infected while they are patients in a hospital and that some 90,000 of these individuals will die from that infection. The number one reason that patients without infections become patients with infections while in the hospital?..Doctors and nurses fail to wash their hands as often as they should. Controlling infections is as simple as getting doctors and nurses to wash their hands correctly with antibacterial soap between each patient contact. The treatment is highly efficacious. Washing one’s hands properly with an antibacterial soap stops the spread of germs. But the treatment is also ineffective. It is ineffective because doctors and nurses cannot comply with the treatment 100% of the time. Gawande cites an example from his own hospital where an all out effort to improve on hand washing got the hospital’s hand washing compliance up to 70%, but infection rates did not change. Seventy percent was simply not good enough. I would submit that washing hands is easier than teaching children to read. It’s instructive to keep this distinction between efficacy and effectiveness in mind as we consider the results of the RFIS further. There has been considerable commentary in the media and on the internet regarding the RFIS, including criticism that the study was fundamentally flawed, and disparaging comments about the use of a RD design instead of an RCT. These criticisms seem misplaced. The RD is perfectly appropriate in this context. I would argue (as has the study team) that the RD is preferable in this context given the desire to intervene in all schools serving the most at-risk and disadvantaged students. In fact, the potential problems that we will discuss about the RFIS would have been no less likely had the study team been able to conduct an RCT. So, while I would disagree that the study was fundamentally flawed – meaning that the study could not hope to provide an unbiased estimate of treatment impacts of RF – I would agree with those individuals who have said that we must exercise caution in drawing inferences from the findings from the RFIS, and in particular we must keep in mind this distinction between efficacy and effectiveness. In talking about the RFIS, we first must be careful to distinguish between two treatments. The first of these treatments is RF Funding (i.e., the money itself). The money is basically uninteresting from a scientific point of view, but not from a policy point of view. It is relatively easy to follow the money and to make sure that non-RF schools did not receive RF funds. But the money is not the only treatment, and, in fact, is not really the treatment of interest to researchers and educators (and probably not to Congress, either). The second, and more interesting treatment, is the RF suite of instructional materials, interventions, assessments, professional development and technical assistance that states and districts were supposed to deploy through the expenditure of RF Funds. The treatment in RFIS is RF Funding, but the language of the RFIS and the interpretation of the results are couched in terms of the RF suite of instruction. There are at least two potential pitfalls here, that is, there are at least two stumbling blocks to valid inferences about the RF intervention from the RFIS data. First, we must consider the extent to which schools that did not receive RF funds also implemented RF, or at least substantial components of RF. RF did not appear in a vacuum, but rather was a cornerstone piece of legislation within NCLB, which significantly increased the attention in all schools on reading achievement. Thus, all schools and districts were under pressure to improve achievement and to ensure that all children were learning. There is widespread anecdotal evidence that many non-RF funded schools were purchasing the same materials and interventions that were in use in RF schools in those districts, and that districts used RF funds, as they were allowed to do, to provide RF professional development for teachers in all schools. I know that such bleed-over took place in at least one district in the RFIS that happens to be located in a state where I am involved in RF. The critics might nonetheless say that the fact that districts purchased RF materials for non-RF schools with non-RF funds indicates that in the absence of RF funds, districts would have done the same thing for those schools that received RF funds. That is, the critics would say that districts would have purchased the RF intervention for RF schools with non-RF funds if RF funds had not been available, and thus, the measure of impact remains unbiased as a measure of what would have happened to schools in the absence of RF. However, this inference requires that we assume that districts would have made these same choices for the expenditure of their own funds in the absence of the significant technical assistance and professional development that had been developed with RF funds at a national and state level. In addition, this inference requires that we assume that in the absence of RF funds, districts would have been in a position to financially support both the RF and non-RF schools in this endeavor, and that assumption is not at all clear. In essence, to the extent that the flow of RF funds to RF schools enabled the district to make different choices and expend additional district resources to place RF like instruction in non-RF schools, the impact estimates from the RFIS will be too small – i.e., treatment effects will be underestimated. The extent to which this issue affected the results of the RFIS will become clearer over time as the final report is released and the results of the implementation and impact studies are combined. For now, we simply must keep in mind that our inferences about treatment impacts require that we assume that the flow of RF dollars to RF schools neither directly, nor indirectly affected the instructional practices in the non-RF schools. Otherwise, we no longer have a valid counterfactual represented by the results in the non-RF schools. I have already indicated that I thought the use of the RD design was a wise choice on the part of the study team. But I do have two significant concerns about how the RD was operationalized. As I understand it from the report, LEA’s were identified which were RF eligible, had received RF funding from their SEA, and had used a quantitative approach to selecting schools to fund. Then within these LEA’s schools which were RF eligible were selected based on their proximity to the cut-score used within that LEA. This design strategy works to make the schools most comparable and, in essence, asks the question “Did the district improve the performance of the schools to which it chose to award RF funds relative to those schools to which it chose not to award RF funds?” This design certainly provides one answer to the question of impact of RF, but it is not the only RD design that could have been employed, and not the only way to conceptualize the question of RF impact. An alternative to the school-level RD design conceived of by the study team is to use the district as the unit of assignment. Specifically, RF funds were awarded to states to improve the performance of their most at risk students in their neediest districts. That is, in every state, a quantitative index was used to determine which districts in the state were eligible for RF and which were not. Could not the RFIS have been conceptualized in such a way as to compare changes in performance between eligible and non-eligible districts? Moreover, isn’t it likely that a study conducted at the district level would have been less prone to the types of contamination that were previously alluded to? It seems that we cannot rule out, without looking at the data, that districts receiving RF funds could be making gains on districts that were not eligible to receive RF funds even though RF funded schools within the eligible districts are not making gains on non-RF schools within the same districts. Has this possible scenario been discussed, and are there plans to examine this possibility using state achievement data for schools in eligible and ineligible districts? If the lack of differences between RF and non-RF schools in the RFIS is due to bleed-over of the intervention, but the intervention is nonetheless effective for teaching students how to read, then we would expect that eligible districts will improve their performance relative to ineligible districts and this possibility should be examined. My major criticism of the RFIS as designed is that the study was not powered as an effectiveness trial. Recognizing that RF was being implemented across such a broad array of contexts with likely substantial heterogeneity in implementation, and substantial design variation across districts and states, it is very unfortunate that the study was not designed to accurately estimate variability in treatment effects and the effects of key moderators of treatment impacts. It is wholly unsatisfying to have tests of significance on variability in treatment impacts be declared “non-significant” when p-values are .06 knowing that the study was underpowered for testing these effects. When we reach the stage of widespread implementation, such that the primary question is one of effectiveness, the most critical questions to be asked concern the factors that affect treatment impacts. The treatments have been shown to be efficacious in more tightly controlled experimental studies; the most important question now is “What are the factors that moderate the effectiveness of the treatments?” The reason this question becomes so important is that if we are not getting the impacts that we want, we need to know what to do differently. Understanding the factors that moderate treatment effectiveness can point the way to changing program implementation so as to bring about the desired results. Failing to power the study to identify and understand these moderators limits our ability to make informed choices about how best to modify the program and its implementation. Some factors that might merit consideration, in addition to those already examined, include the number of RF schools that an LEA is trying to serve, the grade-level configuration of the school, the number of schools served by an individual reading coach, and the degree to which assessments are used to inform instruction. Finally, it is important that the RFIS team take a developmental perspective when thinking about the way in which reading unfolds for individual children. That is to say that the effects of instruction in reading are cumulative, and skills not learned early must be learned remedially in order to provide an adequate foundation for the skills to come later. Program coherence across instructional grades can pay real dividends to students, but the current analyses do not take advantage of these cumulative effects in examining treatment impacts. By examining treatment impacts at each grade or across grades and treating all students within a grade as the same, any benefits of spending multiple years under the same instructional regime are being muted. In our evaluation of Direct Instruction (Carlson & Francis, 2003), we found that program effects increased each year that students remained in the same school. For example, effects for 2nd graders who had been in the school three years exceeded those for second graders who had been in the school only two years when results were compared to students who had been in the control schools for a comparable length of time. Because turnover happens at the student level each year in a school, and because program impacts may be less for students with fewer years in the school, it is important to examine this potential student-level moderator in order to develop a complete picture of how the intervention works within a school. Note it is not necessary to have longitudinal data to examine this effect; one needs only to have information concerning the first year of a student’s continuous enrollment in the school. In closing let me just return to a point made at the beginning, namely how important it is that we have succeeded in conducting a rigorous evaluation of the impact of RF. While there is still much to do and much to learn from the information that has been collected, the RFIS has raised the bar for future evaluations of federal policy. Hopefully, in interpreting the results of the study, we will all keep in mind what the study is and is not, and we will use the science to improve the practice and will continue to improve and develop ever more rigorous evaluations of federal policies in order to ensure that our policies are achieving the results for which they were drafted. It would be tragic for our country if the reward for subjecting policy to rigorous evaluation was to throw out the policy rather than to improve it, when many policies have continued in perpetuity in the absence of any rigorous information about their effectiveness.

How would you rate the quality of this article?

Rating: *	Poor Excellent
Your Name:
Your Email:
Your Comment: *
Verification *	Please copy the characters from the image above into the text field below. Doing this helps us prevent automated submissions.

An On–Going Discussion with Reid Lyon

Michael F. Shaughnessy Senior Columnist EducationNews.org

Spread The Word

Comments

Article Options

Categories

Popular Authors

Popular Articles

Popular Links