Measuring the Measurers

By PETER ELIAS, MD

I precipitated a recent online discussion about healthcare’s obsession with measurement (quality metrics is the current buzz phrase) when I quoted two aphorisms that highlight some problems with metrics and targets:

Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure,”

Campbell’s Law: “The more a metric is used, the more likely it is to “corrupt the process it is intended to monitor.”

One comment rubbed me the wrong way because it implied that measurement reduces harm:

“In my experience, what unites liberal and conservative doctors alike is their conviction that they are being measured unfairly….My view is that those who are upset about the burden on physicians of unnecessary measurement need to show that they understand the death and injury toll of lack of measurement caused before we trust them with the pruning knife.”

Measurement does not reduce harm or improve quality.

I love data (the result of measurement).  As a clinician I used measurement all day, every day. I couldn’t have practiced medicine without it. But measuring a patient’s blood pressure does absolutely nothing to change that patient’s blood pressure. What improves the blood pressure are interventions. If the right thing is measured correctly, and if the measurement contributes to well-tailored interventions, then improvement occurs. Measurements must be part of a process by which one can improve the quality of one’s interventions.

The operative phrase here is “…must be part of a process…”  Measurement does not change blood pressure. (Actually, it often does change blood pressure – but usually makes it worse, albeit temporarily.) Measurement does not improve safety or quality. Measurement is the collection of data, nothing more. My complaints about metrics are fourfold:

  • The common but mistaken claim that things will improve because you measure them.
  • The common practice of measuring what is easy to measure rather than what will facilitate understanding and change.
  • The seductive ease with which people avoid the hard work of using good data to design and modify interventions by stopping at the measurement step.
  • The failure to measure the impacts of the intervention by the overarching system that used measurements to drive change. 

Perhaps a non-medical thought experiment will illustrate my perspective.

Imagine the Woodworkers Guild is unhappy that some craftsman do shoddy work, that this harms customers, and it undermines the trust and respect due those woodworkers who do the best work. They decide to do something, and start by studying a large and varied sample of woodworkers. They collect and analyze a comprehensive data set and find that, among all the metrics they collect, the quality and sharpness of saws is the best predictor of high quality work. This makes sense: the best craftsmen will purchase good tools and keep them in good condition. They decide they should leverage this information to improve the overall quality of woodworkers. They devise a program where woodworkers can have their saws inspected and graded. If they have high-quality saws in good condition, they get a logo to use on their products, on their website, and in their storefront window. This allows them to charge more. In addition, the results are publicly available, so potential customers can look for ‘saw-of-approval’ woodworkers. Which of the following do you think is more likely?

  • Poor quality woodworkers will go back to school or take extra training in order to learn new skills, and in the process will find they need to buy and maintain better saw?
  • Poor quality woodworkers will buy an expensive saw set, and keep it stored safely to show the inspectors, in order to win a ‘saw-of-approval’ accreditation?

Exactly. The craftsmen who have high-quality saws have them because it is part of how they do high-quality work. Acquiring a set of high-quality saws does not turn a poorly trained, lazy, or hurried woodworker into a skilled craftsman. (If that worked, I’d rush out and buy an expensive guitar and go on tour.) Measuring and incentivizing good saws will not improve woodworker quality. That would require using measurements that identify faulty woodworking processes followed by interventions to improve those processes.

So, now lets talk about medicine. I’ll start by using the story of an early and influential example of quality improvement involving ventilator treatment of ARDS done by Drs. Alan Morris and Brent James.

Unhappy with the outcomes of ARDS patients on ventilators in the ICU and aware of huge variations in the clinical approach, they identified some tentative ‘best practices’ for ventilator management of ARDS. The did not use these as targets or incentivize best practices. Instead, they made a recommended standardized protocol available for all clinicians in the ICU. The clinicians were told they were free to use, modify, or ignore the order set, but they were required to document what they did that was not standard, and why. They collected data on clinician behavior, patient characteristics, and outcomes. And then:

  • They identified poor outcomes in patients treated with the standard order set and used that information to repeatedly modify the order set.
  • They identified good outcomes in patients NOT treated with the standard order set, and used that information to repeatedly modify the order set.

Outcomes improved, from 10% to 40% survival. Measurement did not do this. Using measurements to design and continuously modify interventions did this.

An example of success on a smaller scale from our practice many years ago: we measured how well we were doing on some pediatric standards of care. We found a low rate of lead screening results by 24 months. We (the clinicians) provided each clinician with their data and reported quarterly on what percent of the patients had lead results in the chart. There was no change. We then collected more data and looked at it more carefully and were able to identify the failures in our system. We instituted a set of changes and watched our results climb steadily. Again, measurement didn’t cause the improvement. Our USE of the measurement: a deeper understanding of the process and the ability to find and fix the problems made the difference.

Here is a failure that I think represent a common phenomenon. Our institution required screening for fall risk in patients over 65 and penalized clinicians who did not meet the screening target. A form was created so that screening data could be measured. Nearly 100% of clinicians reached the target. However, the clinicians mostly delegated the screening to the nurse rooming the patient, the institution refused to set up a treatment arm or referral process to PT/OT for further evaluation, and no attempt was made to collect data about a decrease in fall-related harms or look at how the process could be improved. Measurement happened and was document, but it went no further than that. But our institution met the CMS standard. A similar thing happened with PHQ9 screening for depression, where the incentivized target was met but no treatment arm was put in place and no attempt was made to see if the intervention was working or how it could be improved.

If we are going to require or incentivize measurement, we should also require or incentivize that it be done productively and not for its own sake.

Peter Elias is a retired primary care physician living in Maine. He is an active member of the Society For Participatory Medicine and a contributing blogger with the Deductible. You can follow him on Twitter at @Pheski.

Leave a Reply