Evaluating evaluations

March 21, 2013

Practitioners are slammed with information overload. To help cut through the avalanche, a set of questions can be useful for determining whether or not a study is worth investing your time and attention.

Most practitioners are inundated daily with an incredibly large amount of new information describing products or treatment choices. The inflow is familiar to every professional and includes hard-copy journals, e-journals, newsletters, or advertising brochures. Among the possible lines of evidence supporting safety and effectiveness, the presentation of clinical results is always a big plus. However, not every study report should be treated with the same degree of reverence.

After reading about 20 x 104 articles during my career as a clinical scientist, I want to share my criteria for assessing studies. The following list of questions may help readers cope with the daily flood of information.

Category 1: Study Design

Was this a randomized, controlled, double-masked, prospective trial (RCT)?

You can earn 4 credit hours at a prestigious university learning why this is the gold standard among study designs. There are many benefits associated with this approach, but implementation is not universally practical.

The masked RCT design helps reduce the potential for bias that occurs when patients, doctors, and staff learn their assigned treatment conditions prior to or during the evaluation. For example, it lowers the probability that the worst patients were inadvertently assigned to one treatment group. A prospective, controlled design, addressing a well-defined, simple question, reduces the potential that a confounding variable will affect the study results and impact one treatment group and not the other. Overall, this design leads to a stronger cause-and-effect determination.

However, not every study can be conducted as a masked RCT. Studies involving medical devices are particularly challenging. For example, the surgical implantation of an IOL with a unique design creates problems when trying to mask the surgeon. The same is true with unique contact lenses. In the lens care arena, we might want all subjects to use disinfecting solutions dispensed from identical bottles. However, the transfer of the solutions to new bottles can significantly alter the product’s chemistry due to interactions with the plastics.

What was the control treatment?

First, I ask whether the treatment represents a relevant clinical choice. A head-to-head comparison between two competing treatment approaches can answer key medical and practice management questions. The study might determine that the treatments were basically equivalent or that there were large differences in safety or effectiveness. However, clever study designers seeking the magic p<0.05 level for statistical significance may choose to aim low, using an older treatment approach as the control in order to ensure that the new treatment wins. While this strategy helps ensure FDA approval and market entry, the old treatment may not represent current standards of practice. In addition, establishing clinical equivalence (i.e., not substantially different) is often much easier than establishing superiority.

Even when a contemporary control treatment is planned, the specific treatment selected can impact study outcomes. If the control (e.g., an established, marketed, effective pharmaceutical) is expected to perform equivalently to the test treatment, the expectations of both the patients and the doctors rise. All measures of effectiveness tend to improve when the participants realize that 50% of the patients were assigned a treatment condition recognized as effective. If an ineffective, untested treatment is used as a control (e.g., a placebo) expectations are often driven downward.

 

Were the right subjects recruited?

Here’s the dilemma. The perfect study is populated with patients who show up for all visits and complete all case report form questions with few errors. Perfect subjects have few non-essential pre-existing conditions that impact treatment efficacy and have a low probability for developing unrelated adverse events. Perfect patients are never extremely old, never too young, and never take any OTC or prescription products except those being investigated. Recruiting perfect study subjects allows for analyses to be completed quickly and makes interpretation easy. Unfortunately, once a drug or device is approved, practitioners cannot control the cases that walk into their waiting rooms. Compared to the study population, real patients often include older and younger subjects, and pregnant women-individuals who represent a much broader cross-section of the population. Enrolling more “Main Street” patients often drives up the number of patients targeted for enrollment and increases study costs. However, there is a strong support for a broader, more realistic inclusion/exclusion criteria that better predicts product performance in the marketplace.

Was the number of enrolled patients sufficient?

Numbers matter. Anecdotal reports concerning unique cases are extraordinarily valuable for the progression of science and medicine. This new, initial information is a catalyst for further explorations. However, these seeds should not be mistaken for mature trees. The smaller the study, the smaller the generalization. What is more believable: a 10% advantage in the treatment group in a study involving 10 patients or a 10% advantage in a trial involving 1,000 patients. Common sense (and a good statistical analysis) leads to the same conclusion.

Was the number of sites sufficient?

Having more study sites matters. Once again, the issue is the generalization of study results. Will the solid results observed among patients in Brooklyn apply to patients in Iowa City? Having a wide geographic diversity has numerous benefits. It helps address confounding variables such as seasonal effects, ethnic variability, humidity, and more. A prominent example is the success of extended-wear lenses worn at sea level (Norfolk, VA) as opposed to higher altitudes (Denver, CO).

Was the study duration sufficient?

Sometimes the answer to an important medical question can be obtained very quickly. For example, was a contact lens comfortable upon application? A quick study can address some simple questions. However, the development of adverse reactions and primary packaging failures are two examples of issues where time matters. Identifying the weak anti-microbial profile of a lens care product might be observed only after several month of daily usage.

 

CATEGORY 2: Interpretation

Were inferential statistics performed?

We all hate statistics and suspect that we are being manipulated by the clever geeks. But these calculations can provide useful tools that help guide interpretation. A statistically significant p value (p<0.05) means that only 5% of the time there is a Type I error (i.e., the study results found differences comparing the two treatment conditions when, in reality, the treatments should not have shown a difference in the primary endpoint). In other words, the results reported might have been a fluke, but the chances for this are very small. Statistics should not be the only criteria to measure success, but it provides strong support.

Were the results clinically meaningful?

Is the hunt worth the chase? Given enough enrolled patients, even a small difference in the mean values can be shown to be statistically different. For example, in a study comparing two topical anti-infectives, conjunctivitis was resolved after 1.5 days using Treatment #1 compared to 1.7 days with Treatment #2. While these results might be statistically significant, are the differences meaningful? In managing your practice, is it wise to switch your patients to a new product if the clinical benefit is small, and you have the choice of using a marketed product with an established safety and effectiveness profile?

Were the right questions asked?

You don’t know what you don’t know. Sometimes study designers and investigators guess wrong. They prepare questionnaires that ask the wrong questions or fail to ask the right ones. Patient interviews and spontaneous comments from study coordinators frequently provide key information. The casual conversation between a clinical research associate and study coordinator has uncovered many unexpected issues in the consumer products area, requiring re-designs.

Do the results make sense?

My SAT coach always told me to ask whether my mathematically computed answer made sense before I filled in the bubble. The same holds true here. Sometimes the results of a single study are just plain wrong. Resist changing anything in your practice until it makes common sense, especially in cases where safety is at risk.

Can the results be replicated?

Finally, we come to the most important factor: patience. Assessing whether a new treatment or product is a winner may take time. This is especially true regarding safety. We have all experienced situations in which individual adverse events are initially dismissed as rare anomalies. A noticeable trend might be detectable only after months or years of experience. Tracking results and objectively evaluating your experience over time is very important.

 

FACTORS YOU SHOULD NOT CONSIDER

This guide to interpreting clinical studies would not be completed without commenting on the factors that should not be considered.

 Who sponsored the study?

It may be hard to believe, but drug and medical device companies do not have institutional policies that require researchers to lie and cheat. My experience is that companies are populated with honest people who want to develop safe and effective products. Besides, conspiracies always fall apart, and there is too much to lose. Studies destined for FDA review are often designed to the highest standards, and study sites are intensely controlled to avoid fraud. In contrast, self-funded studies at academic sites and NIH-sponsored studies are often only loosely monitored.

Where was the study published?

Getting published in a nationally recognized journal is difficult and very time consuming. The time from initial submission to publication might be measured in years when you add review time, rejections, and re-writes. There is bias by editors and publishers, and this impacts acceptance rates. A study published in a contemporary electronic journal or published by a manufacturer should be judged based on the same criteria described above.

What was the study location, school, or institution?

Results of a good study, meeting all design and interpretation criteria, should be considered regardless of the source (author or site). The best ideas and the best studies sometimes come from unknown little places. Studies conducted at nationally recognized institutions should be considered on their merits.ODT

Reference

1. Veys J and Schnider C. Evaluating clinical research for your practice. Optician, 2009; 234: 6118, 22-25

 

TAKE-HOME MESSAGE

Practitioners are slammed with information overload. To help cut through the avalanche, a set of questions can be useful for determining whether or not a study is worth investing your time and attention.

 

 

FYI

During his long career at Alcon Laboratories, Dr. Stein led clinical teams responsible for the development of many lens care products. He has published more than 30 articles and is currently an independent writer and consultant. Reach him at SummerCreekC@gmail.com.

Anecdotal reports concerning unique cases are extraordinarily valuable for the progression of science and medicine. 

 

While the results might be statistically significant, are the differences meaningful?

10 key questions to ask when evaluating clinical studies

STUDY DESIGN:

  • Was this a masked, randomized, controlled, double-masked, prospective trial (RCT)?

  • What was the control treatment?

  • Were the right subjects recruited?

  • Was the number of enrolled patients sufficient?

  • Was the number of sites sufficient?

INTERPRETING THE RESULTS

  • Were inferential statistics performed?

  • Were the results clinical meaningful?

  • Were right questions asked?

  • Do the results make sense?

  • Can the results be replicated?