Well thanks again for the invitation. I’m excited to be able to talk about PROs in oncology today. This is an area where the use of patient-reported outcomes is really growing and changing. So Ari and I are here to talk about some of the neat things that are going on, some of the challenges that we’re facing, and share some examples.
Just quickly, an overview of the session. I’ll talk about the evolution of FDA’s use of PRO in oncology, thinking about using legacy oncology instruments across for multiple stakeholders. And Ari will speak about EMA acceptance of health-related quality of life claims. And then he’ll focus on some special topics related to PROs in oncology. And then, as Bill mentioned, we’ve set aside some time at the end for audience participation, questions and answers, sharing your experiences as well.
There’s been a real evolution of PRO data use in oncology at FDA over the last several years. And some of this stems from the work that’s been going on in patient-focused drug development. More broadly, there’s of course, Nikunj mentioned this morning PDUFA V and VI, 21st Century Cures, that have really helped to guide and motivate FDA reviewers and industry to be more patient focused. And I think there have been some key folks in oncology and hematology at FDA who have really taken this charge to heart. And we’ve seen some major changes. This idea of patient-focused drug development of course is now legislatively mandated in some regards, but there’s been a real push for it from the leadership at FDA, so Scott Gottlieb as has been mentioned already today, has been a real advocate. And so is the center director, Janet Woodcock. This is just a statement from Commissioner Gottlieb. But I think what’s important here is that there is a real focus on patient-focused drug development and including the patient voice, but there is also a continued focus on good methods. And so while FDA is showing flexibility and being collaborative in helping move patient-reported outcomes and other aspects of patient-focused drug development forward, there’s still a need for rigor, and patient-reported outcomes are still thought of us as something that needs to have that rigor and be approached with a real scientific viewpoint. And then just the center director has made some quotes, but basically she thinks that patients are truly the experts in their condition, and so involving patients is critical to drug development.
Apologies if you’ve seen this slide. I like to share it because it tells us that we’ve come a long way. In the 1938 Federal Register Notice, there was a statement that said “information in drug labels should appear only in such medical terms as are not likely to be understood by the ordinary individual.” So clearly we’ve come a long way. But I think we can do so much more. And oncology, hematology, and the Oncology Center of Excellence now at FDA are doing a lot more, I think. Using patient-reported outcomes in oncology settings comes with its own particular set of challenges and I think may be more complicated than in many other therapeutic areas. So there’s been a long history of not using a lot of patient-reported outcomes in oncology. Plus, there are harder endpoints, overall survival or progression-free survival, things that are considered to be more objective, maybe easier to measure. And they’ve done a pretty good job of getting drugs to market over the years. But now there’s this real focus on understanding how the patient is actually feeling while they’re taking these products, and then over time.
One of the challenges right now is that, while FDA reviewers in the Office of Hematology and Oncology Products are interested in PRO data, they’re dealing with trials now that were often started or planned four, five, six, seven years ago in some cases, where not as much thought was given to the selection and implementation of patient-reported outcomes. So they’re receiving these data now that weren’t really intended for FDA or for regulatory purposes. But they still have an interest in seeing it. And so now the challenge is to figure out what we can do with these data from the trials that are already underway or that are wrapping up now, but then also being more proactive to try to encourage better PRO data collection in oncology settings going forward.
One of the things that FDA did in 2016 was to release a paper on core concepts. These are Core Concepts to consider measuring in oncology trials. And this included condition-related symptoms, treatment-related symptoms, and then some sort of measure of physical function or daily functional impacts. This isn’t necessarily for labelling, but these are considered important types of concepts that are generally applicable across many oncology settings though condition-related symptoms sometimes present some specific challenges. But the idea was to share this paper to try to encourage good measurement of these things. It doesn’t mean that other important things shouldn’t be measured, but at a minimum try to encourage these things to be measured from the patient perspective in trials.
And then another thing that OHOP is doing right now is piloting information requests. I’ve seen a number of these come out in the NDA setting but I’ve also started seeing them come out earlier in the IND meetings where—and I’ll talk through this in more detail in a couple of slides—where FDA is asking for some specific information about the patient-reported outcome data that they know companies have.
Here is the core concepts paper. Again, it suggests three specific distinct concepts to consider measuring in oncology trials. The first is disease-related symptoms. When applicable—for all of these that caveat holds, when applicable—but in the case where patients have disease-related symptoms that may improve with treatment or where you may delay deterioration, these may be appropriate for labelling statements in the FDA product labelling. Of course if they’re measured well with the instruments and the endpoints are appropriate and all the normal caveats. For treatment-related symptoms, these are really important in oncology, and right now there is no easy pathway for labelling claims based on treatment-related symptoms or toxicities. But this is really important information to patients. There is a new tool—I’ll share more about that in a few minutes—but the PROCTCAE that FDA has suggested for use, and there is an industry working group that’s looking at ways to best implement the PROCTCAE to measure treatment-related symptoms in oncology trials, and NIH and FDA meet with that working group periodically to try to collaborate and figure out how best to measure these treatment-related symptoms, how to analyze the data, how to present and communicate these findings. Potentially there will be in the future descriptive information in labelling based on some of the work that’s going on right now. And then physical function or some measure of daily function. And this also has the potential for labelling language. We haven’t seen that yet, but again, we’re dealing with instruments and implementation from the past where these things weren’t pre-specified, they weren’t measured well and implemented well. And so as we go forward, we may see more statements in labelling come out based on disease-related symptoms and physical function.
And then, all of this information, plus other things that are measured using patient-reported outcomes, FDA generally will look at now to try to better understand what the totality of the evidence is saying, included as part of their medical product review, but not necessarily in labelling.
OHOP OCE is piloting these information requests. These were a little unsettling for sponsors when they received these initially. These were cases where companies had submitted a product for review, it was at the NDA review stage, and these information requests came out where FDA was aware that patient-reported outcome data were collected in the trial but not much had been done with it. And so they put together this information request, asking for more information about the PRO information, so asking for all of the pre-specified PRO analyses that the company had done, whether they were alpha controlled endpoints or not. Interestingly, they were also asking for the sponsor to summarize their own clinical interpretation of what they think the most important PRO results are that could help inform the assessment of risks and benefits. They also were asking for things like information about the incidence of different healthcare utilization, they wanted to understand missing data and who was completing the instruments and when, and looking at that in terms of expected PRO assessments. They wanted to know, among those who we would expect to have completed a PRO assessment at each time period, how many people actually did. Missing data is such a big problem in oncology trials that FDA is really interested in looking at that and understanding if it’s informative missing data or if it’s more random missing-ness.
And then looking at all PRO assessments at the total score level, domain score, and now at the item score level, just looking at descriptives, change from baseline, and looking at cumulative distribution function curves as well. So you can see there are a number of analyses here that they’re starting to ask for pretty consistently across programs, related to PRO data.
So the use of PRO assessments in oncology clinical trials, we touched on this a little bit when we talked about the different buckets of concepts that can be assessed or the FDA is suggesting maybe assessed. There are patient-reported outcomes for efficacy evaluation, so disease-related symptoms, potentially physical function. There are patient-reported outcome assessments for treatment-related symptoms or tolerability evaluation. And then there are patient-reported assessments for kind of a broader patient experience.
For efficacy, these again have the potential for FDA labelling, so this would include symptoms and proximal impacts like physical function. These may be appropriate for labelling claims, so actual claims with p-values where you have some statement you’re making about benefit. Or these may end up in labelling as more descriptive statements without p-values and so for conservative companies that may have implications on promotion. But it’s something that is put into labelling, these descriptive statements, to help inform clinicians and patients about how patients are feeling during the trial.
And then, the other broader health-related quality of life domains beyond physical functioning are still important to help understand efficacy, but as probably everyone in here knows, they’re often not included in product labelling but they may be useful in the medical product review.
And then, while this isn’t efficacy I included it on this slide, comparative tolerability. This is very rare for labelling, but the evidentiary requirements for obtaining a comparative tolerability claim really are—the rigor needs to be as strong as if you are evaluating efficacy, so the same principles that you would use to set up an endpoint for efficacy you would need to use for comparative tolerability as well.
Here is an example of efficacy data in product labelling. This is for the Jakafi product. And you can see this is information about the number of patients with 50% or greater reduction in total symptoms score by Week 24. This was the first secondary endpoint, so it was in the multiplicity plan, it was successful, and this was really important to the approval of the product, and so it achieved the labelling language here. The bottom right is evidence around the individual symptoms that make up the total symptom score. You can see here, this is more descriptive, there are no p-values here because these individual items were not established endpoints, alpha-controlled endpoints. But the agency thought it would be useful to put this descriptive information in here to be able to let clinicians and patients know that there wasn’t a single symptom that was driving the total score, that across all of the symptoms everything was moving in the same direction.
And so the implications here are, because this is descriptive information around the symptoms, it does help inform readers of the product labelling, but the company cannot make individual claims where they go out and say, night sweats were improved, for example, because those weren’t alpha-controlled endpoints.
There have been exploratory endpoints in labelling, these are just a couple of examples. These are more statements around different patient-reported outcome measures that were not pre-specified in the endpoint hierarchy, but there was something striking about the results that the agency thought should be communicated, at least descriptively, to clinicians and patients. The first bullet here, this is not FDA’s preferred PRO strategy going forward. Unfortunately, one of the implications of them having put in some descriptive labelling around these things is that it has become a strategy for some companies where they have chosen not to pre-specify endpoints, hoping they can just get the descriptive labelling and not have to go through the rigorous process of setting up important endpoints. FDA did this because they were stuck with data that have been collected and were planned years ago. As we move forward, I think their strong preference is that we really give the attention to these patient-reported outcome endpoints that we should be giving to them so we can have true rigorously tested endpoints that are described in labelling based on patient-reported outcome assessments.
Assessing safety and tolerability, this is the focus on treatment-related symptoms. Again, in the future we may see some descriptive information here. This tool, the PROCTCAE is not set up to be statistically tested, there’s no total score, this is at the item level. You select the items that are appropriate for your patient population, your clinical trial. And again there are a lot of good minds that have been working on how to analyze these data and how to present this in a way that is useful to clinicians and patients. So I think in the future we may see some of this in labelling, just again descriptively.
And I think there’s been a real push by the agency for sponsors to consider the PROCTCAE. Rick Pazdur, who is the head of the Oncology Center of Excellence has said that he sees the PROCTCAE as kind of the low hanging fruit for patient-reported outcomes in oncology. And I appreciate that because I think his statement motivated people to use it. But I might argue that it’s easy to implement and analyze, I’m not sure I would call it low hanging fruit. I think it’s important, it’s critically important, but there’s still a lot of things we need to figure out about how to implement this and how to analyze and report the data. But that’s coming. Lots of sponsors are using this now, and so again I think we’ll see it in labelling and in the near future.
There is no standardized approach for how to select concepts or items from this PROCTCAE item library. It includes a number of different possible treatment-related symptoms, and then has different questions for some of the different symptoms. So it may ask about frequency severity or interference, has a recall period of seven days. In picking the right items, again there’s no standardized way to do this, but it’s very important to select the items in a way that doesn’t appear like you’re cherry picking to try to make your product look better and a competitor look poor. So if a new product, if the comparator arm is chemo, then only selecting chemo treatment symptoms will make the chemo look bad, but if we’re not measuring the potential bad things in the new product, then that would be considered cherry picking. We’re not understanding what’s really happening with the product. So the idea, when picking the items, is to select symptom items that you think would be applicable across all of the arms of a study.
This is not always easy to do because sometimes we don’t really know. If it’s a new product, if it’s still under study, it hasn’t been used in a lot of patients, we don’t really know what it may be—what the treatment-related symptoms may be. And so we can try to look at other products in the same class, we could do some exit interviews in early studies to understand what patients were experiencing. There are a number of different ways that we can try to figure out what the new treatments are causing for patients. And then, the NCI put together what they called a cross cutting of core symptoms and so it’s kind of a starting place when you don’t have a lot of evidence on what your product may do, then using this cross cutting of core symptoms might be a good starting place.
And then, patient-reported information can be used for this broader patient experience. And this is an example of a new subsection that was added for Rituxan Hycela in Section 14, the clinical studies section. And it’s this patient experience section that describes patients’ preference for subcutaneous administration of the product versus intravenous. So this is something really new, we haven’t seen a lot of this in previous oncology labelling. So this kind of patient experience data that came out last year in this product labelling was interesting.
There is also a new section in the medical product reviews for describing patient experience data. So when FDA reviews a product to make it an approval determination, if the product is approved, then a medical product review is put on their website and publicly made available. And it shows all the things that they reviewed in the submission package. This is new, thanks to 21st Century Cures, where any patient experience data that was considered during the medical product review is documented here. This is the table that documents the type of evidence, and then it tells you the section of the medical product review where you can find FDA’s comments or their review of the patient experience data.
Another interesting development in oncology PRO instruments is that there are a lot of legacy instruments that have been around for years that have never been suitable for labelling, at least not with FDA. So there are different ways now that we can think about leveraging these legacy instruments.
Before I talk about that specifically, I just want to share this slide that sort of highlights the differences across different stakeholders for what’s important. Of course everything on this list from the proximal disease defining concepts out to the more distal general health-related quality of life concepts are all important to patients. But in terms of FDA labelling, they focus more on the disease-defining concepts, some of the proximal disease impact concepts. But EMA and payers are often more interested in some of the concepts further to the right. And so it really is sort of a puzzle, putting together your PRO strategy so that you’re measuring all of these different things for your different stakeholders, but not asking the patients to complete 500 items to get all these different concepts.
Here is an example of a domain in a legacy PRO instrument that has a score that has not traditionally been suitable for FDA labelling. The challenge here is that this set of items in this domain that produce a total score, includes both treatment-related symptoms and condition-related symptoms. And so if you see change on the total score, it’s not interpretable whether it’s the condition-related symptoms that are getting better, or whether there are less treatment-related symptoms. And for labelling, FDA has to be very careful about making sure that whatever is put in labelling is not potentially false or misleading, and so they have very clear recommendations that scores should be made up of very distinct concepts—the treatment-related symptoms from the condition-related symptoms, separate from physical function impact versus emotional impact. And so this instrument has been used a lot in previous studies, but nothing was ever suitable for labelling because the score wasn’t appropriate for regulatory purposes. For other purposes it’s perfectly fine. Just for FDA regulatory purposes it wasn’t suitable.
So what’s being done now to try to leverage these PRO instruments for US product labelling, because these legacy instrument scores are often appropriate for EMA or payers. So we don’t want to reinvent the wheel and have to use separate instruments for all of these different stakeholders. But what’s happening now is, people are starting to identify a subset of items—for example, the most important and relevant symptoms—that are already in that existing instrument, and then either looking individually at those items as endpoints or creating a new symptoms score from the selected items. So the full instrument is still administered, but then the way you set up endpoints based on the subset of items is different, so that you can pre-specify endpoints that could be suitable for FDA labelling.
In some cases, when a subset of items is identified, they’re combined into a new symptom domain score. If you do that, then you still need to do qualitative and quantitative work to make sure that there’s content validity of that new domain score and that you have good psychometric evidence for that new score. And so there is some work involved in doing that.
And there’s also—historically there’s been a bit of a challenge with coordinating with instrument owners who weren’t so keen on new scores being developed using their instruments. But in the last two, three, four years we’ve started seeing that there’s a lot more openness to that and I think we’re going to hear about item libraries later. A number of these legacy instrument owners have started creating their own item libraries where you can select specific items out of the library. I think that’s pretty cool. And it definitely can help to streamline these with these instruments across multiple stakeholders and it may give us some labeling and promotion opportunities that didn’t exist before.
Here is an example from the FACT-Lung. This is hypothetical and I’m not saying this is necessarily the right thing to do, you would still need to test this with qualitative and quantitative work. But it shows you that from the existing additional concerns domain and from the physical wellbeing domain, there are items across those domains that you may be able to pull out that are symptoms of lung cancer. And so you could create a new domain score using these selected items, again, if the evidence suggests that’s appropriate.
And then, just to kind of wrap up, thinking about just now about using instruments a little differently. But then using those assessments and turning them into endpoints is also a bit challenging in oncology. So we have the option when we select the subset of items to produce a new domain score or to use the items as separate endpoints. And there are pros and cons to this, so using a new domain score is only one endpoint, and so it has less implications on your testing hierarchy. But it’s not always the case that those symptoms psychometrically combine well into a new domain score, and so you may need to test them individually. But then you’re potentially left with five or six separate endpoints, which gets tricky if you’re thinking about how you’re going to alpha-control for six different endpoints. Again, for other stakeholders, administering the full legacy instrument is still useful, so you’re still looking at those original domain scores and total scores. This new way of looking at individual items doesn’t replace the need to look at the original scores as well for your other stakeholders.
And then, some particular challenges with endpoints in oncology. I’ve talked about creating these new scores for condition-related symptoms. But often, patients have limited symptom experience at baseline in these studies. And so often, in other indications, when we use patient-reported outcomes we’re looking for symptom improvement. That doesn’t generally apply in oncology. And so we have to be creative on how we’re looking at these symptoms over time, and so it might be that we need to consider enriching the sample for patients who do have higher baseline symptom burden, so we can look at improvements. We have enough symptom severity in a population to expect we’ll have improvement. We may need to consider evaluating symptoms only in the subset of the trial population who at baseline had higher symptoms burden. Or we might need to think about looking at delaying deterioration of symptoms or delaying the time until the symptoms appear.
And so these are just a few things that are going on in the oncology world right now. And then, Ari is going to address another set of challenges and opportunities in oncology.
Wonderful. Thank you so much. Thanks Ashley. Ashley has gone through the regulatory side of things, and I’m just going to moan and groan about all sorts of other things.
What I’m going to talk about is some of the unique characteristics around oncology studies. And I’ll also talk to you about some work that we just completed looking at how the FDA and how the EMA assess evidence generated from oncology studies. We’ll talk about assessment schedules, and this is a bit of a farce. And we’ll also talk about confusing value messages that come from analysis of oncology clinical trials.
Here is the regulatory characteristics of cancer studies. Most of them are first in class. A lot of them are orphan. They go through fast track, priority, and accelerated reviews. So the speed of developing oncology trials is not the same as the speed of developing drugs for arthritis or psoriasis or something like that. It’s very fast. And these are the sort of characteristics that you find why teams don’t—if you don’t have experience already—teams don’t have enough time to think through the strategy for patient-reported outcomes, which like Ashley said earlier, sometimes end up in exploratory endpoints.
And there are some design characteristics in oncology studies. Most of them are single-arm studies, compared to non-oncology studies. A lot of them are orphan studies, they are not controlled. Few randomized control studies and with fewer patients, less than 200 patients, about a third of them thereabout. So they tend to be small studies, orphan label studies, and single-arm studies.
When you combine the regulatory characteristics and the design characteristics, it’s not quite PRO-friendly. And that explains some of the challenges that Ashley went through.
And further challenges of integrating PROs in oncology studies. We just talked about single-arm or open-label studies, it has a high attrition rate. A lot of cancer studies fail very early, so there is less incentive to invest in PRO development in early times. So it’s always too early, until it’s too late. Rapid development time, so for example 43 new oncology drugs approved by the FDA between ’12 and ’16, 42 used at least one of the expedited approval programs, like fast track or one of those programs. So a lot of them go through very fast development program times.
And, like Ashley just said, it is difficult to differentiate the impact of treatment and the impact of the disease. Fatigue for example, is it to do with the cancer or is it to do with the treatment, it’s very difficult to distinguish them. And within the organizations there is a perception that PROs are soft endpoints, whereas survival and progression-free survival and survival-related endpoints tend to be hard endpoints. So there is less of an emphasis, because the whole team or the people who are involved in it are used to dealing with very hard endpoints, easy to measure.
Challenges of integrating PROs in oncology clinical studies: symptoms can vary by severity levels. For example, early stage CML studies may not present any symptoms at all. And there are some cancers where different patients can have different symptoms. So pancreatic cancer, pancreas is a very small organ in your body but the symptoms experienced by the patients can differ so much depending where the tumor tends to be. Head and neck cancer, brain cancer, same thing. So not all patients have the same symptoms. It’s not like psoriasis or dermatitis, where everyone has itchiness and redness and thickness and so on and so forth.
Now we switch to safety in changing therapeutic context, and we’ve touched on what Ashley talked about in terms of PROCTCEA. So the old medicines or chemotherapies, they were cytotoxic. Intermittent intravenous administration, shorter duration of treatment, relatively homogeneous adverse events profile. Typically, fatigue, hair loss, taste change, the reasonably same kind of profile in the old medications. But the new ones are very different. They can be continuous oral administration, so it’s not like you have to go to the clinic to have your IV and then three weeks later you come back. There are medications now where you get oral medication daily, this is becoming more and more common. More prolonged duration of treatment, and adverse events can differ depending on the mechanism of action. So it’s not the same, it’s not like chemotherapy where everyone has the same kind of adverse events.
And also the new therapeutic medications, they might be safe in the sense that they have low toxicity, but there is a cumulative toxicity. So instead of things happening straightaway, two days, three days after a chemotherapy, things can happen like three, four, five weeks later. And this is one of the reasons both the FDA and the EMA ask for post-progression data or long-term data, which never used to be the case. Or if they did, we didn’t bother to give them and they didn’t mind. But now it’s kind of becoming more and more common.
Now I’m going to look at some of the oncology drug approvals between 2012 and 2016, but this is still looking at the EMA. What you see are the sort of labels and submissions that went through that had PROs included. Strangely enough, the EQ-5D had one of the most labels. The EORTC QLQ-C30 with disease-specific modules: out of 45 studies, 13 of them had that, and [unclear 0:38:07] labels. Anyway I’m not going to go through them one at a time, but you can see the sort of numbers that are coming up from the EMA. Not the case with the FDA. FDA, 0. Like Ashley said, the EORTC and the FACT measures don’t run there.
Looking at the concepts that had labels. Any symptoms: about 76%, 16 out of 45 I think. Physical functioning: 4%. Emotional functioning: 1%. Blah blah blah. And then there is something about health-related quality of life or quality of life: 13%.
Anyone know what quality of life means? That’s a question, it’s not a statement. Does anyone know what quality of life means? What I mean is, if I hand out a 4x6 to every one of you and ask you to write quality of life is, I’m sure we all have our own views of what quality of life is. Now, quality of life for a newly diagnosed cancer patient would be very different from someone who is on third line, someone who is terminally ill, someone who is on adjunct treatment. So it can vary a great deal, and this is another reason why the FDA don’t go down that route, like Ashley said, but the EMA has quite frequently—as you can see, 61% of the time—gave some sort of quality of live related labelling. It even gave label based on the EQ-5D visual analog scale twice. So it’s a bit of a farce.
When we looked at the critical regulatory comments from the FDA and the EMA, it’s quite interesting. The FDA looks at the study design and the measure, so that’s a 9 and a 4. And they say, look, we don’t like the design because it’s open label design or single-arm study. And we don’t like the measure so they stop even going any further and that’s the end. The EMA don’t think too much about it, otherwise there is no complaint about study design or the PRO measure. Instead, their complaint was mostly on no statistical difference or excessive missing data. So you can see how the evidentiary standards are different between the FDA and the EMA.
The impact of missing data, Ashley talked a little bit about that, I don’t want to go too much into it, it’s a little statistical and so on and so forth. But I heard a talk during ISOQOL I think or ISPOR, I’m not so sure, should the mechanics of missing be proactively controlled, so we can minimize patient burden. In other words, can we control the missing values, the study design itself can control the missing values to minimize the patient burden. For example, if there are four visits, one-two-three-four, baseline two-three and the last visit, can we say some patients can give us information one, two, and four, and some patients can give information one, three, and four. So in other words, proactively controlling the missing value. Patients randomly elected, not by treatment but just by patients.
The other thing I often thought about was, why not collect data before the patients come to the visit, the night before or the morning of. Because during the visit, there are so many things going on at the time. How many of you have been part of a clinical trial? Two, three people have been part of a clinical trial, so I’m sure you know what I’m talking about. Stuff happens. At least I encourage you, like one of my previous employers did, to spend a day at a clinical visit in the waiting room, watching what’s happening to the patients as they go through the process. It’s not how we imagine, where they are given a space with a clipboard and forms to fill out and they are happily completing the forms. That’s not the case. My daughter was part of a clinical trial and she was asked to complete the pain questionnaire with five or six questions while a biopsy was performed. So you can imagine what sort of stress patients can go through. That wasn’t the only thing, there was a whole bunch of other things.
Just to get away from that sort of atmosphere, why not ask the patients to complete the data the day before or the morning of, instead of at the time. I never quite understood that. There are a couple of sponsors who are doing that and things are going well. It’s something to think about. And as an ePRO company, and ePRO vendors, they should be able to advise the sponsors to think of these alternatives rather than just go with what the sponsor is asking for. You should be part of the solution, not part of the problem.
So things happen, we take risks. There are daft things that we do. By the way, Ashley wasn’t very happy about me talking about daft things. She said it’s a very professional meeting and you don’t have to talk about daft things, do you. But anyway I think it can raise the message very quickly.
Assessment schedule is not appropriate for the impact of the treatment. This is one of the daft things that we do in cancer studies. These are typical cycles of clinical trials. And things happen soon after the cycle starts in a typical clinical trial. So that’s the asterisks at the bottom, cycle 1, cycle 2, cycle 3, things happen at the beginning. What does the industry do? We ask for information to look back seven days at the beginning of the next cycle. So we typically ask at the beginning of cycle 2, what happened during cycle 1, and then we ask the patient to look at the last seven days where nothing has happened. And we wonder why there is no treatment effect. And not only that, the treatment 2 doesn’t always start straight after their treatment 1. Things happen, patients have adverse events, often there is a delay of a few days or sometimes even a few weeks between cycles. So you could be measuring nothing and wondering why we are not having data that should be pertinent to the patient.
So assessment schedule is not appropriate for the impact of treatment. That’s daft thing number one.
Daft thing number two. We often see differences in death, dropout, toxicity between treatment arms. All sorts of things happening. But the message from the PRO publications manuscript will say the finding that the difference in grade 3 or grade 4 adverse events between the study arms did not translate into clinical meaningful difference in the reported health-related quality of life. In other words, all these things are happening, nasty stuff is happening, but there is no difference in health-related quality of life. And that doesn’t make sense. And this is one of the main reasons why the PRO findings are not published together with the clinical findings, because it’s just confusing and nobody can believe that. And when they do actually publish it, it comes much later when it has no consequences whatsoever. So this is conflicting value messages. But no difference in health-related quality of life can be due to suboptimal measurement tool, suboptimal assessment schedule like we talked about just now, or suboptimal analysis. If you just look at baseline, last value, last value carried forward, don’t take care of the missing values, then that can all lead to no difference in health-related quality of life. Which leads to often delayed separate publication or no publications whatsoever.
I understand Lancet now will not publish clinical results without the PRO results included in the final manuscript. They insist on it.
Anyway, that’s daft thing number two.
Daft thing number three. Use of multiple instruments. Assessment of the same concept using multiple instruments at the same time, that increases the study burden, patient burden, data quality, and analysis. In a typical lung cancer study, you could have the QLQ-C30, the LC13, the LCSS, and EQ-5D. And they all measure pain in one form or the other. Three of them will measure breathlessness and cough. So you’re asking the same questions again and again, at the same visit, and that makes no sense. Again, it’s one of the things that the ePRO vendors can point out to say, hey guys, you’re measuring the same things again and again. It’s something that you can think about.
So the FDA is beginning to get sensitive about this and they are saying—here is a response from the agency—measurement redundancies should be avoided to the extent possible to improve data quality. So they are beginning to catch on to this.
So in conclusion, integrating PROs in oncology clinical trials can be challenging, very true. Assessment of PROs in oncology studies are suboptimal. That is also very true. We also saw that the FDA and the EMA use different evidentiary standards to assess PROs from oncology clinical trials. And the industry excels in doing some pretty daft things. But we are not in Kansas anymore, because like Ashley said, things have changed, there’s a lot of new initiatives being put together, so we are not in Kansas anymore.
In conclusion, patient-focused drug development initiatives encourage inclusion of patients’ voice in drug development. It is now the law. And Ashley also showed one of the labels at least that included the patient-experience labelling. So there is increased emphasis on patient experience. Guidance from OHOP provides flexibility. We’ve seen that there’s very clear guidance, the manuscript that Ashley referred to talks about disease-specific symptoms, treatment-related symptoms, and physical functioning, how they can provide data to assess the risks and the benefits. Not necessarily labelling; risks and benefits in a positive manner. And there are also examples from labelling based on exploratory endpoints, but like Ashley said, that might be just a passing thing.
So we had the old, now we have the new. It’s time for the industry to take a fresh look. And thank you so much.
Bill, I’m going to come and sit there, so you can come and sit there too.
[Q&A Section 00:49:50]
I’d like to thank you, Ari. I love the daft things; it was great to hear about those. Anybody got any questions? I think we heard a lot of information there in terms of the kind of things that are going on in oncology, the kind of labelling claims that are possible on the EMA side, and some of the challenges that we get in regulatory trials for oncology.
AUDIENCE MEMBER 1
Thanks very much, great presentations. Ashley, my question is for you. The CTCAE that you refer to, if you’re choosing particular items, is it possible that you could put your thumb on the scale and choose items that will be beneficial to their outcome? If you’re choosing items, is it possible that one could choose items that would be biased in any way? What’s the thinking around choosing different items?
The goal of choosing items is to choose those items that you think will represent the most likely to occur or the most severe, most bothersome treatment-related symptoms that they can expect from whatever product they’re on, whichever arm they’re on in the study. So my point about not cherry picking was to not just select those items, say that would be appropriate for the chemo arm and not choosing items that you think would be relevant to the new product that’s under study. Then you’re missing all of those things that could be occurring, you wouldn’t have the opportunity to learn what’s going on in that arm of the study in terms of treatment-related symptoms. I’m not sure I understand your question about the beneficial—
AUDIENCE MEMBER 1
[inaudible 51:51-53] —overthinking or being able to cherry pick and ignore certain things.
Ah yes, so if you just record all of the bad stuff that your comparator arm does, then you’re making your product look great, you just haven’t measured the right thing necessarily to show that it was— yes. So the FDA frowns on that.
Let me just add something to that. One of the things that we asked to do was to look at previous studies or previous information that you might have to choose the PROCTCAE items. It’s very difficult to do in Phase II. There’s not a lot of information. And one of the things that I keep running into is that most of the compounds are not organically developed within a company but brought from somewhere else, and often data are not there, I mean they didn’t bring the data with it. And if there is data from animal studies, that doesn’t help a great deal because animals don’t display the same symptoms as humans. For example, rats don’t vomit. So you can only look at the behavior and think what symptoms patients may feel. So, while the theory is right that you must choose the items in an unbiased way, that’s very true. But actually doing that is not easy because of the lack of information there is to do it.
AUDIENCE MEMBER 2
A question for Ashley, more of a clarifying I guess. What’s the difference between pre-specification versus alpha control? Is the FDA requiring everything has to be alpha controlled as well as pre-specification?
Pre-specifying is different from alpha-controlled endpoints. Pre-specifying just means that you set your endpoint up, describe in your SAP how you’re going to test the endpoint, and you put it into a hierarchy somewhere, so it could be a primary endpoint, it could be a secondary endpoint, or it could be exploratory. Alpha control, or controlling for multiplicity, is another step. And so you’re doing that to make sure that when you have a significant finding that you can believe it, that it’s not a spurious finding. I’m not a statistician so I won’t go into all the ways you can control for alpha, but you would set up some testing hierarchy or use some different methods to make sure that you can believe the results that you find. And so typically FDA considers, when you say secondary endpoint at FDA, that implies that it’s alpha controlled and therefore could be suitable for labelling. Lots of times sponsors will say key secondary endpoints, which they mean alpha controlled, and then other secondary endpoints that are not alpha controlled. To FDA those other secondary endpoints that ae not alpha controlled are considered exploratory and not suitable for labelling. And so when putting together your endpoint hierarchy it’s important to pre-specify so you’ll know how you’ll analyze the endpoints, but then also considering where you’re going to control for the error.
AUDIENCE MEMBER 2
Thank you. I just ask you because a lot of times people confuse the two, and people think you need to have alpha control with everything. Like you don’t.
Question for Ari. Challenging your assessment schedule, one of the dafts you have. Let’s say you’re designing a study for three or four years maybe, hopefully depending on the number of events you have. And you’re asking patients—I think what you present, I think I acknowledge all the comments and I think some of the limitations. How would you ask patients to report their symptoms for that long trial? I know usually you anchor to a cycle or things like that. Considering patient burden, when you’re asking patients to fill that out every week, or maybe every day or however it is, I don’t know, what are your thoughts on that when you have a long study in oncology? They are not 12-week trials—
Bill asked me to come and talk about the problems. I wasn’t asked to give answers.
It’s a good question, and I do not have an answer. One of the things that I think about and I haven’t sort of gone through it is, at least, if it’s a three-year study, the chances of you having at least 50% of the patients left in the study after one or two years is very low anyway. So you are dealing with rare events at that point. So at least for the first year or depending on the medication, whether it’s the first three weeks or—early in the studies, do it well. Later on, it’s not that you don’t do it well; you do it infrequently, or as per convenience. And this is why most studies don’t have data collection progression because it’s terribly inconvenient and very few patients, blah blah blah. But the EMA wants to take a look at it. The FDA said [unclear 0:57:39] said I don’t want to know. But the EMA is hell-bent on looking into it because they are looking at very late, possible cumulative adverse events. So anyway that’s my thinking. Do it well, to it early frequently, but later on pick and choose your battles, kind of thing.
AUDIENCE MEMBER 2
Ari, I was hoping for a solution.
Anybody else, any comments?
AUDIENCE MEMBER 3
You made a very good point during the discussion. I forget which of the two of you. You both made a lot of good points. But the good point that I was talking about was, you spoke about the eCOA provider can help to choose. Ari, I think you spoke to multiplicity of measures for the same thing, for pain in three different scales. Typically, though, we become involved way down the pike, the sponsor has already settled the trial elements, and then the ePRO provider gets into it. And it’s just a challenge in that regard because the protocol is set. How do people become more educated and balance it out?
And the second question then, which kind of goes to the PROCTCAE is, is there any thought to defining or predefining a subset of measures that would be used consistently across a broader set of protocols, which then would take out that whole aspect which is our challenge is protocol review after we’ve begun work, and then it’s like, oh we have to change, now the measures are not going to be accepted and we have to modify it again. Is there any thought to a consistency for comparisons across trials?
For the PROCTCAE there is the NCI’s core set of items that can serve as a kind of basis and could be consistently collected across studies. I wouldn’t just stop there though, because there are particular treatment-related symptoms that can occur in some of the newer products that are just specific to those products, and if we just stop with the core set we may not be measuring all of the right things. So I do think consistently we could measure the core set, but making sure that we’re all still including those other elements that we find are important to particular products.
As for your first question, I mean I sympathize. You guys are brought in late. When I was in the industry, one of the things that I expected the vendors or consultants to do is to provide added value. By that I mean give me something for which I’m not paying you. That’s what it means, added value. And you could say, in this study, you are making these mistakes, or you could do this better. And very often, the sort of things that I’ve pointed out, no one has thought about that, because usually it’s copied from one protocol to the other or someone gave some standard inputs and so on. But if you say, look, you are measuring pain four times at the same assessment, maybe the next study you should think about it differently or when we come to the Phase 3 study you should think about it differently. Not just oncology studies, there are other studies where the same sort of things happen. And then you add in value, and probably next time around, when you are brought in, probably things will be a bit tidier, or you’ll be asked to come in and say, okay you said all these things, is there anything else that you want to say, as more of a PRO consultant rather than an ePRO consultant. Which is not good news for me because that’s [unclear 01:01:39] But that’s the added value. Otherwise you are just utility that’s all.
Ari, it is really interesting, isn’t it, where you see protocols where you’ve got three or four different similar instruments being used at the same time is exactly what you’re describing and the overlap is quite large. And so aside from using them at different time points, which would be an obvious possible solution, do you think the sponsors need to use so many instruments? Or, why are they putting these multiple similar quality of life scales all together like this? Is there good rationale for it?
Yes, so take the LC-13 and the LCSS. These are lung cancer questionnaires. One is called LC-13 and the other is LCSS. Out of the 13 questions in the LC-13, eight of them are the same between LCSS and LC-13. So they are so scared of peeling out items from the instruments and that’s kind of allowed now and people are beginning to get a little soft about this. Just to be on the safe side, you put both in. Then you have to put an EQ-5D anyway, otherwise the Europeans will go nuts. And then you need to put the QLQ-C30 in because that’s what everyone else does and so on and so forth. So hopefully, in years to come, that will wash out because now sponsors are beginning to pick items that are only important and ask only once. And if not, it’s the sort of thing that you should be able to ask as well, like I said earlier. The other thing is, you don’t have to measure everything all the time at every visit. Physical functions don’t have to be assessed every three weeks. Maybe every six months. Sometimes once a year. So some things don’t change much. But you’re just so scared of patient dropout, patient might die, that’s the other end of the argument. But that’s why we every visit go through. There’s no perfect answer for this but there are ways of getting around some of these things at least, to ease the study burden.
Great. Thank you. And you’re obviously focused on patient-reported outcomes as part of this, but there are other COAs that we could consider. I read a very interesting article that was then reported in the New York Times I think it was about the use of wearables with patients suffering from various cancers, and in particular the use of activity and sleep monitors. So a wrist-worn accelerometer that a patient could go home with, and the physician perhaps during an intensive cycle of chemotherapy they could get a picture that perhaps a patient was unable to report themselves or didn’t feel well enough or by the time they get to the end of their cycle and they’ve got their clinic visit they’re actually feeling better, so then their ability to kind of reflect exactly how it was during treatment, so that week or week and a half after chemo, they’re able to get that picture because they can see the activity and the sleep patterns of the patient because actually there’s days here where they’re not getting out of bed, or they’re absolutely unable to do anything, and they can see a gradual improvement over time. Do you think, are you seeing in terms of the customers you’re working with, a greater interest in using those sorts of measures to supplement what we do with the patient-reported outcomes?
I’ve seen a greater interest. I haven’t seen people actually operationalizing that. So there’s interest. I think right now there is concern about how will the data be used, analyzed. FDA has made statements that they see a lot of potential for wearables but they’re not yet quite suited for labelling and we’re not sure how payers will look at it and we don’t yet know what the data will tell us, how do you interpret if a patient isn’t getting out of bed one day, is it because they couldn’t or because they chose to, it wasn’t a physical problem, it was some other—we just don’t know exactly yet what the data means for wearables. And so I think that’s still leading to a bit of hesitancy. But I think that to overcome that, the only way we can start to understand those things is if the wearables are used in trials, just as exploratory measures so that we can start accumulating the data, figuring out what to do with it, how to interpret it. And I do see potential there, both in the example you gave in the clinic for the physician to understand what’s going on, but then also eventually in clinical trials to better understand—and I don’t think wearables should replace patient-reported outcomes, I actually think in some cases wearables are not very patient centered. They don’t tell us everything we need to know. So I see them as supplementary to patient-reported outcomes, but we won’t know how to use them well until people start using them. But there’s interest, just not a lot of activity, at least that I’ve seen.
Anyone here from Pfizer? Oh, they didn’t let anyone out today. There is Pfizer, there is BMS, there are a couple of other big companies, they have wearable departments now. There are heads and functions. Suddenly these guys are thinking they are ahead of PROs. So there are the Pfizer PRO group, suddenly it is now in conflict with this wearable group. But anyway, having said that—and Novartis as well. And having said that, so what’s happening is that wherever there is a study, these groups are brought in as well. And they cause havoc because they want to have wearables and want to collect electronic data capture in every study. So last week I was involved in a study where I think it was urticarial, so patients have hives and breakouts and all sorts of things. And the treatment worked so well that patient-reported outcomes correlated very well with some of the distal concepts, like sleep and so on. So hives went away, patients slept well, high correlation, everyone was happy, treatment works and so on. Now they’re going to Phase IIIB study and the digital guys want to measure sleep. Now the patients are saying, I’m sleeping okay, I’m sleeping well. Then the question is, what is this wearable, how is that going to add value over and above what the patient is saying. So I was brought in and I said, look, it will be like having a circus clown in a fish market, it’s nice to look at but it adds no value. So there is this going on. Now, if you take something like sleep apnea, for example, where there is something physiological going on throughout the night and you might just die, and the patient is not aware, then it brings value. So there it is well worth having some sort of device that monitors your sleep better and brings a value or something, whatever. But not for all studies. There is a place for it, but there is too much emphasis and too much chatter at the moment. Hopefully it will sort of die down and we will come to some sort of equilibrium where it makes sense to have those things. Thanks.
Ari, Ashley, thank you very much indeed, that was excellent. Thank you.
[END AT 01:09:46]