Best Practices for Usability Testing to Improve the eCOA Patient Experience Particularly in BYOD Studies is, if nothing else, an extremely long title. My research colleagues and I are pretty excited recently about the ways that our particular perspectives and inclinations are coinciding regarding this topic. So I’m coming from an interest in human-centered design, UI/UX, accessibility guidelines, things like that. And then I have a number of colleagues that I work on this with who specialize in the research itself. So if you ask me a really technical research question I may defer, just as a forward.
The patient experience of eCOA questionnaires. We’ve all been patients. We heard from TJ yesterday, who is a really excellent patient advocate. And because of that, I know that we can all imagine the experience of participating in a clinical trial. And we can imagine using a poorly translated PRO. And we can imagine fumbling around with some sort of device. But in the context of these trials, using our imagination and our empathy to assume that we can understand the patient experience is a risk and sometimes results in missing important changes that we need to make to the different platforms. So, just a key idea that I’m going to be running with here is that our experience is different from the patient experience.
I’m going to tell you a joke, which is kind of a risk but I’m going to do it anyway. So a QA engineer—does everyone—a software engineer builds the software, right, and you have an interface and they build this thing, and it’s wonderful and they do their due diligence and they make it this really excellent thing that we can all use. And then a QA engineer comes in and it’s their job to try to break it. So they click all the buttons, they input things into all the possible input spaces, and do everything they can to break it, and then of course it’s deployed to actual users when everything fails. So it’s also good to have explanations before you tell a joke.
So a QA engineer walks into a bar. He orders a beer. He orders 0 beers. He orders 9999 beers. He orders a lizard. He orders -17 beers. An actual user walks in the bar, the first actual user, and asks where the restrooms are, and the bar immediately lights on fire and burns to the ground.
So the point is, of course, that we as experts here, experts in this field, many of us with certain levels of education, access to technologies where we can discuss the benefits of different operating systems, like is my Smartwatch keeping me fit, and it is not. All of these different things are going to have a fundamentally different perspective from the patient’s because we are coming to some of these topics as a QA specialist, right. So we’re using all of our knowledge to test it as well as we can, when in fact the patients will do things differently than us, many times because they’re different from us and they have a lot of different differences. So this is thinking primarily, I should say, in the context of a global clinical trial. So I know Rebecca just spoke about a trial done in the United States, I believe, using people in the United States. Yes, I think I just ran through that. So our experience is not the patient experience. We’re professionals, physicians, high levels of education and we work in these global workplaces. Some of the patients are also like that as well. So I’m going to stop doing all these caveats but please know that I know that some of our patients are also all of these things.
What we’ve been trying to do lately in our usability testing, which again is the type of testing that is confirming that an interface is functional to a patient, they can get around and they can answer the questions and they know how it works. We are bringing several accessibility principles of interface design to the fore. So I have images of these, these words are kind of clunky, but I think we can all really understand them. We have perceptibility, can you perceive it. So this of course goes beyond just the content itself, but can you find it on the screen, all sorts of things like that. Operability, can you operate it. And understandability has to do with the quality of the text, the quality of the translation, but moves beyond that as well, which I’ll touch on a little bit.
So the idea is, in the case of PROs on ePRO electronic devices, you must perceive in order to be able to respond, you have to be able to respond in order to provide data, and you have to actually correctly understand in order to provide data that’s worth collecting. So any of these things can derail the quality of data collected.
So perceptibility: text, images, video, screen contrast, font sizes. And then again, specific to ePRO eCOA, this can be things that are falling below the fold, as they say in print, things outside of the viewport of the devices, right. So if you have a nice Likert scale and the last option is below the cut of the device, subjects might miss it. It’s even possible if you have a scroll bar, they might miss it. It depends, and it’s worth looking into. So perceptibility, I think we get that one.
Okay, operability: the user’s ability to operate an interface. Now this is, you know, we all know you need to have buttons that you can push, they need to actually function, they need to go where they’re supposed to go. But this actually extends quite a bit into the realm of patient concerns when we’re talking about potentially elderly patients, hand temperature can really impact operability. There’s also dexterity issues. I was just chatting with people yesterday about using a small mobile platform with stage 4 Parkinson’s patients, and the challenges that that may bring. So of course, someone with dexterity issues may be able to rest their hand on a keyboard and type or use a mouse, but may have trouble selecting very small buttons on a device. And then also, navigation is a part of operability, which also came up during the last discussion. So if the patient can’t navigate properly, they can’t respond properly. And all these things of course can combine to lead to frustration, fatigue, and poor responsiveness.
Oh, I forgot about this BYOD callout. I’m excited about BYOD, I love interface design. So we have these occasional just callouts where then I can say, oh this is particularly important when considering a BYOD study. And this is thinking of BYOD in the broadest terms, so we’re thinking about an interface that maybe—just to be really general—that is web based and will respond to all sorts of device sizes. So if I look at it on a desktop it’s going to be kind of a horizontal format, and then if I look at it on a mobile device, it’s going to transform. The text is going to move, images might move. So I’m speaking of BYOD in those terms. I know there’s a lot of restrictions in practice that people use.
So operability, so BYOD. Assisted touch technologies, then, so in this Parkinson’s discussion I was having with people, we were just talking about how there are assistive touch technologies available. And you may want to use them, because if someone is shifting from a desktop environment to a mobile environment, they may not have the dexterity to pull it off.
And understandability. This one is really seemingly simple but can become a little muddy when you dig into it. So clarity and readability, the predictability of the design, right. We’ve all been on terrible websites where there is no correspondence between say color themes, fonts, there’s too much going on, it’s too noisy, and so you have trouble navigating, whereas the best designed tech is clear, it’s predictable, it’s consistent. Also, the ability to review and revise answers falls under understandability. If a patient can’t go back and change things, maybe they’ve thought a little more about it, they want to make a revision, that can cause distress and an understandability issue overall.
So how do these principles interact with our patient demographics? All of these things are true of any interface design, but our concern is the global clinical trial. There are a lot of issues here. We’ve actually just worked on and published a poster that was at ISPOR Tokyo just a few months ago, if you’re interested in a little more information on this. But there are language differences, right, and not just that some words in one language are longer than in others, but right-to-left languages of course, different alphabets. I can say a lot about language differences and I will kind of trim it down.
Access to technology, so this is really about things that we find intuitive, right, as the experts, as the designers, that other people might not find intuitive because they don’t have access to the same technology. We found when we were looking into this, of course wealth has a lot to do with this, but there’s also gender differences. There are a lot of areas of the world in which men are likely to have more access to mobile devices than women. If that’s the case and you’re studying an illness that tends to impact women, you might want to take that into consideration if you’re looking at a BYOD study. Sneak peek: all of these are solvable problems, so we’ll be okay. Differences in technology, of course there’s device and user interface norms by country and different touchscreen conventions.
So I’m going to dive in a little bit to language differences, which is one of my favorite things. Right-to-left languages, talking about like, Arabic, Hebrew, Urdu. One of the challenges with eCOA, which I’m sure many of you know well, is that not only do you need to switch the language direction and justification, you also need to switch button functionality for these users to use it. So your back and next buttons need to swap. And that’s fine, it’s possible, but it does require pre-planning and it does require thoughtfulness. There’s also Asian characters and line breaks. As I was talking about with responsive design, if you have just like a totally responsive kind of setup, the viewport that you’re looking at, where text is happening will adjust depending on the device you’re using to view it. In English, any space between words is an acceptable place to break a line. Word wrap is something we just kind of take for granted. The same is true with French or Spanish or Italian. But in Japanese, for instance, that’s not the case. You can’t break a line just anywhere. So as the viewport changes, you might be introducing, accidentally, new meaning to the text. That is certainly not something we want to do in PRO development. And then font correction. My favorite example of this is Malayalam, the Indian language that I think we see the most font corruption with. And unfortunately, when we see this font corruption—you all know what font corruption looks like, you’re reading a text and then you have just like a giant weird box in the middle of it. With Malayalam unfortunately the font tends to corrupt to a seemingly other Malayalam character. So it’s very difficult to catch in QA, without linguist involvement.
Oh yeah, BYOD, I need to pause for the BYOD part. So, can we test these things prior to fielding? Of course. That is what I’m here to tell you about. And thankfully we can. We recommend—so again this is for a global trial that uses a diverse mix of patients—do on-device testing and it is accessibility focused. So if your trial involves certain physical issues that a patient may be facing, you will want to represent those in this testing for the very reasons I mentioned. If your hands are too cold to use the device, that’ll cause a problem. If you have an elderly population with a lot of vision impairment, you’re going to want to test it with them to make sure that they can use it. And again, on-device testing can become particularly important with BYOD, especially if you’re dealing with populations that may not have the technological access. Or you may need to do some training before the actual trial starts. Make sure that these things that maybe you thought were clear are actually clear to your patient population.
Okay, how patients are different, a review. They perceive things differently due to different languages, potential vision impairment, they use things differently based on their tech familiarity. And they understand things differently.
Not a big surprise, the solution is usability testing. We have been looking at usability testing a lot lately. And it has garnered a reputation for being long and expensive and not providing really useful information. So we’ve been working on that. Some usability testing methodologies rely on a cognitive debriefing interview method. So again, thank you to Rebecca for really going through the differences in these procedures. So the cognitive debriefing interview method relies on a phrase-by-phrase review of a translation, in this case I’ll say a translation, with a trained interviewer and the patient. It’s very time consuming, and it’s critically important to making sure that patient’s understanding these texts.
Now, when we’ve seen this applied to usability testing, so we’re using the same really time-intensive detailed procedure where we’re reviewing literally every aspect, we find there are some issues. So cognitive debriefing takes a long time, it’s expensive. And we found that, when you use that methodology and apply it to usability testing where you’re actually asking people to look at the device, can they use the device, the problem is, we do receive a ton of feedback, but it’s not feedback we can implement. Because by the time we’re doing usability testing, a lot of these texts are psychometrically validated, and the feedback that we’re receiving are things we cannot change. We all know this type of feedback, right, like preferential changes, people don’t like one word, they prefer a different word. These are also really time-consuming procedures, so it expends the patient’s mental energy, and we want the patient to tell us about the device. Like, we’re asking, is this something you can use, are you going to miss things when you’re using the device.
Here is an analysis of usability testing using that cognitive debriefing methodology where you’re just combing through all of the text and using this really long procedure. We saw that 64% of the feedback we received was on content validity that we could not implement. So it is not a good use of anyone’s time. Only 35% was on the usability of the questionnaire, and only 1% was on migration to the new format.
By the way, if this sounds familiar—should have opened with this—my colleague Rebecca presented this little section in detail last year at the eCOA Forum.
So the old method does collect some user experience data. There were some comments on changes to the instructional text, like a change from circle to select, things that Rebecca was just discussing. Also, some minor comments about the overall eCOA layout, but again, these were really minor comments.
So there’s a better way, thank goodness, that’s what I want to tell you about. The three principles of patient experience can’t be tested without the patients. So that is going back to those accessibility guidelines, these are actually taken from the W3C accessibility guidelines. So, perceptibility, operability, and understandability, you can’t test those without the patients, because they’ll understand differently than you will. So we’re targeting the patient experience of these principles our new method.
So our shift that we recommend is a shift from focus, again, on the text—it’s hard to get your mind out of the world of the text because we’re talking about the PROs, we’re worrying about this text functioning both on paper and on eCOA. But what we really need to test is not the text any longer but the experience of using that text on device. So, advancing the usability method, there is a better way.
The way that we are doing this now is to, of course, still test to make sure they understand the text, that’s obviously critical for everyone. So we still review instructions. Any text changes are adapted for the new format, and response options. And that’s just, again, speaking of response options, I’m talking about can you read the response options, do you know what they mean, at this point. But what we did is we shifted our focus to the key elements of the user interface, and we integrated a lot of questions regarding user interface and really pinpointing those elements of the text that may have changed—so button size and placement, text size and presentation, questionnaire navigation, operability. So we really focused our energy and kind of fine-tuned our questions to these components. It’s a semi-structured interview still, which was kind of addressed, so we have a trained interviewer doing these with patients in the target countries, asking prompts, and then receiving their information. But there’s a lot of room to speak and talk and discuss in these interviews.
We believe that you should keep a semi-structured interview format but refine the focus. So we’re framing these question in the context, again, not of the CD methodology which comes from a survey research background which of course is critically important in this industry, but shifting it over to kind of the user experience, human-centered design focus. So we look at navigation, information architecture, button size placement and responsiveness, text layout, and readability.
Results of the improved method were striking, as we had hoped. If you ask different questions, you get different answers, right. I mean that’s kind of obvious, but it’s wonderful to see these kind of results. So when we shifted our questioning, shifted how we were conducting these reviews, we saw that 73% of our feedback was regarding the usability of the questionnaire, that’s exactly what we want to hear. Is this working? Are you struggling with it? Are your hands too cold? Twenty-four percent of the feedback was about the migration to the new format, again exactly what we want to hear. Is this text, does it makes sense to say “select” in France, or do you use a different word. The words that we use for interface interactions aren’t always that apparent, and they don’t perfectly cross over in other cultures. And then we have this little sliver, so 3%, in the new method is regarding content validity. Still always good to know, if there’s an error in a psychometrically validated text we’d want to hear about it. But these are stylistic suggestions.
So, trends. One thing that we’re really excited about this new type of testing is it’s also shorter. It’s really targeted, so that the interviews were 30 minutes shorter on average when we conducted things this way, and yet we gathered significantly more information than we were looking for, using this methodology. Also, I’m going to go back really quick because I want to be clear, it’s not just that we took out the content validity questions and then the other sections are blown up. We actually received just an enormous increase of this kind of information, which I’ll talk about a little bit more, just a couple of quick stories since I have time.
And then also an increased focus on accessibility. One thing that we want to do and that we found, it is difficult for people to talk about their accessibility issues sometimes. We all want to be able to use technology well and to seem competent. And even in the context of an interview like this, if you’re focusing on the source text, not everyone will volunteer information to you about how they are struggling to use a particular product, unless you ask or kind of really—make space is a trendy thing to say, but make the space and the time to hear these things and for them to be comfortable and to know that this kind of feedback is precisely what you’re looking for. So, targeted probes on these topics caused or helped or whatever you want to say patients to volunteer information. And we heard a lot more about the struggles that they were experiencing with the devices.
Quick story. There was one woman—I did not conduct these tests by the way but if it sounds like it’s second information it is, because I’m not one of the researchers, so forgive any generalizations. But one of the women going through the usability testing apparently needed bifocals but didn’t have them for whatever reason, maybe couldn’t afford them. So throughout the entire process she had two pairs of glasses that she held the entire time. And she would switch back and forth during the process. Unfortunately, the text size on this device for this PRO was just so that within a same page, to read all the content she had to switch her glasses. This is not every patient. This is a kind of outlying example, but this was something they were able to solve, right. So when we know about these things that come directly from the patient, it’s like the person walking into the bar asking where the bathroom is, it’s kind of this—comes out of left field feedback that you wouldn’t have gotten any other way.
Another example is, there was a gentleman tested who was reviewing—he was a parent—he was reviewing a PRO that required proxy input, so like a parent or caregiver. But because of the instructions, how they were separated, kind of divorced from the questions—as Rebecca very wonderfully recommended should be ideally included along with the questions—he forgot that is was a caregiver questionnaire. And then when he got to the actual questions, the tone and register kind of switched throughout the questionnaire for a child or an adult, and he was deeply offended and disrupted by the tone that he felt was asking him as an adult man, in a way that was very childish. And these can of course very strongly with cultural differences, I’m sure we can all imagine, I’m not going to belabor the point. But anyway, it’s worth testing.
So all of these examples resulted in actual changes, which is the goal. We want to test, receive feedback, and then implement changes that can resolve the issue moving forward. So here are some examples of changes that we’ve seen from this testing. Response layout: again, it’s so easy for a response option to be below the fold and to just be overlooked. And that’s a really scary possibility, in my mind, because it’s hard to catch and will disrupt your data collection. Also, we all know how to look for that skinny little scroll bar to signify that you have to scroll down. It’s not necessarily true for everybody. Also, again, BYOD topic, patients have to scroll to see extra text, but not on all devices. So maybe you’re conducting testing internally, everyone’s a professional with lots of technological experience, they have no trouble seeing it on their laptops. But if you ask a patient to look at it on their phone, you can see other issues. Next and back buttons cut off, again, on some screens. So when you have screen size variation it’s important to consider these issues. So in that case there is an easy solution: they moved the buttons to the top of the screen, so they were no longer cut off, they were clear to everyone, and then the text the patient needed to see was also apparent, it was very clear that that text was cut off and they needed to scroll to get to it. And another one, a skip button’s label was too small to read. They were able to increase the button size. So the wonderful thing about these issues is they are very solvable. Well, most of them. Response option field with reduced—I think I just kind of talked through that.
And again, BYOD. If you’re looking at device size variation, and if you are probably designing your screen so that they don’t just scale precisely—that can look a little weird, I think we’ve all seen those websites that are just teeny tiny because they’re in a vertical format—you should consider these things. It is our recommendation, and I have seen and do believe that if you test with patients you will receive such thrilling and surprising feedback that you would not have gotten another way. I should say, not patients necessarily, people representative of your patient population.
So that’s it. So yeah, our new usability testing and the approach that we recommend is to gather patient experience data rather than patient text comprehension data. So it’s just a very clear difference: we’re still testing the understanding, but we’re also testing usability and perceptibility and all of these things in light of any restrictions a patient might have.
I think that’s it. Oh jeez. Well I said these things. You guys get it, okay. Thank you, that’s everything.
[Q&A Section 28:50]
Any questions for Alisa?
AUDIENCE MEMBER 1
Quick question. I’ve been a little bit out of the loop with regard to BYOD and why or why not. How are people responding to the why or why not around BYOD nowadays?
Well, my understanding—so this is going to be a limited perspective, there are lot of people in this room who know a lot about BYOD—is that patients, you know, all of us in this room I assume are carrying a smartphone, right, so we already have these devices. We know how to use them, it’s intuitive. So if you can get these things on someone’s own platform, the idea is that they’re going to love it, I mean it’s just going to be easier, you won’t have any usability problems. And I think that’s true, I think it’s a wonderful idea. I do think the challenge is that not everybody has a device, you know when you get into the global context it’s more complex. In the United States, I don’t personally know anyone under the age of 70 who is not really familiar with touchscreen technology, but that’s not globally true. I think that’s the—does anyone else?—that’s the idea of why BYOD is wonderful, right? And if it’s like web based, you can make changes or do data collection in a really efficient way.
Any other questions, comments?
AUDIENCE MEMBER 2
That was great. In terms of the usability piece, if we were to design a study to be generalizable across all the different types of patients that we want, what would be the property of that sample that we would want to make sure we included groups and cohorts of patients with different characteristics, what do you think those characteristics would be?
I’m going to speak in general terms again because I’m not one of the research scientists so I don’t want to bungle this. Definitely elderly patients, vision impairment, again cold hands—I have very cold hands and sometimes I even struggle with my touchscreen. So elderly, dexterity issues. And low education, people who are not wealthy. I think that’s going to be critically important. So if you’re not interfacing with all these technologies on a regular basis as a highly educated professional in a first-world country, things aren’t going to be intuitive.
Anyone else? Okay, thank you again to Alisa.
[END AT 31:42]