Expanding the Criteria for Medical Student Success

Nov 20, 2023

It isn't easy to assess non-cognitive skills in admissions and through the medical school experience in order to train caring and capable physicians. But many medical schools have moved in the direction of holistic review, which complements knowledge assessments like the MCAT and GRE with situational judgment (SJT) assessments of skills like compassion, caring, and teamwork. Paloma Cariello, MD, MPH, Associate Dean of Health Equity, Diversity, and Inclusion, joins Boyd Richards, PhD, Director of Educational Research and Scholarship to talk about their work at Spencer Fox Eccles School of Medicine at the University of Utah using SJT methodology to measure non-cognitive traits and give precise, finely tuned feedback to students.

LISTEN TO OTHER AAMC PODCASTS

Episode Transcript

Interviewer: Assessing non-cognitive skills in admissions and through the medical school experience in order to train caring and capable physicians. That's something that a lot of institutions want to do, and you're going to learn more about that today on conversations between colleagues with thescoperadio.com at AAMC Learn Serve Lead. We're exploring the innovative ideas shaping the landscape of academic medicine.

In this episode, we get the opportunity to listen to a conversation between Dr. Paloma Cariello, the Associate Dean of Health Equity, Diversity, and Inclusion, and Dr. Boyd Richards, Director of Educational Research and Scholarship, both from the Spencer Fox Eccles School of Medicine at University of Utah.

Dr. Cariello, where would you like to start this conversation?

Dr. Cariello: Thank you very much. Dr. Richards, I would like to start by asking why is it important to assess non-cognitive skills?

Dr. Richards: So medical school admissions officers have been quite capable over the years of assessing an applicant's knowledge using things like MCAT and GPA, but there's been a recognition over the years that that's not all it takes to be a caring and compassionate physician. And so they've begun to move to a direction that we call Holistic Review, where we want to compliment these knowledge assessments with other assessments of these non-cognitive skills, like compassion, caring, teamwork.

So today we're going to talk about a particular kind of assessment using a technique called situational judgment tests, which allows us to assess these kinds of skills, I believe, not only at admissions but also later through medical training to ensure that students are continuing to acquire and strengthen their non-cognitive skills.

Dr. Cariello: I understand that you have been interested in using this assessment of situational judgment testing to measure student development for non-cognitive skills, such as culture humility. Can you tell us a little bit more of why this is of interest to you and why we should use it?

Dr. Richards: So, when I came to the University of Utah nearly seven years ago, I had opportunity to begin to work with the admissions team. Ben Chan was very innovative at the time and began to use situational judgment tests along with other measures of non-cognitive skills in an attempt to move into a much more holistic approach to admissions. And he gave me the opportunity to work with him on developing a situational judgment test that we used in our admissions process for a number of years.

At the same time, the Association of American Medical Colleges had approached the medical school to participate in a validation of a situational judgment test that they were developing, a test that they hoped would become a complement to the MCAT, which is very focused on knowledge, and could add to the holistic assessment repertoire that the university or other medical schools had to use.

And so we were part of a validity study of what the AAMC SJT has become, now known as the PREview exam. And so by virtue of my working on the SJT at the university, and by virtue of working on the validity study of the PREview exam, I became very interested in the potential of situational judgment tests as a methodology to measure these non-cognitive skills.

Dr. Cariello: Oh, interesting. So, what is the goal that we are trying to accomplish using the situational judgment test, and what are we doing to meet that goal?

Dr. Richards: So again, I think we're pretty good at measuring knowledge. How much does the student know? How capable is the student of acquiring additional knowledge over time through the curriculum in the medical school?

But we've just not done a very good job measuring these other attributes. I say compassion, teamwork, service orientation, cultural humility, things that are just so important to a caring, capable physician. And so I think SJT is a methodology that really might work.

And SJT, by the way, presents a variety of situations that are supposed to be very realistic and then presents a number of options for the examinee to rate in terms of appropriateness.

If this type of situation were to occur while you're in a particular care setting, it might be an outpatient clinic, a hospital clinic, and then you have to resolve a conflict, a challenge, and here are some options for doing that, how appropriate is each of these options?

And there's evidence coming from the IO psychology literature that a situational judgment test where students are making these kinds of ratings is a measure capturing the level of knowledge . . . I use the term pro-social knowledge, a non-cognitive skill that they have to be able to navigate these difficult situations.

Dr. Cariello: Can you share a bit about the challenges that you're facing while you're trying to implement SJTs?

Dr. Richards: Developing any assessment is terrifically difficult, especially to reach a very high bar. I use the term validity where in order for assessment to be used to make these high-stakes decisions, you ought to be admitted to medical school or not or you're making progress in medical school in developing these traits, there's a pretty high standard that has to be met.

And so in developing the situational judgment tests that we have, we have to identify situations that are realistic and write those in a way that are short, simple, but yet not obvious. Then we have to write the possible alternative responses to that situation and have those be plausible, not obviously right or wrong. And it's not trivial to come up with these kinds of responses.

Then we need to pilot them and collect data using real subjects to see how they're going to respond so that we then can determine if the situations and the response alternatives are working as we expect to differentiate individuals that we think might have more of the trait that we're looking for, like cultural humility, or less of that trait.

And so it takes time and effort to develop these tests. My guess is if we were to ask the Association of American Medical Colleges what it cost for them to develop their PREview exam, it would be in the order of millions of dollars. Of course, we don't have that kind of money at a local medical school, so we need to find ways that we can develop these kinds of assessments without that level of funding.

Dr. Cariello: And how do you know that those results are indeed valid? What's involved in establishing validity?

Dr. Richards: Validity has two parts in my mind. One part I would say would be reliability. And here we're getting a little bit technical and we don't need to get too deep, but does the test have some degree of consistency? If I were to offer the test again to the same group of people, would we get a similar score for that population?

Do the items seem to be measuring the traits in a similar sort of way? And if a test is reliable, you have confidence that the score you're getting is a defensible score.

Validity then is the next step to that, and it is saying that the score you're getting is actually measuring the trait that you want to see, and that a higher score would indicate an individual has more of that trait.

The problem is what's the gold standard? If we have an assessment like an SJT and we want to establish its validity, but we don't know that the applicant already has the quality, what is the measure that we can use as the gold standard?

So in our efforts to develop SJTs at the medical school, we've tried to do a correlation with the admissions SJT, for example, that we had already developed. And so if you have one SJT and another SJT, we think we're measuring similar traits. Do those scores correlate? Is there a relationship between the two outcomes? That's one form of validity.

Ultimately, we might have another measure, a gold standard like faculty's perceptions, and we might have them rate students that they've worked with over the course of a period of time. And if they were to identify the same set of students that did well on our SJT, or identify the students who didn't do so well on the SJT, we'd say, "Yeah, this test seems to have validity. There's that relationship between the two measures."

Dr. Cariello: So it seems like the test can be used in various spaces, different people. I actually recently took an SJT that you developed about cultural humility, which I understand you're able to include as part of the curriculum for first-year students during their first few weeks of medical school. So, it seems like we do have some success in integrating SJT into the curriculum. Is that fair to say?

Dr. Richards: Yes and no. The first non-admissions SJT that a team and I were working on for a number of years was a test that we wanted to offer to students as they were advancing their curriculum just in anticipation of their entering into what we were calling the clinical clerkships.

As you know, in many medical schools, the first couple of years are very focused on classroom-based learning, and there's very little clinical contact. And so we wanted to offer this SJT at the time when they were beginning to enter the clinical environment and working with patients directly.

And we thought we had a pretty good test, but we just didn't have a lot of uptake by curriculum leadership. The curriculum was already really full. They felt satisfied in their ability to understand where student strengths and weaknesses may lie in some of these non-cognitive areas, and so they didn't see that the need existed for that test. So I think that test has potential, but it never was implemented within the curriculum.

We said, "Well, let's try another strategy. Let's use an assessment that's going to be more focused on an single attribute." And we picked cultural humility partly because at the University of Utah with grants from HRSA, we've been able to focus a lot on helping prepare students to work with tribal, rural, and underserved populations, both in the Salt Lake Valley, but throughout the state of Utah. And we thought a cultural humility test would be a useful assessment to ensure that these students were ready to work with these types of individuals.

And so we developed a cultural humility test over a course of years, the one that you mentioned that you took. And in our new program that just has started this past year, it was understood that that would be a good fit.

Having this assessment in the curriculum would be an opportunity for the students to get feedback about how they were doing in readying themselves for the challenges that they were going to be placed in early in their program of starting to work with underserved patients in student-led clinics in the Salt Lake Valley.

And so we were able to administer the test to all 125 students. Because the test has not been validated, we didn't choose to report the results back to the students. What we did instead was we showed aggregate results for selected situations and identified the pattern of responses for the different choices of how to respond to the situation.

It was interesting to see the amount of variance that students had. And so we were able to identify specific response options where there was disagreement and have a conversation. Why are students seeing the situation differently and coming up with different responses? How might that indicate different levels of humility or not?

So we could make the case that this was an attribute that mattered and maybe they weren't where they needed to be ultimately in having that attribute fully ready to be used as they're working with patients.

Dr. Cariello: That is fascinating. I'm personally very excited to see the work moving forward and have supported Dr. Farhat, who is doing a lot of health equity work in the student-led clinics in our school, and I would like to see that grow and become disseminated across our curriculum. How do you feel about the successes that you have had thus far?

Dr. Richards: As I look back over the five years that I've been working with situational judgment tests, I've recognized the need to take a long-term view. And while I'm a bit disappointed that there hasn't been more uptake earlier, particularly in what I call the readiness for clerkships SJT, I do think the experience we had with the cultural humility test is good evidence that this has real merit and opportunity.

And with your support and other faculty at the medical school, I'm confident that we can demonstrate the value of using this methodology to measure these kinds of traits, and to give feedback to students that is very precise and very tuned to their strengths and weaknesses so that they can further grow and develop these kinds of non-cognitive traits.

I certainly don't want to take anything away from the importance of medical knowledge, but we do a great job, as you know, both at admissions with MCAT, but with our USMLE Step 1 and 2 tests, the licensure test, a lot of the tests within the curriculum are very knowledge-centric.

I just would be delighted if five years from now, and it may take that long, that we demonstrate the importance of these non-cognitive traits because we can assess them reliably and validly with various methods, including the situational judgment test.

So, I'm feeling good about the success today, and I'm optimistic about the future.

Dr. Cariello: That is wonderful. It's great to see progress on how we can measure effectiveness in social functional dimensions so we can continue to work on it and see palpable improvement of our skills so that we can have effective interpersonal interactions.

91�鶹��ֱ��