WEBVTT

00:00:05.100 --> 00:00:07.050 position:50% align:middle
- [Anna] Well, good afternoon, everyone.

00:00:07.050 --> 00:00:15.540 position:50% align:middle
And I'm really delighted to be here,
and honored to have the opportunity to actually present

00:00:15.540 --> 00:00:18.000 position:50% align:middle
this research to you in person.

00:00:18.000 --> 00:00:23.840 position:50% align:middle
I'm joined remotely by my colleague,
Professor Rob Jago from Royal Holloway,

00:00:23.840 --> 00:00:27.330 position:50% align:middle
University of London, who I hope is with us.

00:00:27.330 --> 00:00:28.290 position:50% align:middle
I think he is.

00:00:28.290 --> 00:00:30.330 position:50% align:middle
- [Prof. Jago] I am here, Anna.

00:00:30.330 --> 00:00:31.180 position:50% align:middle
Thank you.

00:00:31.180 --> 00:00:32.370 position:50% align:middle
- Great.

00:00:32.370 --> 00:00:42.500 position:50% align:middle
So Rob's going to be talking in a little bit about some
of the qualitative findings from our work together.

00:00:42.500 --> 00:00:45.830 position:50% align:middle
So the first thing to say
is I'm not a technical person.

00:00:45.830 --> 00:00:55.630 position:50% align:middle
So those of you who are computer scientists in the
room, you're going to be disappointed because I'm going

00:00:55.630 --> 00:01:02.730 position:50% align:middle
to talk very superficially about the actual technical
side of this work.

00:01:02.730 --> 00:01:05.970 position:50% align:middle
I have a clinical background.

00:01:05.970 --> 00:01:12.010 position:50% align:middle
I spent many months in the company with people with
really impressive technical skills,

00:01:12.010 --> 00:01:14.380 position:50% align:middle
and some of them are listed on this slide.

00:01:14.380 --> 00:01:21.530 position:50% align:middle
And I'm sorry you can't see their names because they
have been absolutely critical to the success

00:01:21.530 --> 00:01:22.400 position:50% align:middle
of this project.

00:01:22.400 --> 00:01:30.830 position:50% align:middle
And I also want to thank Maryann in particular from
NCSBN, but to the Center for Regulatory Excellence

00:01:30.830 --> 00:01:40.800 position:50% align:middle
for taking a leap of faith back in 2018 when AI was
hardly in the public discourse,

00:01:40.800 --> 00:01:46.200 position:50% align:middle
and certainly not in relation to regulation,
and for funding this project.

00:01:46.200 --> 00:01:58.960 position:50% align:middle
So what this study was seeking to do was to explore the
uses of a particular type of artificial intelligence

00:01:58.960 --> 00:02:03.140 position:50% align:middle
in nurse regulation across three jurisdictions.

00:02:03.140 --> 00:02:06.770 position:50% align:middle
So here in the U.S., the UK, and Australia.

00:02:06.770 --> 00:02:11.440 position:50% align:middle
And it was the first study of its kind in the world,
as far as we're aware.

00:02:11.440 --> 00:02:18.100 position:50% align:middle
And we were very fortunate to work with the Texas Board
of Nursing who, again,

00:02:18.100 --> 00:02:20.000 position:50% align:middle
were instrumental in making this happen.

00:02:20.000 --> 00:02:28.040 position:50% align:middle
And I, again, want to acknowledge the contribution of
Kathy Thomas, Mark Majek, who I know both retired now,

00:02:28.040 --> 00:02:32.050 position:50% align:middle
but Dusty Johnson, Skylar Caddell, and Tony Diggs.

00:02:32.050 --> 00:02:36.670 position:50% align:middle
So I think Elise McDermott,
I think is here from T-Bone [SP].

00:02:36.670 --> 00:02:44.780 position:50% align:middle
So will you please convey my heartfelt thanks to those
people because they've been fantastic collaborators

00:02:44.780 --> 00:02:48.320 position:50% align:middle
on this project?

00:02:48.320 --> 00:02:55.420 position:50% align:middle
So I think when you think about artificial
intelligence, you probably think about a whole range

00:02:55.420 --> 00:02:58.750 position:50% align:middle
of different technologies.

00:02:58.750 --> 00:03:03.970 position:50% align:middle
And just yesterday, we heard about the robo call.

00:03:03.970 --> 00:03:10.400 position:50% align:middle
Joe Biden's robo call for the New Hampshire elections.

00:03:10.400 --> 00:03:17.600 position:50% align:middle
So, effectively, somebody trying to spread false
information using the president's voice.

00:03:17.600 --> 00:03:21.040 position:50% align:middle
That's the kind of thing that grabs the headlines.

00:03:21.040 --> 00:03:28.340 position:50% align:middle
That's the kind of technology that people are fearful
of, and certainly don't want to see anywhere

00:03:28.340 --> 00:03:30.310 position:50% align:middle
near their work environments.

00:03:30.310 --> 00:03:38.850 position:50% align:middle
If you contrast that with the incredible successes of
these kinds of technologies in, for example,

00:03:38.850 --> 00:03:44.500 position:50% align:middle
improving the speed and accuracy of cancer diagnosis.

00:03:44.500 --> 00:03:51.910 position:50% align:middle
If you talk to some of the radiologists who are at the
cutting edge of using AI in their

00:03:51.910 --> 00:03:57.710 position:50% align:middle
clinical environments,
they are incredibly excited about AI.

00:03:57.710 --> 00:04:06.700 position:50% align:middle
So we've got a whole spectrum of interpretations,
and constructs, and views about this technology

00:04:06.700 --> 00:04:10.920 position:50% align:middle
in between those two extremes.

00:04:10.920 --> 00:04:18.810 position:50% align:middle
And I suppose one of the big challenges we face,
whether it's in research, or clinical,

00:04:18.810 --> 00:04:28.840 position:50% align:middle
or even indeed in commercial settings,
is really whether we can actually understand AI as a

00:04:28.840 --> 00:04:35.450 position:50% align:middle
force for good, whether it should be regulated,
how it should be regulated,

00:04:35.450 --> 00:04:39.390 position:50% align:middle
and where it will have most impact.

00:04:39.390 --> 00:04:45.170 position:50% align:middle
So I think these are the kinds of questions that are
front of mind when people think about technologies.

00:04:45.170 --> 00:04:52.900 position:50% align:middle
But there's one thing that's absolutely certain is that
this technology will not go back in its box.

00:04:52.900 --> 00:04:56.320 position:50% align:middle
It's here to stay.

00:04:56.320 --> 00:05:06.010 position:50% align:middle
So in the context of regulation,
our challenge was to try and explore how a particular

00:05:06.010 --> 00:05:10.320 position:50% align:middle
type of artificial intelligence could be used.

00:05:10.320 --> 00:05:18.910 position:50% align:middle
And because Robert and I had been working in nurse
disciplinary, the area of nurse discipline,

00:05:18.910 --> 00:05:29.310 position:50% align:middle
and looking at the way complaints about nurses are
handled, we were particularly excited about the

00:05:29.310 --> 00:05:35.780 position:50% align:middle
prospect that these tools could actually help with the
speed, and accuracy,

00:05:35.780 --> 00:05:41.250 position:50% align:middle
and consistency of decision making in nurse discipline.

00:05:41.250 --> 00:05:44.830 position:50% align:middle
And what we know from research from around the world is
actually that a very,

00:05:44.830 --> 00:05:49.920 position:50% align:middle
very small number of nurses are
a high risk to patients.

00:05:49.920 --> 00:05:58.430 position:50% align:middle
And that actually, around about 70-plus percent of
cases or complaints that are made to regulators

00:05:58.430 --> 00:06:02.910 position:50% align:middle
around the world require no regulatory action.

00:06:02.910 --> 00:06:10.110 position:50% align:middle
So Robert and I were particularly interested in
exploring how these tools could be used with that

00:06:10.110 --> 00:06:26.490 position:50% align:middle
particular cohort where there was no harm to patients,
and low risk in terms of regulatory assessment.

00:06:26.490 --> 00:06:36.010 position:50% align:middle
And the other really important part of this,
the thinking behind this work was that we were made

00:06:36.010 --> 00:06:46.000 position:50% align:middle
aware of just how scarce a resource regulatory,
disciplinary staff in regulation are.

00:06:46.000 --> 00:06:56.330 position:50% align:middle
So there's often a high turnover of regulatory staff
who work in disciplinary teams because it's very high

00:06:56.330 --> 00:07:01.660 position:50% align:middle
stress for relatively little reward.

00:07:01.660 --> 00:07:09.300 position:50% align:middle
And the number of complaints are increasing everywhere.

00:07:09.300 --> 00:07:12.040 position:50% align:middle
So you've got this rise in the number of complaints.

00:07:12.040 --> 00:07:15.340 position:50% align:middle
You've got a high turnover of regulatory staff.

00:07:15.340 --> 00:07:18.850 position:50% align:middle
You've got a scarce resource that
you want to look after.

00:07:18.850 --> 00:07:29.630 position:50% align:middle
And if these tools could be shown to assist in that,
then we would...that was essentially what we were

00:07:29.630 --> 00:07:32.680 position:50% align:middle
aiming to do.

00:07:32.680 --> 00:07:38.440 position:50% align:middle
So our research question was, can this be done?

00:07:38.440 --> 00:07:50.520 position:50% align:middle
Our aim was to focus first on whether or not an AI tool
could actually calculate the risk level at the early

00:07:50.520 --> 00:07:51.800 position:50% align:middle
stage of a complaint.

00:07:51.800 --> 00:07:56.470 position:50% align:middle
So when the summary data comes in,
the complaint comes in,

00:07:56.470 --> 00:08:05.520 position:50% align:middle
can a tool actually assess the data and come up with a
risk classification in the way that we as humans do

00:08:05.520 --> 00:08:07.990 position:50% align:middle
when we do this work?

00:08:07.990 --> 00:08:18.280 position:50% align:middle
Secondly, we wanted to test whether the tool could
actually link cases to regulatory rules and standards.

00:08:18.280 --> 00:08:26.420 position:50% align:middle
And thirdly, could it detect,
find previous similar cases in order for the person,

00:08:26.420 --> 00:08:31.900 position:50% align:middle
the case manager who was making the decision about that
particular case, and whether to take it forward could

00:08:31.900 --> 00:08:42.740 position:50% align:middle
look at previous cases and see what the outcomes of the
decisions had been, and add that to their judgment,

00:08:42.740 --> 00:08:49.840 position:50% align:middle
their human judgment on the case
that was in front of them?

00:08:49.840 --> 00:08:55.980 position:50% align:middle
In terms of our methodology, and as I said,
I'm not going to go into great technical detail

00:08:55.980 --> 00:09:11.670 position:50% align:middle
on this, we were fortunate to be able to access 3000
cases from Texas, and we had about the same proportion,

00:09:11.670 --> 00:09:17.370 position:50% align:middle
1200 or so from the UK and from Australia.

00:09:17.370 --> 00:09:26.920 position:50% align:middle
And we use that data to build the tool using a Python
web development framework.

00:09:26.920 --> 00:09:34.460 position:50% align:middle
And what this tool actually did was try a number of
different AI classifiers.

00:09:34.460 --> 00:09:37.400 position:50% align:middle
So we use five different AI classifiers.

00:09:37.400 --> 00:09:43.290 position:50% align:middle
For those of you who are from that background in the
audience, we use gradient boosting, adaptive boosting,

00:09:43.290 --> 00:09:50.350 position:50% align:middle
CNN, and an ensemble model,
which was a combination of three of the five.

00:09:50.350 --> 00:09:53.740 position:50% align:middle
And we fed in the complaint text,
so all the information that we had

00:09:53.740 --> 00:09:58.440 position:50% align:middle
about that complaint,
including the source of the complaint, the risk level,

00:09:58.440 --> 00:10:02.090 position:50% align:middle
and the harm to patients if there was any.

00:10:02.090 --> 00:10:09.900 position:50% align:middle
And then the classifiers would come up with a risk
rating, or a risk classification on each case.

00:10:09.900 --> 00:10:12.780 position:50% align:middle
That was the first task, if you like.

00:10:12.780 --> 00:10:19.750 position:50% align:middle
And then the second,
and we know from research and debate about AI that data

00:10:19.750 --> 00:10:21.190 position:50% align:middle
quality is a big issue.

00:10:21.190 --> 00:10:28.790 position:50% align:middle
So what you put in to your machine learning tool,
your training tool, determines the quality

00:10:28.790 --> 00:10:32.260 position:50% align:middle
of information that you're going to get out of it.

00:10:32.260 --> 00:10:40.210 position:50% align:middle
So we were keen to check,
particularly for human bias on race, gender, and age.

00:10:40.210 --> 00:10:44.760 position:50% align:middle
But at this early stage,
we didn't have enough data on race and age,

00:10:44.760 --> 00:10:54.540 position:50% align:middle
so we focused just on testing whether the tool could
actually effectively a de-bias for gender.

00:10:54.540 --> 00:10:58.660 position:50% align:middle
So we used three
different gender de-biasing techniques.

00:10:58.660 --> 00:11:04.770 position:50% align:middle
So removing gender, neutral gender,
and a gender swap so that we could make sure that the

00:11:04.770 --> 00:11:10.250 position:50% align:middle
risk level that was calculated wasn't biased on gender.

00:11:10.250 --> 00:11:17.170 position:50% align:middle
And then the final part was the qualitative testing,
which Robert's going to talk about shortly.

00:11:17.170 --> 00:11:22.170 position:50% align:middle
We were very keen all the way through this project to
involve regulatory staff,

00:11:22.170 --> 00:11:27.760 position:50% align:middle
people on the ground who were doing this work,
testing out what we were developing,

00:11:27.760 --> 00:11:33.350 position:50% align:middle
involving them in the design,
and getting their feedback.

00:11:33.350 --> 00:11:40.390 position:50% align:middle
So what I'm going to do now is just to show you some
screenshots of the prototype that we've developed.

00:11:40.390 --> 00:11:46.880 position:50% align:middle
For obvious reasons,
I'm not using Texas Board of Nursing data,

00:11:46.880 --> 00:11:48.080 position:50% align:middle
which wouldn't be appropriate.

00:11:48.080 --> 00:11:54.300 position:50% align:middle
But this is showing accessible data from a U.S.
financial regulator, in fact.

00:11:54.300 --> 00:12:00.690 position:50% align:middle
And as you can see, the first page is a secure login.

00:12:00.690 --> 00:12:05.870 position:50% align:middle
What comes up when you go into the login page is,
first of all, the summary.

00:12:05.870 --> 00:12:11.190 position:50% align:middle
So you see there in the center, the cases...

00:12:11.190 --> 00:12:15.250 position:50% align:middle
I'm so sorry, this is so small,
but I know you're going to get these slides afterwards,

00:12:15.250 --> 00:12:16.820 position:50% align:middle
so you'll be able to look at a bit more detail.

00:12:16.820 --> 00:12:23.170 position:50% align:middle
But if I use the pointer,
this is just the headline from each case.

00:12:23.170 --> 00:12:25.140 position:50% align:middle
You can see the numbers going down.

00:12:25.140 --> 00:12:35.870 position:50% align:middle
And what you have going to the left of the slide is
first the probability score, the confidence score.

00:12:35.870 --> 00:12:42.580 position:50% align:middle
So that tells you something about how much confidence
you can have that the risk level is actually accurate.

00:12:42.580 --> 00:12:44.500 position:50% align:middle
So this one says 98%.

00:12:44.500 --> 00:12:50.050 position:50% align:middle
And then the final column,
which is blank at the moment,

00:12:50.050 --> 00:12:56.400 position:50% align:middle
is giving you the space to enter the human judgment.

00:12:56.400 --> 00:13:00.690 position:50% align:middle
So right from the start,
in the design of the dashboard,

00:13:00.690 --> 00:13:08.140 position:50% align:middle
what you have is a tool that's leaving the final
decision to the human.

00:13:08.140 --> 00:13:14.280 position:50% align:middle
So it's providing information,
but it's not making a decision.

00:13:14.280 --> 00:13:16.550 position:50% align:middle
That's left to the case manager.

00:13:16.550 --> 00:13:25.410 position:50% align:middle
What you see in the bottom right corner is just the
kind of graphics on a high medium and low risk.

00:13:25.410 --> 00:13:32.570 position:50% align:middle
Again, that will be a calculation on each case.

00:13:32.570 --> 00:13:41.460 position:50% align:middle
This is perhaps a little bit clearer for those of you
at the front of the room.

00:13:41.460 --> 00:13:53.300 position:50% align:middle
So on my right, you see the actual text, the free text,
if you like.

00:13:53.300 --> 00:14:01.650 position:50% align:middle
And the sense of what you see is the key words that the
tool has used to calculate the risk.

00:14:01.650 --> 00:14:12.810 position:50% align:middle
And it gives you a sense of which key words are of
particular importance in arriving at that risk score.

00:14:12.810 --> 00:14:21.620 position:50% align:middle
And then above, as you see, you've got your risk score,
your probability, and your confidence score at the top

00:14:21.620 --> 00:14:25.110 position:50% align:middle
of the page there.

00:14:25.110 --> 00:14:33.230 position:50% align:middle
So that's the risk calculation that the case manager
can then use in forming their own judgment

00:14:33.230 --> 00:14:35.170 position:50% align:middle
about the case.

00:14:35.170 --> 00:14:40.260 position:50% align:middle
What we also developed,
and this was very much in collaboration with regulatory

00:14:40.260 --> 00:14:49.640 position:50% align:middle
staff who said, "These are things that we would find
useful," is we, first of all,

00:14:49.640 --> 00:14:56.640 position:50% align:middle
designed the tool so that it could show up the section
of the regulatory rules.

00:14:56.640 --> 00:15:03.400 position:50% align:middle
And as we learned from working with Texas,
there are pages and pages and pages of rules.

00:15:03.400 --> 00:15:10.600 position:50% align:middle
So the tool effectively extracts the elements,
the section of the rulebook that's relevant

00:15:10.600 --> 00:15:13.130 position:50% align:middle
to that particular case.

00:15:13.130 --> 00:15:17.440 position:50% align:middle
So that's the second task, if you like,
beyond the risk calculation.

00:15:17.440 --> 00:15:25.860 position:50% align:middle
And the third is to allow the case manager to compare
the case that they're looking at with any previous

00:15:25.860 --> 00:15:30.910 position:50% align:middle
cases where a similar pattern
of noncompliance has been found.

00:15:30.910 --> 00:15:43.920 position:50% align:middle
So it's all about triangulation of data,
and not in any sense about replacing human judgment.

00:15:43.920 --> 00:15:51.230 position:50% align:middle
And we've been very encouraged by the first phase of
testing, the reliability testing here, where,

00:15:51.230 --> 00:15:55.490 position:50% align:middle
as I said, we had these five different AI classifiers.

00:15:55.490 --> 00:16:00.870 position:50% align:middle
And we compared them to a baseline,
and found that in terms of their reliability,

00:16:00.870 --> 00:16:03.590 position:50% align:middle
the scores were looking promising.

00:16:03.590 --> 00:16:09.790 position:50% align:middle
Not as high as we need them to be,
but this is only on a sample of 1241 cases.

00:16:09.790 --> 00:16:15.060 position:50% align:middle
So the first cases that we
took through the phase one testing.

00:16:15.060 --> 00:16:17.430 position:50% align:middle
And now I'm going to hand over to Robert.

00:16:17.430 --> 00:16:21.230 position:50% align:middle
I hope to talk to you about the qualitative findings.

00:16:21.230 --> 00:16:22.400 position:50% align:middle
- Okay.

00:16:22.400 --> 00:16:23.780 position:50% align:middle
Thank you very much, Anna.

00:16:23.780 --> 00:16:25.390 position:50% align:middle
And good afternoon, everyone.

00:16:25.390 --> 00:16:29.100 position:50% align:middle
And thanks to the conference organizers for allowing me
to appear remotely.

00:16:29.100 --> 00:16:35.530 position:50% align:middle
So when we look at the general ethical concerns in the
AI space, we found ourselves drawn to the seminal work

00:16:35.530 --> 00:16:37.540 position:50% align:middle
of Michael Sandel.

00:16:37.540 --> 00:16:42.690 position:50% align:middle
And you'll see on the slide that Sandel identified
three key ethical concerns.

00:16:42.690 --> 00:16:47.550 position:50% align:middle
So the first of these are
concerns as to privacy and surveillance.

00:16:47.550 --> 00:16:51.640 position:50% align:middle
We then have the second,
which is around bias and discrimination.

00:16:51.640 --> 00:16:56.560 position:50% align:middle
And then the final ethical concern raised by Sandel is
related to the role of human judgments,

00:16:56.560 --> 00:17:01.090 position:50% align:middle
and the concerns about replacing human judgment.

00:17:01.090 --> 00:17:04.930 position:50% align:middle
Now, in our work, we're also drawn,
certainly in the area of health regulation,

00:17:04.930 --> 00:17:07.930 position:50% align:middle
specifically to the work of Gabrielle Wolf.

00:17:07.930 --> 00:17:13.000 position:50% align:middle
And this has been critical because what they do is
explore the ethical implications in the context

00:17:13.000 --> 00:17:17.530 position:50% align:middle
of Australia, and health practitioner regulation.

00:17:17.530 --> 00:17:22.380 position:50% align:middle
So here the first is dealing
with equality before the law.

00:17:22.380 --> 00:17:26.360 position:50% align:middle
Then you have transparency and accountability.

00:17:26.360 --> 00:17:31.000 position:50% align:middle
The next concern is to
do with consistency and predictability.

00:17:31.000 --> 00:17:35.980 position:50% align:middle
And then Wolf explores again the right to privacy as,
of course, had been considered by Sandel.

00:17:35.980 --> 00:17:41.220 position:50% align:middle
And then finally, there's reference to the use of AI,
and how it could potentially undermine the

00:17:41.220 --> 00:17:43.710 position:50% align:middle
right to work.

00:17:43.710 --> 00:17:48.240 position:50% align:middle
Could you move to the next slide, please, Anna?

00:17:48.240 --> 00:17:53.050 position:50% align:middle
So in our research, as well as developing and testing
the prototype, we also, as Anna suggested,

00:17:53.050 --> 00:17:57.520 position:50% align:middle
conducted focus groups with colleagues
from our three sites.

00:17:57.520 --> 00:18:02.370 position:50% align:middle
And we explored with them what they saw as the benefits
and burdens of using AI.

00:18:02.370 --> 00:18:06.470 position:50% align:middle
And you'll see on the slide the three main themes that
were raised here.

00:18:06.470 --> 00:18:12.520 position:50% align:middle
Now, the first of these
is negotiating trust and trustworthiness.

00:18:12.520 --> 00:18:19.180 position:50% align:middle
So our participants focused much attention on trust,
mistrust, and trustworthiness in relation to the

00:18:19.180 --> 00:18:26.770 position:50% align:middle
inclusion of AI, and an AI tool in decision making
related to complaints in nurse regulation.

00:18:26.770 --> 00:18:34.080 position:50% align:middle
Our focus group participants were very aware of the
consequences of error in fitness to practice processes.

00:18:34.080 --> 00:18:37.730 position:50% align:middle
And as they indicated,
flawed decision making could result in registrants

00:18:37.730 --> 00:18:44.880 position:50% align:middle
either continuing to harm patients, or, of course,
being incorrectly judged to lack fitness to practice.

00:18:44.880 --> 00:18:50.000 position:50% align:middle
Either way, the consequence is serious,
and remind regulators of their responsibilities

00:18:50.000 --> 00:18:54.820 position:50% align:middle
to ensure that decision-making
processes are trustworthy.

00:18:54.820 --> 00:19:00.040 position:50% align:middle
Within this theme, our participants talked about
prioritizing honesty and transparency.

00:19:00.040 --> 00:19:03.820 position:50% align:middle
And they also said it was important to think about the
language being used,

00:19:03.820 --> 00:19:09.890 position:50% align:middle
and how impactful it could be
in dealing with issues around AI.

00:19:09.890 --> 00:19:14.630 position:50% align:middle
The second of our themes was
about affirming fairness and nondiscrimination.

00:19:14.630 --> 00:19:19.330 position:50% align:middle
And there was a real concern with this theme that any
outputs from AI tools could potentially

00:19:19.330 --> 00:19:25.620 position:50% align:middle
incorporate bias, and result in unfair and
discriminatory decision making.

00:19:25.620 --> 00:19:28.720 position:50% align:middle
There was, therefore,
a focus on trying to minimize bias.

00:19:28.720 --> 00:19:36.040 position:50% align:middle
There was a need to avoid fabrication where it needed
to be understood, which values or elements are focused

00:19:36.040 --> 00:19:40.590 position:50% align:middle
in any decision-making tool or algorithm.

00:19:40.590 --> 00:19:43.750 position:50% align:middle
And then also, related to this,
was the final sub-theme,

00:19:43.750 --> 00:19:46.410 position:50% align:middle
which was ensuring accountability.

00:19:46.410 --> 00:19:52.950 position:50% align:middle
And our participants explored the objective versus the
subjective nature of decision making in this area,

00:19:52.950 --> 00:19:58.520 position:50% align:middle
and the importance and challenges of clarity when it
comes to accountability.

00:19:58.520 --> 00:20:02.590 position:50% align:middle
Now, it was recognized that there would probably be
some technical developments which assisted

00:20:02.590 --> 00:20:09.880 position:50% align:middle
this process, but they should always remain alert to
the potential for discrimination.

00:20:09.880 --> 00:20:14.320 position:50% align:middle
And then the final of our themes was about managing
burdens and benefits.

00:20:14.320 --> 00:20:19.900 position:50% align:middle
And there was a strong awareness in our focus groups
that regulatory decision making is complex,

00:20:19.900 --> 00:20:26.530 position:50% align:middle
and it needs to take into
account context, uncertainty, and ambiguity.

00:20:26.530 --> 00:20:32.070 position:50% align:middle
And therefore sub-themes here were talking about the
shades of gray with the need for consistency,

00:20:32.070 --> 00:20:36.040 position:50% align:middle
but obviously to negotiate complexity as well.

00:20:36.040 --> 00:20:40.400 position:50% align:middle
There were concerns of ensuring that nothing fell
through the cracks.

00:20:40.400 --> 00:20:46.170 position:50% align:middle
And there was a certain caution and optimism regarding
what any AI decision support tool could actually

00:20:46.170 --> 00:20:49.890 position:50% align:middle
deliver in the context of professional regulation.

00:20:49.890 --> 00:20:53.440 position:50% align:middle
And it was critical, we thought,
to mention that it was felt there was a real need

00:20:53.440 --> 00:20:57.370 position:50% align:middle
for humility as to our expectations of AI.

00:20:57.370 --> 00:21:01.460 position:50% align:middle
And then there were some final discussions around
effectiveness and burden reduction,

00:21:01.460 --> 00:21:08.180 position:50% align:middle
and the potential benefit of any tool vis-a-vis the
emotional content of fitness to practice complaints.

00:21:08.180 --> 00:21:08.680 position:50% align:middle
Thank you.

00:21:08.680 --> 00:21:12.090 position:50% align:middle
And I now hand back to
Anna for some concluding thoughts.

00:21:12.090 --> 00:21:14.600 position:50% align:middle
- Thanks, Rob.

00:21:14.600 --> 00:21:16.190 position:50% align:middle
So lots in there.

00:21:16.190 --> 00:21:18.720 position:50% align:middle
And we really have packed in rather lot.

00:21:18.720 --> 00:21:24.470 position:50% align:middle
We've published three papers, two in the JNR,
and one in the "Journal of Computational Linguistics,"

00:21:24.470 --> 00:21:27.950 position:50% align:middle
which even I don't understand, but it's there.

00:21:27.950 --> 00:21:32.910 position:50% align:middle
So if you're interested, please do read in more depth.

00:21:32.910 --> 00:21:40.220 position:50% align:middle
And I think what I want to kind of leave you with,
I suppose, is just that sense of how do we get

00:21:40.220 --> 00:21:43.180 position:50% align:middle
from data to policy.

00:21:43.180 --> 00:21:51.450 position:50% align:middle
How do we do that in this AI-rich environment that
we're increasingly living with?

00:21:51.450 --> 00:21:56.710 position:50% align:middle
And I think we can't, as regulators, be left behind.

00:21:56.710 --> 00:22:05.330 position:50% align:middle
We have to embrace this,
but we have to do it in a way that reflects our values,

00:22:05.330 --> 00:22:13.420 position:50% align:middle
and our commitment to fairness, to openness,
to transparency, and to robust evidence.

00:22:13.420 --> 00:22:19.940 position:50% align:middle
So this study is a very first baby step towards that.

00:22:19.940 --> 00:22:27.260 position:50% align:middle
And it demonstrates that you can use these multiple
techniques of text classification,

00:22:27.260 --> 00:22:32.000 position:50% align:middle
semantic similarity measurement,
and natural language inference.

00:22:32.000 --> 00:22:42.820 position:50% align:middle
You can use these tools to actually provide an outcome
which regulatory staff told us was of value to them.

00:22:42.820 --> 00:22:49.940 position:50% align:middle
So they came into the work, a lot of them,
with great skepticism about whether this was going

00:22:49.940 --> 00:22:52.780 position:50% align:middle
to have any benefit for them and their work.

00:22:52.780 --> 00:22:57.810 position:50% align:middle
In fact, some of them were very honest about the fact
that they were fearful that this was going to have

00:22:57.810 --> 00:23:02.470 position:50% align:middle
an impact on them and their livelihoods, their jobs.

00:23:02.470 --> 00:23:10.600 position:50% align:middle
And over the course of the two years of working with
them, I think they could see,

00:23:10.600 --> 00:23:20.090 position:50% align:middle
and they told us that they could see the benefits,
as well as the potential pitfalls.

00:23:20.090 --> 00:23:28.820 position:50% align:middle
So we've deliberately designed this tool so that it can
be used by anybody, any state board who's interested

00:23:28.820 --> 00:23:36.580 position:50% align:middle
in replicating this study,
who wants to continue with us on this journey.

00:23:36.580 --> 00:23:39.460 position:50% align:middle
We have no copyright on the work.

00:23:39.460 --> 00:23:43.130 position:50% align:middle
And so we're really keen for NCSBN to take it forward.

00:23:43.130 --> 00:23:51.940 position:50% align:middle
And I know from being here today that you have a
fantastic team of data scientists who

00:23:51.940 --> 00:23:53.720 position:50% align:middle
have the capacity.

00:23:53.720 --> 00:24:00.800 position:50% align:middle
And I hope the interest and the willingness to take
this work forward.

00:24:00.800 --> 00:24:03.120 position:50% align:middle
It certainly isn't going to be me.

00:24:03.120 --> 00:24:09.190 position:50% align:middle
It's going to be the data scientists who do this.

00:24:09.190 --> 00:24:20.790 position:50% align:middle
But just my very final thought is that transformation
comes from people, and not from tools.

00:24:20.790 --> 00:24:27.810 position:50% align:middle
And so we need to engage everybody in this debate
because there is a lot of fear and anxiety about the

00:24:27.810 --> 00:24:33.970 position:50% align:middle
intrusion of these sorts of tools into regulation,
as in all areas of our lives.

00:24:33.970 --> 00:24:42.070 position:50% align:middle
And I hope that I've just given you, along with Robert,
just a glimpse into the potential that this tool has

00:24:42.070 --> 00:24:47.460 position:50% align:middle
to improve regulatory decision making,
and to bring that consistency.

00:24:47.460 --> 00:24:54.340 position:50% align:middle
And crucially, to shift that precious human resource to
the high-risk cases,

00:24:54.340 --> 00:25:00.700 position:50% align:middle
where we know there's going to be a full investigation,
hugely time consuming.

00:25:00.700 --> 00:25:05.380 position:50% align:middle
Those high-risk cases,
where there are patient safety implications are where

00:25:05.380 --> 00:25:10.760 position:50% align:middle
we should be investing our human resource.

00:25:10.760 --> 00:25:20.680 position:50% align:middle
And then embrace the new technology to make the
screening out of the low-risk cases faster, better,

00:25:20.680 --> 00:25:39.890 position:50% align:middle
more consistent without compromising that essential
ingredient, which is human judgment.

00:25:39.890 --> 00:25:41.180 position:50% align:middle
Thanks very much.

00:25:41.180 --> 00:25:49.808 position:50% align:middle
Robert and I would be very happy to take any questions,
comments, objections, illuminations from you now.

00:25:52.918 --> 00:25:55.770 position:50% align:middle
- [Jose] Good afternoon.

00:25:55.770 --> 00:25:57.600 position:50% align:middle
Sorry, that was a little too loud.

00:25:57.600 --> 00:26:00.410 position:50% align:middle
Jose Castillo, Florida Board of Nursing.

00:26:00.410 --> 00:26:04.420 position:50% align:middle
I love the [inaudible] comment.

00:26:04.420 --> 00:26:13.380 position:50% align:middle
It's so enlightening that we are seeing AI being used
in a very good light in the regulatory world.

00:26:13.380 --> 00:26:15.900 position:50% align:middle
As we all know, I'm also an educator.

00:26:15.900 --> 00:26:16.790 position:50% align:middle
Not that you know.

00:26:16.790 --> 00:26:20.240 position:50% align:middle
But as we all know in education, I should start there.

00:26:20.240 --> 00:26:23.720 position:50% align:middle
AI is either being ostracized
or it's being embraced completely.

00:26:23.720 --> 00:26:25.720 position:50% align:middle
So it's like there's no middle ground.

00:26:25.720 --> 00:26:33.150 position:50% align:middle
But I guess my question is from the regulatory realm,
one of the aims is to calculate the risk level

00:26:33.150 --> 00:26:35.520 position:50% align:middle
using anonymized data.

00:26:35.520 --> 00:26:40.800 position:50% align:middle
And I'm not sure, can you expand on that,
like how much of the data?

00:26:40.800 --> 00:26:47.070 position:50% align:middle
Because we know as regulators there's a plethora of
data when it comes after investigation.

00:26:47.070 --> 00:26:51.870 position:50% align:middle
So which ones would be highlighted by AI,
or has that been looked at?

00:26:51.870 --> 00:26:58.090 position:50% align:middle
How does that come to play with the overall evaluation,
so that it will evaluate the risk?

00:26:58.090 --> 00:27:00.000 position:50% align:middle
- That's a very good question.
- Thank you.

00:27:00.000 --> 00:27:03.500 position:50% align:middle
- And quite possibly, my slide was a little misleading.

00:27:03.500 --> 00:27:07.830 position:50% align:middle
So the aim in the initial stage,
the very first pilot stage,

00:27:07.830 --> 00:27:13.470 position:50% align:middle
was to see whether we could actually calculate risks
using anonymized data.

00:27:13.470 --> 00:27:20.100 position:50% align:middle
So that very first stage was about just seeing whether
through the case summaries,

00:27:20.100 --> 00:27:27.390 position:50% align:middle
we could come up with a risk score which was comparable
to the human judgment without knowing anything more

00:27:27.390 --> 00:27:28.740 position:50% align:middle
about the case.

00:27:28.740 --> 00:27:35.020 position:50% align:middle
As we went through the project, however,
and clearly the Texas Board of Nursing Staff knew

00:27:35.020 --> 00:27:37.200 position:50% align:middle
exactly the cases that they were giving us.

00:27:37.200 --> 00:27:45.220 position:50% align:middle
They were the ones who were kind of familiar with the
profiles, and the outcomes,

00:27:45.220 --> 00:27:47.870 position:50% align:middle
and the context for these complaints.

00:27:47.870 --> 00:27:52.840 position:50% align:middle
So as we moved through the project,
we weren't using anonymized data,

00:27:52.840 --> 00:27:59.710 position:50% align:middle
but we were de-identifying the cases,
so as to protect people's identities.

00:27:59.710 --> 00:28:02.400 position:50% align:middle
So I probably slightly misled you on that.

00:28:02.400 --> 00:28:08.820 position:50% align:middle
So the first stage was anonymized
data from the financial regulator.

00:28:08.820 --> 00:28:13.920 position:50% align:middle
That was a kind of database that our data scientists
could gain access to.

00:28:13.920 --> 00:28:15.330 position:50% align:middle
It's freely available.

00:28:15.330 --> 00:28:22.880 position:50% align:middle
But the next stage was actually a de-identification of
cases, and testing to see whether that

00:28:22.880 --> 00:28:25.100 position:50% align:middle
risk classification worked.

00:28:25.100 --> 00:28:28.330 position:50% align:middle
So it was...

00:28:28.330 --> 00:28:31.310 position:50% align:middle
Yeah, I hope that answers your question.

00:28:36.020 --> 00:28:36.630 position:50% align:middle
Okay.

00:28:36.630 --> 00:28:39.546 position:50% align:middle
Well, thank you so much for your attention.

00:28:39.546 --> 00:28:40.903 position:50% align:middle
Thank you.