Kyoung Hyup Nam and Da Young Kim contributed equally to this study as cofirst authors.
The purpose of our study is to develop a spoken dialogue system (SDS) for pain questionnaire in patients with spinal disease. We evaluate user satisfaction and validated the performance accuracy of the SDS in medical staff and patients.
The SDS was developed to investigate pain and related psychological issues in patients with spinal diseases based on the pain questionnaire protocol. We recognized patients’ various answers, summarized important information, and documented them. User satisfaction and performance accuracy were evaluated in 30 potential users of SDS, including doctors, nurses, and patients and statistically analyzed.
The overall satisfaction score of 30 patients was 5.5 ± 1.4 out of 7 points. Satisfaction scores were 5.3 ± 0.8 for doctors, 6.0 ± 0.6 for nurses, and 5.3 ± 0.5 for patients. In terms of performance accuracy, the number of repetitions of the same question was 13, 16, and 33 (13.5%, 16.8%, and 34.7%) for doctors, nurses, and patients, respectively. The number of errors in the summarized comment by the SDS was 5, 0, and 11 (5.2%, 0.0%, and 11.6 %), respectively. The number of summarization omissions was 7, 5, and 7 (7.3%, 5.3%, and 7.4%), respectively.
This is the first study in which voice-based conversational artificial intelligence (AI) was developed for a spinal pain questionnaire and validated by medical staff and patients. The conversational AI showed favorable results in terms of user satisfaction and performance accuracy. Conversational AI can be useful for the diagnosis and remote monitoring of various patients as well as for pain questionnaires in the future.
With the advent of the Fourth Industrial Revolution, efforts to apply artificial intelligence (AI) and machine learning in the medical field are actively underway [
Unlike the written text-based chatbot, a computer system that can communicate by voice is called a spoken dialog system (SDS) [
In assessing patient with spinal disease, doctor-patient dialogue about pain is the first step in diagnosis, and a pain questionnaire is the most important tool during follow-up after treatment or spine surgery. The purpose of our study was to develop a SDS for a pain questionnaire for patients with spinal diseases. We aimed to evaluate user satisfaction and validate the performance accuracy of the system in medical staff and patients. This study is a preliminary study for the development of an interactive medical robot. Based on the results of this study, a follow-up study on robot-based interactive questionnaire is planned.
First, a pain questionnaire protocol for a SDS was developed by dividing the preoperative and postoperative pain questionnaires to assess the outcomes of patients undergoing spine surgery. The pain questionnaire consisted of questions to reflect the actual conversation between the medical staff and the patient. The items were created based on questions that medical staff usually ask during rounds of inpatients. The protocol included questions about the location, type, influencing factors, intensity, time of onset, and duration of pain. In addition, questions about the patient’s psychological state, such as questions regarding mood, anxiety, and sleep quality, were included as indirect indicators of pain. Postoperative question items were replaced with question items about pain at the surgical site. Furthermore, a question about whether the patient’s preoperative pain had improved or not was added. Questions about psychological status were the same as the preoperative questions. Each question was structured in a closed question format so that the pain questionnaire system could easily process the patients’ responses. The developed pain questionnaire protocol is shown in
To build a database of patients’ various expressions for NLU, real doctor-patient dialogue sets were collected. The study was approved by the Institutional Review Board (IRB No. 1905-023-079). Informed consent was obtained from all patients. A total of 1,314 dialogue sets were collected from 100 hospitalized patients who underwent spinal surgery between September 2019 and August 2021. One dialogue set was defined as one question and one answer. The age range was 22–82 years (mean, 62.6 years), and 47 patients were male. There were 48 spinal stenosis, 13 herniated disc herniation, 13 spinal infection, 11 spinal tumor, 8 spinal deformity, 4 spine trauma, and 3 myelopathy cases.
Three doctors asked inpatients questions naturally following the pain questionnaire protocol during the rounds, and the conversations were recorded using a voice recorder. The preoperative pain questionnaire was used the day before surgery, and the postoperative pain questionnaire was used between 3 and 7 days after surgery. The recordings were documented in the format of text and stored in a database for NLU. Additionally, the virtual conversations of the researchers were also collected, and 2,000 dialogue sets were used for the database.
The SDS was structured as shown in
After analyzing the dialogue datasets obtained from the patients, patients’ intents that express the character of pain and psychological state were classified into 95 in the intents column of IBM Watson Assistant. A total of 1,229 expression examples were registered in the user example of the intent column. A total of 770 examples for timing, duration, and influence factors were registered in the name entity column.
User satisfaction and performance accuracy of the developed pain questionnaire SDS were evaluated. Validation of the SDS was performed for 3 user groups: doctors, nurses, and patients. The participants volunteered to be recruited. The study was approved by the Institutional Review Board of Pusan National University Hospital (IRB No. 2012-010-097). Informed consents were obtained from all patients. Ten participants were included in each group. The participants were pretrained to engage in routine conversations rather than simple short-answer conversations. The participants were provided with basic information about the purpose of the study and SDS, and we helped the participants adapt to the conversation with the SDS. The mean ages of the doctors, nurses, and patients were 35.3 years (range, 25–47 years), 31.2 years (range, 21–58 years), and 64.0 years (range, 48–82 years), respectively. The male-to-female ratios in doctors, nurses, and patients were 9:1, 10:0, and 5:5, respectively. Validation of the SDS was performed in a sitting position on the bed of an inpatient ward. The SDS was mounted on a laptop notebook and placed on a bed table. When the start button was pressed, a conversation was initiated automatically. The SDS first asked a question about pain and recognized the answer, and it followed up with further questions. After the last question and answer, the SDS uttered the summarized result to the users and ended the program immediately after the test, the participants completed a user satisfaction questionnaire about SDS. The questionnaire consisted of 10 items, including the accuracy of the SDS’s voice, the degree of similarity to human conversation, and overall satisfaction, and followed the 7-point Likert scale [
To verify the performance accuracy of the SDS, the recognition error of the patient’s answer, summary error, the causes of the errors, and summary omission of the summarized comment were analyzed. User satisfaction and accuracy between the participant groups were statistically analyzed using 1-way analysis of variance and
The results of the user satisfaction survey are shown in
The SDS asked 95 and 96 questions per group, respectively. The number of repeated questions asked by the SDS because it did not recognize the participant’s answers was 13, 16, and 33 (13.5%, 16.8%, and 34.7%) in doctors, nurses, and patients, respectively. The difference in the number of repeated questions was not statistically significant among the 3 groups (p = 0.063). However, the SDS did not recognize the answers of the patient group and tended to increase the repetition of questions. After the pain questionnaire was completed, the number of errors in the summarized comment was measured to be 5, 0, and 11 (5.2%, 0.0%, and 11.6%) for doctors, nurses, and patients, respectively. In particular, there were no summary error in nurses. There was a statistically significant difference between the groups (p = 0.001). The number of summarization omissions was 7, 5, and 7 (7.3%, 5.3%, and 7.4%), respectively, and there was no statistical difference between the groups (p = 0.857) (
Conversational AI is increasingly being used in medical healthcare field [
The term, voice-based conversational AI, is used interchangeably with chatbot or voice assistant; however, the more specialized term is “spoken dialogue system.” The SDS can be defined as a dialog software system that can communicate with people using voice [
This SDS was developed for the purpose of being mounted on a medical assistant robot that provides medical services to the inpatients, especially those undergoing spinal surgery since pre- and postoperative pain assessments in these patients are the most important items for diagnosis and treatment follow-up. Therefore, the conversation flow of the SDS actually followed the pre- and postoperative pain assessments for inpatients with spinal diseases. Although the SDS was developed with a focus on inpatients, it can be sufficiently used for first outpatient visits or remote monitoring due to the general content of the conversation.
In the user satisfaction evaluation of the SDS, there was no statistical difference in satisfaction among the 3 groups, but satisfaction of nurses was slightly higher than that of doctors and patients. In the nurse group, there was no summary errors; hence, the overall accuracy was high, and it is presumed that the expectation for the use of the SDS was reflected in the nurse groups with a high actual workload. On the other hand, it seems that doctors showed relatively low satisfaction because the accuracy of the SDS did not meet their expectations as they require a high level of information accuracy. As for the satisfaction of patients, the mean age was relatively older; hence, unfamiliarity with the digital system may have contributed to the low score. In particular, in item Q1, patients showed significantly lower satisfaction than medical staff; hence, their understanding of the SDS question may have been low. Therefore, the question content and method should be upgraded to be easier to understand for elderly patients. In the performance evaluation of the SDS, recognition errors in the patient group were significantly more in number. The high error rate may be due to the fact that many unstructured speech recognitions occurred because the patient’s answer was long, specific, and varied as a routine expression. In addition, the patient’s voice tended to be lower in volume and unclear; hence, the recognition error was likely to be high. On the other hand, due to their prior education for natural conversation, doctors and nurses tended to intentionally give clear and simple answers so that the SDS could recognize the answers themselves. There were cases in which the user could not predict the end time of the utterance of the SDS and answered before the end of the question. Therefore, it is necessary to improve the usability by adding system feedback so that patients can predict the end point of the SDS utterance. Finally, when users answered a question with multiple contents, the SDS recognized only one content. For example, when users answered about the location of the pain, they complained of pain in several locations, including the back, buttocks, and legs. However, the SDS only recognized only one of the 3 pain sites. This is because the SDS fills the slot by selecting only one keyword from the user’s answer. Therefore, the SDS should be upgraded to recognize these types of answers.
To improve the overall accuracy of the SDS, it is necessary to significantly improve the current voice recognition technology. Despite the rapid development of voice recognition, the rate of its use is still 80% or less, which is not adequate for medical information that requires high accuracy [
Until now, commercialized conversational AI for collecting medical information through voice conversations with patients has not been developed. Conversational AI for collection of medical information can reduce the time and effort needed of medical staff by automating the questionnaire during the first outpatient visit in the future. In addition, it is expected that it will be applied in telemedicine and remote patient monitoring, which is receiving increasing interest due to the recent coronavirus disease 2019 pandemic. In particular, for older patients, collection of patient outcome reports using text-based chatbots or apps are limited due to presbyopia and difficulty in using smart devices. Therefore, it will be more useful if remote monitoring can be performed using conversational AI in elderly patients. If clinical decisions supporting AI and conversational AI are combined in the future, it could be applied to software in medical devices for diagnosis, treatment, and prevention beyond collecting medical information [
SDS can be used for remote pain monitoring of spinal patients through automation of pain questionnaires for spine patients, and shortening of doctor consultation time through automation of initial consultations. In this case, the collection of pain information can be automated through follow-up of the patient before and after surgery, which can help in tracking the patient’s prognosis. By frequently performing additional pain questionnaires as well as pain evaluation during rounds by medical staff, pain evaluation will be possible more frequently while reducing the medical staff's work loading.
A limitation of this study is the small number of test subjects; thus, there may be bias in the evaluation of user satisfaction and performance accuracy. Nevertheless, our study reports the first development of conversational AI for a spinal pain questionnaire. Our study can also provide an important starting point and reference for future related research as our findings validate the accuracy and satisfaction of real patients and medical staff. In the future, we hope to improve the SDS and evaluate user satisfaction and performance accuracy in a large sample of patients.
This study is the first report in which voice-based conversational AI was developed for a spinal pain questionnaire that was validated by medical staff and patients. Conversational AI showed favorable results in terms of user satisfaction and performance accuracy. If a large amount of dialogue sets between patients and medical staff are collected and voice recognition technology is improved, it is expected that conversational AI can be used for diagnosis and remote monitoring of various patients as well as help in creating pain questionnaires in the near future.
Supplementary video clip can be found via
The authors have nothing to disclose.
This study was supported by the Technology Innovation Program (No. 20000515) funded by the Ministry of Trade, Industry & Energy (MOTIE, Korea).
Conceptualization: KHN, DYK, JIL, SSY, BKC, MGK, IHH; Data curation: KHN, DYK, DH Kim, JIL, MJK, JYP, JHH, MGK, IHH; Formal analysis: KHN, DYK, DH Kim, JH Lee, JIL, MJK, SSY, BKC, MGK, IHH; Funding acquisition: MGK, IHH; Methodology: KHN, DYK, JH Lee, MJK, JYP, JHH, SSY, BKC, MGK, IHH; Project administration: MGK, IHH; Visualization: SS Yun, BK Choi, IHH; Writing - original draft: KHN, DYK, IHH. Writing - review & editing: MGK, IHH.
Architecture of the spoken dialogue system (SDS).
Flow chart of the spoken dialogue system. UID, unique identification number; DB, database; STT, speech-to-text; TTS, text-to-speech.
Electric medical record linkage with the spoken dialogue system.
Questionnaire of the spoken dialogue system
Category | Description | Preoperative situation | Postoperative situation |
---|---|---|---|
Pain | Location | Where is the most painful area right now? If there are multiple areas, please tell them briefly in the order of the most pain. | Where do you feel most uncomfortable after surgery? If there are multiple parts, please tell them briefly in the order of discomfort. |
Type | How does the pain feel? Please express it like numbness or aching. | Has the pain that was very painful before the operation improved? | |
Influence factor | 1. What time of the day do you have the most pain? | 1. Is there any pain at the surgical site? | |
2. What posture hurts the most? | 2. What posture hurts the most? | ||
Intensity | Please rate how severe the pain is on a scale of VAS 0–10. | Please rate how severe the pain is on a scale of VAS 0–10. | |
Time and duration | 1. Since when have you had pain? | Does the pain at the surgical site last all day? | |
2. When did the pain get worse? | |||
Psychologic state | Mood | How are you feeling right now? Please tell me between good, average, and bad. | How are you feeling right now? Please tell me between good, average, and bad. |
Anxiety | Are you currently worried or anxious? | Are you currently worried or anxious? | |
Quality | 1. How many hours did you sleep? | 1. How many hours did you sleep? | |
2. Did you sleep well without waking up? | 2. Did you sleep well without waking up? |
VAS, visual analogue scale.
Survey results of the spoken dialogue system
Question items | Doctor | Nurse | Patient | p-value |
---|---|---|---|---|
Q1. I could understand SDS’s words well. | 6.6 ± 0.699 | 6.8 ± 0.422 | 5.7 ± 1.059 | 0.008 |
Q2. The volume, speed, and sound quality of the SDS were adequate. | 6.6 ± 0.699 | 6.7 ± 0.675 | 6.0 ± 1.155 | 0.171 |
Q3. SDS asked the proper questions. | 5.4 ± 1.506 | 6.0 ± 0.816 | 5.2 ± 1.549 | 0.390 |
Q4. SDS gave an appropriate response. | 4.9 ± 1.370 | 5.3 ± 1.418 | 5.4 ± 1.075 | 0.664 |
Q5. In conversation with SDS, I was able to fully express what I wanted to say. | 4.6 ± 1.647 | 5.5 ± 1.179 | 4.5 ± 1.269 | 0.222 |
Q6. SDS seems to understand well what I'm saying. | 4.7 ± 1.337 | 5.6 ± 1.265 | 4.7 ± 1.252 | 0.214 |
Q7. Conversations with SDS were not much different from conversations with people. | 4.3 ± 1.337 | 5.4 ± 1.075 | 5.1 ± 1.370 | 0.153 |
Q8. I think positively about assisting my medical care through the SDS conversation. | 5.5 ± 1.179 | 6.1 ± 0.568 | 5.0 ± 1.491 | 0.118 |
Q9. There was no objection to the conversation with SDS. | 5.3 ± 1.494 | 6.5 ± 0.527 | 6.1 ± 0.994 | 0.057 |
Q10. The conversation with SDS was overall satisfactory. | 5.3 ± 1.160 | 6.2 ± 0.919 | 5.4 ± 1.265 | 0.165 |
Mean | 5.3 ± 0.777 | 6.0 ± 0.547 | 5.3 ± 0.522 | 0.136 |
Values are presented as mean±standard deviation.
SDS, spoken dialogue system.
One-way analysis of variance,
Summary of errors in the spoken dialogue system
Participant | Total no. of question items | Total no. of questions | Recognition error | Summary error | Omission error |
---|---|---|---|---|---|
Doctor | 96 | 109 | 13 (13.5) | 5 (5.2) | 7 (7.3) |
Nurse | 95 | 111 | 16 (16.8) | 0 (0.0) | 5 (5.3) |
Patient | 95 | 128 | 33 (34.7) | 11 (11.6) | 7 (7.4) |
p-value | 0.063 | 0.001 |
0.857 |
Values are presented as number (%).
One-way analysis of variance, post hoc Tukey honestly significant difference analysis was performed, the group that made the difference.
Example of an actual conversation between the spoken dialogue system and a patient
Question of spoken dialogue system | Answer of patient |
---|---|
Hello, it’s interview time. I'll start the interview now. What is most uncomfortable symptom right now? If there are multiple parts, please tell them briefly in the order of the most pain. | Yes, My butt hurts. It hurts on both sides of the buttocks on the legs |
How does the pain feel? Please express it like numbness or aching. | I get numb and numb. I have tingling when I walk. |
How long has the pain been getting worse? | No, what I’m doing here is that it hurt before the operation, but after the operation, it seems like yesterday. |
Please answer with a period such as 1 day, 1 week, 1 month and 1 year. | |
I didn’t hear well. When did the pain get worse? | I can’t remember how many years it was before the surgery, and it’s been less than a week since I came in after the operation, but it’s been 5 days since the operation. |
Please answer with a period such as 1 day, 1 week, 1 month and 1 year. | |
Please rate how severe the pain is on a scale of 0-10. | It seems like the pain is something like 9 or 100 points or something. |
It hurts when I walk. Buttock. I can’t get out of bed. | |
What time of day do you have severe pain? | Mainly, I lay down and went to work, and when the bed goes up, I have to move my ass with it with that. The bed comes down and I can’t do it properly with that. I need help. |
Please answer the time such as in the morning or before bed. | |
I didn’t hear well. When is the most painful time of the day? | Usually, when I lie on the bed and then get up and try to get down from the bed to urinate, this buttock doesn’t move properly. That’s why I have to keep helping you so I can come down. I was sick. it’s time. What, when will you come down? |
Please answer the time such as in the morning or before bed. | |
Please tell me how you are feeling right now? | It’s uncomfortable to sit right now. |
Please answer your feeling between good, average and bad. | |
Are you currently worried or anxious? | Yes |
This is question about sleep quality. Did you sleep well without waking up? | Yes, I sleep well. |
How many hours did you sleep last night? | Well, I didn’t sleep for 3 days after surgery and I slept well last night. |
I didn’t hear well. how much time did you sleep yesterday? | How many hours. Well, 9 hours and 7 hours. |
Yes, I see. Then I will inform you about the results of the questionnaire. Your pain area is your leg, and you have a tingling sensation, getting worse when you wake up. The intensity of the pain is 10, and the pain has gotten worse since about a week ago. | |
And you say you feel bad and have anxiety. You said that you slept 9 hours and the quality of your sleep was good. |
Translated questions and answers conducted in Korean into English.