Nai Jingxue, Liu Shuo, Xu Rui, Yan Chenxia, Cheng Kai
Objective To preliminarily develop a medication educational agent based on multi- modal interactive and solve the problem of information understanding obstacles in medication education for tuberculosis patients. Methods Based on the Dify 1.4.1 platform, an agent workflow was constructed using large language model (LLM) and retrieval-augmented generation. A total of 50 tuberculosis patients, who discharged from Beijing Chest Hospital, Capital Medical University from March to September 2025, were selected. The structured information was collected and input into the agent to evaluate its intrinsic performance, including structured information extraction capability (precision, recall, and micro-averaged F1-value), text generation quality [bilingual evaluation understudy (BLEU) 4 and series of indicators of recall-oriented understudy for gisting evaluation (ROUGE)], as well as stability (interaction success rate), efficiency (response time and average material generation time), compliance (compliance rate), and user experience (satisfaction scores). A total of 38 medical professionals (physicians and pharmacists), who worked in the hospital from February to March 2025, were selected as survey subjects. Using the Wenjuanxing platform, the accuracy, comprehensiveness, readability, humanistic care, and personalization of medication education materials from 3 sources were evaluated, including the hospital′s current standardized medication guidance template (material 1), medication education materials directly generated by a generalpurpose LLM (material 2) and materials generated by the agent developed in this study (material 3). And the application effectiveness of the agent was assessed via the survey results. Results The evaluation result of the agent using structured information of 50 tuberculosis patients showed that the precision was 95%, the recall was 92%, and micro-averaged F1-value was 0.93. The agent was scored (18.61±4.06), (38.60±5.93), (22.40±5.13), and (29.42±6.81) points in BLEU-4, ROUGE-1, ROUGE-2, and ROUGE-L, respectively. The interaction success rate was 96% (48/50), with an average response time of (3.1±0.6) s and a material generation time of (27.4±1.5) s, the compliance rate was 100% (50/50), and the patient satisfaction score for the agent generated text was (84.5±5.5) points. The survey results of 38 medical professionals showed in dimensions of readability, humanistic care and personalization, the scores of material 3 were better than materials 1 and 2, and the differences were statistically significant (all P<0.016 7). The score of material 3 in the comprehensiveness dimension was better than the material 1 (P=0.003). In these medical professionals, 71.1% (27/38) were satisfied with material 3. Conclusions A multi-modal interactive agent for medication education in tuberculosis patients is successfully developed. The multiple performance indicators of this agent have good feasibility and reliability, and have certain advantages in readability, humanistic care, and personalization. By providing multi-modal and layered outputs, this agent offers a novel paradigm for medication education in tuberculosis patients.