Prompting change: The promise of an AI chatbot for alcohol counseling in young adults
Young adults drink more heavily than any other age group but are the least likely to seek treatment. Large language model (LLM) chatbots, such as ChatGPT, may make knowledge and support more accessible. This study pilot tested an LLM-based chatbot trained to deliver motivational interviewing for alcohol-related behavior change among young adults, evaluating its safety, acceptability, and fidelity to motivational interviewing principles.
Young adults (ages 18-25) engage in the highest levels of alcohol use of any age group but have the lowest levels of treatment engagement. Barriers like low perceived need, stigma, time constraints, cost, reluctance to fully abstain, and limited access to providers often prevent individuals from seeking traditional, in-person care. Low-threshold and familiar digital tools may help bridge this gap by offering more easily accessible, private, and flexible options for young adults to reflect on and self-manage their alcohol use.
While LLMs like GPT-4 have been shown to provide accurate responses to alcohol and other-drug related prompts and have demonstrated early promise in mental health applications, their potential for delivering direct alcohol counseling remains largely unexplored. This pilot study aimed to fill that gap by developing a secure LLM-based chatbot trained to deliver motivational interviewing for alcohol-related behavior change in young adults, and evaluated whether the chatbot could do so safely, acceptably, and in alignment with motivational interviewing standards.
HOW WAS THIS STUDY CONDUCTED?
This was a 2-phase, single-arm pilot study testing a GPT-4 powered chatbot with specialized training in motivational interviewing for alcohol-related behavior change. Participants (N = 45) were ages 18-25 (Mean age = 23.6 years), reported consuming at least 10 standard drinks per week, and engaged with the chatbot during a single session. After initial testing in Phase I (n = 8), the researchers refined the chatbot based on feedback and tested the updated version in Phase II (n = 37). The study sample was predominantly white (66.7%), male (71.1%), and college-educated (82.2%) with an average AUDIT score of 7.6, indicating moderate alcohol use severity.
The researchers developed their motivational interviewing chatbot by using Open AI’s GPT-4 and further refined the chatbot through prompt engineering and domain-specific fine-tuning to enhance alignment with motivational interviewing principles. The chatbot was securely hosted on university servers using HIPAA-compliant infrastructure to ensure participant privacy and data protection.
To train the chatbot, the researchers used a publicly available dataset of counseling dialogues coded for motivational interviewing fidelity, extracting 134 alcohol-related sessions to identify relevant high-quality client-therapist exchanges. These were used to refine the chatbot’s responses through iterative prompt development and qualitative review. They then generated 20 synthetic conversations using GPT-4 seeded with anonymized data from a prior clinical trial involving young adults with heavy episodic drinking, to test and validate the chatbot’s consistency before deploying to participants.
Participants were recruited via an online crowdsourcing platform. After providing informed consent, they completed a baseline survey (demographics, motivation to change, and alcohol use severity), interacted with the chatbot, and completed a post-interaction survey assessing usability (System Usability Scale; e.g., “I thought the system was easy to use”) and motivational interviewing fidelity (Client Evaluation of Motivational Interviewing; e.g., “The counselor seemed to understand how I see things;” “The counselor helped me see how my behavior fits with what I want in life”). Participants were told they could interact with the chatbot as much or as little as they wanted. They were also invited to share open-ended feedback on their experience communicating with the AI chatbot about their alcohol use.
Primary outcomes included usability and motivational interviewing fidelity. The authors also analyzed transcripts for safety concerns, intervention fidelity, and coded sessions for change talk versus sustain talk – motivational interviewing terms used to denote speech that is characteristic of changing one’s drinking versus not changing one’s drinking, respectively. Outcomes were compared between Phases I and II to assess whether prompt refinements led to improvements in chatbot performance.
WHAT DID THIS STUDY FIND?
The chatbot was safe and highly acceptable to young adults
No inappropriate or unsafe responses were identified in transcript reviews. Usability scores were high, with participants rating the chatbot 85.4 out of 100 in Phase I and 80.9 in Phase II on the System Usability Scale (higher than the benchmark of 68 set by the research team). Qualitative feedback highlighted that the chatbot was helpful for reflecting on drinking habits, setting realistic goals, and exploring strategies for positive change. Participants also described the chatbot as convenient and easy to use.
(Image Source: Suffoletto, 2025)
Engagement was moderate, with mixed reviews on conversational quality
On average, participants engaged with the chatbot for 6 minutes in Phase 1 and for 11 minutes in Phase II. The number of participant responses per session dropped slightly from 15 in Phase I to 13 in Phase II. Feedback on the chatbot’s conversational style was mixed – some individuals found it natural and emotionally supportive, while others felt its responses were formulaic or lacked personalization.
The chatbot demonstrated high motivational interviewing fidelity and elicited change talk
Motivational interviewing fidelity improved after model refinements between Phase I and II. Relational subscale scores on the Client Evaluation of Motivational Interviewing increased significantly from 67.2 in Phase I to 82.6 in Phase II. Technical subscale scores also improved from 69.6 in Phase I to 81.3 in Phase II. These Phase II scores outperformed the benchmark of 80 set by the research team. The chatbot elicited a high proportion of within-session change talk – rising from 65.2% in Phase I to 75.8% in Phase II. Although these latter two differences were not statistically significant, statistical significance can be difficult to reach in small pilot study samples like these. Overall, these differences appear to be meaningful.
WHAT ARE THE IMPLICATIONS OF THE STUDY FINDINGS?
This study found that a large language model (LLM)-based chatbot, specifically trained in motivational interviewing for alcohol-related behavior change in young adults, was safe, highly acceptable, and demonstrated strong fidelity to motivational interviewing principles. Participants valued the chatbot’s convenience, ease of use, and usefulness in promoting self-reflection, goal setting, and strategy development for behavior change. Some described their chatbot interaction as natural and supportive. However, others reported that the chatbot’s responses felt formulaic and impersonal, which may have contributed to relatively limited engagement. Still, with careful development and thorough validation before deployment, LLM-based chatbots may be capable of delivering motivational interviewing-based alcohol counseling with fidelity – potentially offering a scalable and accessible means of encouraging positive alcohol-related behavior change.
In this study, participants interacted with the chatbot for only 6-11 minutes on average across the two phases – raising questions about whether such tools can sustain user engagement long enough to produce meaningful behavior change. Future studies should use randomized controlled designs to evaluate whether this and similar LLM-based chatbots trained in alcohol intervention are as effective as – or more effective than – straightforward alcohol use tracking, as well as existing digital tools or in-person interventions. Enhancing the personalization and engagement capabilities of AI chatbots may also be critical for sustaining user interaction and supporting long-term behavior change. Additional research should also explore how LLM-based chatbots can complement traditional care – for example, by offering support between therapy sessions or providing access during times when clinicians are unavailable, such as evenings and weekends. If shown to be effective, LLM-based chatbots could offer a scalable solution for reaching young adults – a group that has historically been less likely to engage with conventional alcohol treatment compared to other age groups.
This was a single-arm pilot study without a comparison group or follow-up data, so it is not possible to draw causal conclusions or assess whether interaction with the chatbot led to actual reductions in alcohol use.
Participants were recruited from Prolific, an online crowdsourcing platform, and were predominantly White, male, and college-educated. This limits the generalizability of findings, as the sample may not reflect the broader young adult population, particularly those with less digital access or more diverse sociodemographic backgrounds.
BOTTOM LINE
This large language model (LLM)-based chatbot – specifically trained in motivational interviewing for alcohol-related behavior change – was found to be safe, acceptable to young adults, and demonstrated strong adherence to motivational interviewing principles. Participants found the chatbot easy to use and helpful for reflecting on their drinking, setting goals, and exploring strategies for change. However, some participants noted that the chatbot’s responses felt impersonal or formulaic, underscoring the need for improved personalization and conversational depth. These findings suggest that LLM-based chatbots may offer a scalable, low-barrier digital tool to support alcohol-related behavior change in young adults. However, further research is needed to determine whether engagement with such tools leads to meaningful reductions in alcohol use over time.
For individuals and families seeking recovery: This study found that a large language model (LLM)-based chatbot for alcohol counseling was safe, highly acceptable to young adults, and aligned with motivational interviewing standards. Such chatbots may offer a nonjudgmental space for individuals to reflect on their drinking habits, which can be helpful for those considering or contemplating change. While not a replacement for therapy, LLM-based chatbots may serve as a useful entry point for alcohol-related behavior change. However, it is important to note that there is not yet evidence that these tools lead to meaningful reductions in alcohol use. It’s also important to ensure that any chatbot being used has been properly validated and is hosted on a secure platform to protect privacy.
For treatment professionals and treatment systems: Findings from this study suggest that LLM-based chatbots, when trained in motivational interviewing for alcohol-related behavior change, may offer a safe and acceptable way to support young adults in reflecting on their drinking habits. These tools could be particularly helpful for individuals who prefer digital formats or who face barriers to traditional behavioral health services, including limited access, cost, or stigma. As AI tools become more widely available, it’s important for treatment professionals to recognize that young people may already be turning to them for support – and to ensure that the tools being used are evidence-based, secure, and thoughtfully integrated into care systems when appropriate.
For scientists: While this pilot study demonstrates that an LLM-based chatbot with specialized training in motivational interviewing safely adhered to motivational interviewing standards and was acceptable to young adults, its effectiveness in producing changes in alcohol-related outcomes remains unknown. Future randomized controlled trials are needed to evaluate the efficacy of this and similar LLM-based chatbots, including comparisons to existing digital and in-person interventions. Research is also needed to explore for whom these tools are most effective, how they can be deployed within treatment systems as adjunct supports that may offer added value, and how they can be refined to be more engaging and personalized for young adults.
For policy makers: Investing in innovative digital solutions – such as LLM-based chatbots – may help expand access to early intervention for alcohol use among young adults. Yet, as commercial interest in AI-driven health technologies accelerates, so does the risk of deploying untested or potentially harmful applications without sufficient evidence. Supporting rigorously designed research to develop, validate, and responsibly integrate these technologies into real-world settings is essential to ensure that safe and empirically supported solutions are prioritized.
Young adults (ages 18-25) engage in the highest levels of alcohol use of any age group but have the lowest levels of treatment engagement. Barriers like low perceived need, stigma, time constraints, cost, reluctance to fully abstain, and limited access to providers often prevent individuals from seeking traditional, in-person care. Low-threshold and familiar digital tools may help bridge this gap by offering more easily accessible, private, and flexible options for young adults to reflect on and self-manage their alcohol use.
While LLMs like GPT-4 have been shown to provide accurate responses to alcohol and other-drug related prompts and have demonstrated early promise in mental health applications, their potential for delivering direct alcohol counseling remains largely unexplored. This pilot study aimed to fill that gap by developing a secure LLM-based chatbot trained to deliver motivational interviewing for alcohol-related behavior change in young adults, and evaluated whether the chatbot could do so safely, acceptably, and in alignment with motivational interviewing standards.
HOW WAS THIS STUDY CONDUCTED?
This was a 2-phase, single-arm pilot study testing a GPT-4 powered chatbot with specialized training in motivational interviewing for alcohol-related behavior change. Participants (N = 45) were ages 18-25 (Mean age = 23.6 years), reported consuming at least 10 standard drinks per week, and engaged with the chatbot during a single session. After initial testing in Phase I (n = 8), the researchers refined the chatbot based on feedback and tested the updated version in Phase II (n = 37). The study sample was predominantly white (66.7%), male (71.1%), and college-educated (82.2%) with an average AUDIT score of 7.6, indicating moderate alcohol use severity.
The researchers developed their motivational interviewing chatbot by using Open AI’s GPT-4 and further refined the chatbot through prompt engineering and domain-specific fine-tuning to enhance alignment with motivational interviewing principles. The chatbot was securely hosted on university servers using HIPAA-compliant infrastructure to ensure participant privacy and data protection.
To train the chatbot, the researchers used a publicly available dataset of counseling dialogues coded for motivational interviewing fidelity, extracting 134 alcohol-related sessions to identify relevant high-quality client-therapist exchanges. These were used to refine the chatbot’s responses through iterative prompt development and qualitative review. They then generated 20 synthetic conversations using GPT-4 seeded with anonymized data from a prior clinical trial involving young adults with heavy episodic drinking, to test and validate the chatbot’s consistency before deploying to participants.
Participants were recruited via an online crowdsourcing platform. After providing informed consent, they completed a baseline survey (demographics, motivation to change, and alcohol use severity), interacted with the chatbot, and completed a post-interaction survey assessing usability (System Usability Scale; e.g., “I thought the system was easy to use”) and motivational interviewing fidelity (Client Evaluation of Motivational Interviewing; e.g., “The counselor seemed to understand how I see things;” “The counselor helped me see how my behavior fits with what I want in life”). Participants were told they could interact with the chatbot as much or as little as they wanted. They were also invited to share open-ended feedback on their experience communicating with the AI chatbot about their alcohol use.
Primary outcomes included usability and motivational interviewing fidelity. The authors also analyzed transcripts for safety concerns, intervention fidelity, and coded sessions for change talk versus sustain talk – motivational interviewing terms used to denote speech that is characteristic of changing one’s drinking versus not changing one’s drinking, respectively. Outcomes were compared between Phases I and II to assess whether prompt refinements led to improvements in chatbot performance.
WHAT DID THIS STUDY FIND?
The chatbot was safe and highly acceptable to young adults
No inappropriate or unsafe responses were identified in transcript reviews. Usability scores were high, with participants rating the chatbot 85.4 out of 100 in Phase I and 80.9 in Phase II on the System Usability Scale (higher than the benchmark of 68 set by the research team). Qualitative feedback highlighted that the chatbot was helpful for reflecting on drinking habits, setting realistic goals, and exploring strategies for positive change. Participants also described the chatbot as convenient and easy to use.
(Image Source: Suffoletto, 2025)
Engagement was moderate, with mixed reviews on conversational quality
On average, participants engaged with the chatbot for 6 minutes in Phase 1 and for 11 minutes in Phase II. The number of participant responses per session dropped slightly from 15 in Phase I to 13 in Phase II. Feedback on the chatbot’s conversational style was mixed – some individuals found it natural and emotionally supportive, while others felt its responses were formulaic or lacked personalization.
The chatbot demonstrated high motivational interviewing fidelity and elicited change talk
Motivational interviewing fidelity improved after model refinements between Phase I and II. Relational subscale scores on the Client Evaluation of Motivational Interviewing increased significantly from 67.2 in Phase I to 82.6 in Phase II. Technical subscale scores also improved from 69.6 in Phase I to 81.3 in Phase II. These Phase II scores outperformed the benchmark of 80 set by the research team. The chatbot elicited a high proportion of within-session change talk – rising from 65.2% in Phase I to 75.8% in Phase II. Although these latter two differences were not statistically significant, statistical significance can be difficult to reach in small pilot study samples like these. Overall, these differences appear to be meaningful.
WHAT ARE THE IMPLICATIONS OF THE STUDY FINDINGS?
This study found that a large language model (LLM)-based chatbot, specifically trained in motivational interviewing for alcohol-related behavior change in young adults, was safe, highly acceptable, and demonstrated strong fidelity to motivational interviewing principles. Participants valued the chatbot’s convenience, ease of use, and usefulness in promoting self-reflection, goal setting, and strategy development for behavior change. Some described their chatbot interaction as natural and supportive. However, others reported that the chatbot’s responses felt formulaic and impersonal, which may have contributed to relatively limited engagement. Still, with careful development and thorough validation before deployment, LLM-based chatbots may be capable of delivering motivational interviewing-based alcohol counseling with fidelity – potentially offering a scalable and accessible means of encouraging positive alcohol-related behavior change.
In this study, participants interacted with the chatbot for only 6-11 minutes on average across the two phases – raising questions about whether such tools can sustain user engagement long enough to produce meaningful behavior change. Future studies should use randomized controlled designs to evaluate whether this and similar LLM-based chatbots trained in alcohol intervention are as effective as – or more effective than – straightforward alcohol use tracking, as well as existing digital tools or in-person interventions. Enhancing the personalization and engagement capabilities of AI chatbots may also be critical for sustaining user interaction and supporting long-term behavior change. Additional research should also explore how LLM-based chatbots can complement traditional care – for example, by offering support between therapy sessions or providing access during times when clinicians are unavailable, such as evenings and weekends. If shown to be effective, LLM-based chatbots could offer a scalable solution for reaching young adults – a group that has historically been less likely to engage with conventional alcohol treatment compared to other age groups.
This was a single-arm pilot study without a comparison group or follow-up data, so it is not possible to draw causal conclusions or assess whether interaction with the chatbot led to actual reductions in alcohol use.
Participants were recruited from Prolific, an online crowdsourcing platform, and were predominantly White, male, and college-educated. This limits the generalizability of findings, as the sample may not reflect the broader young adult population, particularly those with less digital access or more diverse sociodemographic backgrounds.
BOTTOM LINE
This large language model (LLM)-based chatbot – specifically trained in motivational interviewing for alcohol-related behavior change – was found to be safe, acceptable to young adults, and demonstrated strong adherence to motivational interviewing principles. Participants found the chatbot easy to use and helpful for reflecting on their drinking, setting goals, and exploring strategies for change. However, some participants noted that the chatbot’s responses felt impersonal or formulaic, underscoring the need for improved personalization and conversational depth. These findings suggest that LLM-based chatbots may offer a scalable, low-barrier digital tool to support alcohol-related behavior change in young adults. However, further research is needed to determine whether engagement with such tools leads to meaningful reductions in alcohol use over time.
For individuals and families seeking recovery: This study found that a large language model (LLM)-based chatbot for alcohol counseling was safe, highly acceptable to young adults, and aligned with motivational interviewing standards. Such chatbots may offer a nonjudgmental space for individuals to reflect on their drinking habits, which can be helpful for those considering or contemplating change. While not a replacement for therapy, LLM-based chatbots may serve as a useful entry point for alcohol-related behavior change. However, it is important to note that there is not yet evidence that these tools lead to meaningful reductions in alcohol use. It’s also important to ensure that any chatbot being used has been properly validated and is hosted on a secure platform to protect privacy.
For treatment professionals and treatment systems: Findings from this study suggest that LLM-based chatbots, when trained in motivational interviewing for alcohol-related behavior change, may offer a safe and acceptable way to support young adults in reflecting on their drinking habits. These tools could be particularly helpful for individuals who prefer digital formats or who face barriers to traditional behavioral health services, including limited access, cost, or stigma. As AI tools become more widely available, it’s important for treatment professionals to recognize that young people may already be turning to them for support – and to ensure that the tools being used are evidence-based, secure, and thoughtfully integrated into care systems when appropriate.
For scientists: While this pilot study demonstrates that an LLM-based chatbot with specialized training in motivational interviewing safely adhered to motivational interviewing standards and was acceptable to young adults, its effectiveness in producing changes in alcohol-related outcomes remains unknown. Future randomized controlled trials are needed to evaluate the efficacy of this and similar LLM-based chatbots, including comparisons to existing digital and in-person interventions. Research is also needed to explore for whom these tools are most effective, how they can be deployed within treatment systems as adjunct supports that may offer added value, and how they can be refined to be more engaging and personalized for young adults.
For policy makers: Investing in innovative digital solutions – such as LLM-based chatbots – may help expand access to early intervention for alcohol use among young adults. Yet, as commercial interest in AI-driven health technologies accelerates, so does the risk of deploying untested or potentially harmful applications without sufficient evidence. Supporting rigorously designed research to develop, validate, and responsibly integrate these technologies into real-world settings is essential to ensure that safe and empirically supported solutions are prioritized.
Young adults (ages 18-25) engage in the highest levels of alcohol use of any age group but have the lowest levels of treatment engagement. Barriers like low perceived need, stigma, time constraints, cost, reluctance to fully abstain, and limited access to providers often prevent individuals from seeking traditional, in-person care. Low-threshold and familiar digital tools may help bridge this gap by offering more easily accessible, private, and flexible options for young adults to reflect on and self-manage their alcohol use.
While LLMs like GPT-4 have been shown to provide accurate responses to alcohol and other-drug related prompts and have demonstrated early promise in mental health applications, their potential for delivering direct alcohol counseling remains largely unexplored. This pilot study aimed to fill that gap by developing a secure LLM-based chatbot trained to deliver motivational interviewing for alcohol-related behavior change in young adults, and evaluated whether the chatbot could do so safely, acceptably, and in alignment with motivational interviewing standards.
HOW WAS THIS STUDY CONDUCTED?
This was a 2-phase, single-arm pilot study testing a GPT-4 powered chatbot with specialized training in motivational interviewing for alcohol-related behavior change. Participants (N = 45) were ages 18-25 (Mean age = 23.6 years), reported consuming at least 10 standard drinks per week, and engaged with the chatbot during a single session. After initial testing in Phase I (n = 8), the researchers refined the chatbot based on feedback and tested the updated version in Phase II (n = 37). The study sample was predominantly white (66.7%), male (71.1%), and college-educated (82.2%) with an average AUDIT score of 7.6, indicating moderate alcohol use severity.
The researchers developed their motivational interviewing chatbot by using Open AI’s GPT-4 and further refined the chatbot through prompt engineering and domain-specific fine-tuning to enhance alignment with motivational interviewing principles. The chatbot was securely hosted on university servers using HIPAA-compliant infrastructure to ensure participant privacy and data protection.
To train the chatbot, the researchers used a publicly available dataset of counseling dialogues coded for motivational interviewing fidelity, extracting 134 alcohol-related sessions to identify relevant high-quality client-therapist exchanges. These were used to refine the chatbot’s responses through iterative prompt development and qualitative review. They then generated 20 synthetic conversations using GPT-4 seeded with anonymized data from a prior clinical trial involving young adults with heavy episodic drinking, to test and validate the chatbot’s consistency before deploying to participants.
Participants were recruited via an online crowdsourcing platform. After providing informed consent, they completed a baseline survey (demographics, motivation to change, and alcohol use severity), interacted with the chatbot, and completed a post-interaction survey assessing usability (System Usability Scale; e.g., “I thought the system was easy to use”) and motivational interviewing fidelity (Client Evaluation of Motivational Interviewing; e.g., “The counselor seemed to understand how I see things;” “The counselor helped me see how my behavior fits with what I want in life”). Participants were told they could interact with the chatbot as much or as little as they wanted. They were also invited to share open-ended feedback on their experience communicating with the AI chatbot about their alcohol use.
Primary outcomes included usability and motivational interviewing fidelity. The authors also analyzed transcripts for safety concerns, intervention fidelity, and coded sessions for change talk versus sustain talk – motivational interviewing terms used to denote speech that is characteristic of changing one’s drinking versus not changing one’s drinking, respectively. Outcomes were compared between Phases I and II to assess whether prompt refinements led to improvements in chatbot performance.
WHAT DID THIS STUDY FIND?
The chatbot was safe and highly acceptable to young adults
No inappropriate or unsafe responses were identified in transcript reviews. Usability scores were high, with participants rating the chatbot 85.4 out of 100 in Phase I and 80.9 in Phase II on the System Usability Scale (higher than the benchmark of 68 set by the research team). Qualitative feedback highlighted that the chatbot was helpful for reflecting on drinking habits, setting realistic goals, and exploring strategies for positive change. Participants also described the chatbot as convenient and easy to use.
(Image Source: Suffoletto, 2025)
Engagement was moderate, with mixed reviews on conversational quality
On average, participants engaged with the chatbot for 6 minutes in Phase 1 and for 11 minutes in Phase II. The number of participant responses per session dropped slightly from 15 in Phase I to 13 in Phase II. Feedback on the chatbot’s conversational style was mixed – some individuals found it natural and emotionally supportive, while others felt its responses were formulaic or lacked personalization.
The chatbot demonstrated high motivational interviewing fidelity and elicited change talk
Motivational interviewing fidelity improved after model refinements between Phase I and II. Relational subscale scores on the Client Evaluation of Motivational Interviewing increased significantly from 67.2 in Phase I to 82.6 in Phase II. Technical subscale scores also improved from 69.6 in Phase I to 81.3 in Phase II. These Phase II scores outperformed the benchmark of 80 set by the research team. The chatbot elicited a high proportion of within-session change talk – rising from 65.2% in Phase I to 75.8% in Phase II. Although these latter two differences were not statistically significant, statistical significance can be difficult to reach in small pilot study samples like these. Overall, these differences appear to be meaningful.
WHAT ARE THE IMPLICATIONS OF THE STUDY FINDINGS?
This study found that a large language model (LLM)-based chatbot, specifically trained in motivational interviewing for alcohol-related behavior change in young adults, was safe, highly acceptable, and demonstrated strong fidelity to motivational interviewing principles. Participants valued the chatbot’s convenience, ease of use, and usefulness in promoting self-reflection, goal setting, and strategy development for behavior change. Some described their chatbot interaction as natural and supportive. However, others reported that the chatbot’s responses felt formulaic and impersonal, which may have contributed to relatively limited engagement. Still, with careful development and thorough validation before deployment, LLM-based chatbots may be capable of delivering motivational interviewing-based alcohol counseling with fidelity – potentially offering a scalable and accessible means of encouraging positive alcohol-related behavior change.
In this study, participants interacted with the chatbot for only 6-11 minutes on average across the two phases – raising questions about whether such tools can sustain user engagement long enough to produce meaningful behavior change. Future studies should use randomized controlled designs to evaluate whether this and similar LLM-based chatbots trained in alcohol intervention are as effective as – or more effective than – straightforward alcohol use tracking, as well as existing digital tools or in-person interventions. Enhancing the personalization and engagement capabilities of AI chatbots may also be critical for sustaining user interaction and supporting long-term behavior change. Additional research should also explore how LLM-based chatbots can complement traditional care – for example, by offering support between therapy sessions or providing access during times when clinicians are unavailable, such as evenings and weekends. If shown to be effective, LLM-based chatbots could offer a scalable solution for reaching young adults – a group that has historically been less likely to engage with conventional alcohol treatment compared to other age groups.
This was a single-arm pilot study without a comparison group or follow-up data, so it is not possible to draw causal conclusions or assess whether interaction with the chatbot led to actual reductions in alcohol use.
Participants were recruited from Prolific, an online crowdsourcing platform, and were predominantly White, male, and college-educated. This limits the generalizability of findings, as the sample may not reflect the broader young adult population, particularly those with less digital access or more diverse sociodemographic backgrounds.
BOTTOM LINE
This large language model (LLM)-based chatbot – specifically trained in motivational interviewing for alcohol-related behavior change – was found to be safe, acceptable to young adults, and demonstrated strong adherence to motivational interviewing principles. Participants found the chatbot easy to use and helpful for reflecting on their drinking, setting goals, and exploring strategies for change. However, some participants noted that the chatbot’s responses felt impersonal or formulaic, underscoring the need for improved personalization and conversational depth. These findings suggest that LLM-based chatbots may offer a scalable, low-barrier digital tool to support alcohol-related behavior change in young adults. However, further research is needed to determine whether engagement with such tools leads to meaningful reductions in alcohol use over time.
For individuals and families seeking recovery: This study found that a large language model (LLM)-based chatbot for alcohol counseling was safe, highly acceptable to young adults, and aligned with motivational interviewing standards. Such chatbots may offer a nonjudgmental space for individuals to reflect on their drinking habits, which can be helpful for those considering or contemplating change. While not a replacement for therapy, LLM-based chatbots may serve as a useful entry point for alcohol-related behavior change. However, it is important to note that there is not yet evidence that these tools lead to meaningful reductions in alcohol use. It’s also important to ensure that any chatbot being used has been properly validated and is hosted on a secure platform to protect privacy.
For treatment professionals and treatment systems: Findings from this study suggest that LLM-based chatbots, when trained in motivational interviewing for alcohol-related behavior change, may offer a safe and acceptable way to support young adults in reflecting on their drinking habits. These tools could be particularly helpful for individuals who prefer digital formats or who face barriers to traditional behavioral health services, including limited access, cost, or stigma. As AI tools become more widely available, it’s important for treatment professionals to recognize that young people may already be turning to them for support – and to ensure that the tools being used are evidence-based, secure, and thoughtfully integrated into care systems when appropriate.
For scientists: While this pilot study demonstrates that an LLM-based chatbot with specialized training in motivational interviewing safely adhered to motivational interviewing standards and was acceptable to young adults, its effectiveness in producing changes in alcohol-related outcomes remains unknown. Future randomized controlled trials are needed to evaluate the efficacy of this and similar LLM-based chatbots, including comparisons to existing digital and in-person interventions. Research is also needed to explore for whom these tools are most effective, how they can be deployed within treatment systems as adjunct supports that may offer added value, and how they can be refined to be more engaging and personalized for young adults.
For policy makers: Investing in innovative digital solutions – such as LLM-based chatbots – may help expand access to early intervention for alcohol use among young adults. Yet, as commercial interest in AI-driven health technologies accelerates, so does the risk of deploying untested or potentially harmful applications without sufficient evidence. Supporting rigorously designed research to develop, validate, and responsibly integrate these technologies into real-world settings is essential to ensure that safe and empirically supported solutions are prioritized.