Maximizing Training Impact with AI-Powered Audio and Video Tools

With so many different AI options available, and more coming online everyday, how do you know where to start? The key to maximizing training impact with AI-powered audio and video tools is to first understand your particular strengths and weaknesses. Then you can identify the right solution for your needs and weigh out the pros and cons. Afterwards, it’s just a matter of building your familiarity and prompting competency. Read on to learn how instructional designers like you put AI tools to work with solving common learning and development audio and video challenges.

Three Different People With Three Different AI Needs

Nancy is an Instructional Designer (ID) for a state health organization. She’s been tasked with adding audio to eLearning modules she has already created, but the team doesn’t want to use professional narrators due to cost constraints and the need for frequent updates. Nancy is uncomfortable using her own voice, though, as her voice shakes and cracks and she doesn’t like the way she sounds.

Thomas creates training and social media content for a mid-size business. His male business owner boss is the ”face” of the company, but Thomas would like to include female voices in his work as well to appeal to a wider audience, including people with accessibility needs.

Angelica is an instructor building a new online class, on a strict budget. She wants to make her PowerPoint presentations more engaging, and maybe branch out into making videos, but she’s not sure how or where to start.

Three different people, three different issues, one (fairly) simple solution: Artificial Intelligence, or AI.

Maximizing Training Impact Through AI Audio Usage

Let’s start with Nancy. She knows a bit about audio editing, so she first tried to clean up her own voice, using tools like Adobe Podcast, which is part of Adobe Creative Cloud. Podcast is very useful for cleaning up audio and removing echoes or room noise. But it didn’t change Nancy’s fundamental concern – her own voice. Her next thought was to use an AI voice for her narration. This was a pretty easy thing to find online, and there are even free options out there. The problem was, in general, AI voices sound like AI voices, a little too robotic and monotone to be real people. Nancy wasn‘t comfortable with that option, either.

Enter 11ElevenLabs, a fairly low cost, web-based tool. 11ElevenLabs offered a different, and for Nancy, a better choice – speech-to-speech. This gave Nancy a chance to choose from a variety of alternate voices, male and female, of different ages, accents and even multiple languages. Speech-to-speech takes your voice, your intonations, speed, pauses and more, and converts them into an AI voice. This means your AI character sounds like you – not like a robotic, monotone computerized person. To quote Nancy, “Catherine (my alter-ego) is a far smoother talker than me.“ It’s not perfect, and often requires some time, energy, and re-takes to get exactly what she wants, but for now it’s a great solution to her problems.

Thomas has chosen a similar path. Without access to a female narrator, or a way to pay for one, he turned to 11ElevenLabs to create Matilda, his alter-ego. Now Thomas simply voices what he wants Matilda to say and how he wants her to say it, and voila, his training and social media content now feature both male and female voices.

Maximizing Training Impact Through AI Video Usage

But what about video? AI can do that too, of course.

Here’s how it works, in layman’s terms. AI learns from thousands (or even millions) of videos and images. It looks at the video frame by frame (like a flipbook of images). It analyzes objects, people, and actions in each frame. Then it uses patterns and pre-learned knowledge to identify things like faces, cars, or specific movements. For example, if you want it to recognize a cat, it sees lots of cat photos and videos to understand what makes something look like a cat. Once the AI understands the video, it can perform tasks like identifying scenes, highlighting key moments, or even predicting what might happen next. Some video AIs can generate new content by combining what they’ve learned, like creating a realistic animation or editing a scene automatically (e.g., changing backgrounds or adding effects).

And the speed at which the technology is changing is truly mind boggling. Here’s a popular example of what AI will create when given the prompt “Will Smith eating spaghetti.”

This was AI video in 2023.

This 2024 video used the same basic “Will Smith eating spaghetti” prompt, but added a little extra enhancement in the background.

Now here’s the 2025 version.

It gets more remarkable. On December 9, 2024, OpenAI introduced Sora, a new video generation tool designed to take text, image, and video inputs and generate a new video as an output. Check out some of the incredible effects Sora can create, fundamentally out of nothing. (Sora is available to anyone with a ChatGPT Plus plan, for $20 a month.)

Just a week later, on December 16, 2024, Google unveiled its Veo 2. As of mid-January 2025, you need to put your name on a waiting list to access this tool, but it will be free and available via Google Labs, Google’s AI Tools community. Veo 2 is a huge step up from Sora already, with the capacity to generate videos in 4K as well as what’s being hailed as a better ability to accurately capture movement and physics.

Here's an example comparing videos from Sora and Veo 2 of someone cutting a tomato.

Ouch! That movement difference really shows up right? Who knows what’s next.

There are also much lower-tech uses for AI video, though. Let’s return to Angelica’s desire for more engaging PowerPoint presentations. When she started asking around, a fellow instructor told her about a tool called Descript. For as little as $12 a month, Descript can be used to make simple videos for education, tutorials, marketing, or just about anything else. The system makes it very easy to edit both audio and video, add stock images or video to your project, even generate ADA-required captions. In Angelica’s case, she created a script in Descript, recorded herself speaking right into the program, easily cleaned up her own audio, then added her PPT slides, some images and transitions to make it more visually interesting, and exported the whole thing out as a video her students can view in her online course shell. Angelica also created transcripts and summaries of her video, allowing her learners to search for keywords or review materials they missed. Her school’s Learning Management System even allows her to embed quiz questions in the video to ensure students are watching and paying attention.

Descript also contains an AI Assistant, to lend a virtual hand in your creations. The program can also be used to create and edit podcasts, blog posts, and more.

AI-Powered Audio And Video Tool Use Cases

As with any tool, however, there are pros and cons to keep in mind when you choose to use AI. Let’s explore a few of them.

Pros and Cons of Using AI Audio

Pro: AI tools can generate audio content in various languages or accents, helping to localize training material or facilitate language practice for global teams.

Con: Lack of cultural sensitivity in AI-generated voices can lead to misinterpretation. AI translations can vary in quality and miss geographical differentiations. Additionally, overly synthesized voices can feel monotonous or robotic, reducing engagement. (Using speech-to-speech as an alternative to text-to-speech can help ameliorate this concern.)

Pros and Cons of Using AI Video

Pro: AI can create interactive, customized training videos and audio content tailored to different departments, skill levels, and learning paces. This personalization can be especially impactful for large organizations with diverse learning needs.

Con: The AI-generated content might lack the nuanced, human touch that’s often valuable for conveying complex or sensitive topics, potentially making it feel impersonal.

AI Isn’t Going Away

Like it or not, Artificial Intelligence is here to stay. AI audio and video can allow you the opportunity to create more and spend less. The key is to use the tools correctly and ethically.

As with any AI tool, prompts are key. Output depends on input. What you get out is completely dependent on what you put in.

Be aware of copyright concerns, and the job concerns of the creatives who work with you. Technology still needs storytellers and storytelling.

Create a balanced approach. AI-driven audio and video tools can provide tremendous value but must be thoughtfully integrated. Tailor your use of these tools to match the strengths and limitations of each learning environment. And consider the future. As AI technology evolves, businesses should continue to:

assess and refine their training strategies to keep pace with advancements,
avoid damaging assumptions that undermine business results, and
ensure that employees benefit from the best of both eLearning and face-to-face training experiences.

And I’ll leave you with one more technology tidbit. I fed this blog post into a Google product called NotebookLM and asked it to create a podcast for me that includes some of the information you’ve read, but expands on it as well. (If the speakers sound familiar, they’re the same ones you heard in your Spotify Wrapped at the end of last year!)

Enjoy.

Written by Robin Herriff and published in 2025.

Subscribe to our newsletter today to automatically receive the next issue in your email inbox.