Cloudinary is a cloud-based platform that automatically delivers optimized images and videos, enhancing the user experience for businesses. The Cloudinary Customer Education team’s job is to make their customers understand how to use Cloudinary’s image- and video-management software. Video is their primary tool.
Without the right tools, it will take you forever to get your videos the way you need them to be. I’d tell anybody new to video for customer education, or learning and development: without Descript, you’re going to find yourself stuck in an editing abyss.
Cloudinary’s Customer Education team’s job is to make videos that teach its customers how to use its software. But they kept getting bogged down in a few parts of the editing process.
First, neither human nor AI transcription services could accurately transcribe Cloudinary’s videos, which were loaded with technical terms, proprietary lingo, and acronyms (and frequently featured non-native English speakers). The transcripts they published had to be accurate, but correcting them took hours.
Second, Cloudinary’s in-house experts often recorded their training sessions live, then handed them off to the Customer Education team to edit for publication. The team wanted to edit out some, but not all, of the experts’ filler words—and they wanted them all removed from the transcript. Each of those steps added hours to their production time.
Finally, the team sent high-quality microphones to their experts, but they couldn’t do much about it if an expert didn’t set up their mic correctly, or recorded in a room full of reverb.
All of this combined to slow the video-production process to a crawl, which blocked the team from doing what they, and Cloudinary, wanted them to be doing: making more video for their customers.
Industry-leading transcription that gets better the more you use it
Without Descript
An editor spent hours poring over every transcript, often correcting the same words, over and over
With Descript
Accurate transcripts that get better every time, thanks to AI that learns as it goes
The Cloudinary team uses Descript’s transcription glossary feature to teach the AI those especially tricky terms, so when it hears “JSON,” Descript transcribes it that way, not as “Jason.” Descript’s AI transcription also learns and gets better the more you use it. After a few months, the Cloudinary team noticed they no longer had to correct one of their most common words: “Cloudinary.”
When there are corrections, Cloudinary’s editor can invite multiple editors to every video project, so experts can quickly scrub inaccuracies using Descript’s effortless transcript-correction tool. All the changes happen in the cloud, so editors can work simultaneously.
Filler words—all or some removed in a few clicks
Without Descript
Removing some filler words from edited livestreams was painfully slow; removing them all from transcripts was excruciating
With Descript
Removing “ums,” “uhs” and other filler words takes a few clicks
Many of Cloudinary’s instructional videos are recorded as livestreams. The team wanted to leave many filler words in, so the edited version would be similar to the live version. But removing them from the transcript during editing was a tedious, time-consuming process.
Descript’s automatic filler-word detection and removal enabled Cloudinary’s editors to remove every unwanted filler word from the transcript in a few clicks. Same for any excessive filler words in the video. Descript also adds room tone automatically to smooth the cuts, and the option to restore them quickly where removal feels unnatural.
We run Studio Sound over everything. It's absolutely a must-use feature in our videos.
Studio-quality sound, wherever and however the experts record
Without Descript
Scratchy, hollow sound on many videos
With Descript
Warm, clear sound on every video
You can send quality microphones to your subject matter experts, but you can’t be sure they’ll set them up right, or that the room they record in won’t be full of reverb, or that their neighbor’s leaf blower won’t start up midway through recording. That was a problem—both a potential distraction for viewers and an unprofessional look for the videos.
Descript’s Studio Sound solved it almost instantly. It uses AI voice re-generation to strip out background noise, reverb and other stuff you don’t want, then re-construct the voice audio so it sounds like it was recorded in a studio. Studio Sound works so well that Sam Brace no longer uses an external mic to record the Cloudinary podcast — he gets better sound by using his laptop mic, then applying Studio Sound in post-production.
Since introducing Descript into its video-editing workflow, the Cloudinary team has drastically reduced the time it takes to produce video—from 13 hours for a single episode of its video podcast to 4 hours. That’s enabled them to go from one podcast episode every few months to two episodes a month with the same team, at the same cost. Same goes for the other videos the team makes.
Without Descript
13 hours
to produce a video podcast
1 video
produced every few months
Without Descript
With Descript
4 hours
to produce a video podcast
2 videos
per month at the same cost
Our free plan shows you what Descript can do, no credit card required. When you need more horsepower, paid plans start at $12 per month.