By Mike Newman
Last week, I attended an incredibly informative webinar hosted by Jim Lundy, CEO & Founder of Aragon Research. Titled “The Future of Communications & Collaboration,” Jim drilled deeply into the current state of AI technologies, their strengths and shortcomings in the areas of unified communications, intelligent video conferencing, and call center automation, and what an enterprise can reasonably expect and benefit from AI in the future. Rather than speaking in broad generalizations, Jim makes the effort to apply an extremely precise and discriminating eye to vertical specific use cases and aptly presents the dangers inherent in both falling for the over-promise or, alternatively, waiting too long to instrument AI for important use cases today.
Borrowing from Jim’s analysis and applying it to webcast production as a correlate of video communications, it’s clear that automation and AI are well on their way to enabling autonomous start-to-finish webcast production, distribution, editing, and archival. If the bar being set is “whether it can be done with little to no human intervention, “assume we are already there. The challenge for organizations evaluating the adoption of AI, then, is to determine for which video applications this bar is sufficient.
One area of video production that has already seen a tremendous and positive impact from AI is transcription and translation. Not long ago, the transcriptions options available were less than appealing: i) hire a very expensive live person; ii) take your chances with an automated solution delivering a 70% or lower accuracy level; or iii) wait to transcribe until you have an archive. With the emergence of highly customizable dictionaries and better closed captioning technologies, accuracy rates for automated solutions consistently improved. With AI and machine learning, the improved accuracy trend has steadily continued, now achieving levels of accuracy once reserved for very expensive options, efficiently and very affordably.
Within the world of large, live webcasts, enterprises are generally extremely committed to high quality and, not surprisingly, averse to risk. Consequently, it was not uncommon to see major investment in human transcription because, while expensive, it consistently offered the highest quality with the least risk. Now, however, with advancements in AI capabilities, enterprises have an array of options designed to fit key use cases. For example, it is not uncommon to see “Text to Search” powered by traditional closed captioning technologies, meetings and small events powered by slightly more expensive and processor intensive AI, and large events supported by professional stenographers at a comparatively high cost, but one that is justified by the quality and mitigation of risk. These options enable organizations to avoid one-size-fits all and to carefully tailor their investments to the use case.
Expanding on this example, we are seeing new AI solutions being used to automate the process of video switching. Building upon traditional sound/voice recognition technologies, AI can automatically highlight a speaker (or, unfortunately, someone coughing) and can now generate and collect speaker information from visual recognition. These are “neat” capabilities and work fantastically for meetings and small events where a professional touch is neither needed nor justified. But for large live webcasts, it is common to have each element carefully choreographed, non-speaking presenters carefully muted, and non-presenters highlighted at times to show their engagement and nods of agreement or disagreement. In these environments, switching video inputs is an important part of the production and storytelling rather than a stimulus/response reaction to noise.
As organizations evaluate AI technologies as part of their communications strategies, it is important, as Jim Lundy advised, to proceed with caution and a critical eye. It is an extremely powerful technology that can add immense value in discrete areas of the communications ecosystem. My guess is that, in time, we will see the artists engaging AI for the more menial production tasks, like transcription, while continuing to provide a human touch understanding of the presenter(s), subject matter and the audience.