By Agam Shah
Copyright computerworld
AI is very text-based right now. Are you running into issues with rich media and multimodal content? “Many top models have multimodal capability. They handle not just text, but images, audio, and video. The technology works similarly. It converts pixels into tokens with positional information, like text. It’s mind-blowing when you see AI look at an image or video and truly understand it, not just label it.
“We have media and entertainment customers with storyboards where AI can create stories from comic strips. It looks at pictures and creates narratives. This is huge for unstructured data going forward — doing AI across modalities.
Could you talk about some upcoming features? “We focus on making more sophisticated agents that can do complex tasks and workflows. Enterprises will automate tasks that were too long or expensive for people to do.