June 23rd news, yesterdayJD.comAnnounceOpen SourceReal-time video visual languageInteractive Model JoyAI-VL-Interaction. Officially, this is the first all-inputing model and system in the world and is supported by vLLM-Omni from day-0。

JoyAI-VL-Interaction supports voice input output, visual interface, long-term memory, back-office model interface and vLLM deployment programme. According to Kyoto, developers can replace ASR, TTS, backstage models, external tools and operating modules, which can be converted into real-time AI assistants such as security surveillance, elder child care, live talk shows, electrician procurement, operational guidance, AI glasses or accessibility aids。
Official data show that in 58 cases of real-life blind evaluation, JoyAI-VL-Interaction had a total success rate of 77.6% compared to the total success rate of Gemini video call assistant 87.9%。
GitHub: Gythub.com/jd-opensource/JoyAI-VL-Interaction
Hugging Face: hugingface.co/jdopensource/JoyAI-VL-Interaction-Preview