![]() |
市場調查報告書
商品編碼
1694625
汽車駕駛艙中的人工智慧應用(2025年)Research Report on the Application of AI in Automotive Cockpits, 2025 |
前大規模模型時代:座艙從機械化到電子化,再到智慧系統,整合小規模的AI模型,用於臉部和語音辨識等場景。
後大規模模型時代:人工智慧應用的範圍和體積不斷擴大,有效性顯著提升,但準確性和適應性仍存在差異。
多模態大規模語言模型(LLM)和推理模型:Cockpit 已從基礎智能發展到 "深度交互作用和自我進化" 階段。
座艙AI發展趨勢一:深度交互作用
深度互動體現在 "連動互動" 、 "多模態互動" 、 "個人化互動" 、 "主動互動" 和 "精準互動" 。
以 "精準互動" 為例,推理大規模模型不僅提升了語音互動的準確率,尤其是連續辨識的準確率,而且透過動態理解上下文,結合感測器融合處理數據,依托多任務學習架構同步處理導航、音樂等複合需求,響應速度較傳統方案提升40%。 2025年,在推理模型(如DeepSeek-R1)規模化部署後,端側推理能力可望加速自動語音辨識的進程,並進一步提升其準確率。
以 "多模態交互作用" 為例,我們可以利用大規模模型的多源資料處理能力,建構跨模態協同的智慧互動系統。透過3D相機與麥克風陣列的深度融合,系統能夠同時分析手勢指令、語音語意、環境特徵,在短時間內完成多模態意圖理解,相較於傳統方案速度提升60%。基於跨模態對齊模型,透過手勢控制與語音指令的協同,可以進一步降低複雜駕駛場景下的誤操作率。預計2025-2026年多模資料融合處理能力將成為新一代駕駛艙的標準配備。典型場景如下:
手勢控制:駕駛者可以使用揮手或指向等簡單手勢方便地控制車窗、天窗、音量、導航等功能,而不會在駕駛時分心。
臉部辨識與個人化:臉部辨識技術可自動辨識駕駛者,並根據個人喜好自動調整座椅、後視鏡、氣候控制和音樂等設置,提供個人化的 "車內" 體驗。
眼動追蹤與注意力監測:眼動追蹤技術可以監測駕駛者的注視方向和注意力狀態,及時發現疲勞駕駛、注意力不集中等危險行為,並進行預警提示,提高駕駛安全性。
情緒辨識與情緒互動:AI系統可以透過駕駛者的臉部表情和語氣判斷駕駛者的情緒狀態,例如焦慮、疲勞或興奮,並相應地調整車內的環境燈光、音樂、空調等,提供更貼心的情感服務。
座艙AI發展趨勢二:自我進化
2025年,駕駛艙代理將成為使用者與駕駛艙互動的媒介,其顯著特徵是體現在 "長期記憶" 、 "回饋學習" 和 "主動認知" 上的 "自我進化" 。
本報告對中國汽車產業進行了調查分析,並提供了國內外廠商在汽車駕駛艙中人工智慧應用的資訊。
Cockpit AI Application Research: From "Usable" to "User-Friendly," from "Deep Interaction" to "Self-Evolution"
From the early 2000s, when voice recognition and facial monitoring functions were first integrated into vehicles, to the rise of the "large model integration" trend in 2023, and further to 2025 when automakers widely adopt the reasoning model DeepSeek-R1, the application of AI in cockpits has evolved through three key phases:
Pre-large model era: Cockpits transitioned from mechanical to electronic and then to intelligent systems, integrating small AI models for scenarios like facial and voice recognition.
Post-large model era: AI applications expanded in scope and quantity, with significant improvements in effectiveness, though accuracy and adaptability remained inconsistent.
Multimodal large language models (LLMs) and reasoning models: Cockpits advanced from basic intelligence to a stage of "deep interaction and self-evolution."
Cockpit AI Development Trend 1: Deep Interaction
Deep interaction is reflected in "linkage interaction", "multi-modal interaction", "personalized interaction", "active interaction" and "precise interaction".
Taking "precise interaction" as an example, the inference large model not only improves the accuracy of voice interaction, especially the accuracy of continuous recognition, but also through dynamic understanding of context, combined with sensor fusion processing data, relying on multi-task learning architecture to synchronously process navigation, music and other composite requests, and the response speed is increased by 40% compared with traditional solutions. It is expected that in 2025, after the large-scale loading of inference models (such as DeepSeek-R1), end-side inference capabilities can make the automatic speech recognition process faster and further improve the accuracy.
Taking "multi-modal interaction" as an example, using the multi-source data processing capabilities of large models, a cross-modal collaborative intelligent interaction system can be built. Through the deep integration of 3D cameras and microphone arrays, the system can simultaneously analyze gesture commands, voice semantics and environmental characteristics, and complete multi-modal intent understanding in a short time, which is 60% faster than traditional solutions. Based on the cross-modal alignment model, gesture control and voice commands can be coordinated to further reduce the misoperation rate in complex driving scenarios. It is expected that in 2025-2026, multi-modal data fusion processing capabilities will become standard in the new generation of cockpits. Typical scenarios include:
Gesture control: Drivers can conveniently control functions such as windows, sunroof, volume, navigation, etc. through simple gestures, such as waving, pointing, etc., without distracting their driving attention.
Facial recognition and personalization: The system can automatically identify the driver through facial recognition technology, and automatically adjust the settings of seats, rearview mirrors, air conditioners, music, etc. according to their personal preferences, to achieve a personalized experience of "get in the car and enjoy".
Eye tracking and attention monitoring: Through eye tracking technology, the system can monitor the driver's gaze direction and attention state, detect risk behaviors such as fatigue driving and inattention in a timely manner, and provide early warning prompts to improve driving safety.
Emotional recognition and emotional interaction: AI systems can even identify the driver's emotional state, such as judging whether the driver is anxious, tired or excited through facial expressions, voice tone, etc., and adjust the ambient lighting, music, air conditioning, etc. in the car accordingly to provide more intimate emotional services.
Cockpit AI Development Trend 2: self-evolution
In 2025, the cockpit agent will become the medium for users to interact with the cockpit, and one of its salient features is "self-evolution", reflected in "long-term memory", "feedback learning", and "active cognition".
"Long-term memory", "feedback learning", and "active cognition" are gradual processes. AI constructs user portraits through voice communication, facial recognition, behavior analysis and other data to achieve "thousands of people and thousands of faces" services. This function uses reinforcement learning and reasoning related technology implementation, and the system relies on data closed-loop continuous learning of user behavior. Under the reinforcement learning mechanism, each user feedback becomes the key basis for optimizing the recommendation results.
With the continuous accumulation of data, the large model can more quickly discover the law of user interest point transfer, and can anticipate user requests in advance. It is expected that in the next two years, with the help of more advanced reinforcement learning algorithms and efficient reasoning architecture, the system will increase the mining speed of users' new areas of interest by 50%, and the accuracy of recommended results will be further improved. Such as:
BMW's cockpit system remembers driver seat preferences, frequented locations, and automatically dims ambient lights to relieve anxiety on rainy days;
Mercedes-Benz's voice assistant can recommend restaurants based on the user's schedule and reserve charging stations in advance.
BMW Intelligent Voice Assistant 2.0 is based on Amazon's Large Language Model (LLM) and combines the roles of personal assistant, vehicle expert and accompanying occupant to generate customized suggestions by analyzing the driver's daily route, music preferences and even seat adjustment habits. For example, if the system detects that the driver often stops at a coffee shop every Monday morning, it will proactively prompt in a similar situation: "Are you going to a nearby Starbucks?" In addition, the system can also adjust recommendations based on weather or traffic conditions, such as recommending indoor parking on rainy days; when the user says "Hello BMW, take me home", "Hello BMW, help me find a restaurant", the personal assistant can quickly plan a route and recommend a restaurant.
Cockpit AI Development Trend 3: Symbiosis of Large and Small Models
The large model has been on the bus for nearly two years, but the phenomenon of the large model "completely replacing" the small model has not occurred. With its lightweight and low power consumption characteristics, the small model performs well in end-side task scenarios with high real-time requirements and relatively small data processing. For example, in intelligent voice interaction, the small model can quickly parse commands such as "turn on the air conditioner" or "next song" to provide instant responses. Similarly, in gesture recognition, the small model realizes low-latency operation through local computing, avoiding the time lag of cloud transmission. This efficiency makes the small model the key to improving the user interaction experience.
In practical applications, the two complement each other; the large model is responsible for complex calculations in the background (such as path planning), while the small model focuses on the fast response of the front desk (such as voice control), jointly building an efficient and intelligent cockpit ecosystem. Especially inspired by DeepSeek's distillation technology, it is expected that after 2025, the end-side small models obtained by distilling high-performance large models will be mass-produced on a certain scale."
Taking NIO as an example, it runs its AI application in a two-wheel drive manner for large and small models as a whole, with a focus on large models, but it does not ignore the application of small models.
Relevant Definitions