
Apple: LLMs Accurately Recognize Activities from Captioned Audio and Motion Data
LLMs can accurately recognize daily activities by fusing captioned audio and motion data—boosting performance without raw audio or specialized multimodal training.

LLMs can accurately recognize daily activities by fusing captioned audio and motion data—boosting performance without raw audio or specialized multimodal training.