BigSmall: Efficient Multi-Task Learning for Disparate Spatial and Temporal Physiological Measurements

Girish Narayanswamy, Yujia (Nancy) Liu, Yuzhe Yang, Jack (Chengqian) Ma, Xin Liu, Daniel McDuff, Shwetak Patel
BigSmall efficiently multitasks physiological signals by leveraging spatiotemporal scales. It is comprised of a Big branch that models high fidelity spatial features and Small branch that models temporal dynamics.


Understanding of human visual perception has historically inspired the design of computer vision architectures. As an example, perception occurs at different scales both spatially and temporally, suggesting that the extraction of salient visual information may be made more effective by attending to specific features at varying scales. Visual changes in the body, due to physiological processes, also occur at varying scales and with modality-specific characteristic properties. Inspired by this, we present BigSmall, an efficient architecture for physiological and behavioral measurement. We present the first joint camera-based facial action, cardiac, and pulmonary measurement model. We propose a multi-branch network with wrapping temporal shift modules that yields efficiency gains and accuracy on par with task-optimized methods. We observe that fusing low-level features leads to suboptimal performance, but that fusing high level features enables efficiency gains with negligible losses in accuracy. We experimentally validate that BigSmall significantly reduces computational cost while achieving comparable results on multiple physiological measurement tasks simultaneously with a unified model.