GAIN: On the Generalization of Instructional Action Understanding
Junlong Li
Guangyi Chen
Yansong Tang
Jinan Bao
Kun Zhang
Jie Zhou
Jiwen Lu
[Paper]
[GitHub]
This is the project page for GAIN.

Abstract

Despite the great success achieved in instructional action understanding by deep learning and mountainous data, deploying trained models to the unseen environment still remains a great challenge, since it requires strong generalizability of models from in-distribution training data to out-of-distribution (OOD) data. In this paper, we introduce a benchmark, named GAIN, to analyze the GeneralizAbility of INstructional action understanding models. In GAIN, we reassemble steps of existing instructional video training datasets to construct the OOD tasks and then collect the corresponding videos. We evaluate the generalizability of models trained on in-distribution datasets with the performance on OOD videos and observe a significant performance drop. We further propose a simple yet effective approach, which cuts off the excessive contextual dependency of action steps by performing causal inference, to provide a potential direction for enhancing the OOD generalizability. In the experiments, we show that this simple approach can improve several baselines on both instructional action segmentation and detection tasks. We expect the introduction of the GAIN dataset will promote future in-depth research on the generalization of instructional video understanding.


Download

[Baidu Pan] Code: ui8y
[Google Drive]

Talk


Code

(a) The causal inference illustration for instructional action understanding. (b) Approximation with Monte Carlo method.

 [GitHub]


Paper and Supplementary Material

Li, Chen, Tang, Bao, Zhang, Zhou, Lu.
GAIN: On the Generalization of Instructional Action Understanding.
In ICLR, 2023.
(hosted on OpenReview)


[Bibtex]


Acknowledgements

This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFA0700802, in part by the National Natural Science Foundation of China under Grant 62125603, and in part by a grant from the Beijing Academy of Artificial Intelligence (BAAI).
This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.