We propose a novel framework for video understanding, called Tempoally Contextualized CLIP (TC-CLIP), which leverages essential temporal information through global interactions in a spatio-temporal ...