publication venue for
- CvT: Introducing Convolutions to Vision Transformers 2021
- DocFormer: End-to-End Transformer for Document Understanding 2021
- How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild 2021
- On Exposing the Challenging Long Tail in Future Prediction of Traffic Actors 2021