Paper Note: Swin Transformer
A new ViT whose representation is computed with Shifted windows*!*** ...
A new ViT whose representation is computed with Shifted windows*!*** ...
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision. ...
ViT applies a standard Transformer directly to images ...
The Transformer, the first sequence transduction model based entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention. ...
BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. ...