Paper Note: Swin Transformer

A new ViT whose representation is computed with Shifted windows*!*** ...

August 10, 2023 · 3 min · 438 words · Me

Paper Note: Masked autoencoders(MAE) (very short)

Masked autoencoders (MAE) are scalable self-supervised learners for computer vision. ...

July 10, 2023 · 2 min · 353 words · Me

Paper Note: ViT

ViT applies a standard Transformer directly to images ...

July 6, 2023 · 1 min · 183 words · Me

Paper Note: Attention is All You Need

The Transformer, the first sequence transduction model based entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention. ...

February 1, 2023 · 4 min · 699 words · Me

Paper Note: BERT

BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. ...

December 2, 2022 · 4 min · 645 words · Me