(Go: >> BACK << -|- >> HOME <<)

SlideShare a Scribd company logo
PR-433
Gandelsman, Yossi, et al. "Test-time training with masked autoencoders." Advances in Neural Information
Processing Systems 35 (2022): 29374-29385.
주성훈, VUNO Inc.
2023. 4. 16.
1. Research Background
2. Methods
1. Research Background 3
Reference
Sun, Yu, et al. "Test-time training with self-supervision for generalization under distribution
shifts." International conference on machine learning. PMLR, 2020.
•https://yueatsprograms.github.io/ttt/home.html
/ 24
2. Methods
1. Research Background 4
https://yueatsprograms.github.io/ttt/home.html
/ 24
2. Methods
1. Research Background 5
Problem Settings
Generalization under distribution shifts
•Generalization is intrinsically hard without access to training data from the test distribution
•The common practice is to avoid distribution shifts altogether by using a wider training
distribution that hopefully contains the test distribution – with more training data or data
augmentation.
Geirhos, Robert, et al. "Generalisation in humans and deep neural networks." Advances in neural information processing systems 31 (2018).
salt-and-pepper noise
uniform noise uniform noise
uniform noise
Hard to know the test distribution!
/ 24
2. Methods
1. Research Background 6
Test time training (Sun et al., ICML, 2020)
/ 24
2. Methods
1. Research Background 7
Test time training (Sun et al., ICML, 2020)
•The self-supervised pretext task employed by TTT is rotation prediction
This task is limited in generality, because it can often be too easy or too hard.
https://yueatsprograms.github.io/ttt/home.html
/ 24
2. Methods
1. Research Background 8
Autoencoders for representation learning
The most successful work is masked autoencoders (MAE)
•He, Kaiming, et al. "Masked autoencoders are scalable vision learners." CVPR. 2022.
•PR-355
Proposed method simply substitutes MAE for the self-supervised part of TTT
/ 24
2. Methods
2. Methods
2. Methods 10
Design choices - Architecture
•Y-shaped (original TTT paper)
https://yueatsprograms.github.io/ttt/home.html
/ 24
2. Methods
2. Methods 11
Design choices - Architecture
h •Main task (e.g. object recognition)
f
•MAE encoder: ViT
g
•MAE decoder : ViT
•ViT-Base (for ViT probing)
•Y-shaped (TTT-MAE)
/ 24
2. Methods
2. Methods 12
Training-time training: 1. training encoder and decoder
f
g
•MAE encoder, deocder: ViT-large, pre-trained
for 800 epochs on ImageNet-1k
• ViT probing: train only, with frozen. Here, is a ViT-Base.
h f h
/ 24
2. Methods
2. Methods 13
Training-time training: 2. training main task head
f
•MAE encoder: ViT-Large
• pre-trained for ImageNet-1k reconstruction
• cross entropy loss for classification
• encoder produced by MAE pre-training
•Augmentation: image cropping and horizontal flips
•No other augmentations (random changes in
brightness, contrast, color and sharpness )
•800 epochs
lm :
f0 :
Training set with samples
n
h
Main task head
xi
yi
/ 24
2. Methods
2. Methods 14
Test-time training
g0
Test input arrives,
x
•self-supervised reconstruction loss
(pixel-wise mean squared error),
•random mask (75%)
•SGD, for 20 steps, using a momentum of
0.9, weight decay of 0.2, batch size of 128,
and fixed learning rate of 5e-3.
ls
Make a prediction on as
x h ∘ fx(x)
f0
fx h Bir
Reset the weights to and for the next test input
f0 g0 x
•By test-time training on the test inputs independently, we do not
assume that they come from the same distribution.
/ 24
2. Methods
2. Methods 15
Optimizer for TTT
Figure 2: We experiment with two optimizers for TTT. MAE [19] uses AdamW for pre-training. But our results (left) show that
AdamW for TTT requires early stopping, which is unrealistic for generalization to unknown distributions without a validation
set. We instead use SGD, which keeps improving performance even after 20 steps (right).
•it simply takes the same optimizer setting as during the last epoch of training-time training of the
self-supervised task. (Original TTT)
•the learning rate schedule of MAE reaches zero by the end of pre-training.
•When Test-Time Training (TTT), excessive iterations with AdamW can negatively impact performance.
•more iterations with SGD consistently improve performance on all distribution shifts
/ 24
3. Experimental Results
2. Methods
3. Experimental Results 17
Calibration on out of distribution data
•15 types of corruption to the images of ImageNet-C, 5 levels of severity
• D. Hendrycks and T. Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. ICLR, 2018
/ 24
2. Methods
3. Experimental Results 18
Main results on ImageNet-C
TTT-MAE has higher performance gains in all corruptions than TTT-Rot, on top of their respective baselines.
• Joint Train: ResNet-16-layers, after joint training for rotation prediction and object recognition (baseline for TTT-Rot)
• TTT-Rot: original paper (rotation task, resnet-18)
• Baseline: pre-trained MAE encoder ViT probing (no TTT)
• TTT-MAE (red) on top of our baseline significantly improves performance.
/ 24
2. Methods
3. Experimental Results 19
TTT-MAE in rotation invariant classes
• Rotation invariant class: images are usually taken from top-down views
TTT-MAE is agnostic to rotation invariance and still helps on these classes.
/ 24
2. Methods
3. Experimental Results 20
Design choices - Training setup
1. Fine-tuning; train ◦ end-to-end. This
works poorly with TTT
2. ViT probing: train only, with frozen.
Here, is a ViT-Base.
3. Joint training: train both ◦ and ◦ ,
by summing their losses together. This is
used by TTT with rotation prediction. But
with MAE, it performs worse on the
ImageNet validation set
h f
h f
h
h f g f
h
f
g
Object classification
/ 24
2. Methods
3. Experimental Results 21
Accuracy comparison of three designs (ViT probing, fine-tuning, joint training)
•The first three rows are only for training-time training, after which a fixed model is applied during testing.
•Joint training does not achieve satisfactory performance on most corruptions
•Fine-tuning: initially performs better than ViT probing, it is not amenable to TTT
•TTT-MAE: TTT-MAE after ViT probing, which performs the best across all corruption types
/ 24
2. Methods
3. Experimental Results 22
Performance on other ImageNet variants
ImageNet-R
• ImageNet-R is a benchmark dataset for evaluating robustness of image classification
• The dataset includes images that are synthetically generated from the original
ImageNet images in a variety of ways, such as adding noise, changing lighting, or
applying artistic styles.
ImageNet-A
• Baseline: pre-trained MAE encoder ViT probing (no TTT)
• ImageNet-A is a dataset designed to test the robustness of computer vision
models against real-world, unmodified images.
• The dataset includes visually similar images to those in ImageNet but with
added challenges such as occlusion, low resolution, and unusual viewpoints.
/ 24
4. Conclusion
2. Methods
4. Conclusions 24
• Main contribution
• The proposal of a new method - TTT-MAE for addressing the problem of domain shift in visual
recognition tasks.
• TTT can be viewed alternatively as one-sample unsupervised domain adaptation (UDA)
• Limitations & future works
• Slower at test time than the baseline applying a fixed model (Inference speed has not been the focus
of this paper), It might be improved through better hyper-parameters, optimizers, training techniques
and architectural designs.
• Studying the generalization of spatial autoencoding to other main tasks and test distributions beyond
object recognition and the benchmarks used in this study.
• Exploring test-time training on video streams in human-like environments, where self-supervised
learning can take advantage of past frames
Thank you.
/ 24

More Related Content

What's hot

Moving object detection
Moving object detectionMoving object detection
Moving object detection
Manav Mittal
 
State of transformers in Computer Vision
State of transformers in Computer VisionState of transformers in Computer Vision
State of transformers in Computer Vision
Deep Kayal
 
Anatomy of YOLO - v1
Anatomy of YOLO - v1Anatomy of YOLO - v1
Anatomy of YOLO - v1
Jihoon Song
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
Jinwon Lee
 
A Comparison of Loss Function on Deep Embedding
A Comparison of Loss Function on Deep EmbeddingA Comparison of Loss Function on Deep Embedding
A Comparison of Loss Function on Deep Embedding
Cenk Bircanoğlu
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
Jinwon Lee
 
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
Seiya Ito
 
Active Object Localization with Deep Reinforcement Learning
Active Object Localization with Deep Reinforcement LearningActive Object Localization with Deep Reinforcement Learning
Active Object Localization with Deep Reinforcement Learning
Universitat Politècnica de Catalunya
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
Antonio Rueda-Toicen
 
論文紹介:Multimodal Learning with Transformers: A Survey
論文紹介:Multimodal Learning with Transformers: A Survey論文紹介:Multimodal Learning with Transformers: A Survey
論文紹介:Multimodal Learning with Transformers: A Survey
Toru Tamaki
 
Recent Progress on Single-Image Super-Resolution
Recent Progress on Single-Image Super-ResolutionRecent Progress on Single-Image Super-Resolution
Recent Progress on Single-Image Super-Resolution
Hiroto Honda
 
Deep Learning and Texture Mapping
Deep Learning and Texture MappingDeep Learning and Texture Mapping
Deep Learning and Texture Mapping
Efe Kaptan
 
You Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionYou Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object Detection
DADAJONJURAKUZIEV
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
Manohar Mukku
 
Prediction of Exchange Rate Using Deep Neural Network
Prediction of Exchange Rate Using Deep Neural Network  Prediction of Exchange Rate Using Deep Neural Network
Prediction of Exchange Rate Using Deep Neural Network
Tomoki Hayashi
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
論文紹介:DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object D...
 論文紹介:DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object D... 論文紹介:DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object D...
論文紹介:DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object D...
Toru Tamaki
 
Image to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GANImage to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GAN
S.Shayan Daneshvar
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
Sangmin Woo
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with Transformers
Databricks
 

What's hot (20)

Moving object detection
Moving object detectionMoving object detection
Moving object detection
 
State of transformers in Computer Vision
State of transformers in Computer VisionState of transformers in Computer Vision
State of transformers in Computer Vision
 
Anatomy of YOLO - v1
Anatomy of YOLO - v1Anatomy of YOLO - v1
Anatomy of YOLO - v1
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
 
A Comparison of Loss Function on Deep Embedding
A Comparison of Loss Function on Deep EmbeddingA Comparison of Loss Function on Deep Embedding
A Comparison of Loss Function on Deep Embedding
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
 
Active Object Localization with Deep Reinforcement Learning
Active Object Localization with Deep Reinforcement LearningActive Object Localization with Deep Reinforcement Learning
Active Object Localization with Deep Reinforcement Learning
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
論文紹介:Multimodal Learning with Transformers: A Survey
論文紹介:Multimodal Learning with Transformers: A Survey論文紹介:Multimodal Learning with Transformers: A Survey
論文紹介:Multimodal Learning with Transformers: A Survey
 
Recent Progress on Single-Image Super-Resolution
Recent Progress on Single-Image Super-ResolutionRecent Progress on Single-Image Super-Resolution
Recent Progress on Single-Image Super-Resolution
 
Deep Learning and Texture Mapping
Deep Learning and Texture MappingDeep Learning and Texture Mapping
Deep Learning and Texture Mapping
 
You Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionYou Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object Detection
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
 
Prediction of Exchange Rate Using Deep Neural Network
Prediction of Exchange Rate Using Deep Neural Network  Prediction of Exchange Rate Using Deep Neural Network
Prediction of Exchange Rate Using Deep Neural Network
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
論文紹介:DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object D...
 論文紹介:DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object D... 論文紹介:DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object D...
論文紹介:DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object D...
 
Image to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GANImage to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GAN
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with Transformers
 

Similar to PR-433: Test-time Training with Masked Autoencoders

Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...
Dongmin Choi
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
Jinwon Lee
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
Edge AI and Vision Alliance
 
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGESA DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
PNandaSai
 
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
Edge AI and Vision Alliance
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
Nimrita Koul
 
nnUNet
nnUNetnnUNet
Traffic Sign Recognition System
Traffic Sign Recognition SystemTraffic Sign Recognition System
Traffic Sign Recognition System
IRJET Journal
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
multimediaeval
 
“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...
“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...
“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...
Edge AI and Vision Alliance
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Jinwon Lee
 
Remote Sensing Sattelite image Digital Image Analysis.pptx
Remote Sensing Sattelite image Digital Image Analysis.pptxRemote Sensing Sattelite image Digital Image Analysis.pptx
Remote Sensing Sattelite image Digital Image Analysis.pptx
habtamuawulachew1
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
Mehrnaz Faraz
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
milad abbasi
 
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Universitat Politècnica de Catalunya
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
Edge AI and Vision Alliance
 
What Makes Training Multi-modal Classification Networks Hard? ppt
What Makes Training Multi-modal Classification Networks Hard? pptWhat Makes Training Multi-modal Classification Networks Hard? ppt
What Makes Training Multi-modal Classification Networks Hard? ppt
taeseon ryu
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
SaadMemon23
 
IRJET- Efficient Face Detection from Video Sequences using KNN and PCA
IRJET-  	  Efficient Face Detection from Video Sequences using KNN and PCAIRJET-  	  Efficient Face Detection from Video Sequences using KNN and PCA
IRJET- Efficient Face Detection from Video Sequences using KNN and PCA
IRJET Journal
 
PR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterPR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But Faster
Sunghoon Joo
 

Similar to PR-433: Test-time Training with Masked Autoencoders (20)

Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
 
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGESA DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
 
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
 
nnUNet
nnUNetnnUNet
nnUNet
 
Traffic Sign Recognition System
Traffic Sign Recognition SystemTraffic Sign Recognition System
Traffic Sign Recognition System
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
 
“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...
“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...
“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
Remote Sensing Sattelite image Digital Image Analysis.pptx
Remote Sensing Sattelite image Digital Image Analysis.pptxRemote Sensing Sattelite image Digital Image Analysis.pptx
Remote Sensing Sattelite image Digital Image Analysis.pptx
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
What Makes Training Multi-modal Classification Networks Hard? ppt
What Makes Training Multi-modal Classification Networks Hard? pptWhat Makes Training Multi-modal Classification Networks Hard? ppt
What Makes Training Multi-modal Classification Networks Hard? ppt
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
IRJET- Efficient Face Detection from Video Sequences using KNN and PCA
IRJET-  	  Efficient Face Detection from Video Sequences using KNN and PCAIRJET-  	  Efficient Face Detection from Video Sequences using KNN and PCA
IRJET- Efficient Face Detection from Video Sequences using KNN and PCA
 
PR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterPR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But Faster
 

More from Sunghoon Joo

PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdf
Sunghoon Joo
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
Sunghoon Joo
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed Recognition
Sunghoon Joo
 
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
Sunghoon Joo
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
Sunghoon Joo
 
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningPR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
Sunghoon Joo
 
PR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningPR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learning
Sunghoon Joo
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
Sunghoon Joo
 
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
Sunghoon Joo
 
PR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingPR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document reranking
Sunghoon Joo
 
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
Sunghoon Joo
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
Sunghoon Joo
 
PR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesPR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseases
Sunghoon Joo
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
Sunghoon Joo
 
PR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchPR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture Search
Sunghoon Joo
 
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of Samples
Sunghoon Joo
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
Sunghoon Joo
 
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
Sunghoon Joo
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
Sunghoon Joo
 

More from Sunghoon Joo (19)

PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdf
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed Recognition
 
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
 
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningPR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
 
PR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningPR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learning
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
 
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
 
PR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingPR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document reranking
 
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
 
PR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesPR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseases
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
 
PR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchPR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture Search
 
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of Samples
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
 
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
 

Recently uploaded

Best Practices for Password Rotation and Tools to Streamline the Process
Best Practices for Password Rotation and Tools to Streamline the ProcessBest Practices for Password Rotation and Tools to Streamline the Process
Best Practices for Password Rotation and Tools to Streamline the Process
Bert Blevins
 
PCA-solved problems.pptx helpful for ml .
PCA-solved problems.pptx helpful for ml .PCA-solved problems.pptx helpful for ml .
PCA-solved problems.pptx helpful for ml .
Sravani477269
 
一比一原版(cuw文凭证书)美国威斯康星康考迪亚大学毕业证如何办理
一比一原版(cuw文凭证书)美国威斯康星康考迪亚大学毕业证如何办理一比一原版(cuw文凭证书)美国威斯康星康考迪亚大学毕业证如何办理
一比一原版(cuw文凭证书)美国威斯康星康考迪亚大学毕业证如何办理
uayma
 
Analysis and Design of Algorithm Lab Manual (BCSL404)
Analysis and Design of Algorithm Lab Manual (BCSL404)Analysis and Design of Algorithm Lab Manual (BCSL404)
Analysis and Design of Algorithm Lab Manual (BCSL404)
VishalMore197390
 
L-3536-Cost Benifit Analysis in ESIA.pptx
L-3536-Cost Benifit Analysis in ESIA.pptxL-3536-Cost Benifit Analysis in ESIA.pptx
L-3536-Cost Benifit Analysis in ESIA.pptx
naseki5964
 
this slide shows husien hanafy portfolio 6-2024
this slide shows husien hanafy portfolio 6-2024this slide shows husien hanafy portfolio 6-2024
this slide shows husien hanafy portfolio 6-2024
hessenhanafy1
 
@Call @Girls Rajkot 0000000000 Priya Sharma Beautiful And Cute Girl any Time
@Call @Girls Rajkot  0000000000 Priya Sharma Beautiful And Cute Girl any Time@Call @Girls Rajkot  0000000000 Priya Sharma Beautiful And Cute Girl any Time
@Call @Girls Rajkot 0000000000 Priya Sharma Beautiful And Cute Girl any Time
mishratanu639
 
How to Manage Internal Notes in Odoo 17 POS
How to Manage Internal Notes in Odoo 17 POSHow to Manage Internal Notes in Odoo 17 POS
How to Manage Internal Notes in Odoo 17 POS
Celine George
 
Citrix Workspace - Diagrams and Icons.pptx
Citrix Workspace - Diagrams and Icons.pptxCitrix Workspace - Diagrams and Icons.pptx
Citrix Workspace - Diagrams and Icons.pptx
kriangkb1
 
Literature Reivew of Student Center Design
Literature Reivew of Student Center DesignLiterature Reivew of Student Center Design
Literature Reivew of Student Center Design
PriyankaKarn3
 
DESIGN OF BEARINGS ANJANEYULU bridge bearing
DESIGN OF BEARINGS ANJANEYULU bridge bearingDESIGN OF BEARINGS ANJANEYULU bridge bearing
DESIGN OF BEARINGS ANJANEYULU bridge bearing
GokulKannan194051
 
Vernier Caliper and How to use Vernier Caliper.ppsx
Vernier Caliper and How to use Vernier Caliper.ppsxVernier Caliper and How to use Vernier Caliper.ppsx
Vernier Caliper and How to use Vernier Caliper.ppsx
Tool and Die Tech
 
UNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-ID
UNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-IDUNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-ID
UNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-ID
GOWSIKRAJA PALANISAMY
 
FD FAN.pdf forced draft fan for boiler operation and run its very important f...
FD FAN.pdf forced draft fan for boiler operation and run its very important f...FD FAN.pdf forced draft fan for boiler operation and run its very important f...
FD FAN.pdf forced draft fan for boiler operation and run its very important f...
MDHabiburRhaman1
 
13 tricks to get the most out of the S Pen
13 tricks to get the most out of the S Pen13 tricks to get the most out of the S Pen
13 tricks to get the most out of the S Pen
aashuverma204
 
Net Zero Case Study: SRK House and SRK Empire
Net Zero Case Study: SRK House and SRK EmpireNet Zero Case Study: SRK House and SRK Empire
Net Zero Case Study: SRK House and SRK Empire
Global Network for Zero
 
一比一原版(UQ毕业证书)昆士兰大学毕业证如何办理
一比一原版(UQ毕业证书)昆士兰大学毕业证如何办理一比一原版(UQ毕业证书)昆士兰大学毕业证如何办理
一比一原版(UQ毕业证书)昆士兰大学毕业证如何办理
byyi0h
 
Destructive_Testing_overview_1717035747.pdf
Destructive_Testing_overview_1717035747.pdfDestructive_Testing_overview_1717035747.pdf
Destructive_Testing_overview_1717035747.pdf
SANDIPCHAVAN80
 
GUIA_LEGAL_CHAPTER_4_FOREIGN TRADE CUSTOMS.pdf
GUIA_LEGAL_CHAPTER_4_FOREIGN TRADE CUSTOMS.pdfGUIA_LEGAL_CHAPTER_4_FOREIGN TRADE CUSTOMS.pdf
GUIA_LEGAL_CHAPTER_4_FOREIGN TRADE CUSTOMS.pdf
ProexportColombia1
 

Recently uploaded (20)

Best Practices for Password Rotation and Tools to Streamline the Process
Best Practices for Password Rotation and Tools to Streamline the ProcessBest Practices for Password Rotation and Tools to Streamline the Process
Best Practices for Password Rotation and Tools to Streamline the Process
 
PCA-solved problems.pptx helpful for ml .
PCA-solved problems.pptx helpful for ml .PCA-solved problems.pptx helpful for ml .
PCA-solved problems.pptx helpful for ml .
 
一比一原版(cuw文凭证书)美国威斯康星康考迪亚大学毕业证如何办理
一比一原版(cuw文凭证书)美国威斯康星康考迪亚大学毕业证如何办理一比一原版(cuw文凭证书)美国威斯康星康考迪亚大学毕业证如何办理
一比一原版(cuw文凭证书)美国威斯康星康考迪亚大学毕业证如何办理
 
Analysis and Design of Algorithm Lab Manual (BCSL404)
Analysis and Design of Algorithm Lab Manual (BCSL404)Analysis and Design of Algorithm Lab Manual (BCSL404)
Analysis and Design of Algorithm Lab Manual (BCSL404)
 
L-3536-Cost Benifit Analysis in ESIA.pptx
L-3536-Cost Benifit Analysis in ESIA.pptxL-3536-Cost Benifit Analysis in ESIA.pptx
L-3536-Cost Benifit Analysis in ESIA.pptx
 
this slide shows husien hanafy portfolio 6-2024
this slide shows husien hanafy portfolio 6-2024this slide shows husien hanafy portfolio 6-2024
this slide shows husien hanafy portfolio 6-2024
 
@Call @Girls Rajkot 0000000000 Priya Sharma Beautiful And Cute Girl any Time
@Call @Girls Rajkot  0000000000 Priya Sharma Beautiful And Cute Girl any Time@Call @Girls Rajkot  0000000000 Priya Sharma Beautiful And Cute Girl any Time
@Call @Girls Rajkot 0000000000 Priya Sharma Beautiful And Cute Girl any Time
 
How to Manage Internal Notes in Odoo 17 POS
How to Manage Internal Notes in Odoo 17 POSHow to Manage Internal Notes in Odoo 17 POS
How to Manage Internal Notes in Odoo 17 POS
 
Citrix Workspace - Diagrams and Icons.pptx
Citrix Workspace - Diagrams and Icons.pptxCitrix Workspace - Diagrams and Icons.pptx
Citrix Workspace - Diagrams and Icons.pptx
 
Literature Reivew of Student Center Design
Literature Reivew of Student Center DesignLiterature Reivew of Student Center Design
Literature Reivew of Student Center Design
 
catalyst-1200-1300-series-switchesbdm.pptx
catalyst-1200-1300-series-switchesbdm.pptxcatalyst-1200-1300-series-switchesbdm.pptx
catalyst-1200-1300-series-switchesbdm.pptx
 
DESIGN OF BEARINGS ANJANEYULU bridge bearing
DESIGN OF BEARINGS ANJANEYULU bridge bearingDESIGN OF BEARINGS ANJANEYULU bridge bearing
DESIGN OF BEARINGS ANJANEYULU bridge bearing
 
Vernier Caliper and How to use Vernier Caliper.ppsx
Vernier Caliper and How to use Vernier Caliper.ppsxVernier Caliper and How to use Vernier Caliper.ppsx
Vernier Caliper and How to use Vernier Caliper.ppsx
 
UNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-ID
UNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-IDUNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-ID
UNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-ID
 
FD FAN.pdf forced draft fan for boiler operation and run its very important f...
FD FAN.pdf forced draft fan for boiler operation and run its very important f...FD FAN.pdf forced draft fan for boiler operation and run its very important f...
FD FAN.pdf forced draft fan for boiler operation and run its very important f...
 
13 tricks to get the most out of the S Pen
13 tricks to get the most out of the S Pen13 tricks to get the most out of the S Pen
13 tricks to get the most out of the S Pen
 
Net Zero Case Study: SRK House and SRK Empire
Net Zero Case Study: SRK House and SRK EmpireNet Zero Case Study: SRK House and SRK Empire
Net Zero Case Study: SRK House and SRK Empire
 
一比一原版(UQ毕业证书)昆士兰大学毕业证如何办理
一比一原版(UQ毕业证书)昆士兰大学毕业证如何办理一比一原版(UQ毕业证书)昆士兰大学毕业证如何办理
一比一原版(UQ毕业证书)昆士兰大学毕业证如何办理
 
Destructive_Testing_overview_1717035747.pdf
Destructive_Testing_overview_1717035747.pdfDestructive_Testing_overview_1717035747.pdf
Destructive_Testing_overview_1717035747.pdf
 
GUIA_LEGAL_CHAPTER_4_FOREIGN TRADE CUSTOMS.pdf
GUIA_LEGAL_CHAPTER_4_FOREIGN TRADE CUSTOMS.pdfGUIA_LEGAL_CHAPTER_4_FOREIGN TRADE CUSTOMS.pdf
GUIA_LEGAL_CHAPTER_4_FOREIGN TRADE CUSTOMS.pdf
 

PR-433: Test-time Training with Masked Autoencoders

  • 1. PR-433 Gandelsman, Yossi, et al. "Test-time training with masked autoencoders." Advances in Neural Information Processing Systems 35 (2022): 29374-29385. 주성훈, VUNO Inc. 2023. 4. 16.
  • 3. 2. Methods 1. Research Background 3 Reference Sun, Yu, et al. "Test-time training with self-supervision for generalization under distribution shifts." International conference on machine learning. PMLR, 2020. •https://yueatsprograms.github.io/ttt/home.html / 24
  • 4. 2. Methods 1. Research Background 4 https://yueatsprograms.github.io/ttt/home.html / 24
  • 5. 2. Methods 1. Research Background 5 Problem Settings Generalization under distribution shifts •Generalization is intrinsically hard without access to training data from the test distribution •The common practice is to avoid distribution shifts altogether by using a wider training distribution that hopefully contains the test distribution – with more training data or data augmentation. Geirhos, Robert, et al. "Generalisation in humans and deep neural networks." Advances in neural information processing systems 31 (2018). salt-and-pepper noise uniform noise uniform noise uniform noise Hard to know the test distribution! / 24
  • 6. 2. Methods 1. Research Background 6 Test time training (Sun et al., ICML, 2020) / 24
  • 7. 2. Methods 1. Research Background 7 Test time training (Sun et al., ICML, 2020) •The self-supervised pretext task employed by TTT is rotation prediction This task is limited in generality, because it can often be too easy or too hard. https://yueatsprograms.github.io/ttt/home.html / 24
  • 8. 2. Methods 1. Research Background 8 Autoencoders for representation learning The most successful work is masked autoencoders (MAE) •He, Kaiming, et al. "Masked autoencoders are scalable vision learners." CVPR. 2022. •PR-355 Proposed method simply substitutes MAE for the self-supervised part of TTT / 24
  • 10. 2. Methods 2. Methods 10 Design choices - Architecture •Y-shaped (original TTT paper) https://yueatsprograms.github.io/ttt/home.html / 24
  • 11. 2. Methods 2. Methods 11 Design choices - Architecture h •Main task (e.g. object recognition) f •MAE encoder: ViT g •MAE decoder : ViT •ViT-Base (for ViT probing) •Y-shaped (TTT-MAE) / 24
  • 12. 2. Methods 2. Methods 12 Training-time training: 1. training encoder and decoder f g •MAE encoder, deocder: ViT-large, pre-trained for 800 epochs on ImageNet-1k • ViT probing: train only, with frozen. Here, is a ViT-Base. h f h / 24
  • 13. 2. Methods 2. Methods 13 Training-time training: 2. training main task head f •MAE encoder: ViT-Large • pre-trained for ImageNet-1k reconstruction • cross entropy loss for classification • encoder produced by MAE pre-training •Augmentation: image cropping and horizontal flips •No other augmentations (random changes in brightness, contrast, color and sharpness ) •800 epochs lm : f0 : Training set with samples n h Main task head xi yi / 24
  • 14. 2. Methods 2. Methods 14 Test-time training g0 Test input arrives, x •self-supervised reconstruction loss (pixel-wise mean squared error), •random mask (75%) •SGD, for 20 steps, using a momentum of 0.9, weight decay of 0.2, batch size of 128, and fixed learning rate of 5e-3. ls Make a prediction on as x h ∘ fx(x) f0 fx h Bir Reset the weights to and for the next test input f0 g0 x •By test-time training on the test inputs independently, we do not assume that they come from the same distribution. / 24
  • 15. 2. Methods 2. Methods 15 Optimizer for TTT Figure 2: We experiment with two optimizers for TTT. MAE [19] uses AdamW for pre-training. But our results (left) show that AdamW for TTT requires early stopping, which is unrealistic for generalization to unknown distributions without a validation set. We instead use SGD, which keeps improving performance even after 20 steps (right). •it simply takes the same optimizer setting as during the last epoch of training-time training of the self-supervised task. (Original TTT) •the learning rate schedule of MAE reaches zero by the end of pre-training. •When Test-Time Training (TTT), excessive iterations with AdamW can negatively impact performance. •more iterations with SGD consistently improve performance on all distribution shifts / 24
  • 17. 2. Methods 3. Experimental Results 17 Calibration on out of distribution data •15 types of corruption to the images of ImageNet-C, 5 levels of severity • D. Hendrycks and T. Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. ICLR, 2018 / 24
  • 18. 2. Methods 3. Experimental Results 18 Main results on ImageNet-C TTT-MAE has higher performance gains in all corruptions than TTT-Rot, on top of their respective baselines. • Joint Train: ResNet-16-layers, after joint training for rotation prediction and object recognition (baseline for TTT-Rot) • TTT-Rot: original paper (rotation task, resnet-18) • Baseline: pre-trained MAE encoder ViT probing (no TTT) • TTT-MAE (red) on top of our baseline significantly improves performance. / 24
  • 19. 2. Methods 3. Experimental Results 19 TTT-MAE in rotation invariant classes • Rotation invariant class: images are usually taken from top-down views TTT-MAE is agnostic to rotation invariance and still helps on these classes. / 24
  • 20. 2. Methods 3. Experimental Results 20 Design choices - Training setup 1. Fine-tuning; train ◦ end-to-end. This works poorly with TTT 2. ViT probing: train only, with frozen. Here, is a ViT-Base. 3. Joint training: train both ◦ and ◦ , by summing their losses together. This is used by TTT with rotation prediction. But with MAE, it performs worse on the ImageNet validation set h f h f h h f g f h f g Object classification / 24
  • 21. 2. Methods 3. Experimental Results 21 Accuracy comparison of three designs (ViT probing, fine-tuning, joint training) •The first three rows are only for training-time training, after which a fixed model is applied during testing. •Joint training does not achieve satisfactory performance on most corruptions •Fine-tuning: initially performs better than ViT probing, it is not amenable to TTT •TTT-MAE: TTT-MAE after ViT probing, which performs the best across all corruption types / 24
  • 22. 2. Methods 3. Experimental Results 22 Performance on other ImageNet variants ImageNet-R • ImageNet-R is a benchmark dataset for evaluating robustness of image classification • The dataset includes images that are synthetically generated from the original ImageNet images in a variety of ways, such as adding noise, changing lighting, or applying artistic styles. ImageNet-A • Baseline: pre-trained MAE encoder ViT probing (no TTT) • ImageNet-A is a dataset designed to test the robustness of computer vision models against real-world, unmodified images. • The dataset includes visually similar images to those in ImageNet but with added challenges such as occlusion, low resolution, and unusual viewpoints. / 24
  • 24. 2. Methods 4. Conclusions 24 • Main contribution • The proposal of a new method - TTT-MAE for addressing the problem of domain shift in visual recognition tasks. • TTT can be viewed alternatively as one-sample unsupervised domain adaptation (UDA) • Limitations & future works • Slower at test time than the baseline applying a fixed model (Inference speed has not been the focus of this paper), It might be improved through better hyper-parameters, optimizers, training techniques and architectural designs. • Studying the generalization of spatial autoencoding to other main tasks and test distributions beyond object recognition and the benchmarks used in this study. • Exploring test-time training on video streams in human-like environments, where self-supervised learning can take advantage of past frames Thank you. / 24