The document provides an introduction to diffusion models. It discusses that diffusion models have achieved state-of-the-art performance in image generation, density estimation, and image editing. Specifically, it covers the Denoising Diffusion Probabilistic Model (DDPM) which reparametrizes the reverse distributions of diffusion models to be more efficient. It also discusses the Denoising Diffusion Implicit Model (DDIM) which generates rough sketches of images and then refines them, significantly reducing the number of sampling steps needed compared to DDPM. In summary, diffusion models have emerged as a highly effective approach for generative modeling tasks.
A presentation about the development of the ideas from the autoencoder to the Stable Diffusion text-to-image model.
Models covered: autoencoder, VAE, VQ-VAE, VQ-GAN, latent diffusion, and stable diffusion.
Generative Adversarial Networks (GANs) are a type of deep learning model used for unsupervised machine learning tasks like image generation. GANs work by having two neural networks, a generator and discriminator, compete against each other. The generator creates synthetic images and the discriminator tries to distinguish real images from fake ones. This allows the generator to improve over time at creating more realistic images that can fool the discriminator. The document discusses the intuition behind GANs, provides a PyTorch implementation example, and describes variants like DCGAN, LSGAN, and semi-supervised GANs.
This document summarizes key concepts in diffusion models and their applications in generative AI systems. It discusses early diffusion models from Sohl-Dickstein and later improvements from DDPM. It also covers recent large diffusion models like GLIDE and DALL-E 2 that can generate images from text prompts. The document provides technical details on diffusion processes, loss functions, and model architectures.
basics of GAN neural network
GAN is a advanced tech in area of neural networks which will help to generate new data . This new data will be developed based over the past experiences and raw data.
This document discusses domain transfer and domain adaptation in deep learning. It begins with introductions to domain transfer, which learns a mapping between domains, and domain adaptation, which learns a mapping between domains with labels. It then covers several approaches for domain transfer, including neural style transfer, instance normalization, and GAN-based methods. It also discusses general approaches for domain adaptation such as source/target feature matching and target data augmentation.
Generative Adversarial Networks (GANs) use two neural networks, a generator and discriminator, that compete against each other. The generator learns to generate fake images that look real, while the discriminator learns to tell real images apart from fakes. This document discusses various GAN architectures and applications, including conditional GANs, image-to-image translation, style transfer, semantic image editing, and data augmentation using GAN-generated images. It also covers evaluation metrics for GANs and societal impacts such as bias and deepfakes.
Transformer Architectures in Vision
[2018 ICML] Image Transformer
[2019 CVPR] Video Action Transformer Network
[2020 ECCV] End-to-End Object Detection with Transformers
[2021 ICLR] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
This document is a slide presentation on recent advances in deep learning. It discusses self-supervised learning, which involves using unlabeled data to learn representations by predicting structural information within the data. The presentation covers pretext tasks, invariance-based approaches, and generation-based approaches for self-supervised learning in computer vision and natural language processing. It provides examples of specific self-supervised methods like predicting image rotations, clustering representations to generate pseudo-labels, and masked language modeling.
Tutorial on Theory and Application of Generative Adversarial NetworksMLReview
Description
Generative adversarial network (GAN) has recently emerged as a promising generative modeling approach. It consists of a generative network and a discriminative network. Through the competition between the two networks, it learns to model the data distribution. In addition to modeling the image/video distribution in computer vision problems, the framework finds use in defining visual concept using examples. To a large extent, it eliminates the need of hand-crafting objective functions for various computer vision problems. In this tutorial, we will present an overview of generative adversarial network research. We will cover several recent theoretical studies as well as training techniques and will also cover several vision applications of generative adversarial networks.
It’s long ago, approx. 30 years, since AI was not only a topic for Science-Fiction writers, but also a major research field surrounded with huge hopes and investments. But the over-inflated expectations ended in a subsequent crash and followed by a period of absent funding and interest – the so-called AI winter. However, the last 3 years changed everything – again. Deep learning, a machine learning technique inspired by the human brain, successfully crushed one benchmark after another and tech companies, like Google, Facebook and Microsoft, started to invest billions in AI research. “The pace of progress in artificial general intelligence is incredible fast” (Elon Musk – CEO Tesla & SpaceX) leading to an AI that “would be either the best or the worst thing ever to happen to humanity” (Stephen Hawking – Physicist).
What sparked this new Hype? How is Deep Learning different from previous approaches? Are the advancing AI technologies really a threat for humanity? Let’s look behind the curtain and unravel the reality. This talk will explore why Sundar Pichai (CEO Google) recently announced that “machine learning is a core transformative way by which Google is rethinking everything they are doing” and explain why "Deep Learning is probably one of the most exciting things that is happening in the computer industry” (Jen-Hsun Huang – CEO NVIDIA).
Either a new AI “winter is coming” (Ned Stark – House Stark) or this new wave of innovation might turn out as the “last invention humans ever need to make” (Nick Bostrom – AI Philosoph). Or maybe it’s just another great technology helping humans to achieve more.
GANs are the new hottest topic in the ML arena; however, they present a challenge for the researchers and the engineers alike. Their design, and most importantly, the code implementation has been causing headaches to the ML practitioners, especially when moving to production.
Starting from the very basic of what a GAN is, passing trough Tensorflow implementation, using the most cutting-edge APIs available in the framework, and finally, production-ready serving at scale using Google Cloud ML Engine.
Slides for the talk: https://www.pycon.it/conference/talks/deep-diving-into-gans-form-theory-to-production
Github repo: https://github.com/zurutech/gans-from-theory-to-production
PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee
- Masked Autoencoders Are Scalable Vision Learners presents a new self-supervised learning method called Masked Autoencoder (MAE) for computer vision.
- MAE works by masking random patches of input images, encoding the visible patches, and decoding to reconstruct the full image. This forces the model to learn visual representations from incomplete views of images.
- Experiments on ImageNet show that MAE achieves superior results compared to supervised pre-training from scratch as well as other self-supervised methods, scaling effectively to larger models. MAE representations also transfer well to downstream tasks like object detection, instance segmentation and semantic segmentation.
This document provides an agenda for a presentation on deep learning, neural networks, convolutional neural networks, and interesting applications. The presentation will include introductions to deep learning and how it differs from traditional machine learning by learning feature representations from data. It will cover the history of neural networks and breakthroughs that enabled training of deeper models. Convolutional neural network architectures will be overviewed, including convolutional, pooling, and dense layers. Applications like recommendation systems, natural language processing, and computer vision will also be discussed. There will be a question and answer section.
Generative Adversarial Networks (GANs) are a type of deep learning algorithm that use two neural networks - a generator and discriminator. The generator produces new data samples and the discriminator tries to determine whether samples are real or generated. The networks train simultaneously, with the generator trying to produce realistic samples and the discriminator accurately classifying samples. GANs can generate high-quality, realistic data and have applications such as image synthesis, but training can be unstable and outputs may be biased.
Synthetic data generation for machine learningQuantUniversity
As machine learning becomes more pervasive in the industry, data scientists and quants are realizing the challenges and limitations of machine learning models. One of the primary reasons machine learning applications fail is due to the lack of rich, diverse and clean datasets needed to build models. Datasets may have missing values, may not incorporate enough samples for all use cases (for example: availability of fraudulent transaction records to train a model) and may not be easily sharable due to privacy concerns. While there are many data cleansing techniques to fix data-related issues and we can always try and get new and rich datasets, the cost is at times prohibitive and at times impractical leading many institutions to abandon machine learning and go back to rule-based methods.
Synthetic data sets and simulations are used to enrich and augment existing datasets to provide comprehensive samples while training machine learning problems. In addition, synthetic datasets can be used for comprehensive scenario analysis, missing value filling and privacy protection of the datasets when building models. The advent of novel techniques like Deep Learning has rekindled interest in using techniques like GANs and Encoder-Decoder architectures in financial synthetic data generation.
In this workshop, we will discuss the state of the art in Synthetic data generation and will illustrate the various techniques and methods that can be used in practice. Through examples using QuSynthesize & QuSandbox, we will demonstrate how these techniques can be realized in practice.
Transformer modality is an established architecture in natural language processing that utilizes a framework of self-attention with a deep learning approach.
This presentation was delivered under the mentorship of Mr. Mukunthan Tharmakulasingam (University of Surrey, UK), as a part of the ScholarX program from Sustainable Education Foundation.
This document discusses generative adversarial networks (GANs) and provides several summaries:
1. GANs use two neural networks, a generator and discriminator, that compete in a game theoretic framework to generate new data instances that match the training data distribution.
2. Training GANs involves training the generator to generate more realistic samples to fool the discriminator while training the discriminator to better distinguish real and generated samples.
3. Several strategies for training GANs are discussed, including varying the update rates of the generator and discriminator, using cooperative or random update strategies, and applying penalties for noisy samples.
Deep neural network with GANs pre- training for tuberculosis type classificat...Behzad Shomali
The following presentation summarizes the bachelor's thesis (final project) of Behzad Shomali at the Ferdowsi University of Mashhad (FUM). The full text can be found at https://bit.ly/3xt4vc0
Minor Project Report on Denoising Diffusion Probabilistic Modelsoxigoh238
Denoising Diffusion Probabilistic Model
Contrastive models like CLIP as a key inspiration.
Demonstrates robust image representations capturing both semantics and style.
Project Objectives:
Two-stage model proposed:
Prior generating a CLIP image embedding from a given text.
Decoder generating an image based on these CLIP image embeddings.
Score based Generative Modeling through Stochastic Differential EquationsSungchul Kim
This document discusses score-based generative modeling using stochastic differential equations (SDEs). It introduces modeling data diffusion as an SDE from the data distribution to a simple prior and generating samples by reversing this diffusion process. It also describes estimating the score (gradient of the log probability density) needed for the reverse process using score matching. Finally, it notes that noise perturbation models like NCSN and DDPM can be viewed as discretizations of specific SDEs called variance exploding and variance preserving SDEs.
Unsupervised generative methods have undergone a recent renaissance, spurred on in large part by impressive photo-realistic results in image applications. These generative methods seek to yield models that understand data by learning how to generate samples through implicit and explicit likelihood optimization. However, despite the surge in interest, these models are limited in several key aspects. First, although methods with an explicit likelihood are, in principle, able to perform additional tasks like anomaly detection and imputation, biases in the learned likelihood render these models useless for such important tasks. For example, recent work has shown that modern methods lead to high out-of-distribution likelihoods for data that is unlike seen training instances. Secondly, most current generative methods are limited to fixed-length vector or sequential data, leaving a substantial gap for the analysis of exchangeable data like sets and graphs. I.e., modern generative models excel at modeling dependencies among features in a point, but are lacking in modeling dependencies amongpoints in a collection. In this talk I discuss these shortcomings and suggest some possible avenues for improvement.
This document describes a bootstrap project analyzing population and sampling distributions using different bootstrap methods. It summarizes the general bootstrap method, bootstrap without replacement (BWO), and mirror-match approaches. Results show the bootstrap sampling distributions mimic the actual distributions and produce accurate estimates of statistics and variances. However, BWO and mirror-match had vastly greater processing times with no statistical advantage over the general bootstrap method for the stratified samples analyzed in this study.
Exploring Simple Siamese Representation LearningSungchul Kim
This document discusses an unsupervised representation learning method called SimSiam. It proposes that SimSiam can be interpreted as an expectation-maximization algorithm that alternates between updating the encoder parameters and assigning representations to images. Key aspects discussed include how the stop-gradient operation prevents collapsed representations, the role of the predictor network, effects of batch size and batch normalization, and alternatives to the cosine similarity measure. Empirical results show that SimSiam learns meaningful representations without collapsing, and the various design choices affect performance but not the ability to prevent collapsed representations.
The document discusses the 2k factorial design, which is a special case of the general factorial design with k factors at two levels. It provides examples of using 2k factorial designs to investigate how multiple factors affect a response. For an unreplicated 2k design with no replication, there are challenges in statistical testing due to having zero degrees of freedom for error. Various methods are discussed for analyzing the effects in an unreplicated 2k design, such as normal probability plotting, Lenth's method, and conditional inference charts. Transformation of the response may also be needed to meet assumptions of the model such as equal variance.
Generational Layered Canvas Mechanism for Collaborative Web Applicationskata shin
The document proposes a Generational Layered (GL) canvas mechanism to reduce unnecessary redraws in collaborative web applications using the HTML5 canvas. The GL canvas assigns drawings to dynamically divided layers based on their update frequency, as determined by an "age" parameter. Drawings are promoted to older layers if their age exceeds a threshold. An evaluation showed the GL canvas outperformed an earlier Drawing-Frequency based Layered canvas by automatically optimizing redraws without requiring developers to configure update frequencies.
Deep Implicit Layers: Learning Structured Problems with Neural NetworksSangwoo Mo
Deep implicit layers allow neural networks to solve structured problems by following algorithmic rules. They include layers for convex optimization, discrete optimization, differential equations, and more. The forward pass runs an algorithm, while the backward pass computes gradients using algorithmic properties like KKT conditions. This enables problems like structured prediction, meta-learning, and time series modeling to be solved reliably with neural networks by respecting their underlying structure.
This document proposes using hyperbolic space to embed hierarchical tree structures, like those that can represent sequences of events in reinforcement learning problems. Specifically, it suggests a method called S-RYM that applies spectral normalization to regularize gradients when training deep reinforcement learning agents with hyperbolic embeddings. This stabilization technique allows naive hyperbolic embeddings to outperform standard Euclidean embeddings. It works by reducing gradient norm explosions during training, allowing the entropy loss to converge properly. The document provides technical details on spectral normalization, hyperbolic space representations, and how S-RYM trains deep reinforcement learning agents with stabilized hyperbolic embeddings.
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...Sangwoo Mo
Lab seminar introduces Ting Chen's recent 3 works:
- Pix2seq: A Language Modeling Framework for Object Detection (ICLR’22)
- A Unified Sequence Interface for Vision Tasks (NeurIPS’22)
- A Generalist Framework for Panoptic Segmentation of Images and Videos (submitted to ICLR’23)
TensorFlow Korea 논문읽기모임 PR12 243째 논문 review입니다
이번 논문은 RegNet으로 알려진 Facebook AI Research의 Designing Network Design Spaces 입니다.
CNN을 디자인할 때, bottleneck layer는 정말 좋을까요? layer 수는 많을 수록 높은 성능을 낼까요? activation map의 width, height를 절반으로 줄일 때(stride 2 혹은 pooling), channel을 2배로 늘려주는데 이게 최선일까요? 혹시 bottleneck layer가 없는 게 더 좋지는 않은지, 최고 성능을 내는 layer 수에 magic number가 있는 건 아닐지, activation이 절반으로 줄어들 때 channel을 2배가 아니라 3배로 늘리는 게 더 좋은건 아닌지?
이 논문에서는 하나의 neural network을 잘 design하는 것이 아니라 Auto ML과 같은 기술로 좋은 neural network을 찾을 수 있는 즉 좋은 neural network들이 살고 있는 좋은 design space를 design하는 방법에 대해서 얘기하고 있습니다. constraint이 거의 없는 design space에서 human-in-the-loop을 통해 좋은 design space로 그 공간을 좁혀나가는 방법을 제안하였는데요, EfficientNet보다 더 좋은 성능을 보여주는 RegNet은 어떤 design space에서 탄생하였는지 그리고 그 과정에서 우리가 당연하게 여기고 있었던 design choice들이 잘못된 부분은 없었는지 아래 동영상에서 확인하실 수 있습니다~
영상링크: https://youtu.be/bnbKQRae_u4
논문링크: https://arxiv.org/abs/2003.13678
It Works well on images while you want to edit an image or to repair old images. it also has great results on occluded images and good to use on censorship purposes. Appropriate reconstruction is one of its features.
one of the main and effective purposes is to complete images which have been destroyed during a time on SSDs or during transferring data in a transmission line or during transferring data between two devices such as laptop or Cellphones
Hope you all enjoy and make it as a reference
High-performance graph analysis is unlocking knowledge in computer security, bioinformatics, social networks, and many other data integration areas. Graphs provide a convenient abstraction for many data problems beyond linear algebra. Some problems map directly to linear algebra. Others, like community detection, look eerily similar to sparse linear algebra techniques. And then there are algorithms that strongly resist attempts at making them look like linear algebra. This talk will cover recent results with an emphasis on streaming graph problems where the graph changes and results need updated with minimal latency. We’ll also touch on issues of sensitivity and reliability where graph analysis needs to learn from numerical analysis and linear algebra.
Multiple patterning is a class of technologies for manufacturing integrated circuits (ICs), developed for photolithography to enhance the feature density. The simplest case of multiple patterning is double patterning, where a conventional lithography process is enhanced to produce double the expected number of features. The resolution of a photoresist pattern is believed to blur at around 45 nm half-pitch. For the semiconductor industry, therefore, double patterning was introduced for the 32 nm half-pitch node and below. This presentation gives us an insight of why multiple patterning is an important to give us a better resolution below 32nm.
PR-305: Exploring Simple Siamese Representation LearningSungchul Kim
SimSiam is a self-supervised learning method that uses a Siamese network with stop-gradient to learn representations from unlabeled data. The paper finds that stop-gradient plays an essential role in preventing the model from collapsing to a degenerate solution. Additionally, it is hypothesized that SimSiam implicitly optimizes an Expectation-Maximization-like algorithm that alternates between updating the network parameters and assigning representations to samples in a manner analogous to k-means clustering.
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...WiMLDSMontreal
"Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy Data"
By Sergül Aydöre, Assistant Professor at Stevens Institute of Technology
Abstract:
The use of complex models –with many parameters– is challenging with high-dimensional small-sample
problems: indeed, they face rapid overfitting. Such situations are common when data collection is expensive,
as in neuroscience, biology, or geology. Dedicated regularization can be crafted to tame overfit, typically via
structured penalties. But rich penalties require mathematical expertise and entail large computational costs.
Stochastic regularizers such as dropout are easier to implement: they prevent overfitting by random perturbations.
Used inside a stochastic optimizer, they come with little additional cost. We propose a structured stochastic
regularization that relies on feature grouping. Using a fast clustering algorithm, we define a family of
groups of features that capture feature covariations. We then randomly select these groups inside a stochastic
gradient descent loop. This procedure acts as a structured regularizer for high-dimensional correlated data
without additional computational cost and it has a denoising effect. We demonstrate the performance of our
approach for logistic regression both on a sample-limited face image dataset with varying additive noise and on
a typical high-dimensional learning problem, brain image classification.
Towards Modularity in Live Visual Modeling: A case-study with OpenPonk and Ke...ESUG
This document discusses extending live visual modeling platforms like OpenPonk to support modular exploration of complex systems. It proposes a user interface with two views to display model definitions and simulation results simultaneously and update both views incrementally as definitions change. It also involves extending OpenPonk to support new diagram-friendly modular models with elements that can reference other models, and a "bridge" component to update dependent results when models change. The work aims to allow interactive modular composition and simulation of epidemiological models in Kendrick using this extended OpenPonk platform.
Brief History of Visual Representation LearningSangwoo Mo
The document summarizes the history of visual representation learning in 3 eras: (1) 2012-2015 saw the evolution of deep learning architectures like AlexNet and ResNet; (2) 2016-2019 brought diverse learning paradigms for tasks like few-shot learning and self-supervised learning; (3) 2020-present focuses on scaling laws and foundation models through larger models, data and compute as well as self-supervised methods like MAE and multimodal models like CLIP. The field is now exploring how to scale up vision transformers to match natural language models and better combine self-supervision and generative models.
Learning Visual Representations from Uncurated DataSangwoo Mo
Slide about the defense of my Ph.D. dissertation: "Learning Visual Representations from Uncurated Data"
It includes four papers about
- Learning from multi-object images for contrastive learning [1] and Vision Transformer (ViT) [2]
- Learning with limited labels (semi-sup) for image classification [3] and vision-language [4] models
[1] Mo*, Kang* et al. Object-aware Contrastive Learning for Debiased Scene Representation. NeurIPS’21.
[2] Kang*, Mo* et al. OAMixer: Object-aware Mixing Layer for Vision Transformers. CVPRW’22.
[3] Mo et al. RoPAWS: Robust Semi-supervised Representation Learning from Uncurated Data. ICLR’23.
[4] Mo et al. S-CLIP: Semi-supervised Vision-Language Pre-training using Few Specialist Captions. Under Review.
Deep Learning Theory Seminar (Chap 3, part 2)Sangwoo Mo
This document summarizes key points from a lecture on deep learning theory:
1) It discusses the Maurey sampling technique, which shows that a finite sample approximation X^ of a random variable X converges to X as the number of samples k goes to infinity.
2) It proposes extending this technique to sample finite-width neural networks by converting the weight distribution of an infinite network to a probability measure through normalization.
3) The approximation error between outputs of the infinite and finite networks is bounded using Maurey sampling, with the bound converging to zero as the number of samples increases.
Deep Learning Theory Seminar (Chap 1-2, part 1)Sangwoo Mo
1. The document discusses the approximation capabilities of deep neural networks. It outlines topics that will be covered, including approximation, optimization, and generalization.
2. For approximation, it shows that a neural network can approximate any smooth function over a compact domain to any desired accuracy by bounding the function norm. Specifically, it presents constructive proofs that a univariate function can be approximated by a 2-layer network and a multivariate function by a 3-layer network.
3. The chapter will prove approximation capabilities of finite-width neural networks, including constructive proofs for specific activations and universal approximation for general activations. It will discuss approximating indicators with ReLU activations.
1) The document discusses object-region video transformers (ORViT) for video recognition. ORViT applies attention at both the patch and object levels.
2) ORViT considers three aspects of objects: the objects themselves, interactions between objects, and object dynamics over time.
3) Experimental results show ORViT outperforms baseline models on action recognition, compositional action recognition, and spatio-temporal action detection tasks. ORViT better captures object-level information and dynamics compared to patch-level attention alone.
Learning Theory 101 ...and Towards Learning the Flat MinimaSangwoo Mo
The document discusses recent theories on why deep neural networks generalize well despite being highly overparameterized. Classic learning theory, which assumes restricting the hypothesis space is necessary for generalization, fails to explain modern neural networks. Recent studies suggest neural networks generalize because 1) their complexity is underestimated and 2) SGD regularization finds flat minima. Sharpness-aware minimization (SAM) directly optimizes for flat minima and consistently improves generalization, especially for vision transformers which have sharper loss landscapes than ResNets. SAM produces more interpretable attention maps and significantly boosts performance of vision transformers and MLP-Mixers on in-domain and out-of-domain tasks.
Lab seminar on
- Sharpness-Aware Minimization for Efficiently Improving Generalization (ICLR 2021)
- When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations (under review)
This document summarizes research on reducing the computational complexity of self-attention in Transformer models from O(L2) to O(L log L) or O(L). It describes the Reformer model which uses locality-sensitive hashing to achieve O(L log L) complexity, the Linformer model which uses low-rank approximations and random projections to achieve O(L) complexity, and the Synthesizer model which replaces self-attention with dense or random attention. It also briefly discusses the expressive power of sparse Transformer models.
This document summarizes two meta-learning papers:
1) "Meta-Learning with Implicit Gradients" which introduces Implicit Model-Agnostic Meta-Learning (iMAML), an efficient alternative to MAML that computes meta-gradients without differentiating through the inner loop.
2) "Modular Meta-Learning with Shrinkage" which proposes learning a separate set of parameters for each module with different levels of shrinkage, optimized in an alternating manner to avoid collapse.
Deep Learning for Natural Language ProcessingSangwoo Mo
This document summarizes a lecture on recent advances in deep learning for natural language processing. It discusses improvements to network architectures like attention mechanisms and self-attention, which help models learn long-term dependencies and attend to relevant parts of the input. It also discusses improved training methods to reduce exposure bias and the loss-evaluation mismatch. Newer models presented include the Transformer, which uses only self-attention, and BERT, which introduces a pretrained bidirectional transformer encoder that achieves state-of-the-art results on many NLP tasks.
Improved Trainings of Wasserstein GANs (WGAN-GP)Sangwoo Mo
This document summarizes improved training methods for Wasserstein GANs (WGANs). It begins with an overview of GANs and their limitations, such as gradient vanishing. It then introduces WGANs, which use the Wasserstein distance instead of Jensen-Shannon divergence to provide more meaningful gradients during training. However, weight clipping used in WGANs limits the function space and can cause optimization difficulties. The document proposes using gradient penalty instead of weight clipping to enforce a Lipschitz constraint. It also suggests sampling from an estimated optimal coupling rather than independently sampling real and generated samples to better match theory. Experimental results show the gradient penalty approach improves stability and performance of WGANs on image generation tasks.
Recursive neural networks (RNNs) were developed to model recursive structures like images, sentences, and phrases. RNNs construct feature representations recursively from components. Later models like recursive autoencoders (RAEs), matrix-vector RNNs (MV-RNNs), and recursive neural tensor networks (RNTNs) improved on RNNs by handling unlabeled data, incorporating different composition rules, and reducing parameters. These recursive models achieved strong performance on tasks like image segmentation, sentiment analysis, and paraphrase detection.
Emergence of Invariance and Disentangling in Deep RepresentationsSangwoo Mo
This document summarizes a paper on the emergence of invariance and disentangling in deep representations. The key points are:
1. The paper investigates the relationship between properties desired for representations such as sufficiency, invariance, and disentangling. It shows that under certain model assumptions, minimal sufficiency alone is sufficient to achieve invariance and disentangling.
2. The paper proposes using weight information as a measure of network complexity that can help explain generalization. Weight information is shown to be implicitly minimized by SGD.
3. The paper addresses claims that a new theory of generalization is needed for deep learning by showing weight information recovers the bias-variance tradeoff even when a
Reinforcement Learning with Deep Energy-Based PoliciesSangwoo Mo
This document discusses reinforcement learning with deep energy-based policies. It motivates using maximum entropy reinforcement learning to find policies that not only maximize reward but also explore possibilities. It presents an approach using energy-based models for the policy and soft Q-learning to find the optimal maximum entropy policy. The method uses neural networks to approximate the soft Q-function and a sampling network to draw samples from the policy. Experiments show maximum entropy policies provide better exploration, initialization, compositionality and robustness compared to deterministic policies.
Transcript: Details of description part II: Describing images in practice - T...BookNet Canada
This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator.
Link to presentation recording and slides: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/
Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.
What's Next Web Development Trends to Watch.pdfSeasiaInfotech2
Explore the latest advancements and upcoming innovations in web development with our guide to the trends shaping the future of digital experiences. Read our article today for more information.
Quality Patents: Patents That Stand the Test of TimeAurora Consulting
Is your patent a vanity piece of paper for your office wall? Or is it a reliable, defendable, assertable, property right? The difference is often quality.
Is your patent simply a transactional cost and a large pile of legal bills for your startup? Or is it a leverageable asset worthy of attracting precious investment dollars, worth its cost in multiples of valuation? The difference is often quality.
Is your patent application only good enough to get through the examination process? Or has it been crafted to stand the tests of time and varied audiences if you later need to assert that document against an infringer, find yourself litigating with it in an Article 3 Court at the hands of a judge and jury, God forbid, end up having to defend its validity at the PTAB, or even needing to use it to block pirated imports at the International Trade Commission? The difference is often quality.
Quality will be our focus for a good chunk of the remainder of this season. What goes into a quality patent, and where possible, how do you get it without breaking the bank?
** Episode Overview **
In this first episode of our quality series, Kristen Hansen and the panel discuss:
⦿ What do we mean when we say patent quality?
⦿ Why is patent quality important?
⦿ How to balance quality and budget
⦿ The importance of searching, continuations, and draftsperson domain expertise
⦿ Very practical tips, tricks, examples, and Kristen’s Musts for drafting quality applications
https://www.aurorapatents.com/patently-strategic-podcast.html
Are you interested in learning about creating an attractive website? Here it is! Take part in the challenge that will broaden your knowledge about creating cool websites! Don't miss this opportunity, only in "Redesign Challenge"!
Data Protection in a Connected World: Sovereignty and Cyber Securityanupriti
Delve into the critical intersection of data sovereignty and cyber security in this presentation. Explore unconventional cyber threat vectors and strategies to safeguard data integrity and sovereignty in an increasingly interconnected world. Gain insights into emerging threats and proactive defense measures essential for modern digital ecosystems.
Blockchain and Cyber Defense Strategies in new genre timesanupriti
Explore robust defense strategies at the intersection of blockchain technology and cybersecurity. This presentation delves into proactive measures and innovative approaches to safeguarding blockchain networks against evolving cyber threats. Discover how secure blockchain implementations can enhance resilience, protect data integrity, and ensure trust in digital transactions. Gain insights into cutting-edge security protocols and best practices essential for mitigating risks in the blockchain ecosystem.
The DealBook is our annual overview of the Ukrainian tech investment industry. This edition comprehensively covers the full year 2023 and the first deals of 2024.
Implementations of Fused Deposition Modeling in real worldEmerging Tech
The presentation showcases the diverse real-world applications of Fused Deposition Modeling (FDM) across multiple industries:
1. **Manufacturing**: FDM is utilized in manufacturing for rapid prototyping, creating custom tools and fixtures, and producing functional end-use parts. Companies leverage its cost-effectiveness and flexibility to streamline production processes.
2. **Medical**: In the medical field, FDM is used to create patient-specific anatomical models, surgical guides, and prosthetics. Its ability to produce precise and biocompatible parts supports advancements in personalized healthcare solutions.
3. **Education**: FDM plays a crucial role in education by enabling students to learn about design and engineering through hands-on 3D printing projects. It promotes innovation and practical skill development in STEM disciplines.
4. **Science**: Researchers use FDM to prototype equipment for scientific experiments, build custom laboratory tools, and create models for visualization and testing purposes. It facilitates rapid iteration and customization in scientific endeavors.
5. **Automotive**: Automotive manufacturers employ FDM for prototyping vehicle components, tooling for assembly lines, and customized parts. It speeds up the design validation process and enhances efficiency in automotive engineering.
6. **Consumer Electronics**: FDM is utilized in consumer electronics for designing and prototyping product enclosures, casings, and internal components. It enables rapid iteration and customization to meet evolving consumer demands.
7. **Robotics**: Robotics engineers leverage FDM to prototype robot parts, create lightweight and durable components, and customize robot designs for specific applications. It supports innovation and optimization in robotic systems.
8. **Aerospace**: In aerospace, FDM is used to manufacture lightweight parts, complex geometries, and prototypes of aircraft components. It contributes to cost reduction, faster production cycles, and weight savings in aerospace engineering.
9. **Architecture**: Architects utilize FDM for creating detailed architectural models, prototypes of building components, and intricate designs. It aids in visualizing concepts, testing structural integrity, and communicating design ideas effectively.
Each industry example demonstrates how FDM enhances innovation, accelerates product development, and addresses specific challenges through advanced manufacturing capabilities.
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Erasmo Purificato
Slide of the tutorial entitled "Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Emerging Trends" held at UMAP'24: 32nd ACM Conference on User Modeling, Adaptation and Personalization (July 1, 2024 | Cagliari, Italy)
An invited talk given by Mark Billinghurst on Research Directions for Cross Reality Interfaces. This was given on July 2nd 2024 as part of the 2024 Summer School on Cross Reality in Hagenberg, Austria (July 1st - 7th)
Interaction Latency: Square's User-Centric Mobile Performance MetricScyllaDB
Mobile performance metrics often take inspiration from the backend world and measure resource usage (CPU usage, memory usage, etc) and workload durations (how long a piece of code takes to run).
However, mobile apps are used by humans and the app performance directly impacts their experience, so we should primarily track user-centric mobile performance metrics. Following the lead of tech giants, the mobile industry at large is now adopting the tracking of app launch time and smoothness (jank during motion).
At Square, our customers spend most of their time in the app long after it's launched, and they don't scroll much, so app launch time and smoothness aren't critical metrics. What should we track instead?
This talk will introduce you to Interaction Latency, a user-centric mobile performance metric inspired from the Web Vital metric Interaction to Next Paint"" (web.dev/inp). We'll go over why apps need to track this, how to properly implement its tracking (it's tricky!), how to aggregate this metric and what thresholds you should target.
Coordinate Systems in FME 101 - Webinar SlidesSafe Software
If you’ve ever had to analyze a map or GPS data, chances are you’ve encountered and even worked with coordinate systems. As historical data continually updates through GPS, understanding coordinate systems is increasingly crucial. However, not everyone knows why they exist or how to effectively use them for data-driven insights.
During this webinar, you’ll learn exactly what coordinate systems are and how you can use FME to maintain and transform your data’s coordinate systems in an easy-to-digest way, accurately representing the geographical space that it exists within. During this webinar, you will have the chance to:
- Enhance Your Understanding: Gain a clear overview of what coordinate systems are and their value
- Learn Practical Applications: Why we need datams and projections, plus units between coordinate systems
- Maximize with FME: Understand how FME handles coordinate systems, including a brief summary of the 3 main reprojectors
- Custom Coordinate Systems: Learn how to work with FME and coordinate systems beyond what is natively supported
- Look Ahead: Gain insights into where FME is headed with coordinate systems in the future
Don’t miss the opportunity to improve the value you receive from your coordinate system data, ultimately allowing you to streamline your data analysis and maximize your time. See you there!
How RPA Help in the Transportation and Logistics Industry.pptxSynapseIndia
Revolutionize your transportation processes with our cutting-edge RPA software. Automate repetitive tasks, reduce costs, and enhance efficiency in the logistics sector with our advanced solutions.
What Not to Document and Why_ (North Bay Python 2024)Margaret Fero
We’re hopefully all on board with writing documentation for our projects. However, especially with the rise of supply-chain attacks, there are some aspects of our projects that we really shouldn’t document, and should instead remediate as vulnerabilities. If we do document these aspects of a project, it may help someone compromise the project itself or our users. In this talk, you will learn why some aspects of documentation may help attackers more than users, how to recognize those aspects in your own projects, and what to do when you encounter such an issue.
These are slides as presented at North Bay Python 2024, with one minor modification to add the URL of a tweet screenshotted in the presentation.
2. • Diffusion model is SOTA on image generation
• Beat BigGAN and StyleGAN on high-resolution images
Diffusion Model Boom!
2
Dhariwal & Nichol. Diffusion Models Beat GANs on Image Synthesis. NeurIPS’21
3. • Diffusion model is SOTA on density estimation
• Beat autoregressive models on likelihood score
Diffusion Model Boom!
3
Song et al. Maximum Likelihood Training of Score-Based Diffusion Models. NeurIPS’21
Kingma et al. Variational Diffusion Models. NeurIPS’21
4. • Diffusion model is useful for image editing
• Editing = Rough scribble + diffusion (i.e., naturalization)
• Scribbled images are unseen for GANs, but diffusion models still can denoise them
Diffusion Model Boom!
4
Meng et al. SDEdit: Image Synthesis and Editing with Stochastic Differential Equations. arXiv’21
5. • Diffusion model is useful for image editing
• Also can be combined with vision-and-language model
Diffusion Model Boom!
5
Nichol et al. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. arXiv’21
6. • Diffusion model is also effective for non-visual domains
• Continuous domains like speech, and even for discrete domains like text
Diffusion Model Boom!
6
Kong et al. DiffWave: A Versatile Diffusion Model for Audio Synthesis. ICLR’21
Austin et al. Structured Denoising Diffusion Models in Discrete State-Spaces. NeurIPS’21
7. • Trilemma of generative models: Quality vs. Diversity vs. Speed
• Diffusion model produces diverse and high-quality samples, but generations is slow
Diffusion Model is All We Need?
7
Xiao et al. Tackling the Generative Learning Trilemma with Denoising Diffusion GANs. arXiv’21
8. • Today’s content
• Diffusion Probabilistic Model – ICML’15
• Denoising Diffusion Probabilistic Model (DDPM) – NeurIPS’20
• Improve quality & diversity of diffusion model
• Denoising Diffusion Implicit Model (DDIM) – ICLR’21
• Improve generation speed of diffusion model
• Not covering
• Relation of diffusion model and score matching
• Extension to stochastic differential equation
• There are lots of new interesting works (see NeurIPS’21, ICLR’22)
Outline
8
Score SDE: Song et al. Score-Based Generative Modeling through Stochastic Differential Equations. ICLR’21
→ See Score SDE (ICLR’21)
9. • Diffusion model aims to learn the reverse of noise generation procedure
• Forward step: (Iteratively) Add noise to the original sample
→ The sample 𝑥! converges to the complete noise 𝑥" (e.g., ∼ 𝒩(0, 𝐼))
Diffusion Probabilistic Model
9
Sohl-Dickstein et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. ICML’15
Forward (diffusion) process
10. • Diffusion model aims to learn the reverse of noise generation procedure
• Forward step: (Iteratively) Add noise to the original sample
→ The sample 𝑥! converges to the complete noise 𝑥" (e.g., ∼ 𝒩(0, 𝐼))
• Reverse step: Recover the original sample from the noise
→ Note that it is the “generation” procedure
Diffusion Probabilistic Model
10
Sohl-Dickstein et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. ICML’15
Reverse process
Forward (diffusion) process
11. • Diffusion model aims to learn the reverse of noise generation procedure
• Forward step: (Iteratively) Add noise to the original sample
→ Technically, it is a product of conditional noise distributions 𝑞(𝐱#|𝐱#$%)
• Usually, the parameters 𝛽# are fixed (one can jointly learn, but not beneficial)
• Noise annealing (i.e., reducing noise scale 𝛽# < 𝛽#$%) is crucial to the performance
Diffusion Probabilistic Model
11
Sohl-Dickstein et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. ICML’15
12. • Diffusion model aims to learn the reverse of noise generation procedure
• Forward step: (Iteratively) Add noise to the original sample
→ Technically, it is a product of conditional noise distributions 𝑞(𝐱#|𝐱#$%)
• Reverse step: Recover the original sample from the noise
→ It is also a product of conditional (de)noise distributions 𝑝&(𝐱#'%|𝐱#)
• Use the learned parameters: denoiser 𝝁& (main part) and randomness 𝚺&
Diffusion Probabilistic Model
12
Sohl-Dickstein et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. ICML’15
13. • Diffusion model aims to learn the reverse of noise generation procedure
• Forward step: (Iteratively) Add noise to the original sample
Reverse step: Recover the original sample from the noise
• Training: Minimize variational lower bound of the model 𝑝&(𝐱!)
Diffusion Probabilistic Model
13
Sohl-Dickstein et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. ICML’15
14. • Diffusion model aims to learn the reverse of noise generation procedure
• Forward step: (Iteratively) Add noise to the original sample
Reverse step: Recover the original sample from the noise
• Training: Minimize variational lower bound of the model 𝑝& 𝐱!
→ It can be decomposed to the step-wise losses (for each step 𝑡)
Diffusion Probabilistic Model
14
Sohl-Dickstein et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. ICML’15
15. • Diffusion model aims to learn the reverse of noise generation procedure
• Training: Minimize variational lower bound of the model 𝑝& 𝐱!
→ It can be decomposed to the step-wise losses (for each step 𝑡)
• Here, the true reverse step 𝑞(𝐱#$%|𝐱#, 𝐱!) can be computed as a closed form of 𝛽#
• Note that we only define the true forward step 𝑞(𝐱#|𝐱#$%)
• Since all distributions above are Gaussian, the KL divergences are tractable
Diffusion Probabilistic Model
15
Sohl-Dickstein et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. ICML’15
16. • Diffusion model aims to learn the reverse of noise generation procedure
• Network: Use the image-to-image translation (e.g., U-Net) architectures
• Recall that input is 𝐱# and output is 𝐱#$%, both are images
• It is expensive since both input and output are high-dimensional
• Note that the denoiser 𝜇& 𝐱(, t shares weights, but conditioned by step 𝑡
Diffusion Probabilistic Model
16
* Image from the pix2pix-HD paper
Sohl-Dickstein et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. ICML’15
17. • Diffusion model aims to learn the reverse of noise generation procedure
• Sampling: Draw a random noise 𝒙" then apply the reverse step 𝑝&(𝐱#'%|𝐱#)
• It often requires the hundreds of reverse steps (very slow)
• Early and late steps change the high- and low-level attributes, respectively
Diffusion Probabilistic Model
17
* Image from the DDPM paper
Sohl-Dickstein et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. ICML’15
18. • DDPM reparametrizes the reverse distributions of diffusion models
• Key idea: The original reverse step fully creates the denoiser 𝜇& 𝐱(, t from 𝐱#
• However, 𝐱#$% and 𝐱# share most information, and thus it is redundant
→ Instead, create the residual 𝜖& 𝐱(, t and add to the original 𝐱#
Denoising Diffusion Probabilistic Model (DDPM)
18
Ho et al. Denoising Diffusion Probabilistic Models. NeurIPS'20
19. • DDPM reparametrizes the reverse distributions of diffusion models
• Key idea: The original reverse step fully creates the denoiser 𝜇& 𝐱(, t from 𝐱#
• However, 𝐱#$% and 𝐱# share most information, and thus it is redundant
→ Instead, create the residual 𝜖& 𝐱(, t and add to the original 𝐱#
• Formally, DDPM reparametrizes the learned reverse distribution as1
and the step-wise objective 𝐿#$% can be reformulated as2
Denoising Diffusion Probabilistic Model (DDPM)
19
1. 𝛼! are some constants determined by 𝛽!
2. Note that we need no “intermediate” samples, and only compare the forward noise 𝝐 and reverse noise 𝝐" conditioned on 𝐱#
Ho et al. Denoising Diffusion Probabilistic Models. NeurIPS'20
20. • DDPM initiated the diffusion model boom
• Achieved SOTA on CIFAR-10, with high-resolution scalability
• It produces more diverse samples than GAN (no mode collapse)
Denoising Diffusion Probabilistic Model (DDPM)
20
Ho et al. Denoising Diffusion Probabilistic Models. NeurIPS'20
21. • DDIM roughly sketches the final sample, then refine it with the reverse process
• Motivation:
• Diffusion model is slow due to the iterative procedure
• GAN/VAE creates the sample by one-shot forward operation
• ⇒ Can we combine the advantages for fast sampling of diffusion models?
• Technical spoiler:
• Instead of naïvely applying diffusion model upon GAN/VAE,
DDIM proposes a principled approach of rough sketch + refinement
Denoising Diffusion Implicit Model (DDIM)
21
Song et al. Denoising Diffusion Implicit Models. ICLR’21
22. • DDIM roughly sketches the final sample, then refine it with the reverse process
• Key idea:
• Given 𝐱#, generate the rough sketch 𝐱! and refine 𝑝&(𝐱#$%|𝐱#, 𝐱!)1
• Unlike original diffusion model, it is not a Markovian structure
Denoising Diffusion Implicit Model (DDIM)
22
1. Recall that the original diffusion model uses 𝑝"(𝐱$%&|𝐱$)
Song et al. Denoising Diffusion Implicit Models. ICLR’21
23. • DDIM roughly sketches the final sample, then refine it with the reverse process
• Key idea: Given 𝐱#, generate the rough sketch 𝐱! and refine 𝑞(𝐱#$%|𝐱#, 𝐱!)
• Formulation: Define the forward distribution 𝑞(𝐱#$%|𝐱#, 𝐱!) as
then, the forward process is derived from Bayes’ rule
Denoising Diffusion Implicit Model (DDIM)
23
Song et al. Denoising Diffusion Implicit Models. ICLR’21
24. • DDIM roughly sketches the final sample, then refine it with the reverse process
• Key idea: Given 𝐱#, generate the rough sketch 𝐱! and refine 𝑞(𝐱#$%|𝐱#, 𝐱!)
• Formulation: Forward process is
and reverse process is
Denoising Diffusion Implicit Model (DDIM)
24
Song et al. Denoising Diffusion Implicit Models. ICLR’21
25. • DDIM roughly sketches the final sample, then refine it with the reverse process
• Key idea: Given 𝐱#, generate the rough sketch 𝐱! and refine 𝑞(𝐱#$%|𝐱#, 𝐱!)
• Formulation: Forward process is
and reverse process is
• Training: The variational lower bound of DDIM is identical to the one of DDPM1
• It is surprising since the forward/reverse formulation is totally different
Denoising Diffusion Implicit Model (DDIM)
25
1. Precisely, the bound is different, but the solution is identical under some assumption (though violated in practice)
Song et al. Denoising Diffusion Implicit Models. ICLR’21
26. • DDIM significantly reduces the sampling steps of diffusion model
• Creates the outline of the sample after only 10 steps (DDPM needs hundreds)
Denoising Diffusion Implicit Model (DDIM)
26
Song et al. Denoising Diffusion Implicit Models. ICLR’21
27. • New golden era of generative models
• Competition of various approaches: GAN, VAE, flow, diffusion model1
• Also, lots of hybrid approaches (e.g., score SDE = diffusion + continuous flow)
• Which model to use?
• Diffusion model seems to be a
nice option for high-quality generation
• However, GAN is (currently) still a
more practical solution which needs
fast sampling (e.g., real-time apps.)
Take-home Message
27
1. VAE also shows promising generation performance (see NVAE, very deep VAE)