Image segmentation refers to partitioning a digital image into multiple regions or sets of pixels based on characteristics like color or texture. The goal is to simplify the image representation to make it easier to analyze. Some applications in medical imaging include locating tumors, measuring tissue volumes, and computer-guided surgery. Common segmentation techniques include thresholding, edge detection, region growing, and split-and-merge approaches.
Lec4: Pre-Processing Medical Images (II)Ulaş Bağcı
2017 Spring, UCF Medical Image Computing CAVA: Computer Aided Visualization and Analysis • CAD: Computer Aided Diagnosis • Definitions and Terminologies • Coordinate Systems • Pre-Processing Images – Volume of Interest – RegionofInterest – IntensityofInterest – ImageEnhancement • Filtering • Smoothing • Introduction to Medical Image Computing and Toolkits • Image Filtering, Enhancement, Noise Reduction, and Signal Processing • MedicalImageRegistration • MedicalImageSegmentation • MedicalImageVisualization • Machine Learning in Medical Imaging • Shape Modeling/Analysis of Medical Images Deep Learning in Radiology
Conditional Image Generation with PixelCNN Decoderssuga93
The document summarizes research on conditional image generation using PixelCNN decoders. It discusses how PixelCNNs sequentially predict pixel values rather than the whole image at once. Previous work used PixelRNNs, but these were slow to train. The proposed approach uses a Gated PixelCNN that removes blind spots in the receptive field by combining horizontal and vertical feature maps. It also conditions PixelCNN layers on class labels or embeddings to generate conditional images. Experimental results show the Gated PixelCNN outperforms PixelCNN and achieves performance close to PixelRNN on CIFAR-10 and ImageNet, while training faster. It can also generate portraits conditioned on embeddings of people.
Lec12: Shape Models and Medical Image SegmentationUlaş Bağcı
ShapeModeling – M-reps
– Active Shape Models (ASM)
– Oriented Active Shape Models (OASM)
– Application in anatomy recognition and segmentation – Comparison of ASM and OASM
ActiveContour(Snake) • LevelSet • Applications Enhancement, Noise Reduction, and Signal Processing • MedicalImageRegistration • MedicalImageSegmentation • MedicalImageVisualization • Machine Learning in Medical Imaging • Shape Modeling/Analysis of Medical Images Deep Learning in Radiology Fuzzy Connectivity (FC) – Affinity functions • Absolute FC • Relative FC (and Iterative Relative FC) • Successful example applications of FC in medical imaging • Segmentation of Airway and Airway Walls using RFC based method Energy functional – Data and Smoothness terms • GraphCut – Min cut – Max Flow • ApplicationsinRadiologyImages
Tutorial on Generalization in Neural Fields, CVPR 2022 Tutorial on Neural Fie...Vincent Sitzmann
Slides for the "generalization" session of our CVPR 2022 tutorial on Neural Fields in Computer Vision.
Neural Fields are an emerging technique to parameterize signals that live in spatial coordinates plus time. They parameterize a signal as a continuous function that maps a space-time coordinate to whatever is at that spacetime coordinate - for instance, the geometry of a 3D scene could be encoded in a function that maps a 3D coordinate to whether that coordinate is occupied or not. A neural field parameterizes that function as a neural network.
In this session, I gave a high-level overview over how we may use neural fields as the output of a variety of inference algorithms, for instance to reconstruct a complete 3D shape from partial observations in the form of a pointcloud, or to reconstruct a 3D scene from only a single image.
You are free to use the slides for any purpose, as long as you keep a note on the slides that acknowledges their source.
Neural Fields database: https://neuralfields.cs.brown.edu/
Tutorial website: https://neuralfields.cs.brown.edu/cvpr22
The document discusses super resolution imaging techniques. Super resolution aims to enhance image resolution and clarity by processing multiple low resolution images to generate a single high resolution output image. It combines non-repetitive information from multiple low resolution images. Common approaches involve image registration, interpolation using techniques like nearest neighbor, bilinear, and bicubic interpolation, followed by noise removal. The techniques can help obtain higher resolution images for applications like satellite imaging, medical imaging, and surveillance.
Three key points about structure from motion:
1. Given multiple images of 3D points, structure from motion aims to estimate the 3D structure and camera motion from 2D point correspondences across images.
2. For affine cameras, factorization methods can be used to decompose the measurement matrix and obtain the motion and structure matrices up to an affine ambiguity.
3. For projective cameras, an iterative procedure alternates between factorization to estimate motion/structure and re-solving for depths to handle the projective ambiguity. At least 7 point correspondences are needed for a two-camera case.
In this project, we propose methods for semantic segmentation with the deep learning state-of-the-art models. Moreover,
we want to filterize the segmentation to the specific object in specific application. Instead of concentrating on unnecessary objects we
can focus on special ones and make it more specialize and effecient for special purposes. Furtheromore, In this project, we leverage
models that are suitable for face segmentation. The models that are used in this project are Mask-RCNN and DeepLabv3. The
experimental results clearly indicate that how illustrated approach are efficient and robust in the segmentation task to the previous work
in the field of segmentation. These models are reached to 74.4 and 86.6 precision of Mean of Intersection over Union. The visual
Results of the models are shown in Appendix part.
Lec5: Pre-Processing Medical Images (III) (MRI Intensity Standardization)Ulaş Bağcı
2017 Spring, UCF Medical Image Computing CAVA: Computer Aided Visualization and Analysis • CAD: Computer Aided Diagnosis • Definitions and Terminologies • Coordinate Systems • Pre-Processing Images – Volume of Interest – RegionofInterest – IntensityofInterest – ImageEnhancement • Filtering • Smoothing • Introduction to Medical Image Computing and Toolkits • Image Filtering, Enhancement, Noise Reduction, and Signal Processing • MedicalImageRegistration • MedicalImageSegmentation • MedicalImageVisualization • Machine Learning in Medical Imaging • Shape Modeling/Analysis of Medical Images Deep Learning in Radiology
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
SSD is a single-shot object detector that processes the entire image at once, rather than proposing regions of interest. It uses a base VGG16 network with additional convolutional layers to predict bounding boxes and class probabilities at three scales simultaneously. SSD achieves state-of-the-art accuracy while running significantly faster than two-stage detectors like Faster R-CNN. It introduces techniques like default boxes, hard negative mining, and data augmentation to address class imbalance and improve results on small objects. On PASCAL VOC 2007, SSD detects objects at 59 FPS with 74.3% mAP, comparable to Faster R-CNN but much faster.
This document discusses different techniques for image segmentation. It begins by defining image segmentation as dividing an image into regions based on similarity and differences between adjacent regions. The main approaches discussed are discontinuity-based segmentation, which looks for sudden changes in pixel intensity (edges), and similarity-based segmentation, which groups similar pixels into regions. The document then examines various methods for detecting edges, linking edges, thresholding, and region-based segmentation using techniques like region growing and splitting/merging.
Object detection and Instance SegmentationHichem Felouat
The document discusses object detection and instance segmentation models like YOLOv5, Faster R-CNN, EfficientDet, Mask R-CNN, and TensorFlow's object detection API. It provides information on labeling images with bounding boxes for training these models, including open-source and commercial annotation tools. The document also covers evaluating object detection models using metrics like mean average precision (mAP) and intersection over union (IoU). It includes an example of training YOLOv5 on a custom dataset.
This document discusses Motaz El Saban's research experience and interests which focus on analyzing, modeling, learning from, and predicting digital media content such as text, images, and speech. Some key areas of research include real-time video stitching, annotating mobile videos, object and activity recognition from videos, and facial expression recognition using deep learning techniques. The document also outlines El Saban's educational background and provides an agenda for his upcoming presentation.
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
1. The document discusses various deep learning models for image segmentation, including fully convolutional networks, encoder-decoder models, multi-scale pyramid networks, and dilated convolutional models.
2. It provides details on popular architectures like U-Net, SegNet, and models from the DeepLab family.
3. The document also reviews datasets commonly used to evaluate image segmentation methods and reports accuracies of different models on the Cityscapes dataset.
This document summarizes a project on real-time object detection using computer vision techniques. It discusses using a system that can recognize objects in a video stream from a camera and label them with bounding boxes and labels. It notes that most video surveillance footage is uninteresting unless there are moving objects. The project aims to address this by building an accurate, fast object detection system that can run on resource-constrained devices. It proposes using a hybrid CNN-SVM model trained on a large dataset to recognize objects and discusses the training and detection phases of the system.
Real Time Object Dectection using machine learningpratik pratyay
This document discusses the development of a real-time object detection system using computer vision techniques. It aims to recognize and label moving objects in video streams from monitoring cameras with high accuracy and in a short amount of time. The system will use a hybrid model of convolutional neural networks and support vector machines for feature extraction and classification of objects from camera feeds into predefined classes. It is intended to help analyze surveillance video by only flagging clips that contain objects of interest like people or vehicles, reducing wasted storage and review time.
Computer Vision Landscape : Present and FutureSanghamitra Deb
Millions of people all around the world Learn with Chegg. Education at Chegg is powered by the depth and diversity of the content that we have. A huge part of our content is in form of images. These images could be uploaded by students or by content creators. Images contain text that is extracted using a transcription service. Very often uploaded images are noisy. This leads to irrelevant characters or words in the transcribed text. Using object detection techniques we develop a service that extracts the relevant parts of the image and uses a transcription service to get clean text. In the first part of the presentation, I will talk about building an object detection model using YOLO for cropping and masking images to obtain a cleaner text from transcription. YOLO is a deep learning object detection and recognition modeling framework that is able to produce highly accurate results with low latency. In the next part of my presentation, I will talk about the building the Computer Vision landscape at Chegg. Starting from images on academic materials that are composed of elements such as text, equations, diagrams we create a pipeline for extracting these image elements. Using state of the art deep learning techniques we create embeddings for these elements to enhance downstream machine learning models such as content quality and similarity.
1) The document discusses using data in deep learning models, including understanding the limitations of data and how it is acquired.
2) It describes techniques for image matching using multi-view geometry, including finding corresponding points across images and triangulating them to determine camera pose.
3) Recent works aim to improve localization of objects in images using multiple instance learning approaches that can learn without full supervision or through more stable optimization methods like linearizing sampling operations.
The document outlines a presentation on multimedia data mining. It discusses three articles: 1) a tool for visually mining multimedia data for social studies, 2) a framework for mining traffic video sequences, and 3) using voice mining to understand customer feedback. It also provides an introduction to multimedia data mining and recommendations.
The document provides an overview of deep learning based object detection models. It discusses early approaches like R-CNN, Fast R-CNN, and Faster R-CNN, as well as more recent single-shot detectors like YOLO, SSD, RetinaNet, and CenterNet. It covers performance metrics like mean average precision (mAP) and compares the speed and accuracy of different models. The document concludes by outlining general guidelines for choosing an object detection model based on priorities like accuracy, speed, model size, and portability.
Neural Networks for Machine Learning and Deep Learningcomifa7406
This document discusses autoencoders and their use in dimensionality reduction and retrieval tasks. It begins by explaining principal component analysis (PCA) and how autoencoders can learn PCA through backpropagation by minimizing reconstruction error. Deep autoencoders are then introduced as a way to perform nonlinear dimensionality reduction by encoding data onto a manifold. Applications discussed include document retrieval, visualization, and hashing. Binary codes learned through deep autoencoders are shown to work well for image retrieval.
Deep learning fundamental and Research project on IBM POWER9 system from NUSGanesan Narayanasamy
Moving object recognition (MOR) corresponds to the localisation and classification of moving objects in videos. Discriminating moving objects from static objects and background in videos is an essential task for many computer vision applications. MOR has widespread applications in intelligent visual surveillance, intrusion detection, anomaly detection and monitoring, industrial sites monitoring, detection-based tracking, autonomous vehicles, etc. In this session, Murari is going to talk about the deep learning algorithms to identify both locations and corresponding categories of moving objects with a convolutional network. The challenges in developing such algorithms will be discussed. The discourse will also include the implementation details of these models in both conventional and UAV videos.
Journal club done with Vid Stojevic for PointNet:
https://arxiv.org/abs/1612.00593
https://github.com/charlesq34/pointnet
http://stanford.edu/~rqi/pointnet/
Deep learning for Indoor Point Cloud processing. PointNet, provides a unified architecture operating directly on unordered point clouds without voxelisation for applications ranging from object classification, part segmentation, to scene semantic parsing.
Alternative download link:
https://www.dropbox.com/s/ziyhgi627vg9lyi/3D_v2017_initReport.pdf?dl=0
Camera-Based Road Lane Detection by Deep Learning IIYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
The document introduces various computer vision topics including convolutional neural networks, popular CNN architectures, data augmentation, transfer learning, object detection, neural style transfer, generative adversarial networks, and variational autoencoders. It provides overviews of each topic and discusses concepts such as how convolutions work, common CNN architectures like ResNet and VGG, why data augmentation is important, how transfer learning can utilize pre-trained models, how object detection algorithms like YOLO work, the content and style losses used in neural style transfer, how GANs use generators and discriminators, and how VAEs describe images with probability distributions. The document aims to discuss these topics at a practical level and provide insights through examples.
Content-based image retrieval (CBIR) uses computer vision techniques to search for and retrieve images from large databases based on visual similarities. CBIR systems typically extract features from images and measure similarities to return images matching a query image. Popular applications include Google Images, eBay, and Pinterest. Evaluation of CBIR systems focuses on precision and recall metrics, as precision alone is insufficient without also considering recall. Training siamese networks for CBIR requires loss functions that pull similar images closer together and push dissimilar images farther apart.
Unsupervised anomaly detection using style distillationLEE HOSEONG
The document discusses using convolutional autoencoders for unsupervised anomaly detection. It describes training a convolutional autoencoder model on normal data to learn the distribution of normal examples, then using the model to detect anomalies in new data based on the reconstruction error. The process involves training the autoencoder to minimize the difference between inputs and outputs, then using the trained model to encode new data and flag examples with a high reconstruction error as anomalies.
do adversarially robust image net models transfer betterLEE HOSEONG
The document discusses an experiment comparing the transfer learning performance of standard ImageNet models versus adversarially robust ImageNet models. The experiment finds that robust models consistently match or outperform standard models on a variety of downstream transfer learning tasks, despite having lower accuracy on ImageNet. Further analysis shows robust models improve with increased width and that the optimal level of robustness depends on properties of the downstream task like dataset granularity. Overall, the findings suggest adversarially robust models transfer learned representations better than standard models.
This document discusses mixed precision training techniques for deep neural networks. It introduces three techniques to train models with half-precision floating point without losing accuracy: 1) Maintaining a FP32 master copy of weights, 2) Scaling the loss to prevent small gradients, and 3) Performing certain arithmetic like dot products in FP32. Experimental results show these techniques allow a variety of networks to match the accuracy of FP32 training while reducing memory and bandwidth. The document also discusses related work and PyTorch's new Automatic Mixed Precision features.
YOLOv4: optimal speed and accuracy of object detection reviewLEE HOSEONG
YOLOv4 builds upon previous YOLO models and introduces techniques like CSPDarknet53, SPP, PAN, Mosaic data augmentation, and modifications to existing methods to achieve state-of-the-art object detection speed and accuracy while being trainable on a single GPU. Experiments show that combining these techniques through a "bag of freebies" and "bag of specials" approach improves classifier and detector performance over baselines on standard datasets. The paper contributes an efficient object detection model suitable for production use with limited resources.
FixMatch:simplifying semi supervised learning with consistency and confidenceLEE HOSEONG
This document summarizes the FixMatch paper, which proposes a simple semi-supervised learning method that achieves state-of-the-art results. FixMatch combines pseudo-labeling and consistency regularization by generating pseudo-labels for unlabeled data using a model's prediction on a weakly augmented version and enforcing consistency on a strongly augmented version. Extensive ablation studies show that FixMatch outperforms previous methods on standard benchmarks even with limited labeled data and identifies consistency regularization and pseudo-labeling as the most important factors for its success.
"Revisiting self supervised visual representation learning" Paper ReviewLEE HOSEONG
This paper revisits self-supervised visual representation learning techniques. It conducts a large-scale study comparing different CNN architectures (ResNet, RevNet, VGG) and self-supervised techniques (rotation, exemplar, jigsaw, relative patch location). The study finds that using modern CNN architectures like ResNet instead of older AlexNet models significantly improves performance. Increasing the width of networks also boosts performance of self-supervised learning. Evaluation of representations on a new dataset shows the learned features generalize well.
Self-supervised learning uses unlabeled data to learn visual representations through pretext tasks like predicting relative patch location, solving jigsaw puzzles, or image rotation. These tasks require semantic understanding to solve but only use unlabeled data. The features learned through pretraining on pretext tasks can then be transferred to downstream tasks like image classification and object detection, often outperforming supervised pretraining. Several papers introduce different pretext tasks and evaluate feature transfer on datasets like ImageNet and PASCAL VOC. Recent work combines multiple pretext tasks and shows improved generalization across tasks and datasets.
Human uncertainty makes classification more robust, ICCV 2019 ReviewLEE HOSEONG
1. The document summarizes a research paper that proposes training deep neural networks on soft labels representing human uncertainty in image classification, which improves generalization and robustness compared to training on hard labels.
2. Experiments show that models trained on soft labels constructed from human responses better fit patterns of human uncertainty and improve accuracy, cross-entropy, and a new second-best accuracy measure on various generalization datasets.
3. Alternative soft label methods are also explored, finding that human uncertainty provides a more important contribution than soft labels alone. While robustness to adversarial attacks is improved, defenses are still needed.
This document provides an overview of single image super resolution using deep learning. It discusses how super resolution can be used to generate a high resolution image from a low resolution input. Deep learning models like SRCNN were early approaches for super resolution but newer models use deeper networks and perceptual losses. Generative adversarial networks have also been applied to improve perceptual quality. Key applications are in satellite imagery, medical imaging, and video enhancement. Metrics like PSNR and SSIM are commonly used but may not correlate with human perception. Overall, deep learning has advanced super resolution techniques but challenges remain in fully evaluating perceptual quality.
This document provides a review of the paper "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks" presented at ICLR 2019. The paper proposes that dense neural networks contain sparse subnetworks that are capable of learning in isolation with the same accuracy in fewer iterations if they retain their original initialization weights. Through experiments on MNIST, CIFAR10 and ImageNet datasets, the paper finds evidence that iterative pruning can discover such "winning tickets" and achieve better performance than one-shot pruning or training sparse subnetworks from random initialization. However, further work is needed to test the hypothesis on larger datasets and optimize the resulting architectures.
Smart mobility refers to the integration of advanced technologies and innovative solutions to create efficient, sustainable, and interconnected transportation systems. It encompasses various aspects of transportation, including public transit, shared mobility services, intelligent transportation systems, electric vehicles, and connected infrastructure. Smart mobility aims to improve the overall mobility experience by leveraging data, connectivity, and automation to enhance safety, reduce congestion, optimize transportation networks, and minimize environmental impacts.
Develop Secure Enterprise Solutions with iOS Mobile App Development ServicesDamco Solutions
The security of enterprise apps should not be overlooked by organizations. Since these apps handle confidential finance/user data and business operations, ensuring greater security is crucial. That’s why, businesses should hire dedicated iOS mobile application development services providers for creating super-secured enterprise apps. By incorporating sophisticated security mechanisms, these developers make enterprise apps resistant to a range of cyber threats.
Content source - https://www.bizbangboom.com/articles/enterprise-mobile-app-development-with-ios-augmenting-business-security
Read more - https://www.damcogroup.com/ios-application-development-services
The Zaitechno Handheld Raman Spectrometer is a powerful and portable tool for rapid, non-destructive chemical analysis. It utilizes Raman spectroscopy, a technique that analyzes the vibrational fingerprint of molecules to identify their chemical composition. This handheld instrument allows for on-site analysis of materials, making it ideal for a variety of applications, including:
Material identification: Identify unknown materials, minerals, and contaminants.
Quality control: Ensure the quality and consistency of raw materials and finished products.
Pharmaceutical analysis: Verify the identity and purity of pharmaceutical compounds.
Food safety testing: Detect contaminants and adulterants in food products.
Field analysis: Analyze materials in the field, such as during environmental monitoring or forensic investigations.
The Zaitechno Handheld Raman Spectrometer is easy to use and features a user-friendly interface. It is compact and lightweight, making it ideal for field applications. With its rapid analysis capabilities, the Zaitechno Handheld Raman Spectrometer can help you improve efficiency and productivity in your research or quality control workflows.
It's your unstructured data: How to get your GenAI app to production (and spe...Zilliz
So you've successfully built a GenAI app POC for your company -- now comes the hard part: bringing it to production. Aparavi addresses the challenges of AI projects while addressing data privacy and PII. Our Service for RAG helps AI developers and data scientists to scale their app to 1000s to millions of users using corporate unstructured data. Aparavi’s AI Data Loader cleans, prepares and then loads only the relevant unstructured data for each AI project/app, enabling you to operationalize the creation of GenAI apps easily and accurately while giving you the time to focus on what you really want to do - building a great AI application with useful and relevant context. All within your environment and never having to share private corporate data with anyone - not even Aparavi.
Discovery Series - Zero to Hero - Task Mining Session 1DianaGray10
This session is focused on providing you with an introduction to task mining. We will go over different types of task mining and provide you with a real-world demo on each type of task mining in detail.
How UiPath Discovery Suite supports identification of Agentic Process Automat...DianaGray10
📚 Understand the basics of the newly persona-based LLM-powered Agentic Process Automation and discover how existing UiPath Discovery Suite products like Communication Mining, Process Mining, and Task Mining can be leveraged to identify APA candidates.
Topics Covered:
💡 Idea Behind APA: Explore the innovative concept of Agentic Process Automation and its significance in modern workflows.
🔄 How APA is Different from RPA: Learn the key differences between Agentic Process Automation and Robotic Process Automation.
🚀 Discover the Advantages of APA: Uncover the unique benefits of implementing APA in your organization.
🔍 Identifying APA Candidates with UiPath Discovery Products: See how UiPath's Communication Mining, Process Mining, and Task Mining tools can help pinpoint potential APA candidates.
🔮 Discussion on Expected Future Impacts: Engage in a discussion on the potential future impacts of APA on various industries and business processes.
Enhance your knowledge on the forefront of automation technology and stay ahead with Agentic Process Automation. 🧠💼✨
Speakers:
Arun Kumar Asokan, Delivery Director (US) @ qBotica and UiPath MVP
Naveen Chatlapalli, Solution Architect @ Ashling Partners and UiPath MVP
Retrieval Augmented Generation Evaluation with RagasZilliz
Retrieval Augmented Generation (RAG) enhances chatbots by incorporating custom data in the prompt. Using large language models (LLMs) as judge has gained prominence in modern RAG systems. This talk will demo Ragas, an open-source automation tool for RAG evaluations. Christy will talk about and demo evaluating a RAG pipeline using Milvus and RAG metrics like context F1-score and answer correctness.
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesSAI KAILASH R
Explore the advantages and disadvantages of blockchain technology in this comprehensive SlideShare presentation. Blockchain, the backbone of cryptocurrencies like Bitcoin, is revolutionizing various industries by offering enhanced security, transparency, and efficiency. However, it also comes with challenges such as scalability issues and energy consumption. This presentation provides an in-depth analysis of the key benefits and drawbacks of blockchain, helping you understand its potential impact on the future of technology and business.
Improving Learning Content Efficiency with Reusable Learning ContentEnterprise Knowledge
Enterprise Knowledge’s Emily Crockett, Content Engineering Consultant, presented “Improve Learning Content Efficiency with Reusable Learning Content” at the Learning Ideas conference on June 13th, 2024.
This presentation explored the basics of reusable learning content, including the types of reuse and the key benefits of reuse such as improved content maintenance efficiency, reduced organizational risk, and scalable differentiated instruction & personalization. After this primer on reuse, Crockett laid out the basic steps to start building reusable learning content alongside a real-life example and the technology stack needed to support dynamic content. Key objectives included:
- Be able to explain the difference between reusable learning content and duplicate content
- Explore how a well-designed learning content model can reduce duplicate content and improve your team’s efficiency
- Identify key tasks and steps in creating a learning content model
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptxFwdays
I will share my personal experience of full-time development on wasm Blazor
What difficulties our team faced: life hacks with Blazor app routing, whether it is necessary to write JavaScript, which technology stack and architectural patterns we chose
What conclusions we made and what mistakes we committed
Finetuning GenAI For Hacking and DefendingPriyanka Aash
Generative AI, particularly through the lens of large language models (LLMs), represents a transformative leap in artificial intelligence. With advancements that have fundamentally altered our approach to AI, understanding and leveraging these technologies is crucial for innovators and practitioners alike. This comprehensive exploration delves into the intricacies of GenAI, from its foundational principles and historical evolution to its practical applications in security and beyond.
This PDF delves into the aspects of information security from a forensic perspective, focusing on privacy leaks. It provides insights into the methods and tools used in forensic investigations to uncover and mitigate privacy breaches in mobile and cloud environments.
3. 3Type A-3
CVPR 2019 Statistics
• What is CVPR?
• Conference on Computer Vision and Pattern Recognition (CVPR)
• CVPR was first held in 1983 and has been held annually
• CVPR 2019: June 16th – June 20th in Long Beach, CA
4. 4Type A-3
CVPR 2019 Statistics
• CVPR 2019 statistics
• The total number of papers is increasing every year and this year has increased significantly!
• We can visualize main topic using title of paper and simple python script!
• https://github.com/hoya012/CVPR-Paper-Statistics
28.4% 30%
29.9%
29.6%
25.1%
6. 6Type A-3
CVPR 2018 Statistics
2018 CVPR paper statistics
Compared to 2018 Statistics..
7. 7Type A-3
CVPR 2018 vs CVPR 2019 Statistics
• Most of the top keywords were maintained
• Image, detection, 3d, object, video, segmentation, adversarial, recognition, visual …
• “graph”, “cloud”, “representation” are about twice as frequent
• graph : 15 → 45
• representation: 25 → 48
• cloud: 16 → 35
8. 8Type A-3
Before beginning..
• It does not mean that it is not an interesting article because it is not in the list.
• Since I mainly studied Computer Vision, most papers that I will discuss today are
Computer Vision papers..
• Topics not covered today
• Natural Language Processing
• Reinforcement Learning
• Robotics
• Etc..?
9. 9Type A-3
1. Learning to Synthesize Motion Blur (oral)
• Synthesizing a motion blurred image from a pair of unblurred sequential images
• Motion blur is important in cinematography, and artful photo
• Generate a large-scale synthetic training dataset of motion blurred images
Recommended reference: “Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation”, 2018 CVPR
10. 10Type A-3
2. Semantic Image Synthesis with Spatially-Adaptive Normalization (oral)
• Synthesizing photorealistic images given an input semantic layout
• Spatially-adaptive normalization can keep semantic information
• This model allows user control over both semantic and style as synthesizing images
Demo Code: https://github.com/NVlabs/SPADE
11. 11Type A-3
3. SiCloPe: Silhouette-Based Clothed People (Oral)
• Reconstruct a complete and textured 3D model of a person from a single image
• Use 2D Silhouettes and 3D joints of a body pose to reconstruct 3D mesh
• An effective two-stage 3D shape reconstruction pipeline
• Predicting multi-view 2D silhouettes from single input segmentation
• Deep visual hull based mesh reconstruction technique
Recommended reference: “BodyNet: Volumetric Inference of 3D Human Body Shapes”, 2018 ECCV
12. 12Type A-3
4. Im2Pencil: Controllable Pencil Illustration from Photographs
• Propose controllable photo-to-pencil translation method
• Modeling pencil outline(rough, clean), pencil shading(4 types)
• Create training data pairs from online websites(e.g., Pinterest) and use image filtering techniques
Demo Code: https://github.com/Yijunmaverick/Im2Pencil
13. 13Type A-3
5. End-to-End Time-Lapse Video Synthesis from a Single Outdoor Image
• End-to-end solution to synthesize a time-lapse video from single image
• Use time-lapse videos and image sequences during training
• Use only single image during inference
Input image(single)
14. 14Type A-3
6. StoryGAN: A Sequential Conditional GAN for Story Visualization
• Propose a new task called Story Visualization using GAN
• Sequential conditional GAN based StoryGAN
• Story Encoder – stochastic mapping from story to an low-dimensional embedding vector
• Context Encoder – capture contextual information during sequential image generation
• Two Discriminator – Image Discriminator & Story Discriminator
Context Encoder
15. 15Type A-3
7. Image Super-Resolution by Neural Texture Transfer (oral)
• Improve “RefSR” even when irrelevant reference images are provided
• Traditional Single Image Super-Resolution is extremely challenging (ill-posed problem)
• Reference-based(RefSR) utilizes rich texture from HR references .. but.. only similar Ref images
• Adaptively transferring the texture from Ref Images according to their texture similarity
Recommended reference: “CrossNet: An end-to-end reference-based super resolution network using cross-scale warping”, 2018 ECCV
Similar
Different
16. 16Type A-3
8. DVC: An End-to-end Deep Video Compression Framework (oral)
• Propose the first end-to-end video compression deep model
• Conventional video compression use predictive coding architecture and encode corresponding
motion information and residual information
• Taking advantage of both classical compression and neural network
• Use learning based optical flow estimation
17. 17Type A-3
9. Defense Against Adversarial Images using Web-Scale Nearest-Neighbor Search (oral)
• Defense adversarial attack using Big-data and image manifold
• Assume that adversarial attack move the image away from the image manifold
• A successful defense mechanism should aim to project the images back on the image manifold
• For tens of billions of images, search a nearest-neighbor images (K=50) and use them
• Also propose two novel attack methods to break nearest neighbor defenses
18. 18Type A-3
10. Bag of Tricks for Image Classification with Convolutional Neural Networks
• Examine a collection of some refinements and empirically evaluate their impact
• Improve ResNet-50’s accuracy from 75.3% to 79.29% on ImageNet with some refinements
• Efficient Training
• FP32 with BS=256 → FP16 with BS=1024 with some techniques
• Training Refinements:
• Cosine Learning Rate Decay / Label Smoothing / Knowledge Distillation / Mixup Training
• Transfer from classification to Object Detection, Semantic Segmentation
Linear scaling LR
LR warmup
Zero γ initialization in BN
No bias decay
Result of Efficient Training
Result of Training refinements
ResNet tweaks
19. 19Type A-3
11. Fully Learnable Group Convolution for Acceleration of Deep Neural Networks
• Automatically learn the group structure in training stage with end-to-end manner
• Outperform standard group convolution
• Propose an efficient strategy for index re-ordering
20. 20Type A-3
12. ScratchDet:Exploring to Train Single-Shot Object Detectors from Scratch (oral)
• Explore to train object detectors from scratch robustly
• Almost SOTA detectors are fine-tuned from pretrained CNN (e.g., ImageNet)
• The classification and detection have different degrees of sensitivity to translation
• The architecture is limited by the classification network(backbone) → inconvenience!
• Find that one of the overlooked points is BatchNorm!
Recommended reference: “DSOD: Learning Deeply Supervised Object Detectors from Scratch”, 2017 ICCV
21. 21Type A-3
13. Precise Detection in Densely Packed Scenes
• Propose precise detection in densely packed scenes
• In real-world, there are many applications of object detection (ex, detection and count # of object)
• In densely packed scenes, SOTA detector can’t detect accurately
(1) layer for estimating the Jaccard index (2) a novel EM merging unit (3) release SKU-110K dataset
22. 22Type A-3
14. SIXray: A Large-scale Security Inspection X-ray Benchmark for Prohibited Item Discovery in Overlapping Images
• Present a large-scale dataset and establish a baseline for security inspection X-ray
• Total 1,059,231 X-ray images in which 6 classes of 8,929 prohibited items
• Propose an approach named class-balanced hierarchical refinement(CHR) and class-balanced loss
function
23. 23Type A-3
15. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regressio
• Address the weaknesses of IoU and introduce generalized version(GIoU)
• Intersection over Union(IoU) is the most popular evaluation metric used in object detection
• But, there is a gap between optimizing distance losses and maximizing IoU
• Introducing generalized IoU as both a new loss and a new metric
24. 24Type A-3
16. Bounding Box Regression with Uncertainty for Accurate Object Detection
• Propose novel bounding box regression loss with uncertainty
• Most of datasets have ambiguities and labeling noise of bounding box coordinate
• Network can learns to predict localization variance for each coordinate
25. 25Type A-3
17. UPSNet: A Unified Panoptic Segmentation Network (oral)
• Propose a unified panoptic segmentation network(UPSNet)
• Semantic segmentation + Instance segmentation = panoptic segmentation
• Semantic Head + Instance Head + Panoptic head → end-to-end manner
Recommended reference: “Panoptic Segmentation”, 2018 arXiv
Countable objects → things
Uncountable objects → stuff
Deformable Conv Mask R-CNN Parameter-free
26. 26Type A-3
18. SFNet: Learning Object-aware Semantic Correspondence (Oral)
• Propose SFNet for semantic correspondence problem
• Propose to use images annotated with binary foreground masks and synthetic geometric
deformations during training
• Manually selecting point correspondences is so expensive!!
• Outperform SOTA on standard benchmarks by a significant margin
27. 27Type A-3
19. Fast Interactive Object Annotation with Curve-GC
• Propose end-to-end fast interactive object annotation tool (Curve-GCN)
• Predict all vertices simultaneously using a Graph Convolutional Network, (→ Polygon-RNN X)
• Human annotator can correct any wrong point and only the neighboring points are affected
Recommended reference: “Efficient interactive annotation of segmentation datasets with polygon-rnn++ ”, 2018 CVPR
Code: https://github.com/fidler-lab/curve-gcn
Correction!
28. 28Type A-3
20. FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference
• Propose image-level WSSS method using stochastic inference (dropout)
• Localization maps(CAM) only focus on the small parts of objects → Problem
• FickleNet allows a single network to generate multiple CAM from a single image
• Does not require any additional training steps and only adds a simple layer
Full Stochastic
Both Training
and Inference
29. 29Type A-3
Related Post..
• In my personal blog, there are similar works
• SIGGRAPH 2018
• NeurIPS 2018
• ICLR 2019
https://hoya012.github.io/