In this deck from the Swiss HPC Conference, Mark Wilkinson presents: 40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility.
"DiRAC is the integrated supercomputing facility for theoretical modeling and HPC-based research in particle physics, and astrophysics, cosmology, and nuclear physics, all areas in which the UK is world-leading. DiRAC provides a variety of compute resources, matching machine architecture to the algorithm design and requirements of the research problems to be solved. As a single federated Facility, DiRAC allows more effective and efficient use of computing resources, supporting the delivery of the science programs across the STFC research communities. It provides a common training and consultation framework and, crucially, provides critical mass and a coordinating structure for both small- and large-scale cross-discipline science projects, the technical support needed to run and develop a distributed HPC service, and a pool of expertise to support knowledge transfer and industrial partnership projects. The on-going development and sharing of best-practice for the delivery of productive, national HPC services with DiRAC enables STFC researchers to produce world-leading science across the entire STFC science theory program."
Watch the video: https://wp.me/p3RLHQ-k94
Learn more: https://dirac.ac.uk/
and
http://hpcadvisorycouncil.com/events/2019/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the HPC User Forum at Argonne, Andrew Siegel from Argonne presents: ECP Application Development.
"The Exascale Computing Project is accelerating delivery of a capable exascale computing ecosystem for breakthroughs in scientific discovery, energy assurance, economic competitiveness, and national security. ECP is chartered with accelerating delivery of a capable exascale computing ecosystem to provide breakthrough modeling and simulation solutions to address the most critical challenges in scientific discovery, energy assurance, economic competitiveness, and national security. This role goes far beyond the limited scope of a physical computing system. ECP’s work encompasses the development of an entire exascale ecosystem: applications, system software, hardware technologies and architectures, along with critical workforce development."
Watch the video: https://wp.me/p3RLHQ-kSL
Learn more: https://www.exascaleproject.org
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
This document discusses clouds at CERN. It provides background on CERN, including that it was founded in 1954 by 12 European states for "Science for Peace" and now has 20 member states. It notes CERN has around 2300 staff, 1000 other paid personnel, and over 11,000 users. The document discusses challenges in scaling IT infrastructure with fixed staff and budgets. It outlines CERN's approach of moving to cloud models using open source tools. The status provides details on OpenStack deployments at CERN and experiments. It outlines next steps such as moving to new OpenStack releases and using cells to scale capacity.
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Intel® Software
This session discuss the implementation and performance of the K-nearest neighbor (KNN) computation on a distributed architecture using the Intel® Xeon Phi™ processor.
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...inside-BigData.com
In this deck from the 2019 Stanford HPC Conference, DK Panda from Ohio State University presents: How to Design Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems.
"This talk will focus on challenges in designing HPC, Deep Learning, and HPC Cloud middleware for Exascale systems with millions of processors and accelerators. For the HPC domain, we will discuss about the challenges in designing runtime environments for MPI+X (PGAS - OpenSHMEM/UPC/CAF/UPC++, OpenMP, and CUDA) programming models taking into account support for multi-core systems (Xeon, OpenPower, and ARM), high-performance networks, GPGPUs (including GPUDirect RDMA), and energy-awareness. Features and sample performance numbers from the MVAPICH2 libraries (http://mvapich.cse.ohio-state.edu) will be presented. For the Deep Learning domain, we will focus on popular Deep Learning frameworks (Caffe, CNTK, and TensorFlow) to extract performance and scalability with MVAPICH2-GDR MPI library and RDMA-Enabled Big Data stacks. Finally, we will outline the challenges in moving middleware to the Cloud environments."
Watch the video: https://youtu.be/hR8cnFVF8Zg
Learn more: http://www.cse.ohio-state.edu/~panda
and
http://hpcadvisorycouncil.com/events/2019/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The document discusses how the TeraGrid initiative provides US researchers with access to large scale high performance computing resources for physics research. It describes the diverse computing resources available through TeraGrid including supercomputers, clusters, and visualization resources. It provides examples of how physics domains like lattice QCD, astrophysics, and nanoscale electronic structure are using TeraGrid resources to enable large simulations and address research challenges. Training and support resources for users are also summarized.
Hpc, grid and cloud computing - the past, present, and future challengeJason Shih
This document discusses trends in high performance computing (HPC), grid computing, and cloud computing. It provides an overview of HPC cluster performance and interconnects. Grid computing enabled large-scale scientific collaboration through infrastructures like EGEE. The LHC requires petascale computing capabilities. Cloud computing hype is discussed alongside observations of performance and virtualization challenges. The future of computing may involve more sophisticated tools and dynamic, small computing elements.
High Performance Computing - Challenges on the Road to Exascale ComputingHeiko Joerg Schick
The document discusses challenges in achieving exascale computing capabilities by 2018. It outlines how standard technology scaling will not be enough, and compromises will need to be made. These include reduced node performance, lower network bandwidth and fewer pins. Blue Gene architecture is presented as an example of a balanced system that achieves high performance through optimized interconnects and packaging density. A thought experiment proposes integrating significant solid state storage at each node to create an "active storage" machine based on Blue Gene architecture.
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Heiko Joerg Schick
The document discusses challenges faced with application specific supercomputer design. It provides an example of QPACE, a supercomputer designed for quantum chromodynamics (QCD) computations. Key challenges discussed include data ordering issues when using InfiniBand networking that could cause computations to use invalid data if ordering of writes to memory was not enforced. Ensuring proper data ordering is important to avoid software consuming data before it is valid.
Petascale Analytics - The World of Big Data Requires Big AnalyticsHeiko Joerg Schick
The document discusses big data and analytics technologies. It describes how new technologies like Hadoop and MapReduce enable processing of extremely large datasets. It also discusses future technologies like exascale computing and storage class memory that will be needed to manage increasing data volumes and support real-time analytics.
Dynamic Resource Allocation Algorithm using ContainersIRJET Journal
1) The document proposes a dynamic resource allocation algorithm using containers to optimize resource utilization in server farms.
2) It uses Docker to deploy applications in lightweight containers instead of virtual machines to reduce overhead. A node selection algorithm uses fuzzy logic to determine the most suitable node for container deployment based on resource availability and workload.
3) The proposed approach is tested on a small cluster using Docker, Hadoop and the node selection algorithm to process queries. Results show increased processing speed and better resource utilization compared to traditional virtualization methods.
This is a presentation by Prof. Anne Elster at the International Workshop on Open Source Supercomputing held in conjunction with the 2017 ISC High Performance Computing Conference.
In this deck from the 2017 MVAPICH User Group, Adam Moody from Lawrence Livermore National Laboratory presents: MVAPICH: How a Bunch of Buckeyes Crack Tough Nuts.
"High-performance computing is being applied to solve the world's most daunting problems, including researching climate change, studying fusion physics, and curing cancer. MPI is a key component in this work, and as such, the MVAPICH team plays a critical role in these efforts. In this talk, I will discuss recent science that MVAPICH has enabled and describe future research that is planned. I will detail how the MVAPICH team has responded to address past problems and list the requirements that future work will demand."
Watch the video: https://wp.me/p3RLHQ-hp6
This document discusses the concept of a Science DMZ, which consists of three key components: 1) a dedicated "friction-free" network path with high-performance networking devices located near the site perimeter to facilitate science data transfer, 2) dedicated high-performance data transfer nodes optimized for data transfer tools, and 3) a performance measurement/test node. It contrasts this approach with the typical ad-hoc deployment of a data transfer node wherever space allows, which often fails to provide necessary performance. Details of an example Science DMZ deployment at Lawrence Berkeley National Laboratory are provided.
Towards a Lightweight Multi-Cloud DSL for Elastic and Transferable Cloud-nati...Nane Kratzke
The document discusses a proposed lightweight multi-cloud domain-specific language (DSL) for defining elastic and transferable cloud-native applications. It begins by outlining the research context and motivation to avoid vendor lock-in and make applications portable across different cloud infrastructures. The presentation then describes requirements for a cloud programming language, including supporting containerized deployments, application scaling, lightweight definitions, multi-cloud operations, and infrastructure independence. It proposes a core DSL model and shows how it can be made platform agnostic. An evaluation demonstrates deploying an application to different clouds and runtime environments and transferring it between infrastructures. The DSL is found to fulfill the intended requirements within the limitations of its scope.
Scalable Deep Learning in ExtremeEarth-phiweek19ExtremeEarth
This document summarizes a presentation about scalable deep learning techniques for analyzing Copernicus Earth observation data using the Hopsworks platform. The presentation discusses Hopsworks' end-to-end machine learning pipelines, feature engineering capabilities, distributed deep learning techniques like data parallel training, and applications of these techniques to challenges in classifying satellite imagery like sea ice mapping. Deep learning architectures, preprocessing steps, and distributed training methods are highlighted as areas of ongoing work and improvement for analyzing large volumes of remote sensing data on Hopsworks.
Blue Waters and Resource Management - Now and in the Futureinside-BigData.com
In this presentation from Moabcon 2013, Bill Kramer from NCSA presents: Blue Waters and Resource Management - Now and in the Future.
Watch the video of this presentation: http://insidehpc.com/?p=36343
Benchmark Analysis of Multi-core Processor Memory Contention April 2009James McGalliard
This document summarizes benchmark testing of a cubed sphere climate modeling application on a multi-core cluster. The testing showed that using fewer cores per node improved performance. Runtime was reduced by 38% when using 2 cores per node instead of 8 cores. MPI performance and cache access times also degraded with increased core density per node. Overall, the results indicate that job scheduling should aim to use fewer cores per node to optimize runtime in multi-core environments where resource contention can occur.
This document discusses how data is increasingly dominating high performance computing workloads. It notes that while computing power doubles every two years, data storage and movement capabilities are not keeping pace. This is leading to a "data tsunami" as experiments and simulations generate terabytes of data per day. The document then summarizes Sun Microsystems' end-to-end infrastructure for data-centric HPC workflows, including their Lustre parallel storage system, unified storage, tape archives, high performance computing blades, and InfiniBand switches. It positions Sun as uniquely able to deliver an integrated solution from computation to long-term data retention to help users cope with the challenges posed by rapidly growing datasets.
The document discusses cloud computing and the AIST Super Cloud. It provides details on 3 common cloud platforms: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). It then describes AIST's transition from its Super Cluster to the Green Cloud and newer Super Cloud, including hardware specifications. Key cloud management platforms are outlined, including Rocks, Eucalyptus, OpenStack, and OpenNebula. The document focuses on AIST's adoption and use of cloud computing technologies for research.
4 TeraGrid Sites Have Focal Points:
SDSC – The Data Place
Large-scale and high-performance data analysis/handling
Every Cluster Node is Directly Attached to SAN
NCSA – The Compute Place
Large-scale, Large Flops computation
Argonne – The Viz place
Scalable Viz walls
Caltech – The Applications place
Data and flops for applications – Especially some of the GriPhyN Apps
Specific machine configurations reflect this
1) Scaling up data center networks (DCNs) requires new switching technologies as hyperscale DCNs continue growing dramatically in size and traffic.
2) Optical switching technologies such as optical time-slot switching show potential for deployments in hybrid optical/electrical DCNs by providing higher switching capacity and bandwidth than electrical switches alone.
3) The University of Bristol has explored optical time-slot switching and its scheduling algorithms, demonstrating SDN control of prototype optical switches for DCN virtualization.
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
Google’s TensorFlow is one of the most popular deep learning (DL) frameworks. In distributed TensorFlow, gradient updates are a critical step governing the total model training time. These updates incur a massive volume of data transfer over the network.
In this talk, we first present a thorough analysis of the communication patterns in distributed TensorFlow. Then we propose a unified way of achieving high performance through enhancing the gRPC runtime with Remote Direct Memory Access (RDMA) technology on InfiniBand and RoCE. Through our proposed RDMA-gRPC design, TensorFlow only needs to run over the gRPC channel and gets the optimal performance. Our design includes advanced features such as message pipelining, message coalescing, zero-copy transmission, etc. The performance evaluations show that our proposed design can significantly speed up gRPC throughput by up to 1.5x compared to the default gRPC design. By integrating our RDMA-gRPC with TensorFlow, we are able to achieve up to 35% performance improvement for TensorFlow training with CNN models.
Speakers
Dhabaleswar K (DK) Panda, Professor and University Distinguished Scholar, The Ohio State University
Xiaoyi Lu, Research Scientist, The Ohio State University
NERSC is the production high-performance computing (HPC) center for the United States Department of Energy (DOE) Office of Science. The center supports over 6,000 users in 600 projects, using a variety of applications in materials science, chemistry, biology, astrophysics, high energy physics, climate science, fusion science, and more.
NERSC deployed the Cori system on over 9,000 Intel® Xeon Phi™ processors. This session describes the optimization strategy for porting codes that target traditional manycore architectures to the processors. We also discuss highlights and lessons learned from the optimization process on 20 applications associated with the NERSC Exascale Science Application Program (NESAP).
The document discusses accelerating science discovery with AI inference-as-a-service. It describes showcases using this approach for high energy physics and gravitational wave experiments. It outlines the vision of the A3D3 institute to unite domain scientists, computer scientists, and engineers to achieve real-time AI and transform science. Examples are provided of using AI inference-as-a-service to accelerate workflows for CMS, ProtoDUNE, LIGO, and other experiments.
In this video from the Argonne Training Program on Extreme-Scale Computing 2019, Jeffrey Vetter from ORNL presents: The Coming Age of Extreme Heterogeneity.
"In this talk, I'm going to talk about the high-level trends guiding our industry. Moore’s Law as we know it is definitely ending for either economic or technical reasons by 2025. Our community must aggressively explore emerging technologies now!"
Watch the video: https://wp.me/p3RLHQ-lic
Learn more: https://ft.ornl.gov/~vetter/
and
https://extremecomputingtraining.anl.gov/archive/atpesc-2019/agenda-2019/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The SKA Project - The World's Largest Streaming Data Processorinside-BigData.com
In this presentation from the 2014 HPC Advisory Council Europe Conference, Paul Calleja from University of Cambridge presents: The SKA Project - The World's Largest Streaming Data Processor.
"The Square Kilometre Array Design Studies is an international effort to investigate and develop technologies which will enable us to build an enormous radio astronomy telescope with a million square meters of collecting area."
Watch the video presentation: http://wp.me/p3RLHQ-cot
El Barcelona Supercomputing Center (BSC) fue establecido en 2005 y alberga el MareNostrum, uno de los superordenadores más potentes de España. Somos el centro pionero de la supercomputación en España. Nuestra especialidad es la computación de altas prestaciones - también conocida como HPC o High Performance Computing- y nuestra misión es doble: ofrecer infraestructuras y servicio de supercomputación a los científicos españoles y europeos, y generar conocimiento y tecnología para transferirlos a la sociedad. Somos Centro de Excelencia Severo Ochoa, miembros de primer nivel de la infraestructura de investigación europea PRACE (Partnership for Advanced Computing in Europe), y gestionamos la Red Española de Supercomputación (RES). Como centro de investigación, contamos con más de 456 expertos de 45 países, organizados en cuatro grandes áreas de investigación: Ciencias de la computación, Ciencias de la vida, Ciencias de la tierra y aplicaciones computacionales en ciencia e ingeniería.
Linac Coherent Light Source (LCLS) Data Transfer Requirementsinside-BigData.com
In this deck from the Stanford HPC Conference, Les Cottrell from the SLAC National Accelerator Laboratory, at Stanford University presents: Linac Coherent Light Source (LCLS) Data Transfer Requirements.
"Funded by the U.S. Department of Energy (DOE) the LCLS is the world’s first hard X-ray free-electron laser. Its strobe-like pulses are just a few millionths of a billionth of a second long, and a billion times brighter than previous X-ray sources. Scientists use LCLS to take crisp pictures of atomic motions, watch chemical reactions unfold, probe the properties of materials and explore fundamental processes in living things.
Its performance to date, over the first few years of operation, has already provided a breathtaking array of world-leading results, published in the most prestigious academic journals and has inspired other XFEL facilities to be commissioned around the world.
LCLS-II will build from the success of LCLS to ensure that the U.S. maintains a world-leading capability for advanced research in chemistry, materials, biology and energy. It is planned to see first light in 2020.
LCLS-II will provide a major jump in capability – moving from 120 pulses per second to 1 million pulses per second. This will enable researchers to perform experiments in a wide range of fields that are now impossible. The unique capabilities of LCLS-II will yield a host of discoveries to advance technology, new energy solutions and our quality of life.
Analysis of the data will require transporting huge amounts of data from SLAC to supercomputers at other sites to provide near real-time analysis results and feedback to the experiments.
The talk will introduce LCLS and LCLS-II with a short video, discuss its data reduction, collection, data transfer needs and current progress in meeting these needs."
Watch the video: https://youtu.be/LkwwGh7YdPI
Learn more: https://www6.slac.stanford.edu/
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Design Considerations, Installation, and Commissioning of the RedRaider Cluster at the Texas Tech University
High Performance Computing Center
Outline of this talk
HPCC Staff and Students
Previous clusters
• History, Performance, usage Patterns, and Experience
Motivation for Upgrades
• Compute Capacity Goals
• Related Considerations
Installation and Benchmarks Conclusions and Q&A
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...zionsaint
John Hennessy gave a talk outlining the end of Moore's law and faster general purpose computing, and opportunities for a new golden age. He discussed how three key changes - the end of Dennard scaling, slowing Moore's law transistor gains, and architectural limitations - have converged to end the steady performance increases of the past. This marks the end of an era of stunning microprocessor progress. Domain specific architectures and languages that better match applications to tailored hardware designs provide new opportunities for more efficient computing. Research into cheaper hardware development, new technologies, and the co-evolution of domains, languages and architectures could enable a new golden age.
The document discusses the GreenDroid mobile application processor, which uses specialized "conservation cores" (c-cores) to execute frequently used portions of application code and reduce energy consumption by 11x compared to conventional designs. It achieves this by filling the "dark silicon" areas of chips with these automatically generated c-cores. The c-cores are highly efficient because they remove unnecessary structures like instruction decoding. This approach converts unused silicon into significant energy savings while maintaining performance.
Java Thread and Process Performance for Parallel Machine Learning on Multicor...Saliya Ekanayake
The growing use of Big Data frameworks on large machines highlights the importance of performance issues and the value of High Performance Computing (HPC) technology. This paper looks carefully at three major frameworks Spark, Flink and Message Passing Interface (MPI) both in scaling across nodes and internally over the many cores inside modern nodes.We focus on the special challenges of the Java Virtual Machine (JVM) using an Intel Haswell HPC cluster with 24 cores per node. Two parallel machine learning algorithms, K-Means clustering and Multidimensional Scaling (MDS) are used in our performance studies. We identify three major issues – thread models, affinity patterns, and communication mechanisms – as factors affecting performance by large factors and show how to optimize them so that Java can match the performance of traditional HPC languages like C. Further we suggest approaches that preserve the user interface and elegant dataflow approach of Flink and Spark but modify the runtime so that these Big Data frameworks can achieve excellent performance and realize the goals of HPCBig Data convergence.
Exploring emerging technologies in the HPC co-design spacejsvetter
This document discusses emerging technologies for high performance computing (HPC), focusing on heterogeneous computing and non-volatile memory. It provides an overview of HPC architectures past and present, highlighting the trend toward more heterogeneous systems using GPUs and other accelerators. The document discusses challenges for applications to adapt to these changing architectures. It also explores potential future technologies like 3D memory and discusses the Department of Energy's efforts in codesign centers to facilitate collaboration between application developers and emerging hardware.
MIG 5th Data Centre Summit 2016 PTS Presentation v1blewington
This document provides an overview and predictions for enterprise data center trends from 2020 onwards. It discusses how IT requirements are evolving more rapidly than traditional data center design timelines, with higher power densities and virtualization. This is driving trends towards software-defined infrastructure, hyperconverged systems, and new cooling solutions as average rack power increases. Data centers will need to adapt cooling methods to address densities reaching 10kW and beyond per rack. The pace of IT evolution poses challenges for data center operators to continuously keep up with changing power and infrastructure demands.
PLNOG 8: Ivan Pepelnjak - Data Center Fabrics - What Really Matters PROIDEA
The document discusses data center fabric architectures. It describes how traditional data center designs focus on north-south traffic but modern applications generate more east-west traffic between servers. New fabric architectures are needed to provide flexible workload placement and mobility. Common fabric approaches use leaf-spine Clos network designs with non-blocking switching fabrics to provide any-to-any connectivity between endpoints. Large-scale fabrics can be built today using existing switching equipment and protocols like ECMP routing rather than new technologies. The key is to keep layer 2 domains small and use overlay encapsulation for virtual networks.
This document discusses next-generation optical access networks and moving toward providing 10 Gbps connectivity everywhere. It outlines several key points:
1) It discusses the business and architectural issues with current networks and the need for a paradigm shift toward more flexible, dynamically reconfigurable networks.
2) It proposes an ultimate optical network architecture using a common infrastructure for access, metro, and backbone networks to gain statistical multiplexing benefits across different traffic patterns and usage.
3) It introduces a quantitative analysis framework using an extended equivalent circuit rate (ECR) metric to define and measure a requirement of "10 Gbps everywhere" in a quantifiable way for different network architectures.
This document proposes a framework for intelligently placing datacenters to optimize response time, availability, costs and emissions. It defines relevant parameters like costs, response time, consistency delay and availability. It formulates the placement problem as an optimization problem aiming to minimize costs while meeting constraints. The problem is solved using simulated annealing and linear programming. A tool is developed to automatically select datacenter locations based on this approach. Experimental results demonstrate millions of dollars can be saved through optimized placement.
Grid computing involves distributing computing resources across a network to tackle large problems. The Worldwide LHC Computing Grid (WLCG) was established to support the Large Hadron Collider (LHC) experiment, which produces around 15 petabytes of data annually. The WLCG uses a four-tiered model, with raw data stored at Tier-0 (CERN), copies distributed to Tier-1 data centers, computational resources provided by Tier-2 centers, and Tier-3 facilities providing additional analysis capabilities. This distributed model has proven effective in supporting the first year of LHC data collection and analysis through globally shared computing resources.
Similar to 40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility (20)
The document discusses the top 5 technologies that all organizations must understand: digital transformation, quantum computing, IoT, 5G, and AI/HPC. It provides an overview of each technology including opportunities and threats to organizations. The document emphasizes that understanding these emerging technologies is mandatory as the information revolution changes many aspects of life and business.
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
In this deck from IWOCL / SYCLcon 2020, Hal Finkel from Argonne National Laboratory presents: Preparing to program Aurora at Exascale - Early experiences and future directions.
"Argonne National Laboratory’s Leadership Computing Facility will be home to Aurora, our first exascale supercomputer. Aurora promises to take scientific computing to a whole new level, and scientists and engineers from many different fields will take advantage of Aurora’s unprecedented computational capabilities to push the boundaries of human knowledge. In addition, Aurora’s support for advanced machine-learning and big-data computations will enable scientific workflows incorporating these techniques along with traditional HPC algorithms. Programming the state-of-the-art hardware in Aurora will be accomplished using state-of-the-art programming models. Some of these models, such as OpenMP, are long-established in the HPC ecosystem. Other models, such as Intel’s oneAPI, based on SYCL, are relatively-new models constructed with the benefit of significant experience. Many applications will not use these models directly, but rather, will use C++ abstraction libraries such as Kokkos or RAJA. Python will also be a common entry point to high-performance capabilities. As we look toward the future, features in the C++ standard itself will become increasingly relevant for accessing the extreme parallelism of exascale platforms.
This presentation will summarize the experiences of our team as we prepare for Aurora, exploring how to port applications to Aurora’s architecture and programming models, and distilling the challenges and best practices we’ve developed to date. oneAPI/SYCL and OpenMP are both critical models in these efforts, and while the ecosystem for Aurora has yet to mature, we’ve already had a great deal of success. Importantly, we are not passive recipients of programming models developed by others. Our team works not only with vendor-provided compilers and tools, but also develops improved open-source LLVM-based technologies that feed both open-source and vendor-provided capabilities. In addition, we actively participate in the standardization of OpenMP, SYCL, and C++. To conclude, I’ll share our thoughts on how these models can best develop in the future to support exascale-class systems."
Watch the video: https://wp.me/p3RLHQ-lPT
Learn more: https://www.iwocl.org/iwocl-2020/conference-program/
and
https://www.anl.gov/topic/aurora
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck, Greg Wahl from Advantech presents: Transforming Private 5G Networks.
Advantech Networks & Communications Group is driving innovation in next-generation network solutions with their High Performance Servers. We provide business critical hardware to the world's leading telecom and networking equipment manufacturers with both standard and customized products. Our High Performance Servers are highly configurable platforms designed to balance the best in x86 server-class processing performance with maximum I/O and offload density. The systems are cost effective, highly available and optimized to meet next generation networking and media processing needs.
“Advantech’s Networks and Communication Group has been both an innovator and trusted enabling partner in the telecommunications and network security markets for over a decade, designing and manufacturing products for OEMs that accelerate their network platform evolution and time to market.” Said Advantech Vice President of Networks & Communications Group, Ween Niu. “In the new IP Infrastructure era, we will be expanding our expertise in Software Defined Networking (SDN) and Network Function Virtualization (NFV), two of the essential conduits to 5G infrastructure agility making networks easier to install, secure, automate and manage in a cloud-based infrastructure.”
In addition to innovation in air interface technologies and architecture extensions, 5G will also need a new generation of network computing platforms to run the emerging software defined infrastructure, one that provides greater topology flexibility, essential to deliver on the promises of high availability, high coverage, low latency and high bandwidth connections. This will open up new parallel industry opportunities through dedicated 5G network slices reserved for specific industries dedicated to video traffic, augmented reality, IoT, connected cars etc. 5G unlocks many new doors and one of the keys to its enablement lies in the elasticity and flexibility of the underlying infrastructure.
Advantech’s corporate vision is to enable an intelligent planet. The company is a global leader in the fields of IoT intelligent systems and embedded platforms. To embrace the trends of IoT, big data, and artificial intelligence, Advantech promotes IoT hardware and software solutions with the Edge Intelligence WISE-PaaS core to assist business partners and clients in connecting their industrial chains. Advantech is also working with business partners to co-create business ecosystems that accelerate the goal of industrial intelligence."
Watch the video: https://wp.me/p3RLHQ-lPQ
* Company website: https://www.advantech.com/
* Solution page: https://www2.advantech.com/nc/newsletter/NCG/SKY/benefits.html
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
In this deck from the Stanford HPC Conference, Katie Lewis from Lawrence Livermore National Laboratory presents: The Incorporation of Machine Learning into Scientific Simulations at Lawrence Livermore National Laboratory.
"Scientific simulations have driven computing at Lawrence Livermore National Laboratory (LLNL) for decades. During that time, we have seen significant changes in hardware, tools, and algorithms. Today, data science, including machine learning, is one of the fastest growing areas of computing, and LLNL is investing in hardware, applications, and algorithms in this space. While the use of simulations to focus and understand experiments is well accepted in our community, machine learning brings new challenges that need to be addressed. I will explore applications for machine learning in scientific simulations that are showing promising results and further investigation that is needed to better understand its usefulness."
Watch the video: https://youtu.be/NVwmvCWpZ6Y
Learn more: https://computing.llnl.gov/research-area/machine-learning
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
In this deck from the Stanford HPC Conference, DK Panda from Ohio State University presents: How to Achieve High-Performance, Scalable and Distributed DNN Training on Modern HPC Systems?
"This talk will start with an overview of challenges being faced by the AI community to achieve high-performance, scalable and distributed DNN training on Modern HPC systems with both scale-up and scale-out strategies. After that, the talk will focus on a range of solutions being carried out in my group to address these challenges. The solutions will include: 1) MPI-driven Deep Learning, 2) Co-designing Deep Learning Stacks with High-Performance MPI, 3) Out-of- core DNN training, and 4) Hybrid (Data and Model) parallelism. Case studies to accelerate DNN training with popular frameworks like TensorFlow, PyTorch, MXNet and Caffe on modern HPC systems will be presented."
Watch the video: https://youtu.be/LeUNoKZVuwQ
Learn more: http://web.cse.ohio-state.edu/~panda.2/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
In this deck from the Stanford HPC Conference, Nick Nystrom and Paola Buitrago provide an update from the Pittsburgh Supercomputing Center.
Nick Nystrom is Chief Scientist at the Pittsburgh Supercomputing Center (PSC). Nick is architect and PI for Bridges, PSC's flagship system that successfully pioneered the convergence of HPC, AI, and Big Data. He is also PI for the NIH Human Biomolecular Atlas Program’s HIVE Infrastructure Component and co-PI for projects that bring emerging AI technologies to research (Open Compass), apply machine learning to biomedical data for breast and lung cancer (Big Data for Better Health), and identify causal relationships in biomedical big data (the Center for Causal Discovery, an NIH Big Data to Knowledge Center of Excellence). His current research interests include hardware and software architecture, applications of machine learning to multimodal data (particularly for the life sciences) and to enhance simulation, and graph analytics.
Watch the video: https://youtu.be/LWEU1L1o7yY
Learn more: https://www.psc.edu/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The document discusses using systems intelligence and artificial intelligence/neural networks to enhance semiconductor electronic design automation (EDA) workflows by collecting telemetry data from EDA jobs and infrastructure and analyzing it using complex event processing, machine learning models, and messaging substrates to provide insights that could optimize EDA pipelines and infrastructure. The approach aims to allow both internal and external augmentation of EDA processes and environments through unsupervised and incremental learning.
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
In this deck from the Stanford HPC Conference, Nicole Xu from Stanford University describes how she transformed a common jellyfish into a bionic creature that is part animal and part machine.
"Animal locomotion and bioinspiration have the potential to expand the performance capabilities of robots, but current implementations are limited. Mechanical soft robots leverage engineered materials and are highly controllable, but these biomimetic robots consume more power than corresponding animal counterparts. Biological soft robots from a bottom-up approach offer advantages such as speed and controllability but are limited to survival in cell media. Instead, biohybrid robots that comprise live animals and self- contained microelectronic systems leverage the animals’ own metabolism to reduce power constraints and body as an natural scaffold with damage tolerance. We demonstrate that by integrating onboard microelectronics into live jellyfish, we can enhance propulsion up to threefold, using only 10 mW of external power input to the microelectronics and at only a twofold increase in cost of transport to the animal. This robotic system uses 10 to 1000 times less external power per mass than existing swimming robots in literature and can be used in future applications for ocean monitoring to track environmental changes."
Watch the video: https://youtu.be/HrmJFyvInj8
Learn more: https://sanfrancisco.cbslocal.com/2020/02/05/stanford-research-project-common-jellyfish-bionic-sea-creatures/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the Stanford HPC Conference, Peter Dueben from the European Centre for Medium-Range Weather Forecasts (ECMWF) presents: Machine Learning for Weather Forecasts.
"I will present recent studies that use deep learning to learn the equations of motion of the atmosphere, to emulate model components of weather forecast models and to enhance usability of weather forecasts. I will than talk about the main challenges for the application of deep learning in cutting-edge weather forecasts and suggest approaches to improve usability in the future."
Peter is contributing to the development and optimization of weather and climate models for modern supercomputers. He is focusing on a better understanding of model error and model uncertainty, on the use of reduced numerical precision that is optimised for a given level of model error, on global cloud- resolving simulations with ECMWF's forecast model, and the use of machine learning, and in particular deep learning, to improve the workflow and predictions. Peter has graduated in Physics and wrote his PhD thesis at the Max Planck Institute for Meteorology in Germany. He worked as Postdoc with Tim Palmer at the University of Oxford and has taken up a position as University Research Fellow of the Royal Society at the European Centre for Medium-Range Weather Forecasts (ECMWF) in 2017.
Watch the video: https://youtu.be/ks3fkRj8Iqc
Learn more: https://www.ecmwf.int/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck, Gilad Shainer from the HPC AI Advisory Council describes how this organization fosters innovation in the high performance computing community.
"The HPC-AI Advisory Council’s mission is to bridge the gap between high-performance computing (HPC) and Artificial Intelligence (AI) use and its potential, bring the beneficial capabilities of HPC and AI to new users for better research, education, innovation and product manufacturing, bring users the expertise needed to operate HPC and AI systems, provide application designers with the tools needed to enable parallel computing, and to strengthen the qualification and integration of HPC and AI system products."
Watch the video: https://wp.me/p3RLHQ-lNz
Learn more: http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Today RIKEN in Japan announced that the Fugaku supercomputer will be made available for research projects aimed to combat COVID-19.
"Fugaku is currently being installed and is scheduled to be available to the public in 2021. However, faced with the devastating disaster unfolding before our eyes, RIKEN and MEXT decided to make a portion of the computational resources of Fugaku available for COVID-19-related projects ahead of schedule while continuing the installation process.
Fugaku is being developed not only for the progress in science, but also to help build the society dubbed as the “Society 5.0” by the Japanese government, where all people will live safe and comfortable lives. The current initiative to fight against the novel coronavirus is driven by the philosophy behind the development of Fugaku."
Initial Projects
Exploring new drug candidates for COVID-19 by "Fugaku"
Yasushi Okuno, RIKEN / Kyoto University
Prediction of conformational dynamics of proteins on the surface of SARS-Cov-2 using Fugaku
Yuji Sugita, RIKEN
Simulation analysis of pandemic phenomena
Nobuyasu Ito, RIKEN
Fragment molecular orbital calculations for COVID-19 proteins
Yuji Mochizuki, Rikkyo University
In this deck from the Performance Optimisation and Productivity group, Lubomir Riha from IT4Innovations presents: Energy Efficient Computing using Dynamic Tuning.
"We now live in a world of power-constrained architectures and systems and power consumption represents a significant cost factor in the overall HPC system economy. For these reasons, in recent years researchers, supercomputing centers and major vendors have developed new tools and methodologies to measure and optimize the energy consumption of large-scale high performance system installations. Due to the link between energy consumption, power consumption and execution time of an application executed by the final user, it is important for these tools and the methodology used to consider all these aspects, empowering the final user and the system administrator with the capability of finding the best configuration given different high level objectives.
This webinar focused on tools designed to improve the energy-efficiency of HPC applications using a methodology of dynamic tuning of HPC applications, developed under the H2020 READEX project. The READEX methodology has been designed for exploiting the dynamic behaviour of software. At design time, different runtime situations (RTS) are detected and optimized system configurations are determined. RTSs with the same configuration are grouped into scenarios, forming the tuning model. At runtime, the tuning model is used to switch system configurations dynamically.
The MERIC tool, that implements the READEX methodology, is presented. It supports manual or binary instrumentation of the analysed applications to simplify the analysis. This instrumentation is used to identify and annotate the significant regions in the HPC application. Automatic binary instrumentation annotates regions with significant runtime. Manual instrumentation, which can be combined with automatic, allows code developer to annotate regions of particular interest."
Watch the video: https://wp.me/p3RLHQ-lJP
Learn more: https://pop-coe.eu/blog/14th-pop-webinar-energy-efficient-computing-using-dynamic-tuning
and
https://code.it4i.cz/vys0053/meric
Sign up for our insideHPC Newsletter: http://insidehpc.com/newslett
The document discusses how DDN A3I storage solutions and Nvidia's SuperPOD platform can enable HPC at scale. It provides details on DDN's A3I appliances that are optimized for AI and deep learning workloads and validated for Nvidia's DGX-2 SuperPOD reference architecture. The solutions are said to deliver the fastest performance, effortless scaling, reliability and flexibility for data-intensive workloads.
In this deck, Paul Isaacs from Linaro presents: State of ARM-based HPC. This talk provides an overview of applications and infrastructure services successfully ported to Aarch64 and benefiting from scale.
"With its debut on the TOP500, the 125,000-core Astra supercomputer at New Mexico’s Sandia Labs uses Cavium ThunderX2 chips to mark Arm’s entry into the petascale world. In Japan, the Fujitsu A64FX Arm-based CPU in the pending Fugaku supercomputer has been optimized to achieve high-level, real-world application performance, anticipating up to one hundred times the application execution performance of the K computer. K was the first computer to top 10 petaflops in 2011."
Watch the video: https://wp.me/p3RLHQ-lIT
Learn more: https://www.linaro.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
Today Xilinx announced Versal Premium, the third series in the Versal ACAP portfolio. The Versal Premium series features highly integrated, networked and power-optimized cores and the industry’s highest bandwidth and compute density on an adaptable platform. Versal Premium is designed for the highest bandwidth networks operating in thermally and spatially constrained environments, as well as for cloud providers who need scalable, adaptable application acceleration.
Versal is the industry’s first adaptive compute acceleration platform (ACAP), a revolutionary new category of heterogeneous compute devices with capabilities that far exceed those of conventional silicon architectures. Developed on TSMC’s 7-nanometer process technology, Versal Premium combines software programmability with dynamically configurable hardware acceleration and pre-engineered connectivity and security features to enable a faster time-to- market. The Versal Premium series delivers up to 3X higher throughput compared to current generation FPGAs, with built-in Ethernet, Interlaken, and cryptographic engines that enable fast and secure networks. The series doubles the compute density of currently deployed mainstream FPGAs and provides the adaptability to keep pace with increasingly diverse and evolving cloud and networking workloads.
Learn more: https://insidehpc.com/2020/03/xilinx-announces-versal-premium-acap-for-network-and-cloud-acceleration/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
In this video from the Rice Oil & Gas Conference, Chin Fang from Zettar presents: Moving Massive Amounts of Data across Any Distance Efficiently.
The objective of this talk is to present two on-going projects aiming at improving and ensuring highly efficient bulk transferring or streaming of massive amounts of data over digital connections across any distance. It examines the current state of the art, a few very common misconceptions, the differences among the three major type of data movement solutions, a current initiative attempting to improve the data movement efficiency from the ground up, and another multi-stage project that shows how to conduct long distance large scale data movement at speed and scale internationally. Both projects have real world motivations, e.g. the ambitious data transfer requirements of Linac Coherent Light Source II (LCLS-II) [1], a premier preparation project of the U.S. DOE Exascale Computing Initiative (ECI) [2]. Their immediate goals are described and explained, together with the solution used for each. Findings and early results are reported. Possible future works are outlined.
Watch the video: https://wp.me/p3RLHQ-lBX
Learn more: https://www.zettar.com/
and
https://rice2020oghpc.rice.edu/program-2/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the Rice Oil & Gas Conference, Bradley McCredie from AMD presents: Scaling TCO in a Post Moore's Law Era.
"While foundries bravely drive forward to overcome the technical and economic challenges posed by scaling to 5nm and beyond, Moore’s law alone can provide only a fraction of the performance / watt and performance / dollar gains needed to satisfy the demands of today’s high performance computing and artificial intelligence applications. To close the gap, multiple strategies are required. First, new levels of innovation and design efficiency will supplement technology gains to continue to deliver meaningful improvements in SoC performance. Second, heterogenous compute architectures will create x-factor increases of performance efficiency for the most critical applications. Finally, open software frameworks, APIs, and toolsets will enable broad ecosystems of application level innovation."
Watch the video:
Learn more: http://amd.com
and
https://rice2020oghpc.rice.edu/program-2/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
In this deck from the ECSS Symposium, Abe Stern from NVIDIA presents: CUDA-Python and RAPIDS for blazing fast scientific computing.
"We will introduce Numba and RAPIDS for GPU programming in Python. Numba allows us to write just-in-time compiled CUDA code in Python, giving us easy access to the power of GPUs from a powerful high-level language. RAPIDS is a suite of tools with a Python interface for machine learning and dataframe operations. Together, Numba and RAPIDS represent a potent set of tools for rapid prototyping, development, and analysis for scientific computing. We will cover the basics of each library and go over simple examples to get users started. Finally, we will briefly highlight several other relevant libraries for GPU programming."
Watch the video: https://wp.me/p3RLHQ-lvu
Learn more: https://developer.nvidia.com/rapids
and
https://www.xsede.org/for-users/ecss/ecss-symposium
Sign up for our insideHPC Newsletter: http://insidehp.com/newsletter
In this deck from FOSDEM 2020, Colin Sauze from Aberystwyth University describes the development of a RaspberryPi cluster for teaching an introduction to HPC.
"The motivation for this was to overcome four key problems faced by new HPC users:
* The availability of a real HPC system and the effect running training courses can have on the real system, conversely the availability of spare resources on the real system can cause problems for the training course.
* A fear of using a large and expensive HPC system for the first time and worries that doing something wrong might damage the system.
* That HPC systems are very abstract systems sitting in data centres that users never see, it is difficult for them to understand exactly what it is they are using.
* That new users fail to understand resource limitations, in part because of the vast resources in modern HPC systems a lot of mistakes can be made before running out of resources. A more resource constrained system makes it easier to understand this.
The talk will also discuss some of the technical challenges in deploying an HPC environment to a Raspberry Pi and attempts to keep that environment as close to a "real" HPC as possible. The issue to trying to automate the installation process will also be covered."
Learn more: https://github.com/colinsauze/pi_cluster
and
https://fosdem.org/2020/schedule/events/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from ATPESC 2019, Ken Raffenetti from Argonne presents an overview of HPC interconnects.
"The Argonne Training Program on Extreme-Scale Computing (ATPESC) provides intensive, two-week training on the key skills, approaches, and tools to design, implement, and execute computational science and engineering applications on current high-end computing systems and the leadership-class computing systems of the future."
Watch the video: https://wp.me/p3RLHQ-luc
Learn more: https://extremecomputingtraining.anl.gov/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Performance Budgets for the Real World by Tammy EvertsScyllaDB
Performance budgets have been around for more than ten years. Over those years, we’ve learned a lot about what works, what doesn’t, and what we need to improve. In this session, Tammy revisits old assumptions about performance budgets and offers some new best practices. Topics include:
• Understanding performance budgets vs. performance goals
• Aligning budgets with user experience
• Pros and cons of Core Web Vitals
• How to stay on top of your budgets to fight regressions
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsMydbops
This presentation, delivered at the Postgres Bangalore (PGBLR) Meetup-2 on June 29th, 2024, dives deep into connection pooling for PostgreSQL databases. Aakash M, a PostgreSQL Tech Lead at Mydbops, explores the challenges of managing numerous connections and explains how connection pooling optimizes performance and resource utilization.
Key Takeaways:
* Understand why connection pooling is essential for high-traffic applications
* Explore various connection poolers available for PostgreSQL, including pgbouncer
* Learn the configuration options and functionalities of pgbouncer
* Discover best practices for monitoring and troubleshooting connection pooling setups
* Gain insights into real-world use cases and considerations for production environments
This presentation is ideal for:
* Database administrators (DBAs)
* Developers working with PostgreSQL
* DevOps engineers
* Anyone interested in optimizing PostgreSQL performance
Contact info@mydbops.com for PostgreSQL Managed, Consulting and Remote DBA Services
this resume for sadika shaikh bca studentSadikaShaikh7
I am a dedicated BCA student with a strong foundation in web technologies, including PHP and MySQL. I have hands-on experience in Java and Python, and a solid understanding of data structures. My technical skills are complemented by my ability to learn quickly and adapt to new challenges in the ever-evolving field of computer science.
An invited talk given by Mark Billinghurst on Research Directions for Cross Reality Interfaces. This was given on July 2nd 2024 as part of the 2024 Summer School on Cross Reality in Hagenberg, Austria (July 1st - 7th)
Coordinate Systems in FME 101 - Webinar SlidesSafe Software
If you’ve ever had to analyze a map or GPS data, chances are you’ve encountered and even worked with coordinate systems. As historical data continually updates through GPS, understanding coordinate systems is increasingly crucial. However, not everyone knows why they exist or how to effectively use them for data-driven insights.
During this webinar, you’ll learn exactly what coordinate systems are and how you can use FME to maintain and transform your data’s coordinate systems in an easy-to-digest way, accurately representing the geographical space that it exists within. During this webinar, you will have the chance to:
- Enhance Your Understanding: Gain a clear overview of what coordinate systems are and their value
- Learn Practical Applications: Why we need datams and projections, plus units between coordinate systems
- Maximize with FME: Understand how FME handles coordinate systems, including a brief summary of the 3 main reprojectors
- Custom Coordinate Systems: Learn how to work with FME and coordinate systems beyond what is natively supported
- Look Ahead: Gain insights into where FME is headed with coordinate systems in the future
Don’t miss the opportunity to improve the value you receive from your coordinate system data, ultimately allowing you to streamline your data analysis and maximize your time. See you there!
Sustainability requires ingenuity and stewardship. Did you know Pigging Solutions pigging systems help you achieve your sustainable manufacturing goals AND provide rapid return on investment.
How? Our systems recover over 99% of product in transfer piping. Recovering trapped product from transfer lines that would otherwise become flush-waste, means you can increase batch yields and eliminate flush waste. From raw materials to finished product, if you can pump it, we can pig it.
How RPA Help in the Transportation and Logistics Industry.pptxSynapseIndia
Revolutionize your transportation processes with our cutting-edge RPA software. Automate repetitive tasks, reduce costs, and enhance efficiency in the logistics sector with our advanced solutions.
In this follow-up session on knowledge and prompt engineering, we will explore structured prompting, chain of thought prompting, iterative prompting, prompt optimization, emotional language prompts, and the inclusion of user signals and industry-specific data to enhance LLM performance.
Join EIS Founder & CEO Seth Earley and special guest Nick Usborne, Copywriter, Trainer, and Speaker, as they delve into these methodologies to improve AI-driven knowledge processes for employees and customers alike.
AC Atlassian Coimbatore Session Slides( 22/06/2024)apoorva2579
This is the combined Sessions of ACE Atlassian Coimbatore event happened on 22nd June 2024
The session order is as follows:
1.AI and future of help desk by Rajesh Shanmugam
2. Harnessing the power of GenAI for your business by Siddharth
3. Fallacies of GenAI by Raju Kandaswamy
Navigating Post-Quantum Blockchain: Resilient Cryptography in Quantum Threatsanupriti
In the rapidly evolving landscape of blockchain technology, the advent of quantum computing poses unprecedented challenges to traditional cryptographic methods. As quantum computing capabilities advance, the vulnerabilities of current cryptographic standards become increasingly apparent.
This presentation, "Navigating Post-Quantum Blockchain: Resilient Cryptography in Quantum Threats," explores the intersection of blockchain technology and quantum computing. It delves into the urgent need for resilient cryptographic solutions that can withstand the computational power of quantum adversaries.
Key topics covered include:
An overview of quantum computing and its implications for blockchain security.
Current cryptographic standards and their vulnerabilities in the face of quantum threats.
Emerging post-quantum cryptographic algorithms and their applicability to blockchain systems.
Case studies and real-world implications of quantum-resistant blockchain implementations.
Strategies for integrating post-quantum cryptography into existing blockchain frameworks.
Join us as we navigate the complexities of securing blockchain networks in a quantum-enabled future. Gain insights into the latest advancements and best practices for safeguarding data integrity and privacy in the era of quantum threats.
Blockchain and Cyber Defense Strategies in new genre timesanupriti
Explore robust defense strategies at the intersection of blockchain technology and cybersecurity. This presentation delves into proactive measures and innovative approaches to safeguarding blockchain networks against evolving cyber threats. Discover how secure blockchain implementations can enhance resilience, protect data integrity, and ensure trust in digital transactions. Gain insights into cutting-edge security protocols and best practices essential for mitigating risks in the blockchain ecosystem.
UiPath Community Day Kraków: Devs4Devs ConferenceUiPathCommunity
We are honored to launch and host this event for our UiPath Polish Community, with the help of our partners - Proservartner!
We certainly hope we have managed to spike your interest in the subjects to be presented and the incredible networking opportunities at hand, too!
Check out our proposed agenda below 👇👇
08:30 ☕ Welcome coffee (30')
09:00 Opening note/ Intro to UiPath Community (10')
Cristina Vidu, Global Manager, Marketing Community @UiPath
Dawid Kot, Digital Transformation Lead @Proservartner
09:10 Cloud migration - Proservartner & DOVISTA case study (30')
Marcin Drozdowski, Automation CoE Manager @DOVISTA
Pawel Kamiński, RPA developer @DOVISTA
Mikolaj Zielinski, UiPath MVP, Senior Solutions Engineer @Proservartner
09:40 From bottlenecks to breakthroughs: Citizen Development in action (25')
Pawel Poplawski, Director, Improvement and Automation @McCormick & Company
Michał Cieślak, Senior Manager, Automation Programs @McCormick & Company
10:05 Next-level bots: API integration in UiPath Studio (30')
Mikolaj Zielinski, UiPath MVP, Senior Solutions Engineer @Proservartner
10:35 ☕ Coffee Break (15')
10:50 Document Understanding with my RPA Companion (45')
Ewa Gruszka, Enterprise Sales Specialist, AI & ML @UiPath
11:35 Power up your Robots: GenAI and GPT in REFramework (45')
Krzysztof Karaszewski, Global RPA Product Manager
12:20 🍕 Lunch Break (1hr)
13:20 From Concept to Quality: UiPath Test Suite for AI-powered Knowledge Bots (30')
Kamil Miśko, UiPath MVP, Senior RPA Developer @Zurich Insurance
13:50 Communications Mining - focus on AI capabilities (30')
Thomasz Wierzbicki, Business Analyst @Office Samurai
14:20 Polish MVP panel: Insights on MVP award achievements and career profiling
Quantum Communications Q&A with Gemini LLM. These are based on Shannon's Noisy channel Theorem and offers how the classical theory applies to the quantum world.
AI_dev Europe 2024 - From OpenAI to Opensource AIRaphaël Semeteys
Navigating Between Commercial Ownership and Collaborative Openness
This presentation explores the evolution of generative AI, highlighting the trajectories of various models such as GPT-4, and examining the dynamics between commercial interests and the ethics of open collaboration. We offer an in-depth analysis of the levels of openness of different language models, assessing various components and aspects, and exploring how the (de)centralization of computing power and technology could shape the future of AI research and development. Additionally, we explore concrete examples like LLaMA and its descendants, as well as other open and collaborative projects, which illustrate the diversity and creativity in the field, while navigating the complex waters of intellectual property and licensing.
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
1. 40 powers of 10
Simulating the Universe from
quarks to galaxy clusters with the
DiRAC HPC Facility
Dr Mark Wilkinson
University of Leicester
Director, DiRAC HPC Facility
DiRAC
2. DiRAC
Distributed HPC Facility for theoretical astrophysics,
particle physics, cosmology and nuclear physics
Combined:
• ~90,000 cores
• ~5 Pflop/s
• >10 PB storage
Extreme Scaling (Edinburgh)
Data Intensive (Cambridge)
Data Intensive
(Leicester)
Memory Intensive (Durham)
3. The DiRAC Facility, a brief history
2009: DiRAC-1: systems installed at 13 host sites
Nov 2011: DiRAC-2 awarded £15M capital by BIS
- 5 systems at 4 sites - Cambridge, Durham, Edinburgh, Leicester
- Procurement completed in 100 days
Dec 2012: DiRAC-2 operations begin
- Systems full within 1 week - usage >90%
- Access via international peer-review process
- Free to use for STFC-funded researchers
April 2017: DiRAC-2.5 operations begin
- 3 services: Extreme Scaling, Memory Intensive, Data Intensive
April 2018: DiRAC-2.5x
- interim funding to replace 2012 hardware and support 2018/19
science programme
Dec 2018: DiRAC-2.5y
- BEIS-funded stop-gap 2x upgrade of all DiRAC services
4. • Direct numerical simulation and modelling is core theory research activity
• HPC systems are main scientific instruments for theory research
• Computational requirements of models increasing due to
• Increased resolution: running models with existing physics at finer
scales.
• Increased complexity: introducing new physics into models to reflect
progress in theoretical understanding; often needed to match resolution
• Coupling of models: multi-physics, multi-scale modelling;
• Quantification of modelling uncertainty using large ensembles of
simulations to provide robust statistics
• Constant process of refining and re-defining our tools
• Growing requirement for simulations and modelling concurrent with
observations so that models evolve in line with data acquired
• Observational facilities need access to significant local computing
capabilities as well as option to burst out to the larger, national facilities.
Computing in theory research
5. The DiRAC approach to service design
Science
case
Workflow
assessment
Technical
case
Peer-reviewed, scientific justification for resources
Peer-reviewed, high-level technical specifications
Technical
design
Co-design with
industry partners
Individual service specifications
6. Diverse science cases require
heterogenous architectures
Extreme Scaling
“Tesseract”
(Edinburgh)
Memory Intensive
“COSMA”
(Durham)
Data Intensive
“DIaL” and “CSD3”
(Leicester & Cambridge)
230 TB RAM to
support largest
cosmological
simulations
Heterogeneous
architecture to support
complex simulation and
modelling workflows
2 Pflop/s to
support largest
lattice-QCD
simulations
DiRAC
7. scientific impact is highly significant, sustained and of world leading quality.
Figure 3-7: Facility Citations and Paper Record 2015-2018
3.5 WP5: Project Management
DiRAC Science Impact Analysis
• Refereed publications since 2012: > 950 papers; > 49000 citations
• Second-most cited paper in all astronomy in 2015
• 199 refereed papers in 2017
10. Inside the proton
Image: Arpad Horvath - Own work, CC BY-SA 2.5,
https://commons.wikimedia.org/w/index.php?curid=637353
quark
Strong interactions
carried by gluons
11. Understanding the strong force is key to testing the
Standard Modal of Particle Physics - binds quarks into
hadrons that we see in experiment
ATLAS@Large Hadron
Collider (LHC)
Connecting observed hadron
properties to those of quarks
requires full non-perturbative
treatment of Quantum
Chromodynamics - lattice QCD
DALI Run=16449 Evt=4055
PH
1cm|
IP
B
D
ππ
KK
K
π
e−
−
−
+
+
+s
s
Calculations of
hadron masses and
rates for simple weak
decays allow tests of
Standard Model
12. • L4
local volume (space + time)
• finite di↵erence operator 8 point stencil
• ⇠ 1
L of data references come from o↵ node
Scaling QCD sparse matrix requires interconnect bandwidth for halo exchange
Bnetwork ⇠
Bmemory
L
⇥R
where R is the reuse factor obtained for the stencil in caches
• Aim: Distribute 1004
datapoints over 104
nodes
Scalable QCD: sparse matrix PDE solver communications (Boyle et al.)
14. Grid QCD code
Design considerations
• Performance portable across multi and many core CPU’s
SIMD⌦OpenMP⌦MPI
• Performance portable to GPU’s
SIMT⌦offload⌦MPI
• N-dimensional cartesian arrays
• Multiple grids
• Data parallel C++ layer : Connection Machine inspired
Grid QCD code
(Boyle et al.)
15. • Relatively cheap node: high node count and scalability
• Improve price per core, bandwidth per core, reduce power
Tesseract performance per node vs nodes, volume
GF/spernode
0.0
150.0
300.0
450.0
600.0
Nodes
1 16 32 64 128 256
12^4 16^4 24^4
nodes (single switch) delivers bidirectional 25GB/s to every node (wirespeed)
2 nodes topology aware bidirectional 19GB/s
• 76% wirespeed using every link in system concurrently
DiRAC HPE ICE-XA hypercube network
• Edinburgh HPE 8600 system (Installed March 2018, 2 days ahead of schedule)
• Low end Skylake Silver 4116, 12 core parts
• Single rail Omnipath interconnect
• Relatively cheap node: high node count and scalability
• Improve price per core, bandwidth per core, reduce power
Tesseract performance per node vs nodes, volume
GF/spernode
0.0
150.0
300.0
450.0
600.0
Nodes
1 16 32 64 128 256
12^4 16^4 24^4
• 16 nodes (single switch) delivers bidirectional 25GB/s to every node (wirespeed)
• 512 nodes topology aware bidirectional 19GB/s
• 76% wirespeed using every link in system concurrently
• Low end Skylake Silver 4116, 12 core parts
• Single rail Omnipath interconnect
• Relatively cheap node: high node count and scalability
• Improve price per core, bandwidth per core, reduce power
Tesseract performance per node vs nodes, volume
GF/spernode
0.0
150.0
300.0
450.0
600.0
Nodes
1 16 32 64 128 256
12^4 16^4 24^4
• 16 nodes (single switch) delivers bidirectional 25GB/s to every node (wirespeed)
• 512 nodes topology aware bidirectional 19GB/s
• 76% wirespeed using every link in system concurrently
• Low end Skylake Silver 4116, 12 core parts
• Single rail Omnipath interconnect
• Relatively cheap node: high node count and scalability
• Improve price per core, bandwidth per core, reduce power
Tesseract performance per node vs nodes, volume
GF/spernode
0.0
150.0
300.0
450.0
600.0
Nodes
1 16 32 64 128 256
12^4 16^4 24^4
16 nodes (single switch) delivers bidirectional 25GB/s to every node (wirespeed)
512 nodes topology aware bidirectional 19GB/s
• 76% wirespeed using every link in system concurrently
• Small project with SGI/HPE on Mellanox EDR networks (James Southern)
• Embed 2n
QCD torus inside hypercube so that nearest neigbour comms travels single hop
4x speed up over default MPI Cartesian communicators on large systems
) Customise HPE 8600 (SGI ICE-XA) to use 16 = 24
nodes per leaf switch
Boyle et al.
16. Other Turing work
• Intel’s US Patent Application 20190042544
(FP16-S7E8 MIXED PRECISION FOR DEEP LEARNING AND OTHER ALGORITHMS)
• Authors: Boyle (ATI, Edinburgh), Sid Kashyap, Angus Lepper (Intel, former DiRAC RSE’s)
• Systematic study using Gnu Multi Precision library.
• BFP16 displays greater numerical stability for machine learning training.
• Understanding: histogramme of multiply results during SGD gradient calculations
• Patent application full text searchable on uspto.gov
DiRAC / Alan Turing Institute / Intel Collaboration
17. Software Innovation - AI on HPC
cessors.
������
�����
����
���
�
�� ��� ���� ����� ������ ����� ����� �����
�������
�����
����� ���� �����
����� ���� �������
G. 5: Wall clock time per reduction call vs. vector length before and after our optimisation. The large vector reducti
formance is ten times better after our optimisations on large vector lengths. The gain includes both computation accelerati
d communication acceleration.
the interface to memory management is not ideal.
ther, it is likely that method dealloc and alloc routine provided by collectives.h should have been used consistent
th use of any other “free” operation declared illegal. Further, this would have enabled simpler modification of t
ocation and deallocation implementation. It would also have been better to separate the vector allocation from t
duction operation; so that in hot loops the programmer could choose to reuse the same allocation.
ur first optimisations were high level: i) remove the expectation that the caller deallocates the returned vector;
Boyle et al.
2017
• Demonstration of factor 10 speed-up in the Baidu Research
optimised reduction code
- a publicly available code designed to optimise the performance
limiting steps in distributed machine learning (ML)
• potentially disruptive implications for design of cloud systems
- shows that ML workflows can achieve 10x performance
improvement when fully optimised and implemented on traditional
HPC architectures
18. gure 1: Left panel: A ‘fan’ plot for the light quark ud and us decay constants (normalised with the
erage ud, us decay constant) for one lattice spacing against the pion mass; Right panel: The heavy b
uark ub, sb decay constants, again normalised, for several lattice spacings also plotted against the
on mass.
ow further extended this programme to the heavier quark masses and in particular the b, [2].
the right panel of the figure we give the equivalent plot for the B and Bs mesons, together
ith the present (FLAG16) values.
Bornyakov et al. 2017 Hollitt et al. 2018
Testing the standard model of particle physics
• Mesons are composed of quark-antiquark pairs
• Their decay rates into other particles provide a strong test of the standard model
• Anomalies may point to new physics
• DiRAC researchers have developed new, computationally efficient technique for
estimating decay rates
19. DiRAC@Durham - Memory Intensive (MI)
• COSMA6 (2016)
• IBM/Lenovo/DDN
• 8192 Intel SandyBridge, 2.6 GHz cores
• Combined 65 Tbyte of RAM;
• Mellanox FDR10 interconnect in 2:1 blocking;
• 2.5 Pbyte of Lustre data storage
• COSMA7 (2018)
• Dell
• 23,600 Intel Skylake 4116 cores
• Interconnect: Mellanox EDR in a 2:1 blocking configuration with
islands of 24 nodes;
• Combined 230 Tbyte of RAM;
• a fast checkpointing i/o system (343 Tbyte) with peak performance
of 185 Gbyte/second write and read;
• >4 PB storage
Durham – MI
m was delivered by Dell in March 2018;
y Alces
service started on 1 May 2018
engagement:
closely with the Industrial Strategy of
partment for Business, Energy and
ial Strategy (BEIS).
g leads to industrial engagement; this
in innovation, both leading to
ts academia and the wider industry.
20. The Universe, a brief history
Credit: NASA / WMAP Science Team
DiRAC supports
calculations on all
physical scales
21. The components of the Universe
Credit: NASA / WMAP Science Team & Planck Satellite Team
26.8%
4.9%
68.3%
22. ICC Durham; Virgo Consortium
The EAGLE Simulation
DiRAC
Dark matter (N-body)
24. Research Software Engineers
“With great power comes great responsibility”
• Science requirements for DiRAC-3 demand 10-40x increases in
computing power to stay competitive
- hardware alone cannot deliver this
• We can no longer rely on “free lunch” from the Xeon era
• Vectorisation and code efficiency now critical
• Next generation hardware is more difficult to program efficiently
• RSEs are increasingly important
- RSEs can help with code profiling, optimisation, porting, etc
• DiRAC now has 3 RSEs: effort allocated via peer review process
algorithmic
computational
David Keyes
25. FirstGalaxies
SolarSystemForms
Reionization
Milky Way
13121110987654321
Age of the Universe (Gyr)
Present DayBig Bang
The Evolution of the Universe
First simulation on DiRAC-2.5y Memory
Intensive system at Durham carried out
with SWIFT, a revolutionary new
cosmological hydrodynamics code
developed at Durham which is 20x faster
than the state-of-the-art.
Key Problems
Origin of galaxies
Indentity of the dark matter
Nature of the dark energy
26. • A fast checkpointing i/o system (343 Tbyte) with peak performance of
185 Gbyte/second write and read
• 15 Lustre Object Storage Servers on Dell 640 nodes:
• 2 x Intel Skylake 5120 processors
• 192 Gbyte of RAM
• 8 x 3.2 TB NVMe SFF drives.
• 1 Mellanox EDR card
• A user code benchmark produced 180 Gbyte/second write and read –
this is almost wire-speed!
• At time of installation this was the fastest filesystem in production in
Europe
Memory Intensive:
burst buffer for checkpointing
28. Core hours lost in checkpointing over 5 years
Snapshot
period 24-hr 12-hr 6-hr 4-hr 2-hr 1-hr
Hours/
snapshot
30GB/sec 7.1M 14.2M 28.4M 42.5M 85.1M 170.1M 0.95
120GB/sec 1.8M 3.5M 7.1M 10.6M 21.3M 42.5M 0.24
140GB/sec 1.5M 3.0M 6.1M 9.1M 18.2M 36.54M 0.20
• Total number of available cpu hours per year: 36M (4116 cores)
• ~13% gain in available core hours due to faster checkpointing
Memory Intensive:
burst buffer for checkpointing
Heck (2018)
29. Uranus collision?
8 J. A. Kegerreis et al.
10 5 0
x Position (R )
10
5
0
yPosition(R)
10 5 0
x Position (R )
106
107
108
SpecificInternalEnergyJkg1
Figure 6. A mid-collision snapshots of a grazing impact with 108 SPH particles – compared with the more head-on collision in Fig. 5 –
coloured by their material and internal energy, showing some of the detailed evolution and mixing that can now be resolved. In the left
panel, light and dark grey show the target’s ice and rock material, respectively, and purple and brown show the same for the impactor.
Light blue is the target’s H-He atmosphere.
mon aim in cosmology, to simulate a larger patch of the uni-
verse), or we can study a small system with higher resolution
to model smaller details.
not) question that we ask a computer to solve, so this is an
important first step.
Kegerreis et al.
(2019), subm.
Planet collisions
Giant impact explains Uranus spin axis and cold temperature
• 108-particle simulations carried out on DiRAC Memory Intensive service
(Durham) using new hydro+Nbody code, Swift
• Hardware and software for this calculation developed with DiRAC support.
30. DiRAC & the discovery of Gravitational waves
On September 14, 2015 at 09:50:45 UTC, the LIGO
Hanford, WA, and Livingston, LA, observatories detected
pact binary waveforms [44] recovered GW
most significant event from each detector f
tions reported here. Occurring within the
• Simulations of binary
black hole mergers
performed on DiRAC
DataCentric system
(COSMA5)
• Crucial for interpretation
of LIGO gravitational
wave detection
Abbott et al., (2016)
DiRAC
31. DiRAC@Leicester – Data Intensive (DIaL)
• HPE system
• 400 compute nodes
• 2x Intel Xeon Skylake 6140, 2.3GHz, 18-core processors
• Dual FMA AVX512
• 192 GB RAM
• 1 x 6TB SuperDome Flex with 144 X6154 cores (3.0GHz)
• 3 x 1.5TB fat nodes with 36 X6140 cores
• Mellanox EDR interconnect in a 2:1 blocking setup
• 3PB Lustre storage.
• 150 TB flash storage for data intensive workflows
HPE/Arm/Suse Catalyst UK Arm system
• 4000-core Thunder X2-based cluster installed in January 2019
• Infiniband all-to-all interconnect
32. Shared platform with local Cambridge and EPSRC Tier 2 clusters
• Dell multi-architecture system (Skylake, KNL, Nvidia GPU)
• 484 Skylake nodes
• 2 x Intel Xeon Skylake 6142 2.6GHz 16-core processors
• Mix of 192 GB/node and 384 GB/node
• 44 Intel KNL nodes
• Intel Xeon Phi CPU 7210 @ 1.30GHz
• 96GB of RAM per node
• 12 NVIDIA GPU nodes
• four NVIDIA Tesla P100 GPUs
• 96GB memory per node
• connected by Mellanox EDR Infiniband
• Intel OmniPath interconnect in 2:1 blocking configuration
• 1.5PB of Lustre disk storage;
• 0.5PB flash storage-based “Data Accelerator”
DiRAC@Cambridge – Data Intensive (DIaC)
34. DiRAC HPC Training
• DiRAC provides access to training from wide pool of providers
• Currently offering:
- DiRAC Essentials Test: now available online (and
compulsory!)
- Workshops and Hackathons
• Coming soon:
- Domain-specific workshops
- Online individual training portal
Why do we do this?
- maximise DiRAC science output
- flexibility to adopt most cost-effective technologies
- future-proofing our software and skills
- contributes to increasing skills of wider economy
35. (Better systems) + (Better software) = Better science
This requires:
• More engagement in hardware and software co-design
• Enhanced training and knowledge transfer
DiRAC