I gave this talk at Krakow/Poland DevOPS meetup. It was a lightning talk covering subject of High Availability solutions, architecture, planning and deploying.
High Availability can be a curiously nebulous term, and most people probably don't care about it until they can't access their online banking service, or their plane crashes.
This presentation examines some of the considerations necessary when building highly available computer systems, then focuses on the HA infrastructure software currently available from the Corosync/OpenAIS, Linux-HA and Pacemaker projects.
Originally presented at Linux Users Victoria in April 2010 (http://luv.asn.au/2010/04/06)
IBM Configuration Assistant for z/OS Communications Server updatezOSCommserver
New capabilities for managing TCP/IP disaster recovery and planned outage configurations were shipped for the Network Configuration Assistant (formerly Configuration Assistant) via APAR PI97737 in July 2018. This presentation will provide a brief background on TCP/IP configuration with Network Configuration Assistant, and then introduce and explain the new capabilities.
Rundeck is an open source automation tool that allows users to break processes down into reusable workflows called jobs. It provides a central platform for visibility of operations tasks and enables teams to easily share tasks. Rundeck aims to connect disparate tools and resources through its APIs. The document discusses how Rundeck is used in different organizations for tasks like continuous delivery, data processing, test environment provisioning, and more. It provides demonstrations of Rundeck's job scheduling capabilities and plugin ecosystem. The document outlines Rundeck's system architecture and roadmap and encourages users to get involved through discussions, writing plugins, or sponsoring features.
Server virtualization concepts allow partitioning of physical servers into multiple virtual servers using virtualization software and hardware techniques. This improves resource utilization by running multiple virtual machines on a single physical server. Server virtualization provides benefits like reduced costs, higher efficiency, lower power consumption, and improved availability compared to running each application on its own physical server. Key components of server virtualization include virtual machines, hypervisors, CPU virtualization using techniques like Intel VT-x or AMD-V, memory virtualization, and I/O virtualization through methods like emulated, paravirtualized or direct I/O. KVM and QEMU are popular open source virtualization solutions, with KVM providing kernel-level virtualization support and Q
Virtualization allows multiple operating systems to run on a single physical system by sharing hardware resources. It is enabled by a hypervisor which controls the host system's processor and resources, allocating them to guest virtual machines. Virtualization improves resource utilization and costs by pooling physical hardware and allocating virtual resources on demand. However, the hypervisor is a potential point of failure as it has control over the entire system and is part of the trusted computing base. Approaches aim to reduce this risk by removing the hypervisor or restricting its control.
The document discusses managing egress traffic from Kubernetes applications using Istio. It describes the challenge of needing a stable outbound IP for compliance or security reasons. It then provides examples of solutions like VM-based NAT and external proxies before introducing Istio's egress gateway as a cluster-native solution. The document demonstrates how to configure an egress gateway to direct traffic to external services and shows how it provides a stable IP and protocol controls. It also discusses new features in Istio 1.2 like improved testing and release processes.
High Availability can be a curiously nebulous term, and most people probably don't care about it until they can't access their online banking service, or their plane crashes.
This presentation examines some of the considerations necessary when building highly available computer systems, then focuses on the HA infrastructure software currently available from the Corosync/OpenAIS, Linux-HA and Pacemaker projects.
Originally presented at Linux Users Victoria in April 2010 (http://luv.asn.au/2010/04/06)
IBM Configuration Assistant for z/OS Communications Server updatezOSCommserver
New capabilities for managing TCP/IP disaster recovery and planned outage configurations were shipped for the Network Configuration Assistant (formerly Configuration Assistant) via APAR PI97737 in July 2018. This presentation will provide a brief background on TCP/IP configuration with Network Configuration Assistant, and then introduce and explain the new capabilities.
Rundeck is an open source automation tool that allows users to break processes down into reusable workflows called jobs. It provides a central platform for visibility of operations tasks and enables teams to easily share tasks. Rundeck aims to connect disparate tools and resources through its APIs. The document discusses how Rundeck is used in different organizations for tasks like continuous delivery, data processing, test environment provisioning, and more. It provides demonstrations of Rundeck's job scheduling capabilities and plugin ecosystem. The document outlines Rundeck's system architecture and roadmap and encourages users to get involved through discussions, writing plugins, or sponsoring features.
Server virtualization concepts allow partitioning of physical servers into multiple virtual servers using virtualization software and hardware techniques. This improves resource utilization by running multiple virtual machines on a single physical server. Server virtualization provides benefits like reduced costs, higher efficiency, lower power consumption, and improved availability compared to running each application on its own physical server. Key components of server virtualization include virtual machines, hypervisors, CPU virtualization using techniques like Intel VT-x or AMD-V, memory virtualization, and I/O virtualization through methods like emulated, paravirtualized or direct I/O. KVM and QEMU are popular open source virtualization solutions, with KVM providing kernel-level virtualization support and Q
Virtualization allows multiple operating systems to run on a single physical system by sharing hardware resources. It is enabled by a hypervisor which controls the host system's processor and resources, allocating them to guest virtual machines. Virtualization improves resource utilization and costs by pooling physical hardware and allocating virtual resources on demand. However, the hypervisor is a potential point of failure as it has control over the entire system and is part of the trusted computing base. Approaches aim to reduce this risk by removing the hypervisor or restricting its control.
The document discusses managing egress traffic from Kubernetes applications using Istio. It describes the challenge of needing a stable outbound IP for compliance or security reasons. It then provides examples of solutions like VM-based NAT and external proxies before introducing Istio's egress gateway as a cluster-native solution. The document demonstrates how to configure an egress gateway to direct traffic to external services and shows how it provides a stable IP and protocol controls. It also discusses new features in Istio 1.2 like improved testing and release processes.
[SC03] Active Directory の DR 対策~天災/人災/サイバー攻撃、その時あなたの IT 基盤は利用継続できますか? de:code 2017
大規模災害や管理者のオペレーションミス、標的型攻撃による管理者アカウントの乗っ取りなど、様々な理由で Active Directory や Azure Active Directory が正常に利用できなくなった場合に IT 管理者はどうすれば良いのか、事前に何を準備しておけば良いのかの対策について具体例を交えて紹介致します。
受講対象: Active Directory の設計と運用を担当するエンジニア
製品/テクノロジ: Microsoft Azure/アイデンティティ (AD/Azure AD)/クラウド/事業継続/運用/セキュリティ
渡辺 元気
NTTコミュニケーションズ株式会社
クラウドサービス部
主査
The document discusses load balancing with NSX-T. It covers the key building blocks of load balancing including load balancers, virtual servers, pools, and monitors. It also describes different load balancing modes like inline, one-arm, and mechanisms like round robin, least connection, and IP hash. The document concludes with steps to set up an inline load balancing configuration in NSX-T.
Linux Crontab allows scheduling routine jobs to run automatically in the background at specific times or days. The document provides 15 examples of cron job configurations, including running jobs daily, weekly, monthly, at startup or reboot, and during specific time ranges. It also covers viewing, editing, and installing cron jobs, as well as redirecting output and specifying environment variables. Anacron is introduced as an alternative for machines that may not be running 24/7, to better ensure scheduled jobs run as expected.
Optimizing Network Performance for Amazon EC2 Instances (CMP308-R1) - AWS re:...Amazon Web Services
Many customers are using Amazon EC2 instances to run applications with high performance networking requirements. In this session, we provide an overview of Amazon EC2 network performance features— including enhanced networking, ENA, and placement groups—and discuss how we are innovating on behalf of our customers to improve networking performance in a scalable and cost-efficient manner. We share best practices and performance tips for getting the best networking performance out of your Amazon EC2 instances.
This document discusses virtualization, containers, and hyperconvergence. It provides an overview of virtualization and its benefits including hardware abstraction and multi-tenancy. However, virtualization also has challenges like significant overhead and repetitive configuration tasks. Containers provide similar benefits with less overhead by abstracting at the operating system level. The document then discusses how hyperconvergence combines compute, storage, and networking to simplify deployment and operations. It notes that many hyperconverged solutions still face virtualization challenges. The presentation argues that combining containers and hyperconvergence can provide both the benefits of containers' efficiency and hyperconvergence's scale. Stratoscale is presented as a solution that provides containers as a service with multi-tenancy, SLA-driven performance
In this session, Lucian talks about monitoring CloudStack and its related components. What are the best practices and what do you need to track closely to ensure your cloud reliability.
Lucian is a long-time sysadmin and Apache Cloustack user and contributor. He has a background in hosting, virtualisation and datacentre operations, but is now working full time on Cloudstack.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
Virtualization involves dividing the resources of a computer system into multiple execution environments. It works by inserting a thin virtualization layer that allows multiple operating systems to run concurrently on a single physical machine while sharing hardware resources. Virtualization provides significant benefits such as improved hardware utilization, simplified management, reduced costs, and improved fault tolerance.
The document provides instructions for deploying Prometheus and the Kube Prometheus Stack on NKS. Key steps include:
1. Deploying Prometheus using Helm with custom storage class and service type settings.
2. Verifying successful deployment by checking pods, services, and accessing the Prometheus UI.
3. Deploying the Kube Prometheus Stack using Helm, again with custom storage class and service type settings.
4. Verifying successful deployment including checking pods, services, and accessing the Grafana UI with default credentials to view pre-configured dashboards importing from Prometheus data.
This document provides an introduction to Docker. It discusses why Docker is useful for isolation, being lightweight, simplicity, workflow, and community. It describes the Docker engine, daemon, and CLI. It explains how Docker Hub provides image storage and automated builds. It outlines the Docker installation process and common workflows like finding images, pulling, running, stopping, and removing containers and images. It promotes Docker for building local images and using host volumes.
1) Apache Kafka is a distributed streaming platform that can be used for publish-subscribe messaging and storing and processing streams of data. However, there are many potential anti-patterns to be aware of when using Kafka.
2) Some common anti-patterns include not properly configuring data durability, ignoring error handling and exceptions, failing to use Kafka's built-in retries and idempotence features, and not embracing Kafka's at least once processing semantics.
3) It is also important to properly configure Kafka for production use by tuning OS settings, reading documentation on best practices, implementing monitoring, and addressing topics and partitioning design.
Introduction To Docker, Docker Compose, Docker SwarmAn Nguyen
This document provides an introduction to Docker, Docker Compose, and Docker Swarm. It begins with an agenda and questions to gauge audience familiarity. It then defines Docker as a container engine that packages applications and dependencies into standardized units. Key differences between containers and virtual machines are outlined. Docker Compose is introduced as a tool to define and run multi-container applications with YAML files. Docker Swarm is a clustering tool that allows managing Docker nodes as a single virtual system for scaling and updating applications. The document demonstrates several Docker concepts and commands.
The document discusses Docker and containerization. It introduces Docker Enterprise Edition which provides end-to-end features for container apps along with enterprise grade security and support. It also discusses Docker Assemble, a tool that can build an optimized Docker container from source code without needing a Dockerfile by detecting frameworks, adding dependencies, and optimizing the image. The document demonstrates using Docker Assemble and deploying containers to Docker Universal Control Plane (UCP) for cluster management.
High Availability (HA) Explained - second editionMaciej Lasyk
I gave this talk at one of the biggest Linux conferences in Poland: 11 Liux Session that took place in Wrocław on 5/6-04-2014. It was a lightning talk covering subject of High Availability solutions, architecture, planning and deploying.
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...confluent
We recently learned about “Fault Tree Analysis” and decided to apply the technique to bulletproof our Apache Kafka deployments. In this talk, learn about fault tree analysis and what you should focus on to make your Apache Kafka clusters resilient.
This talk should provide a framework for answers the following common questions a Kafka operator or user might have:
What guarantees can I promise my users?
What should my replication factor?
What should the ISR setting be?
Should I use RAID or not?
Should I use external storage such as EBS or local disks?
[SC03] Active Directory の DR 対策~天災/人災/サイバー攻撃、その時あなたの IT 基盤は利用継続できますか? de:code 2017
大規模災害や管理者のオペレーションミス、標的型攻撃による管理者アカウントの乗っ取りなど、様々な理由で Active Directory や Azure Active Directory が正常に利用できなくなった場合に IT 管理者はどうすれば良いのか、事前に何を準備しておけば良いのかの対策について具体例を交えて紹介致します。
受講対象: Active Directory の設計と運用を担当するエンジニア
製品/テクノロジ: Microsoft Azure/アイデンティティ (AD/Azure AD)/クラウド/事業継続/運用/セキュリティ
渡辺 元気
NTTコミュニケーションズ株式会社
クラウドサービス部
主査
The document discusses load balancing with NSX-T. It covers the key building blocks of load balancing including load balancers, virtual servers, pools, and monitors. It also describes different load balancing modes like inline, one-arm, and mechanisms like round robin, least connection, and IP hash. The document concludes with steps to set up an inline load balancing configuration in NSX-T.
Linux Crontab allows scheduling routine jobs to run automatically in the background at specific times or days. The document provides 15 examples of cron job configurations, including running jobs daily, weekly, monthly, at startup or reboot, and during specific time ranges. It also covers viewing, editing, and installing cron jobs, as well as redirecting output and specifying environment variables. Anacron is introduced as an alternative for machines that may not be running 24/7, to better ensure scheduled jobs run as expected.
Optimizing Network Performance for Amazon EC2 Instances (CMP308-R1) - AWS re:...Amazon Web Services
Many customers are using Amazon EC2 instances to run applications with high performance networking requirements. In this session, we provide an overview of Amazon EC2 network performance features— including enhanced networking, ENA, and placement groups—and discuss how we are innovating on behalf of our customers to improve networking performance in a scalable and cost-efficient manner. We share best practices and performance tips for getting the best networking performance out of your Amazon EC2 instances.
This document discusses virtualization, containers, and hyperconvergence. It provides an overview of virtualization and its benefits including hardware abstraction and multi-tenancy. However, virtualization also has challenges like significant overhead and repetitive configuration tasks. Containers provide similar benefits with less overhead by abstracting at the operating system level. The document then discusses how hyperconvergence combines compute, storage, and networking to simplify deployment and operations. It notes that many hyperconverged solutions still face virtualization challenges. The presentation argues that combining containers and hyperconvergence can provide both the benefits of containers' efficiency and hyperconvergence's scale. Stratoscale is presented as a solution that provides containers as a service with multi-tenancy, SLA-driven performance
In this session, Lucian talks about monitoring CloudStack and its related components. What are the best practices and what do you need to track closely to ensure your cloud reliability.
Lucian is a long-time sysadmin and Apache Cloustack user and contributor. He has a background in hosting, virtualisation and datacentre operations, but is now working full time on Cloudstack.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
Virtualization involves dividing the resources of a computer system into multiple execution environments. It works by inserting a thin virtualization layer that allows multiple operating systems to run concurrently on a single physical machine while sharing hardware resources. Virtualization provides significant benefits such as improved hardware utilization, simplified management, reduced costs, and improved fault tolerance.
The document provides instructions for deploying Prometheus and the Kube Prometheus Stack on NKS. Key steps include:
1. Deploying Prometheus using Helm with custom storage class and service type settings.
2. Verifying successful deployment by checking pods, services, and accessing the Prometheus UI.
3. Deploying the Kube Prometheus Stack using Helm, again with custom storage class and service type settings.
4. Verifying successful deployment including checking pods, services, and accessing the Grafana UI with default credentials to view pre-configured dashboards importing from Prometheus data.
This document provides an introduction to Docker. It discusses why Docker is useful for isolation, being lightweight, simplicity, workflow, and community. It describes the Docker engine, daemon, and CLI. It explains how Docker Hub provides image storage and automated builds. It outlines the Docker installation process and common workflows like finding images, pulling, running, stopping, and removing containers and images. It promotes Docker for building local images and using host volumes.
1) Apache Kafka is a distributed streaming platform that can be used for publish-subscribe messaging and storing and processing streams of data. However, there are many potential anti-patterns to be aware of when using Kafka.
2) Some common anti-patterns include not properly configuring data durability, ignoring error handling and exceptions, failing to use Kafka's built-in retries and idempotence features, and not embracing Kafka's at least once processing semantics.
3) It is also important to properly configure Kafka for production use by tuning OS settings, reading documentation on best practices, implementing monitoring, and addressing topics and partitioning design.
Introduction To Docker, Docker Compose, Docker SwarmAn Nguyen
This document provides an introduction to Docker, Docker Compose, and Docker Swarm. It begins with an agenda and questions to gauge audience familiarity. It then defines Docker as a container engine that packages applications and dependencies into standardized units. Key differences between containers and virtual machines are outlined. Docker Compose is introduced as a tool to define and run multi-container applications with YAML files. Docker Swarm is a clustering tool that allows managing Docker nodes as a single virtual system for scaling and updating applications. The document demonstrates several Docker concepts and commands.
The document discusses Docker and containerization. It introduces Docker Enterprise Edition which provides end-to-end features for container apps along with enterprise grade security and support. It also discusses Docker Assemble, a tool that can build an optimized Docker container from source code without needing a Dockerfile by detecting frameworks, adding dependencies, and optimizing the image. The document demonstrates using Docker Assemble and deploying containers to Docker Universal Control Plane (UCP) for cluster management.
High Availability (HA) Explained - second editionMaciej Lasyk
I gave this talk at one of the biggest Linux conferences in Poland: 11 Liux Session that took place in Wrocław on 5/6-04-2014. It was a lightning talk covering subject of High Availability solutions, architecture, planning and deploying.
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...confluent
We recently learned about “Fault Tree Analysis” and decided to apply the technique to bulletproof our Apache Kafka deployments. In this talk, learn about fault tree analysis and what you should focus on to make your Apache Kafka clusters resilient.
This talk should provide a framework for answers the following common questions a Kafka operator or user might have:
What guarantees can I promise my users?
What should my replication factor?
What should the ISR setting be?
Should I use RAID or not?
Should I use external storage such as EBS or local disks?
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...confluent
We recently learned about “Fault Tree Analysis” and decided to apply the technique to bulletproof our Apache Kafka deployments. In this talk, learn about fault tree analysis and what you should focus on to make your Apache Kafka clusters resilient. This talk should provide a framework for answers the following common questions a Kafka operator or user might have:
-What guarantees can I promise my users?
-What should my replication factor?
-What should the ISR setting be?
-Should I use RAID or not?
-Should I use external storage such as EBS or local disks?
NoSQL Revolution: Under the Covers of Distributed Systems at Scale (SPOT401) ...Amazon Web Services
The Dynamo paper started a revolution in distributed systems. The contributions from this paper are still impacting the design and practices of some of the world's largest distributed systems, including those at Amazon.com and beyond. Building distributed systems is hard, but our goal in this session is to simplify the complexity of this topic to empower the hacker in you! Have you been bitten by the eventual consistency bug lately? We show you how to tame eventual consistency and make it a great scaling asset. As you scale up, you must be ready to deal with node, rack, and data center failure. We share insights on how to limit the blast radius of the individual components of your system, battle tested techniques for simulating failures (network partitions, data center failure), and how we used core distributed systems fundamentals to build highly scalable, performance, durable, and resilient systems. Come watch us uncover the secret sauce behind Amazon DynamoDB, Amazon SQS, Amazon SNS, and the fundamental tenents that define them as Internet scale services. To turn this session into a hacker's dream, we go over design and implementation practices you can follow to build an application with virtually limitless scalability on AWS within an hour. We even share insights and secret tips on how to make the most out of one of the services released during the morning keynote.
Scaling your Kafka streaming pipeline can be a pain - but it doesn’t have to ...HostedbyConfluent
"Kafka data pipeline maintenance can be painful.
It usually comes with complicated and lengthy recovery processes, scaling difficulties, traffic ‘moodiness’, and latency issues after downtimes and outages.
It doesn’t have to be that way!
We’ll examine one of our multi-petabyte scale Kafka pipelines, and go over some of the pitfalls we’ve encountered. We’ll offer solutions that alleviate those problems, and go over comparisons between the before and after . We’ll then explain why some common sense solutions do not work well and offer an improved, scalable and resilient way of processing your stream.
We’ll cover:
• Costs of processing in stream compared to in batch
• Scaling out for bursts and reprocessing
• Making the tradeoff between wait times and costs
• Recovering from outages
• And much more…"
The document discusses MySQL Group Replication, which is a plugin that provides multi-master replication capability for MySQL. It allows data to be replicated between multiple MySQL servers so that they can stay in sync. The replication works by having each server send transaction writesets to other servers through a group communication system, and then each server certifies and applies the changes locally in an asynchronous manner.
I gave this presentation at the Oracle InSync09 Conference in Sydney in May 2009. It's all about Oracle Coherence - Napster for the enterprise - and how you can use it to get the most out of your applications.
Use Coherence like our customers are doing today;
- Sharing web session state across multiple portals
- Caching the results of calls to back end systems
- State management for stateful services, bring the processing to the data
- Process large XML payloads more quickly and efficiently
The document discusses various topics related to optimizing MySQL performance, including database engines, basic settings, useful utilities, and queries. It begins by describing different MySQL storage engines like InnoDB, MyISAM, Memory, NDB and others. It then covers important configuration settings like query_cache_size, innodb_buffer_pool_size, and others. Utilities for MySQL performance analysis and tuning are presented, such as tuning-primer.sh, mysql-tuner.pl, and tools from the maatkit collection. Best practices for query optimization are also covered, such as using ORM frameworks, proper indexing, and making queries concrete. The document concludes by providing contact details for the author.
The document discusses continuous deployment practices at Outbrain, an online content recommendation company. It emphasizes the importance of short feedback loops between code changes and user exposure through practices like deploying new code multiple times daily and testing code changes automatically before deployment. Infrastructure is codified and deployment is automated using tools like Chef to further streamline the process.
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...HostedbyConfluent
This document discusses whether it is better to process data using a stream or batch approach. It describes how one company evolved their data pipeline from a micro-batch streaming process to a batch approach. The streaming process was very expensive, costing $400,000 per year to run. It also had issues with wasted resources during idle times, slow processing during bursts of data, and long recovery times from outages. The company rearchitected the process to use discrete time windows run in isolated batch jobs. This new batch approach reduced costs by 60% to $160,000 per year and improved processing efficiency and outage recovery times.
Microservices 5 things i wish i'd known code motionVincent Kok
Microservices are hot! A lot of companies are experimenting with this architectural pattern that greatly benefits the software development process. When adopting new patterns we always encounter that moment where we think 'if only I knew this three months ago'. This talk will be a sneak peak into the world of microservices at Atlassian and reveal what we've learned about microservices: how to arrange, configure and build your code efficiently; deployment and testing; and how to operate effectively in this environment. In this talk you will learn how to immediately apply five simple strategies.
Microservices: 5 things I wish I'd known - Vincent Kok - Codemotion Amsterdam...Codemotion
Microservices are hot! A lot of companies are experimenting with this architectural pattern that greatly benefits the software development process. When adopting new patterns we always encounter that moment where we think 'if only I knew this three months ago'. This talk will be a sneak peak into the world of microservices at Atlassian and reveal what we've learned about microservices: how to arrange, configure and build your code efficiently; deployment and testing; and how to operate effectively in this environment. In this talk you will learn how to immediately apply five simple strategies.
This document provides an overview of performance tuning for Java applications. It discusses top-down and bottom-up performance analysis approaches. It also covers choosing the right garbage collector and JVM tuning basics like calculating allocation rates and live data size from GC logs. The document shows examples of tuning JVM settings for latency using CMS and G1 collectors as well as tuning for throughput using ParallelOldGC.
Science Of Saving With AWS Reserved Instances - 9/11/14Cloudability
Choosing the right Reserved Instances isn’t an art, it’s a science. Perfecting that science could save you up to 65% on your AWS bill.
In this presentation, you’ll learn the math and science used by thousands of AWS users to optimize their Reserved Instance portfolios.
Topics include:
- Identifying the right Reserved Instances for your company's usage
- Avoiding common Reserved Instance pitfalls
- Maintaining long-term savings as your usage changes
Using Kubernetes to deliver a “serverless” serviceDoKC
Serverless promises to change the way we consume software. It allows us to potentially pay for only that which we use and can help drive down operational costs to the minimal amount of resources necessary.
Architecting for serverless requires a unique look at app logic and the way it is deployed. It takes a combination of the logical and physical worlds. An architectural pattern has emerged where we can scale ephemeral compute separate from services that need to persist.
We use Kubernetes to deliver exactly this. A “serverless” experience that is driven and enabled by compute pods and storage pods. We also have used our experience running thousands of database clusters on Kubernetes to automate the operational expertise of managing a distributed database.
In this talk, we will take a dive deep into the architecture of our application and share:
* A definition and outline of the challenges of serverless
* How we reworked our logic for a serverless approach
* How we use Kubernetes to gain serverless autoscaling
This talk was given by Jim Walker for DoK Day Europe @ KubeCon 2022.
Présentation du FME World Tour du 13 avril 2017 à QuebecGuillaume Genest
Présentation de l'événement FME World Tour 2017 qui a eu lieu le 13 avril 2017 à Québec. Découvrez les nouveautés de FME 2017 et FME Server 2017. Voyez les trucs et astuces pour optimiser la performance de vos workbench, une solution pour comparer des workspaces ensemble, un portail de chargement et téléchargement de données avec FME Server ainsi que des outils de validation et correction topologique.
This document discusses the history and development of Docker. It notes that Docker was originally created at dotCloud as the engine for their Platform as a Service (PaaS), but in 2013 as PaaS times were hard, Docker was open sourced. Docker was based on LXC and created for a single purpose. dotCloud then pivoted to create Docker Inc. and make Docker their main product. The document also discusses Docker 1.11's integration with runC and systemd, as well as the transition to using the Open Container Initiative specification.
Programowanie AWSa z CLI, boto, Ansiblem i libcloudemMaciej Lasyk
The document describes a session that demonstrates how to program AWS using the AWS CLI, Boto, and Ansible. It provides an agenda for the session that includes a short AWS introduction, demonstrations of the AWS console, AWS CLI, AWS shell, Boto library, Ansible configuration management tool, and Libcloud library. Contact information is also provided for learning more about AWS programming and joining the training organization.
This document discusses Linux security and SELinux. It provides an overview of SELinux and how it works to provide mandatory access control on Linux systems. It discusses how SELinux labels processes and files to confine programs and prevent unauthorized access. It also discusses using SELinux with Docker containers to provide security isolation between containers.
Under the Dome (of failure driven pipeline)Maciej Lasyk
The document discusses various topics related to DevOps including:
1. Different types of shells (login, non-login, interactive, non-interactive, su, sudo su, sudo -i, sudo /bin/bash, sudo -s) and how they affect environment variables and profile files.
2. Stories of organizational "anti-types" that go against DevOps principles like not seeing the need for operations teams.
3. How automation, consistency, and reducing errors leads to stable environments and less unplanned work, allowing teams to focus on delivery.
This document discusses integrating security into DevOps practices through continuous delivery. It proposes including security automation and monitoring at each stage of the software development pipeline from development through production. Specific techniques mentioned include performing continuous security scanning, integrating security testing with other testing stages, automating security tasks using tools like Ansible, and sharing security data and lessons learned across teams to improve processes over time. The overall message is that security should be built into delivery rather than treated separately to avoid slowing software releases while still maintaining quality.
Orchestrating docker containers at scale (#DockerKRK edition)Maciej Lasyk
Slightly different version (original is here http://www.slideshare.net/d0cent/orchestrating-docker-containersatscale). This version was presented during first #Docker meetup in Kraków / Poland.
Orchestrating docker containers at scale (PJUG edition)Maciej Lasyk
Slightly changed version (original is here http://www.slideshare.net/d0cent/orchestrating-docker-containersatscale). This version was presented during Polish Java User Group meetup JavaCamp#13 in Kraków / Poland.
Orchestrating Docker containers at scaleMaciej Lasyk
Many of us already poked around Docker. Let's recap what we know and then think what do we know about scaling apps & whole environments which are Docker - based? Should we PaaS, IaaS or go with bare? Which tools to use on a given scale?
This document contains a list of various tools related to terminals, privacy, communication, productivity, and mobile topics. It discusses terminal emulators like guake and iterm2, VPN services like OpenVPN, messaging clients like IRC and XMPP, note taking apps like Evernote and Geeknote, and more. It concludes by inviting questions about any of the topics mentioned.
How could one create very sophisticated, open - source based monitoring solution that is very scalable and easy to deploy?
I gave this talk during on of the biggest Linux conferences in Poland: 11 Linux Session which took place in Wrocław on 5/6-04-2013
I gave this talk during first Infosec meetup in Kraków/Poland on 13th March 2014. After viewing this presentation you'll know how and why you should use SELinux (or others LSMs).
Is Red Hat / Fedora / Centos ready for lightweight Docker containers? Is Docker secure enough? How about SELinux? How could we deploy Jboss or Django within Docker / RHEL?
I gave this talk at DevOPS meetup in Krakow at 2014-02-26.
How to run system administrator recruitment process? By creating platform based on open source parts in just 2 nights! I gave this talk in Poland / Kraków OWASP chapter meeting on 17th Octomber 2013 at our local Google for Entrepreneurs site. It's focused on security and also shows how to create recruitment process in CTF / challenge way.
This story covers mostly security details of this whole platform. There's great chance, that I will give another talk about this system but this time focusing on technical details. Stay tuned ;)
9 Ways Pastors Will Use AI Everyday By 2029
These future use cases are only a handful of the many many options generative AI is providing pastors and leaders everywhere. If you learn how AI might enhance and support your ministry, you'll enter into a world that's full of hope for the Gospel.
Learn more at http://www.AIforChurchLeaders.com and http://www.churchtechtoday.com
Dev Dives: Mining your data with AI-powered Continuous DiscoveryUiPathCommunity
Want to learn how AI and Continuous Discovery can uncover impactful automation opportunities? Watch this webinar to find out more about UiPath Discovery products!
Watch this session and:
👉 See the power of UiPath Discovery products, including Process Mining, Task Mining, Communications Mining, and Automation Hub
👉 Watch the demo of how to leverage system data, desktop data, or unstructured communications data to gain deeper understanding of existing processes
👉 Learn how you can benefit from each of the discovery products as an Automation Developer
🗣 Speakers:
Jyoti Raghav, Principal Technical Enablement Engineer @UiPath
Anja le Clercq, Principal Technical Enablement Engineer @UiPath
⏩ Register for our upcoming Dev Dives July session: Boosting Tester Productivity with Coded Automation and Autopilot™
👉 Link: https://bit.ly/Dev_Dives_July
This session was streamed live on June 27, 2024.
Check out all our upcoming Dev Dives 2024 sessions at:
🚩 https://bit.ly/Dev_Dives_2024
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCynthia Thomas
Identities are a crucial part of running workloads on Kubernetes. How do you ensure Pods can securely access Cloud resources? In this lightning talk, you will learn how large Cloud providers work together to share Identity Provider responsibilities in order to federate identities in multi-cloud environments.
The document discusses testing throughout the software development life cycle. It describes different software development models including sequential, incremental, and iterative models. It also covers different test levels from component and integration testing to system and acceptance testing. The document discusses different types of testing including functional and non-functional testing. It also covers topics like maintenance testing and triggers for additional testing when changes are made. Also covers concepts of Agile including DevOps, Shift Left Approach, TDD, BDD, ATDD, Retrospective and Process Improvement
Chapter 3 of ISTQB Foundation 2018 syllabus with sample questions. Answers about what is static testing, what is review, types of review, informal review, walkthrough, technical review, inspection.
Quantum Communications Q&A with Gemini LLM. These are based on Shannon's Noisy channel Theorem and offers how the classical theory applies to the quantum world.
How to Optimize Call Monitoring: Automate QA and Elevate Customer ExperienceAggregage
The traditional method of manual call monitoring is no longer cutting it in today's fast-paced call center environment. Join this webinar where industry experts Angie Kronlage and April Wiita from Working Solutions will explore the power of automation to revolutionize outdated call review processes!
Metadata Lakes for Next-Gen AI/ML - DatastratoZilliz
As data catalogs evolve to meet the growing and new demands of high-velocity, unstructured data, we see them taking a new shape as an emergent and flexible way to activate metadata for multiple uses. This talk discusses modern uses of metadata at the infrastructure level for AI-enablement in RAG pipelines in response to the new demands of the ecosystem. We will also discuss Apache (incubating) Gravitino and its open source-first approach to data cataloging across multi-cloud and geo-distributed architectures.
This slide deck is a deep dive the Salesforce latest release - Summer 24, by the famous Stephen Stanley. He has examined the release notes very carefully, and summarised them for the Wellington Salesforce user group, virtual meeting June 27 2024.
Building an Agentic RAG locally with Ollama and MilvusZilliz
With the rise of Open-Source LLMs like Llama, Mistral, Gemma, and more, it has become apparent that LLMs might also be useful even when run locally. In this talk, we will see how to deploy an Agentic Retrieval Augmented Generation (RAG) setup using Ollama, with Milvus as the vector database on your laptop. That way, you can also avoid being Rate Limited by OpenAI like I have been in the past.
Quality Patents: Patents That Stand the Test of TimeAurora Consulting
Is your patent a vanity piece of paper for your office wall? Or is it a reliable, defendable, assertable, property right? The difference is often quality.
Is your patent simply a transactional cost and a large pile of legal bills for your startup? Or is it a leverageable asset worthy of attracting precious investment dollars, worth its cost in multiples of valuation? The difference is often quality.
Is your patent application only good enough to get through the examination process? Or has it been crafted to stand the tests of time and varied audiences if you later need to assert that document against an infringer, find yourself litigating with it in an Article 3 Court at the hands of a judge and jury, God forbid, end up having to defend its validity at the PTAB, or even needing to use it to block pirated imports at the International Trade Commission? The difference is often quality.
Quality will be our focus for a good chunk of the remainder of this season. What goes into a quality patent, and where possible, how do you get it without breaking the bank?
** Episode Overview **
In this first episode of our quality series, Kristen Hansen and the panel discuss:
⦿ What do we mean when we say patent quality?
⦿ Why is patent quality important?
⦿ How to balance quality and budget
⦿ The importance of searching, continuations, and draftsperson domain expertise
⦿ Very practical tips, tricks, examples, and Kristen’s Musts for drafting quality applications
https://www.aurorapatents.com/patently-strategic-podcast.html
An Introduction to All Data Enterprise IntegrationSafe Software
Are you spending more time wrestling with your data than actually using it? You’re not alone. For many organizations, managing data from various sources can feel like an uphill battle. But what if you could turn that around and make your data work for you effortlessly? That’s where FME comes in.
We’ve designed FME to tackle these exact issues, transforming your data chaos into a streamlined, efficient process. Join us for an introduction to All Data Enterprise Integration and discover how FME can be your game-changer.
During this webinar, you’ll learn:
- Why Data Integration Matters: How FME can streamline your data process.
- The Role of Spatial Data: Why spatial data is crucial for your organization.
- Connecting & Viewing Data: See how FME connects to your data sources, with a flash demo to showcase.
- Transforming Your Data: Find out how FME can transform your data to fit your needs. We’ll bring this process to life with a demo leveraging both geometry and attribute validation.
- Automating Your Workflows: Learn how FME can save you time and money with automation.
Don’t miss this chance to learn how FME can bring your data integration strategy to life, making your workflows more efficient and saving you valuable time and resources. Join us and take the first step toward a more integrated, efficient, data-driven future!
Sustainability requires ingenuity and stewardship. Did you know Pigging Solutions pigging systems help you achieve your sustainable manufacturing goals AND provide rapid return on investment.
How? Our systems recover over 99% of product in transfer piping. Recovering trapped product from transfer lines that would otherwise become flush-waste, means you can increase batch yields and eliminate flush waste. From raw materials to finished product, if you can pump it, we can pig it.
Details of description part II: Describing images in practice - Tech Forum 2024BookNet Canada
This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator.
Link to presentation recording and transcript: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/
Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.
Database Management Myths for DevelopersJohn Sterrett
Myths, Mistakes, and Lessons learned about Managing SQL Server databases. We also focus on automating and validating your critical database management tasks.
Test Management as Chapter 5 of ISTQB Foundation. Topics covered are Test Organization, Test Planning and Estimation, Test Monitoring and Control, Test Execution Schedule, Test Strategy, Risk Management, Defect Management
2. “Anything that can go wrong, will go wrong”
Murphy's law
Maciej Lasyk, High Availability Explained
2/14
3. “Anything that can go wrong, will go wrong”
Murphy's law
Maciej Lasyk, High Availability Explained
2/14
4. “Anything that can go wrong, will go wrong”
Murphy's law
An electrical explosion and fire Saturday at a Houston data
center operated by The Planet has taken the entire facility offline.
The company claimed power to the facility was interrupted when a
transformer exploded. Official reports that three walls were blown
down causing a fire.
Maciej Lasyk, High Availability Explained
2/14
5. “Anything that can go wrong, will go wrong”
Murphy's law
An electrical explosion and fire Saturday at a Houston data
center operated by The Planet has taken the entire facility offline.
The company claimed power to the facility was interrupted when a
transformer exploded. Official reports that three walls were blown
down causing a fire.
Three walls of the electrical equipment room on the first floor
blew several feet from their original position, and the underground
cabling that powers the first floor of H1 was destroyed.
Maciej Lasyk, High Availability Explained
2/14
6. High Availability is in the eye of the beholder
Maciej Lasyk, High Availability Explained
3/14
7. High Availability is in the eye of the beholder
CEO: we don't loose sales
Maciej Lasyk, High Availability Explained
3/14
8. High Availability is in the eye of the beholder
CEO: we don't loose sales
Sales: we can extend our offer basing on HA level
Maciej Lasyk, High Availability Explained
3/14
9. High Availability is in the eye of the beholder
CEO: we don't loose sales
Sales: we can extend our offer basing on HA level
Accounts managers: we don't upset our customers (that often)
Maciej Lasyk, High Availability Explained
3/14
10. High Availability is in the eye of the beholder
CEO: we don't loose sales
Sales: we can extend our offer basing on HA level
Accounts managers: we don't upset our customers (that often)
Developers: we can be proud – our services are working ;)
Maciej Lasyk, High Availability Explained
3/14
11. High Availability is in the eye of the beholder
CEO: we don't loose sales
Sales: we can extend our offer basing on HA level
Accounts managers: we don't upset our customers (that often)
Developers: we can be proud – our services are working ;)
System engineers: we can sleep well (and fsck, we love to!)
Maciej Lasyk, High Availability Explained
3/14
12. High Availability is in the eye of the beholder
CEO: we don't loose sales
Sales: we can extend our offer basing on HA level
Accounts managers: we don't upset our customers (that often)
Developers: we can be proud – our services are working ;)
System engineers: we can sleep well (and fsck, we love to!)
Technical support: no calls? Back to WoW then.. ;)
Maciej Lasyk, High Availability Explained
3/14
13. So how many 9's?
Maciej Lasyk, High Availability Explained
4/14
14. So how many 9's?
Maciej Lasyk, High Availability Explained
4/14
15. So how many 9's?
Monthly: 1 hour of outage means 100% - 0.13888 ~= 99.86112 of availability
Maciej Lasyk, High Availability Explained
4/14
16. So how many 9's?
Monthly: 1 hour of outage means 100% - 0.13888 ~= 99.86112 of availability
Yearly: 1 hour of outage means 100% - 0.01142 ~= 99.98858 of availability
Maciej Lasyk, High Availability Explained
4/14
17. So how many 9's?
Monthly: 1 hour of outage means 100% - 0.13888 ~= 99.86112 of availability
Yearly: 1 hour of outage means 100% - 0.01142 ~= 99.98858 of availability
Availability
Downtime (year)
Downtime (month)
90% (“one nine”)
36.5 days
72 hours
95%
18.25 days
36 hours
97%
10.96 days
21.6 hours
98%
7.30 days
14.4 hours
99% (“two nines”)
3.65 days
7.2 hours
99.5%
1.83 days
3.6 hours
99.8%
17.52 hours
86.23 minutes
99.9% (“three nines”)
4.38 hours
21.56 minutes
99.99 (“four nines”)
52.56 minutes
4.32 minutes
99.999 (“five nines”)
5.26 minutes
25.9 seconds
Maciej Lasyk, High Availability Explained
4/14
18. So how many 9's?
https://jazz.net/wiki/bin/view/Deployment/HighAvailability
Maciej Lasyk, High Availability Explained
4/14
19. HA terminology
RPO: Recovery Point Objective; how much data can we loose?
Maciej Lasyk, High Availability Explained
5/14
20. HA terminology
RPO: Recovery Point Objective; how much data can we loose?
RTO: Recovery Time Objective; how long does it take to recover?
Maciej Lasyk, High Availability Explained
5/14
21. HA terminology
RPO: Recovery Point Objective; how much data can we loose?
RTO: Recovery Time Objective; how long does it take to recover?
MTBF: Mean-Times-Between-Failures; time between failures
(density fnc -> reliability fnc)
https://en.wikipedia.org/wiki/Mean_time_between_failures
Maciej Lasyk, High Availability Explained
5/14
22. HA terminology
SLA: Service Level Agreement;
formal definitions (customer <-> provider)
Maciej Lasyk, High Availability Explained
5/14
23. HA terminology
SLA: Service Level Agreement;
formal definitions (customer <-> provider)
OLA: Operational Level Agreement; definitions within organization;
help us keeping provided SLAs
Maciej Lasyk, High Availability Explained
5/14
24. SLAs..
So what is written in SLAs?
Availability
Downtime (year)
Downtime (month)
90%
36.5 days
72 hours
95%
18.25 days
36 hours
97%
10.96 days
21.6 hours
98%
7.30 days
14.4 hours
99%
3.65 days
7.2 hours
99.5% (EC2, EBS)
1.83 days
3.6 hours
99.8%
17.52 hours
86.23 minutes
99.9% (SoftLayer, IBM)
4.38 hours
21.56 minutes
99.99
52.56 minutes
4.32 minutes
99.999
5.26 minutes
25.9 seconds
Maciej Lasyk, High Availability Explained
5/14
25. SLAs..
So what is written in SLAs?
Availability
Downtime (year)
Downtime (month)
90%
36.5 days
72 hours
95%
18.25 days
36 hours
97%
10.96 days
21.6 hours
98%
7.30 days
14.4 hours
99%
3.65 days
7.2 hours
99.5% (EC2, EBS)
1.83 days
3.6 hours
99.8%
17.52 hours
86.23 minutes
99.9% (SoftLayer, IBM)
4.38 hours
21.56 minutes
99.99
52.56 minutes
4.32 minutes
99.999
5.26 minutes
25.9 seconds
http://aws.amazon.com/ec2/sla/
http://www.softlayer.com/about/service-level-agreement
Maciej Lasyk, High Availability Explained
5/14
26. SLAs..
Availability mentioned in SLAs are only goals of service provider
Usually when it's not met than company pays off the fees
Maciej Lasyk, High Availability Explained
5/14
27. How deep is this hole?
app layer (core, db, cache)
data storage
operating system
hardware
networking
location
So we would like to achieve 99,9999% which is about 30s of downtime per year
Maciej Lasyk, High Availability Explained
6/14
28. How deep is this hole?
app layer (core, db, cache)
data storage
operating system
hardware
networking
location
Even Proof of Concept is very hard to provide: 5s of downtime per layer yearly!
Maciej Lasyk, High Availability Explained
6/14
31. th
th
LB – 4 layer or 7 ?
4th layer:
7th layer:
- high performance
- low cost
- just do the LB work!
- good for quickfixes / patches
- reliable
- not that scalable
- scalable
- low performance
- complex codebase
- custom code for protocols
- cookies? what about memcache..
Maciej Lasyk, High Availability Explained
8/14
36. Planning for failure
Everything starts here - DNS:
- keep TTLs low (300s). Can't make under 60min? That's bad!
- check SLA of DNS servers (dnsmadeeasy.com history)
- what do you know about DNSes?
- zero downtime here is a must!
- this can be achieved with complicated network abracadabra
- remember what 99.9999% means?
- round robin is a load – balancer but without failover!
- GSLB – killed by OS/browser/srvs cache'ing
(GlobalServerLoadBalancing)
- GlobalIP (SoftLayer etc) – workaround for GSLB via routing
Maciej Lasyk, High Availability Explained
10/14
37. Planning for failure
E-mail servers:
- it's simple as MX records (delivering)
- it's almost simple as complicated system of SMTP servers (sending)
- it's not that simple when IMAP locking over DFS (reading)
5 gmail-smtp-in.l.google.com.
10 alt1.gmail-smtp-in.l.google.com.
20 alt2.gmail-smtp-in.l.google.com.
30 alt3.gmail-smtp-in.l.google.com.
40 alt4.gmail-smtp-in.l.google.com.
When MXing – watch the spam!
Maciej Lasyk, High Availability Explained
10/14
38. Planning for failure
WEB servers:
- it's simple as some frontend loadbalancer
- did you really stick user session to particular server? Memcache!
- LB balancing algorithm
- how many Lbs?
- what if LB goes down?
Maciej Lasyk, High Availability Explained
10/14
39. Planning for failure
DB servers:
- it's.. not that simple
- replication (master – master? App should be aware..)
- replication ring? Complicated, works, but in case of failure...
- let's talk about MySQL:
- NoSPOF solution: MySQL cluster
- MySQL Galera cluster – synch, active-active multi-master
- master – master – simply works
- Failover? Matsunobu Yoshinori mysql-master-ha
- MySQL utilities (http://www.clusterdb.com/mysql/mysql-utilities-webinar-qa-replay-now-available/)
Maciej Lasyk, High Availability Explained
10/14
40. Planning for failure
Caching servers:
- this is cache for God's sake – why would we use HA here?
- just use proper architecture like... redundancy.
Maciej Lasyk, High Availability Explained
10/14
41. Planning for failure
Caching servers:
- this is cache for God's sake – why would we use HA here?
- just use proper architecture like... redundancy.
Load – balancers:
- remember about failovering IP addresses!
Maciej Lasyk, High Availability Explained
10/14
42. Planning for failure
Caching servers:
- this is cache for God's sake – why would we use HA here?
- just use proper architecture like... redundancy.
Load – balancers:
- remember about failovering IP addresses!
Storage – DFSes:
- GlusterFS – we'll see it in action in a minute
- NFS? Could be – over some SAN / NAS (high cost solution)
- CephFS – just like GlusterFS – it's great and does the work
- DRBD – lower level, does the work on block – device layer – slow...
Maciej Lasyk, High Availability Explained
10/14
44. Planning for failure
GlusterFS: replicated volumes vs Geo-replication
- replicated:
- mirrors data
- provides HA
- synch – replication
- Geo-replication:
- mirrors data across geo – distributed clusters
- ensures backing up data for DR
- asynch – replica (periodic checks)
Maciej Lasyk, High Availability Explained
10/14
45. Planning for failure
HA for virtualization solutions?
- it's really complicated, like...
Maciej Lasyk, High Availability Explained
11/14
46. Planning for failure
HA for virtualization solutions?
- it's really complicated, like...
Maciej Lasyk, High Availability Explained
11/14
47. Tools
The most important tool would be the conclusion from the picture below:
Maciej Lasyk, High Availability Explained
12/14
48. Tools
The most important tool would be the conclusion from the picture below:
Maciej Lasyk, High Availability Explained
12/14
49. Tools
The most important tool would be the conclusion from the picture below:
Maciej Lasyk, High Availability Explained
12/14
50. Tools
- DNS: roundrobin, GSLB, low ttls, globalIP
Maciej Lasyk, High Availability Explained
12/14
55. Turn on HA thinking!
Main goal of HA? Improve user experience!
- keep the app fully functional
- keep the app resistant and tolerant to faults
- provide method for a successful audit
- sleep well (anyone awake?) ;)
Maciej Lasyk, High Availability Explained
13/14
56. Thank you :)
High Availability Explained
Maciej Lasyk
Kraków, devOPS meetup #2
2014-01-28
http://maciek.lasyk.info/sysop
maciek@lasyk.info
@docent-net
Maciej Lasyk, High Availability Explained
14/14