"Kafka data pipeline maintenance can be painful.
It usually comes with complicated and lengthy recovery processes, scaling difficulties, traffic ‘moodiness’, and latency issues after downtimes and outages.
It doesn’t have to be that way!
We’ll examine one of our multi-petabyte scale Kafka pipelines, and go over some of the pitfalls we’ve encountered. We’ll offer solutions that alleviate those problems, and go over comparisons between the before and after . We’ll then explain why some common sense solutions do not work well and offer an improved, scalable and resilient way of processing your stream.
We’ll cover:
• Costs of processing in stream compared to in batch
• Scaling out for bursts and reprocessing
• Making the tradeoff between wait times and costs
• Recovering from outages
• And much more…"
Report
Share
Report
Share
1 of 76
Download to read offline
More Related Content
Similar to Scaling your Kafka streaming pipeline can be a pain - but it doesn’t have to be!! with Opher Dubrovsky and Ido Nadler | Kafka Summit London 2022
AWS provides a range of Compute Services – Amazon EC2, Amazon ECS and AWS Lambda. We will provide an intro level overview of these services and highlight suitable use cases. Amazon Elastic Compute Cloud (Amazon EC2) itself provides a broad selection of instance types to accommodate a diverse mix of workloads. Going a bit deeper on EC2 we will provide background on the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current-generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances, both from a performance and cost perspective.
1) AWS provides a range of application and infrastructure services including compute, storage, database, and networking.
2) Amazon Redshift is a fast, powerful petabyte-scale data warehouse service that is delivered as a managed service.
3) Jaspersoft integrates seamlessly with Amazon Redshift through automatic discovery of the PostgreSQL SQL driver and can provide business intelligence capabilities in the cloud for less than $2 per hour.
Scylla Summit 2018: OLAP or OLTP? Why Not Both?ScyllaDB
OLTP and Analytics are very different. One is characterized by many concurrent small requests, with a high sensitivity to latency, while the other typically processes large streams of data with more emphasis on throughput.
The talk will cover:
- the different requirements of the two workloads
- how ScyllaDB optimizes for both
- performance isolation of different workloads within ScyllaDB
- how ScyllaDB supports concurrent OLTP and Analytics without sacrificing either latency or throughput
- measurements
Tsinghua University: Two Exemplary Applications in ChinaDataStax Academy
In this talk, we will share the experiences of applying Cassandra with two real customers in China. In the first use case, we deployed Cassandra at Sany Group, a leading company of Machinery manufacturing, to manage the sensor data generated by construction machinery. By designing a specific schema and optimizing the write process, we successfully managed over 1.5 billion historical data records and achieved the online write throughput of 10k write operations per second with 5 servers. MapReduce is also used on Cassandra for valued-added services, e.g. operations management, machine failure prediction, and abnormal behavior mining. In the second use case, Cassandra is deployed in the China Meteorological Administration to manage the Meteorological data. We design a hybrid schema to support both slice query and time window based query efficiently. Also, we explored the optimized compaction and deletion strategy for meteorological data in this case.
Business in a Flash: How to increase performance and lower costs in the data...Violin Memory
Find out how Flash fabric architecture improves performance and dramatically lowers costs in the data center.
In this presentation, you will learn about
* Storage challenges in application deployments
* Flash fabric architecture
* Revolution of economics in the data center
* Case studies: a global telcom company, Juniper Networks, Fortune 500 retailer, multiple quotations about improved performance and lowered costs from real customers
Cost is often the conversation starter when customers think about moving to the cloud. AWS helps lower costs for customers through its “pay only for what you use” pricing model, frequent price drops, and pricing model choice to support variable & stable workloads. In this session, you will learn about the financial considerations of owning and operating a traditional data center or managed hosting provider versus utilizing AWS. We will detail our TCO methodology and showcase cost comparisons for some common customer use-cases. We’ll also cover a few AWS cost optimization areas, including Spot and Reserved Instances, EC2 Auto Scaling, and consolidated billing.
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...Amazon Web Services
With Amazon EC2, you have the flexibility to mix-and-match purchasing models to suit your business needs. By combining pay-as-you-go (On-Demand), reserve ahead of time for discounts (Reserved), and high-discount spare capacity (Spot) purchasing models, you can optimize cost, grow your compute capacity and throughput, and enable new types of cloud computing applications. This presentation will guide you on how to achieve high performance and availability at the lowest TCO. We will explore how to best combine EC2's purchasing models across several common applications with immediately actionable takeaways.
1. Microsoft Azure StorSimple provides a hybrid cloud storage solution that connects on-premises servers to Azure Storage with no application changes, allowing inactive data to be automatically tiered to the cloud.
2. It offers benefits like 40-60% lower storage costs, simplified data protection and disaster recovery, and increased business agility.
3. The solution includes StorSimple hybrid storage arrays, StorSimple Manager for consolidated management, and StorSimple Virtual Appliance for accessing enterprise data from Azure.
Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
In the last decade, the development of modern horizontally scalable open-source Big Data technologies such as Apache Cassandra (for data storage), and Apache Kafka (for data streaming) enabled cost-effective, highly scalable, reliable, low-latency applications, and made these technologies increasingly ubiquitous. To enable reliable horizontal scalability, both Cassandra and Kafka utilize partitioning (for concurrency) and replication (for reliability and availability) across clustered servers. But building scalable applications isn’t as easy as just throwing more servers at the clusters, and unexpected speed humps are common. Consequently, you also need to understand the performance impact of partitions, replication, and clusters; monitor the correct metrics to have an end-to-end view of applications and clusters; conduct careful benchmarking, and scale and tune iteratively to take into account performance insights and optimizations. In this presentation, I will explore some of the performance goals, challenges, solutions, and results I discovered over the last 5 years building multiple realistic demonstration applications. The examples will include trade-offs with elastic Cassandra auto-scaling, scaling a Cassandra and Kafka anomaly detection application to 19 Billion checks per day, and building low-latency streaming data pipelines using Kafka Connect for multiple heterogeneous source and sink systems.
Invited keynote for 5th Workshop on Hot Topics in Cloud Computing Performance (HotCloudPerf 2022) https://hotcloudperf.spec.org/ at ICPE 2022 https://icpe2022.spec.org/
This document summarizes a presentation about Amazon Aurora. It discusses how Aurora provides the speed and availability of commercial databases at a lower cost than open source databases. Aurora is a MySQL and PostgreSQL compatible database that is managed as a service, automating administrative tasks. It utilizes a distributed, self-healing storage system to provide high availability and durability across availability zones.
This document discusses how to reduce spending on AWS through various techniques:
1. Paying for cloud resources only when they are used through the pay-as-you-go model avoids upfront costs and allows turning off unused capacity.
2. Using reserved instances when capacity needs are predictable provides significant discounts compared to on-demand pricing.
3. Architecting applications in a "cost aware" manner, such as leveraging caching, auto-scaling, managed services, and right-sizing instances can optimize costs.
4. Taking advantage of AWS's economies of scale through consolidated billing and free services helps lower overall spend. Planning workload usage of spot instances can achieve up to 85% savings.
The document announces and provides details about Amazon Elastic File System (EFS), a new fully managed file storage service for Amazon EC2 instances. Some key points:
- EFS provides a simple way to set up and scale shared file storage for use with EC2 instances.
- It replicates data across Availability Zones for high availability and durability. Storage grows elastically up to petabyte scale on demand.
- EFS offers very high throughput and low latency SSD-based storage that supports thousands of concurrent NFS connections.
- A preview of EFS is available in winter 2015, with customers able to register on the AWS website.
This document discusses database backup and disaster recovery in the cloud. It defines key terms like disaster recovery (DR), which is preparing for and recovering from disasters, and defines a disaster as any event negatively impacting business continuity or finances. It notes statistics on businesses failing after data loss or disasters without recovery plans. Recovery Time Objective (RTO) is the time to restore systems after a disaster, while Recovery Point Objective (RPO) is acceptable data loss in time. The document discusses different disaster recovery strategies and their costs and impact on RTO and RPO, including local backups, online backups, pilot light, warm standby, and moving applications to the cloud for recovery.
Slides from the High Performance Cloud Computing tutorial at Supercomputing 2011 in Seattle. Additional materials available from: cloudsupercomputing.net.
Intro to Joyent's Manta Object Storage ServiceRod Boothby
Rod Boothby introduces Manta, Joyent's new object storage service. Some key points:
- Manta provides scalable object storage like Amazon S3 but allows running analytics jobs directly on the stored data, eliminating the need to move data for processing.
- With Manta, big data analytics can be done in a single step versus the complex multi-step process required with AWS S3.
- Manta offers competitive pricing and partnerships with file interface providers for easy adoption. Analytics jobs on Manta are billed based on compute resources used.
- Manta uses a REST API and SDKs for common languages, providing easy access and integration with applications and data platforms. Underlying technology includes
AWS has different pricing models to match your needs. One example is the different instance types available such as On-Demand, Reserved and Spot Instances. Customers can develop cost-saving strategies based upon their usage patterns, models and growth expectations. In some cases, a set of larger instances can be cheaper than multiple small instances. Learn how to size your AWS applications to maximize your use and minimize your spend. Companies such as Pinterest take very active roles to constantly reduce their spend; learn how they do it and develop your own cost-saving approaches.
Amazon Elastic Compute Cloud (Amazon EC2) provides a broad selection of instance types to accommodate a diverse mix of workloads. In this technical session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current-generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances. Other Compute options such as ECS and Lambda for processing in the cloud will be introduced and explained at a high level.
AWS provides a range of Compute Services – Amazon EC2, Amazon ECS and AWS Lambda. We will provide an intro level overview of these services and highlight suitable use cases. Amazon Elastic Compute Cloud (Amazon EC2) itself provides a broad selection of instance types to accommodate a diverse mix of workloads. Going a bit deeper on EC2 we will provide background on the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current-generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances, both from a performance and cost perspective.
Similar to Scaling your Kafka streaming pipeline can be a pain - but it doesn’t have to be!! with Opher Dubrovsky and Ido Nadler | Kafka Summit London 2022 (20)
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
"In this talk, attendees will be provided with an introduction to Kafka Connect and the basics of Single Message Transforms (SMTs) and how they can be used to transform data streams in a simple and efficient way. SMTs are a powerful feature of Kafka Connect that allow custom logic to be applied to individual messages as they pass through the data pipeline. The session will explain how SMTs work, the types of transformations they can be used for, and how they can be applied in a modular and composable way.
Further, the session will discuss where SMTs fit in with Kafka Connect and when they should be used. Examples will be provided of how SMTs can be used to solve common data integration challenges, such as data enrichment, filtering, and restructuring. Attendees will also learn about the limitations of SMTs and when it might be more appropriate to use other tools or frameworks.
Additionally, an overview of the alternatives to SMTs, such as Kafka Streams and KSQL, will be provided. This will help attendees make an informed decision about which approach is best for their specific use case.
Whether attendees are developers, data engineers, or data scientists, this talk will provide valuable insights into how Kafka Connect and SMTs can help streamline data processing workflows. Attendees will come away with a better understanding of how these tools work and how they can be used to solve common data integration challenges."
"While Apache Kafka lacks native support for topic renaming, there are scenarios where renaming topics becomes necessary. This presentation will delve into the utilization of MirrorMaker 2.0 as a solution for renaming Kafka topics. It will illustrate how MirrorMaker 2.0 can efficiently facilitate the migration of messages from the old topic to the new one and how Kafka Connect Metrics can be employed to monitor the mirroring progress. The discussion will encompass the complexity of renaming Kafka topics, addressing certain limitations, and exploring potential workarounds when using MirrorMaker 2.0 for this purpose. Despite not being originally designed for topic renaming, MirrorMaker 2.0 has a suitable solution for renaming Kafka topics.
Blog Post : https://engineering.hellofresh.com/renaming-a-kafka-topic-d6ff3aaf3f03"
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
"Trendyol, Turkey's leading e-commerce company, is committed to positively impacting the lives of millions of customers. Our decision-making processes are entirely driven by data. As a data warehouse team, our primary goal is to provide accurate and up-to-date data, enabling the extraction of valuable business insights.
We utilize the benefits provided by Kafka and Kafka Connect to facilitate the transfer of data from the source to our analytical environment. We recently transitioned our Kafka Connect clusters from on-premise VMs to Kubernetes. This shift was driven by our desire to effectively manage rapid growth(marked by a growing number of producers, consumers, and daily messages), ensuring proper monitoring and consistency. Consistency is crucial, especially in instances where we employ Single Message Transforms to manipulate records like filtering based on their keys or converting a JSON Object into a JSON string.
Monitoring our cluster's health is key and we achieve this through Grafana dashboards and alerts generated through kube-state-metrics. Additionally, Kafka Connect's JMX metrics, coupled with NewRelic, are employed for comprehensive monitoring.
The session will aim to explain our approach to NRT data ingestion, outlining the role of Kafka and Kafka Connect, our transition journey to K8s, and methods employed to monitor the health of our clusters."
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
"Join our lightning talk to delve into the strategies vital for maintaining a resilient Kafka service.
While proactive monitoring is key for issue prevention, failures will still occur. Rapid detection tools will enable you to identify and resolve problems before they impact end-users. This session explores the techniques employed by Kafka cloud providers for this detection, many of which are also applicable if you are managing independent Kafka clusters or applications.
The talk focuses on health-checking, a powerful tool that encompasses an application and its monitoring to validate Kafka environment availability. The session navigates through Kafka health-check methods, sharing best practices, identifying common pitfalls, and highlighting the monitoring of critical performance metrics like throughput and latency for early issue detection.
Attendees will gain valuable insights into the art of health-checking their Kafka environment, equipping them with the tools to identify and address issues before they escalate into critical problems. We invite all Kafka enthusiasts to join us in this talk to foster a deeper understanding of Kafka health-checking and ensure the continued smooth operation of your Kafka environment."
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
"Stream processing systems traditionally gave their users the choice between at least once processing and at most once processing: accepting duplicate data or missing data. But ideally we would provide exactly-once processing, where every event in the input data is represented exactly once in the output.
Kafka provides a transaction API that enables exactly-once when using Kafka as your source and sink. But this API has turned out to not be well suited for use by high level streaming systems, requiring various work arounds to still provide transactional processing.
In this talk, I’ll cover how the transaction API works, and how systems like Arroyo and Flink have used it to build exactly-once support, and how improvements to the transactional API will enable better end-to-end support for consistent stream processing."
"In this talk, we will explore the exciting world of IoT and computer vision by presenting a unique project: Fish Plays Pokemon. Using an ESP Eye camera connected to an ESP32 and other IoT devices, to monitor fish's movements in an aquarium.
This project showcases the power of IoT and computer vision, demonstrating how even a fish can play a popular video game. We will discuss the challenges we faced during development, including real-time processing, IoT device integration, and Kafka message consumption.
By the end of the talk, attendees will have a better understanding of how to combine IoT, computer vision, and the usage of a serverless cloud to create innovative projects. They will also learn how to integrate IoT devices with Kafka to simulate keyboard behavior, opening up endless possibilities for real-time interactions between the physical and digital worlds."
What is tiered storage and what is it good for? After this session you will know how to leverage the tiered storage feature to enable longer retention than the storage attached to brokers allows. You will get acquainted with the different configuration options and know what to expect when you enable the feature, like for example when will the first upload to the remote object storage take place.
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
"Real-time 24/7 monitoring and verification of massive data is challenging – even more so for the world’s second largest manufacturer of memory chips and semiconductors. Tolerance levels are incredibly small, any small defect needs to be identified and dealt with immediately. The goal of semiconductor manufacturing is to improve yield and minimize unnecessary work.
However, even with real-time data collection, the data was not easy to manipulate by users and it took many days to enable stream processing requests – limiting its usefulness and value to the business.
You’ll hear why SK hynix switched to Confluent and how we developed a self-service stream process portal on top of it. Now users have an easy-to-use service to manipulate the data they want.
Results have been impressive, stream processing requests are available the same day – previously taking 5 days! We were also able to drive down costs by 10% as stream processing requests no longer require additional hardware.
What you’ll take away from our talk:
- What were the pain points in the previous environment
- How we transitioned to Confluent without service downtime
- Creating a self-service stream processing portal built on top of Connect and ksqlDB
- Use case of stream process portal"
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
"Discover how default configurations might impact ingestion times, especially when dealing with large files. We'll explore a real-world scenario with a 20,000,000+ line file, assessing metrics and exploring the bottleneck in the default setup. Understand the intricacies of batch size calculations and how to optimize them based on your unique data characteristics.
Walk away with actionable insights as we showcase a practical example, turning a 7-hour ingestion process into a mere 30 minutes for over 30,000,000 records in a Kafka topic. Uncover metrics, configurations, and best practices to elevate the performance of your Kafka Connect CSV source connectors. Don't miss this opportunity to optimize your data pipeline and ensure smooth, efficient data flow."
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
"In order to meet the current and ever-increasing demand for near-zero RPO/RTO systems, a focus on resiliency is critical. While Kafka offers built-in resiliency features, a perfect blend of client and cluster resiliency is necessary in order to achieve a highly resilient Kafka client application.
At Fidelity Investments, Kafka is used for a variety of event streaming needs such as core brokerage trading platforms, log aggregation, communication platforms, and data migrations. In this lightening talk, we will discuss the governance framework that has enabled producers and consumers to achieve their SLAs during unprecedented failure scenarios. We will highlight how we automated resiliency tests through chaos engineering and tightly integrated observability dashboards for Kafka clients to analyze and optimize client configurations. And finally, we will summarize the chaos test suite and the ""test, test and test"" mantra that are helping Fidelity Investments reach its goal of a future with zero down-time."
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
"There are various strategies for securely connecting to Kafka clusters between different networks or over the public internet. Many cloud providers even offer endpoints that privately route traffic between networks and are not exposed to the internet. But, depending on your network setup and how you are running Kafka, these options ... might not be an option!
In this session, we’ll discuss how you can use SSH bastions or a self managed PrivateLink endpoint to establish connectivity to your Kafka clusters without exposing brokers directly to the internet. We explain the required network configuration, and show how we at Materialize have contributed to librdkafka to simplify these scenarios and avoid fragile workarounds."
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
"In my talk, we will examine all the stages of building our self-service Streaming Data Platform based on Apache Flink and Kafka Connect, from the selection of a solution for stateful streaming data processing, right up to the successful design of a robust self-service platform, covering the challenges that we’ve met.
I will share our experience in providing non-Java developers with a company-wide self-service solution, which allows them to quickly and easily develop their streaming data pipelines.
Additionally, I will highlight specific business use cases that would not have been implemented without our platform.0 characters0 characters"
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
"Almost everyone has heard about large language models, and tens of millions of people have tried out OpenAI ChatGPT and Google Bard. However, the intricate architecture and underlying mathematics driving these remarkable systems remain elusive to many.
LLM's are fascinating - so let's grab a drink and find out how these systems are built and dive deep into their inner workings. In the length of time it to enjoy a round of drinks, you'll understand the inner workings of these models. We'll take our first sip of word vectors, enjoy the refreshing taste of the transformer, and drain a glass understanding how these models are trained on phenomenally large quantities of data.
Large language models for your streaming application - explained with a little maths and a lot of pub stories"
"Monitoring is a fundamental operation when running Kafka and Kafka applications in production. There are numerous metrics available when using Kafka, however the sheer number is overwhelming, making it challenging to know where to start and how to properly utilise them.
This session will introduce you to some of the key metrics that should be monitored and best practices in fine tuning your monitoring. We will delve into which metrics are the indicators for cluster’s availability and performance and are the most helpful when debugging client applications."
Kafka Streams relies on state restoration for maintaining standby tasks as failure recovery mechanism as well as for restoring the state after rebalance scenarios. When you are scaling up or down your application instances, it is necessary to know the current state of the restoration process for each active and standby task in order to prevent a long restoration process as much as possible. During this presentation, you will get an understanding of how KIP-869 provides valuable information about the current active task restoration after a rebalance and KIP-988 opens a window to the continuous process of standby restoration. When you encounter a situation in which you need to choose whether or not to scale up or down your application instances, both KIPs will be an invaluable ally for you.
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
"In this talk, we will dive into the world of Kafka producer configs and explore how to understand and optimize them for better performance. We will cover the different types of configs, their impact on performance, and how to tune them to achieve the best results. Whether you're new to Kafka or a seasoned pro, this session will provide valuable insights and practical tips for improving your Kafka producer performance.
- Introduction to Kafka producer internal and workflow
- Understanding the producer configs like linger.ms, batch.size, buffer.memory and their impact on performance
- Learning about producer configs like max.block.ms, delivery.timeout.ms, request.timeout.ms and retries to make producer more resilient.
- Discuss configs like enable.idempotence, max.in.flight.requests.per.connection and transaction related configs to achieve delivery guarantees.
- Q&A session with attendees to address specific questions and concerns."
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
"Data contracts are one of the hottest topics in the data management community. A data contract is a formal agreement between a data producer and its consumers, aimed at reducing data downtime and improving data quality. Schemas are an important part of data contracts, but they are not the only relevant element.
In this talk, we’ll:
1. see why data contracts are so important but also difficult to implement;
2. identify the characteristics of a well-designed data contract:
discuss the anatomy of a data contract, its main elements and, how to formally describe them;
3. show how to manage the lifecycle of a data contract leveraging Confluent Platform's services."
"In the realm of stateful stream processing, Apache Flink has emerged as a powerful and versatile platform. However, the conventional SQL-based approach often limits the full potential of Flink applications.
We will delve into the benefits of adopting a code-first approach, which provides developers with greater control over application logic, facilitates complex transformations, and enables more efficient handling of state and time. We will also discuss how the code-first approach can lead to more maintainable and testable code, ultimately improving the overall quality of your Flink applications.
Whether you're a seasoned Flink developer or just starting your journey, this talk will provide valuable insights into how a code-first approach can revolutionize your stream processing applications."
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
"Change Data Capture (CDC) has become a commodity in data engineering, much in part due to the ever-rising success of Debezium [1]. But is that all there is? In this lightning talk, we’ll outline the current state of the CDC ecosystem, and understand why adopting a Debezium alternative is still a hard sell. If you’ve ever wondered what else is out there, but can’t keep up with the sprawling of new tools in the ecosystem; we’ll wrap it up for you!
[1] https://debezium.io/"
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
"Separation of compute and storage has become the de-facto standard in the data industry for batch processing.
The addition of tiered storage to open source Apache Kafka is the first step in bringing true separation of compute and storage to the streaming world.
In this talk, we'll discuss in technical detail how to take the concept of tiered storage to its logical extreme by building an Apache Kafka protocol compatible system that has zero local disks.
Eliminating all local disks in the system requires not only separating storage from compute, but also separating data from metadata. This is a monumental task that requires reimagining Kafka's architecture from the ground up, but the benefits are worth it.
This approach enables a stateless, elastic, and serverless deployment model that minimizes operational overhead and also drives inter-zone networking costs to almost zero."
Video traffic on the Internet is constantly growing; networked multimedia applications consume a predominant share of the available Internet bandwidth. A major technical breakthrough and enabler in multimedia systems research and of industrial networked multimedia services certainly was the HTTP Adaptive Streaming (HAS) technique. This resulted in the standardization of MPEG Dynamic Adaptive Streaming over HTTP (MPEG-DASH) which, together with HTTP Live Streaming (HLS), is widely used for multimedia delivery in today’s networks. Existing challenges in multimedia systems research deal with the trade-off between (i) the ever-increasing content complexity, (ii) various requirements with respect to time (most importantly, latency), and (iii) quality of experience (QoE). Optimizing towards one aspect usually negatively impacts at least one of the other two aspects if not both. This situation sets the stage for our research work in the ATHENA Christian Doppler (CD) Laboratory (Adaptive Streaming over HTTP and Emerging Networked Multimedia Services; https://athena.itec.aau.at/), jointly funded by public sources and industry. In this talk, we will present selected novel approaches and research results of the first year of the ATHENA CD Lab’s operation. We will highlight HAS-related research on (i) multimedia content provisioning (machine learning for video encoding); (ii) multimedia content delivery (support of edge processing and virtualized network functions for video networking); (iii) multimedia content consumption and end-to-end aspects (player-triggered segment retransmissions to improve video playout quality); and (iv) novel QoE investigations (adaptive point cloud streaming). We will also put the work into the context of international multimedia systems research.
Blockchain and Cyber Defense Strategies in new genre timesanupriti
Explore robust defense strategies at the intersection of blockchain technology and cybersecurity. This presentation delves into proactive measures and innovative approaches to safeguarding blockchain networks against evolving cyber threats. Discover how secure blockchain implementations can enhance resilience, protect data integrity, and ensure trust in digital transactions. Gain insights into cutting-edge security protocols and best practices essential for mitigating risks in the blockchain ecosystem.
Navigating Post-Quantum Blockchain: Resilient Cryptography in Quantum Threatsanupriti
In the rapidly evolving landscape of blockchain technology, the advent of quantum computing poses unprecedented challenges to traditional cryptographic methods. As quantum computing capabilities advance, the vulnerabilities of current cryptographic standards become increasingly apparent.
This presentation, "Navigating Post-Quantum Blockchain: Resilient Cryptography in Quantum Threats," explores the intersection of blockchain technology and quantum computing. It delves into the urgent need for resilient cryptographic solutions that can withstand the computational power of quantum adversaries.
Key topics covered include:
An overview of quantum computing and its implications for blockchain security.
Current cryptographic standards and their vulnerabilities in the face of quantum threats.
Emerging post-quantum cryptographic algorithms and their applicability to blockchain systems.
Case studies and real-world implications of quantum-resistant blockchain implementations.
Strategies for integrating post-quantum cryptography into existing blockchain frameworks.
Join us as we navigate the complexities of securing blockchain networks in a quantum-enabled future. Gain insights into the latest advancements and best practices for safeguarding data integrity and privacy in the era of quantum threats.
Coordinate Systems in FME 101 - Webinar SlidesSafe Software
If you’ve ever had to analyze a map or GPS data, chances are you’ve encountered and even worked with coordinate systems. As historical data continually updates through GPS, understanding coordinate systems is increasingly crucial. However, not everyone knows why they exist or how to effectively use them for data-driven insights.
During this webinar, you’ll learn exactly what coordinate systems are and how you can use FME to maintain and transform your data’s coordinate systems in an easy-to-digest way, accurately representing the geographical space that it exists within. During this webinar, you will have the chance to:
- Enhance Your Understanding: Gain a clear overview of what coordinate systems are and their value
- Learn Practical Applications: Why we need datams and projections, plus units between coordinate systems
- Maximize with FME: Understand how FME handles coordinate systems, including a brief summary of the 3 main reprojectors
- Custom Coordinate Systems: Learn how to work with FME and coordinate systems beyond what is natively supported
- Look Ahead: Gain insights into where FME is headed with coordinate systems in the future
Don’t miss the opportunity to improve the value you receive from your coordinate system data, ultimately allowing you to streamline your data analysis and maximize your time. See you there!
The Rise of Supernetwork Data Intensive ComputingLarry Smarr
Invited Remote Lecture to SC21
The International Conference for High Performance Computing, Networking, Storage, and Analysis
St. Louis, Missouri
November 18, 2021
this resume for sadika shaikh bca studentSadikaShaikh7
I am a dedicated BCA student with a strong foundation in web technologies, including PHP and MySQL. I have hands-on experience in Java and Python, and a solid understanding of data structures. My technical skills are complemented by my ability to learn quickly and adapt to new challenges in the ever-evolving field of computer science.
AC Atlassian Coimbatore Session Slides( 22/06/2024)apoorva2579
This is the combined Sessions of ACE Atlassian Coimbatore event happened on 22nd June 2024
The session order is as follows:
1.AI and future of help desk by Rajesh Shanmugam
2. Harnessing the power of GenAI for your business by Siddharth
3. Fallacies of GenAI by Raju Kandaswamy
UiPath Community Day Kraków: Devs4Devs ConferenceUiPathCommunity
We are honored to launch and host this event for our UiPath Polish Community, with the help of our partners - Proservartner!
We certainly hope we have managed to spike your interest in the subjects to be presented and the incredible networking opportunities at hand, too!
Check out our proposed agenda below 👇👇
08:30 ☕ Welcome coffee (30')
09:00 Opening note/ Intro to UiPath Community (10')
Cristina Vidu, Global Manager, Marketing Community @UiPath
Dawid Kot, Digital Transformation Lead @Proservartner
09:10 Cloud migration - Proservartner & DOVISTA case study (30')
Marcin Drozdowski, Automation CoE Manager @DOVISTA
Pawel Kamiński, RPA developer @DOVISTA
Mikolaj Zielinski, UiPath MVP, Senior Solutions Engineer @Proservartner
09:40 From bottlenecks to breakthroughs: Citizen Development in action (25')
Pawel Poplawski, Director, Improvement and Automation @McCormick & Company
Michał Cieślak, Senior Manager, Automation Programs @McCormick & Company
10:05 Next-level bots: API integration in UiPath Studio (30')
Mikolaj Zielinski, UiPath MVP, Senior Solutions Engineer @Proservartner
10:35 ☕ Coffee Break (15')
10:50 Document Understanding with my RPA Companion (45')
Ewa Gruszka, Enterprise Sales Specialist, AI & ML @UiPath
11:35 Power up your Robots: GenAI and GPT in REFramework (45')
Krzysztof Karaszewski, Global RPA Product Manager
12:20 🍕 Lunch Break (1hr)
13:20 From Concept to Quality: UiPath Test Suite for AI-powered Knowledge Bots (30')
Kamil Miśko, UiPath MVP, Senior RPA Developer @Zurich Insurance
13:50 Communications Mining - focus on AI capabilities (30')
Thomasz Wierzbicki, Business Analyst @Office Samurai
14:20 Polish MVP panel: Insights on MVP award achievements and career profiling
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfjackson110191
These fighter aircraft have uses outside of traditional combat situations. They are essential in defending India's territorial integrity, averting dangers, and delivering aid to those in need during natural calamities. Additionally, the IAF improves its interoperability and fortifies international military alliances by working together and conducting joint exercises with other air forces.
Quality Patents: Patents That Stand the Test of TimeAurora Consulting
Is your patent a vanity piece of paper for your office wall? Or is it a reliable, defendable, assertable, property right? The difference is often quality.
Is your patent simply a transactional cost and a large pile of legal bills for your startup? Or is it a leverageable asset worthy of attracting precious investment dollars, worth its cost in multiples of valuation? The difference is often quality.
Is your patent application only good enough to get through the examination process? Or has it been crafted to stand the tests of time and varied audiences if you later need to assert that document against an infringer, find yourself litigating with it in an Article 3 Court at the hands of a judge and jury, God forbid, end up having to defend its validity at the PTAB, or even needing to use it to block pirated imports at the International Trade Commission? The difference is often quality.
Quality will be our focus for a good chunk of the remainder of this season. What goes into a quality patent, and where possible, how do you get it without breaking the bank?
** Episode Overview **
In this first episode of our quality series, Kristen Hansen and the panel discuss:
⦿ What do we mean when we say patent quality?
⦿ Why is patent quality important?
⦿ How to balance quality and budget
⦿ The importance of searching, continuations, and draftsperson domain expertise
⦿ Very practical tips, tricks, examples, and Kristen’s Musts for drafting quality applications
https://www.aurorapatents.com/patently-strategic-podcast.html
How Netflix Builds High Performance Applications at Global ScaleScyllaDB
We all want to build applications that are blazingly fast. We also want to scale them to users all over the world. Can the two happen together? Can users in the slowest of environments also get a fast experience? Learn how we do this at Netflix: how we understand every user's needs and preferences and build high performance applications that work for every user, every time.
An invited talk given by Mark Billinghurst on Research Directions for Cross Reality Interfaces. This was given on July 2nd 2024 as part of the 2024 Summer School on Cross Reality in Hagenberg, Austria (July 1st - 7th)
Transcript: Details of description part II: Describing images in practice - T...BookNet Canada
This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator.
Link to presentation recording and slides: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/
Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.
How to Avoid Learning the Linux-Kernel Memory ModelScyllaDB
The Linux-kernel memory model (LKMM) is a powerful tool for developing highly concurrent Linux-kernel code, but it also has a steep learning curve. Wouldn't it be great to get most of LKMM's benefits without the learning curve?
This talk will describe how to do exactly that by using the standard Linux-kernel APIs (locking, reference counting, RCU) along with a simple rules of thumb, thus gaining most of LKMM's power with less learning. And the full LKMM is always there when you need it!
25. Issue 4 - Slow Recovery
System Restored
Downtime
start
Recovery
Process
Retention
Timeout
Bomb Exploded. Data Lost !!!
Recovery
completed
Lost
Data
29. Common sense solutions:
Scaling the Pipeline
No Choice But to Add Partitions !
○ Add more consumer machines à requires partitions
○ Move to larger machines
30. Adding More Partitions
Max Throughput = (# Partitions) x (Consumer Throughput)
BUT
1. Implications for Kafka cluster performance
2. Small data files
3. Hard to scale down partitions
More Partitions More Consumers
31. Assume we need 2
days data retention
The Cost of Slow Recovery à Storage
Faster Burst Processing à Save $$$
Reprocessing 2 days
takes 3 days
We will need 5 days of data retention.
2.5X the cluster cost !!
$810,000 /y
47. Using Discreet Time Slots
time
4:00 5:00 6:00 7:00
3:00 8:00 9:00
Task
Task
Task
48. ● Consumer independence
○ No need to syncronize offsets
● Cost savings
○ Work at max throuput
○ Terminate when done
Benefits
49. Task (& Cluster) Independence
Processing
Hour
4:00
5:00
6:00
8:00
3:00
7:00
9:00
1 hour 2 hours 3 hours
Some tasks are short
No more paying for
idle time !
Some tasks >1 hour
Next task
50. Pay as You Go
time
Hour
4:00
5:00
6:00
8:00
3:00
7:00
9:00
3:00 4:00 5:00
Savings
6:00 7:00
51. Cluster Efficiency and Cost
Warm-up
~0% load
Processing
~90% load
Processing time
Cluster
termination
Total efficiency ~75%
Improve warm-up time à higher efficiency !!
52. Comparing to Previous System
Cluster
Capacity
Wasted
Processing
Efficiency was 30% ONLY !!
New System is 60% Cheaper !
65. Can We Use Multiple Consumers?
Consumer
Partition
Consumer group
NO !
66. Manage & Control Consumers’ Offsets
Offsets[1000, 10000]
Consumers
Partition
67. Scale Up – Hand Out the Offsets
Offsets[1000, 4000]
Consumers
Partition
Scale Up
Offsets[4001, 7000]
Offsets[7001, 10000]
68. Process Time Single Partition
0
50
100
150
200
250
300
350
400
1 2 3 4 5 6 7 8
Seconds
# of Consumers
Processing time / Consumers
Doubling of cores ~~ ½ Process time
72. Recap - What we’ve talked about
THE COST OF NOT SCALING UP / DOWN
KAFKA PIPELINE SCALING DIFFICULTIES
OPTIMIZING A KAFKA PIPELINE
ARCHITECTURE INSIGHTS
73. Recap - What we’ve talked about
o Streaming is expensive
o Fixed clusters are never perfect
o Off hours - $$$
o High loads - Slow
o Recovery is critical
o Isolation and parallelism allows you to
o Easily scale
o Save on costs
o Deal with loads
74. Take outs
Prepare for the
worse
Consider your
stream….
Split your work
into isolated
tasks
Remove
dependencies