How could one create very sophisticated, open - source based monitoring solution that is very scalable and easy to deploy?
I gave this talk during on of the biggest Linux conferences in Poland: 11 Linux Session which took place in Wrocław on 5/6-04-2013
This document discusses configuring Nagios monitoring with Chef automation. It begins with an overview of Bryan McLellan and why automation is important for operations. It then provides an introduction to Chef including its principles, basics like nodes, roles, and recipes, and how resources and providers work. It demonstrates a basic Apache recipe and use of search in Chef. The summary highlights the key topics covered in the document regarding automating Nagios configuration with Chef.
Nagios core vs. nagios xi presentation power point.pptx [diperbaiki]Fanky Christian
This document introduces Nagios XI, an enterprise-class monitoring solution built on the Nagios Core foundation. Nagios XI features a simplified web-based user interface that does not require advanced configuration knowledge. It allows for multi-tenancy and session-based authentication. Nagios XI also includes advanced visualization features, reporting capabilities, and the Core Config Manager for easily managing the Core monitoring engine. The document explains that Nagios XI saves time and money for users compared to Nagios Core due to its intuitive GUI, configuration wizards, and faster installation and monitoring.
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise EditionNagios
Landis+Gyr uses Nagios XI to monitor over 105,000 hosts and services across its global operations. It utilizes 20 Nagios servers with 14 dedicated to managed services and 5 for data centers. Key benefits of Nagios XI for Landis+Gyr include its user-friendly GUI, short learning curve, and ability to interface with other systems. Additional features such as Thruk, Active Directory integration, automated host management, custom plugins, and the NSClient++ agent enhance Landis+Gyr's monitoring capabilities. Backend APIs also allow integration with third-party tools.
June 24, 2014. At Velocity 2014, Fastly engineer Vladimir Vuksan gave an intro to Ganglia concepts (grid, clusters, hosts) as well as an installation of a sample monitoring grid. He also goes through the following commonly used visualization tools and how they may aid in detecting issues, identifying causes, and taking corrective action:
- Cluster/Grid Views
- Aggregate graphs
- Compare Hosts
- Custom graph functionality
- Views
- Interactive graphs
- Trending
- Nagios/Alerting system integration
- How to add metrics to Ganglia
- Different export formats such as JSON, CSV, and XML
Nagios Conference 2013 - Eric Stanley and Andy Brist - API and NagiosNagios
Eric Stanley and Andy Brist's presentation on API and Nagios.
The presentation was given during the Nagios World Conference North America held Sept 20-Oct 2nd, 2013 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
Time to say goodbye to your Nagios based setup. Discover all the new cool tools out there to do some more efficient monitoring. A talk made at OSMC 2014.
https://www.youtube.com/watch?v=_BAWi9Zhmic
Nagios Conference 2012 - Mike Weber - FailoverNagios
Mike Weber's presentation on using Nagios and High Availability.
The presentation was given during the Nagios World Conference North America held Sept 25-28th, 2012 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
Jenkins is an open source continuous integration tool written in Java. The project was forked from Hudson after a dispute with Oracle. Jenkins provides continuous integration services for software development.
Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...Nagios
Konstantin Benz's presentation on Monitoring Openstack The Relationship Between Nagios and Ceilometer.
The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference
OTechs Network Monitoring (Nagios) Training CourseOsman Suliman
This 30-hour course on Nagios monitoring is aimed at system administrators, network engineers, helpdesk technicians, and other IT professionals, and costs SD 3000 per trainee. The course covers introductory and advanced topics on Nagios including installation, using the web interface, plugins, configuration, notifications, passive checks, monitoring remote and SNMP-enabled hosts, and programming Nagios.
Nagios Conference 2011 - David Thomas - Know Its Broke Before Your Customers DoNagios
Dave Th presentation on Nagios in the datacenter. The presentation was given during the Nagios World Conference North America held Sept 27-29th, 2011 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
Altnix provides consulting, implementation and 24x7 maintenance services for Nagios monitoring solutions. Nagios is a leading open source software for end to end IT infrastructure monitoring including Servers, Network Devices, Databases and Applications. Altnix team has expertise on Nagios XI, Nagios Core, Fusion, Reactor, Incident Manager,Network Analyzer and Log Server
Nagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XINagios
Andy Brist's presentation on Data Visualizations and Nagios XI.
The presentation was given during the Nagios World Conference North America held Sept 20-Oct 2nd, 2013 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
Talk about using Ganglia and other tools for storing all kinds of web application metrics for both operations and business purposes. Presented at Cambridge Geek Night
Mike Weber's presentation on using Nagios with NRPE.
The presentation was given during the Nagios World Conference North America held Sept 25-28th, 2012 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
NagiosXI - Astiostech NagiosXI Event with NTT MSC CyberjayaSanjay Willie
IMPORTANT: Parts of these slides, its content and its materials are taken off the web. I do not claim rights to them. I am merely showcasing them for public knowledge. If you find any information or items that are your copyright, etc, please write in explanation and i will take them down. Thank you.
Nagios Conference 2011 - Mike Guthrie - Distributed Monitoring With NagiosNagios
Mike Guthrie's presentation on distributed monitoring solutions for Nagios. The presentation was given during the Nagios World Conference North America held Sept 27-29th, 2011 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
Janice Singh - Writing Custom Nagios Plugins - New to Nagios and wanting to expand its use with your own
custom plugins? This presentation will show you how to write your own plugins and integrate it into Nagios.
This document discusses the history and development of Docker. It notes that Docker was originally created at dotCloud as the engine for their Platform as a Service (PaaS), but in 2013 as PaaS times were hard, Docker was open sourced. Docker was based on LXC and created for a single purpose. dotCloud then pivoted to create Docker Inc. and make Docker their main product. The document also discusses Docker 1.11's integration with runC and systemd, as well as the transition to using the Open Container Initiative specification.
Programowanie AWSa z CLI, boto, Ansiblem i libcloudemMaciej Lasyk
The document describes a session that demonstrates how to program AWS using the AWS CLI, Boto, and Ansible. It provides an agenda for the session that includes a short AWS introduction, demonstrations of the AWS console, AWS CLI, AWS shell, Boto library, Ansible configuration management tool, and Libcloud library. Contact information is also provided for learning more about AWS programming and joining the training organization.
This document discusses Linux security and SELinux. It provides an overview of SELinux and how it works to provide mandatory access control on Linux systems. It discusses how SELinux labels processes and files to confine programs and prevent unauthorized access. It also discusses using SELinux with Docker containers to provide security isolation between containers.
Under the Dome (of failure driven pipeline)Maciej Lasyk
The document discusses various topics related to DevOps including:
1. Different types of shells (login, non-login, interactive, non-interactive, su, sudo su, sudo -i, sudo /bin/bash, sudo -s) and how they affect environment variables and profile files.
2. Stories of organizational "anti-types" that go against DevOps principles like not seeing the need for operations teams.
3. How automation, consistency, and reducing errors leads to stable environments and less unplanned work, allowing teams to focus on delivery.
This document discusses integrating security into DevOps practices through continuous delivery. It proposes including security automation and monitoring at each stage of the software development pipeline from development through production. Specific techniques mentioned include performing continuous security scanning, integrating security testing with other testing stages, automating security tasks using tools like Ansible, and sharing security data and lessons learned across teams to improve processes over time. The overall message is that security should be built into delivery rather than treated separately to avoid slowing software releases while still maintaining quality.
Orchestrating docker containers at scale (#DockerKRK edition)Maciej Lasyk
Slightly different version (original is here http://www.slideshare.net/d0cent/orchestrating-docker-containersatscale). This version was presented during first #Docker meetup in Kraków / Poland.
Orchestrating docker containers at scale (PJUG edition)Maciej Lasyk
Slightly changed version (original is here http://www.slideshare.net/d0cent/orchestrating-docker-containersatscale). This version was presented during Polish Java User Group meetup JavaCamp#13 in Kraków / Poland.
Orchestrating Docker containers at scaleMaciej Lasyk
Many of us already poked around Docker. Let's recap what we know and then think what do we know about scaling apps & whole environments which are Docker - based? Should we PaaS, IaaS or go with bare? Which tools to use on a given scale?
This document contains a list of various tools related to terminals, privacy, communication, productivity, and mobile topics. It discusses terminal emulators like guake and iterm2, VPN services like OpenVPN, messaging clients like IRC and XMPP, note taking apps like Evernote and Geeknote, and more. It concludes by inviting questions about any of the topics mentioned.
High Availability (HA) Explained - second editionMaciej Lasyk
I gave this talk at one of the biggest Linux conferences in Poland: 11 Liux Session that took place in Wrocław on 5/6-04-2014. It was a lightning talk covering subject of High Availability solutions, architecture, planning and deploying.
I gave this talk during first Infosec meetup in Kraków/Poland on 13th March 2014. After viewing this presentation you'll know how and why you should use SELinux (or others LSMs).
Is Red Hat / Fedora / Centos ready for lightweight Docker containers? Is Docker secure enough? How about SELinux? How could we deploy Jboss or Django within Docker / RHEL?
I gave this talk at DevOPS meetup in Krakow at 2014-02-26.
I gave this talk at Krakow/Poland DevOPS meetup. It was a lightning talk covering subject of High Availability solutions, architecture, planning and deploying.
The Rise of Supernetwork Data Intensive ComputingLarry Smarr
Invited Remote Lecture to SC21
The International Conference for High Performance Computing, Networking, Storage, and Analysis
St. Louis, Missouri
November 18, 2021
Video traffic on the Internet is constantly growing; networked multimedia applications consume a predominant share of the available Internet bandwidth. A major technical breakthrough and enabler in multimedia systems research and of industrial networked multimedia services certainly was the HTTP Adaptive Streaming (HAS) technique. This resulted in the standardization of MPEG Dynamic Adaptive Streaming over HTTP (MPEG-DASH) which, together with HTTP Live Streaming (HLS), is widely used for multimedia delivery in today’s networks. Existing challenges in multimedia systems research deal with the trade-off between (i) the ever-increasing content complexity, (ii) various requirements with respect to time (most importantly, latency), and (iii) quality of experience (QoE). Optimizing towards one aspect usually negatively impacts at least one of the other two aspects if not both. This situation sets the stage for our research work in the ATHENA Christian Doppler (CD) Laboratory (Adaptive Streaming over HTTP and Emerging Networked Multimedia Services; https://athena.itec.aau.at/), jointly funded by public sources and industry. In this talk, we will present selected novel approaches and research results of the first year of the ATHENA CD Lab’s operation. We will highlight HAS-related research on (i) multimedia content provisioning (machine learning for video encoding); (ii) multimedia content delivery (support of edge processing and virtualized network functions for video networking); (iii) multimedia content consumption and end-to-end aspects (player-triggered segment retransmissions to improve video playout quality); and (iv) novel QoE investigations (adaptive point cloud streaming). We will also put the work into the context of international multimedia systems research.
What Not to Document and Why_ (North Bay Python 2024)Margaret Fero
We’re hopefully all on board with writing documentation for our projects. However, especially with the rise of supply-chain attacks, there are some aspects of our projects that we really shouldn’t document, and should instead remediate as vulnerabilities. If we do document these aspects of a project, it may help someone compromise the project itself or our users. In this talk, you will learn why some aspects of documentation may help attackers more than users, how to recognize those aspects in your own projects, and what to do when you encounter such an issue.
These are slides as presented at North Bay Python 2024, with one minor modification to add the URL of a tweet screenshotted in the presentation.
Are you interested in dipping your toes in the cloud native observability waters, but as an engineer you are not sure where to get started with tracing problems through your microservices and application landscapes on Kubernetes? Then this is the session for you, where we take you on your first steps in an active open-source project that offers a buffet of languages, challenges, and opportunities for getting started with telemetry data.
The project is called openTelemetry, but before diving into the specifics, we’ll start with de-mystifying key concepts and terms such as observability, telemetry, instrumentation, cardinality, percentile to lay a foundation. After understanding the nuts and bolts of observability and distributed traces, we’ll explore the openTelemetry community; its Special Interest Groups (SIGs), repositories, and how to become not only an end-user, but possibly a contributor.We will wrap up with an overview of the components in this project, such as the Collector, the OpenTelemetry protocol (OTLP), its APIs, and its SDKs.
Attendees will leave with an understanding of key observability concepts, become grounded in distributed tracing terminology, be aware of the components of openTelemetry, and know how to take their first steps to an open-source contribution!
Key Takeaways: Open source, vendor neutral instrumentation is an exciting new reality as the industry standardizes on openTelemetry for observability. OpenTelemetry is on a mission to enable effective observability by making high-quality, portable telemetry ubiquitous. The world of observability and monitoring today has a steep learning curve and in order to achieve ubiquity, the project would benefit from growing our contributor community.
Hire a private investigator to get cell phone recordsHackersList
Learn what private investigators can legally do to obtain cell phone records and track phones, plus ethical considerations and alternatives for addressing privacy concerns.
What's Next Web Development Trends to Watch.pdfSeasiaInfotech2
Explore the latest advancements and upcoming innovations in web development with our guide to the trends shaping the future of digital experiences. Read our article today for more information.
this resume for sadika shaikh bca studentSadikaShaikh7
I am a dedicated BCA student with a strong foundation in web technologies, including PHP and MySQL. I have hands-on experience in Java and Python, and a solid understanding of data structures. My technical skills are complemented by my ability to learn quickly and adapt to new challenges in the ever-evolving field of computer science.
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfjackson110191
These fighter aircraft have uses outside of traditional combat situations. They are essential in defending India's territorial integrity, averting dangers, and delivering aid to those in need during natural calamities. Additionally, the IAF improves its interoperability and fortifies international military alliances by working together and conducting joint exercises with other air forces.
In this follow-up session on knowledge and prompt engineering, we will explore structured prompting, chain of thought prompting, iterative prompting, prompt optimization, emotional language prompts, and the inclusion of user signals and industry-specific data to enhance LLM performance.
Join EIS Founder & CEO Seth Earley and special guest Nick Usborne, Copywriter, Trainer, and Speaker, as they delve into these methodologies to improve AI-driven knowledge processes for employees and customers alike.
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecJames Anderson
The lecture titled "Automating AppSec" delves into the critical challenges associated with manual application security (AppSec) processes and outlines strategic approaches for incorporating automation to enhance efficiency, accuracy, and scalability. The lecture is structured to highlight the inherent difficulties in traditional AppSec practices, emphasizing the labor-intensive triage of issues, the complexity of identifying responsible owners for security flaws, and the challenges of implementing security checks within CI/CD pipelines. Furthermore, it provides actionable insights on automating these processes to not only mitigate these pains but also to enable a more proactive and scalable security posture within development cycles.
The Pains of Manual AppSec:
This section will explore the time-consuming and error-prone nature of manually triaging security issues, including the difficulty of prioritizing vulnerabilities based on their actual risk to the organization. It will also discuss the challenges in determining ownership for remediation tasks, a process often complicated by cross-functional teams and microservices architectures. Additionally, the inefficiencies of manual checks within CI/CD gates will be examined, highlighting how they can delay deployments and introduce security risks.
Automating CI/CD Gates:
Here, the focus shifts to the automation of security within the CI/CD pipelines. The lecture will cover methods to seamlessly integrate security tools that automatically scan for vulnerabilities as part of the build process, thereby ensuring that security is a core component of the development lifecycle. Strategies for configuring automated gates that can block or flag builds based on the severity of detected issues will be discussed, ensuring that only secure code progresses through the pipeline.
Triaging Issues with Automation:
This segment addresses how automation can be leveraged to intelligently triage and prioritize security issues. It will cover technologies and methodologies for automatically assessing the context and potential impact of vulnerabilities, facilitating quicker and more accurate decision-making. The use of automated alerting and reporting mechanisms to ensure the right stakeholders are informed in a timely manner will also be discussed.
Identifying Ownership Automatically:
Automating the process of identifying who owns the responsibility for fixing specific security issues is critical for efficient remediation. This part of the lecture will explore tools and practices for mapping vulnerabilities to code owners, leveraging version control and project management tools.
Three Tips to Scale the Shift Left Program:
Finally, the lecture will offer three practical tips for organizations looking to scale their Shift Left security programs. These will include recommendations on fostering a security culture within development teams, employing DevSecOps principles to integrate security throughout the development
Implementations of Fused Deposition Modeling in real worldEmerging Tech
The presentation showcases the diverse real-world applications of Fused Deposition Modeling (FDM) across multiple industries:
1. **Manufacturing**: FDM is utilized in manufacturing for rapid prototyping, creating custom tools and fixtures, and producing functional end-use parts. Companies leverage its cost-effectiveness and flexibility to streamline production processes.
2. **Medical**: In the medical field, FDM is used to create patient-specific anatomical models, surgical guides, and prosthetics. Its ability to produce precise and biocompatible parts supports advancements in personalized healthcare solutions.
3. **Education**: FDM plays a crucial role in education by enabling students to learn about design and engineering through hands-on 3D printing projects. It promotes innovation and practical skill development in STEM disciplines.
4. **Science**: Researchers use FDM to prototype equipment for scientific experiments, build custom laboratory tools, and create models for visualization and testing purposes. It facilitates rapid iteration and customization in scientific endeavors.
5. **Automotive**: Automotive manufacturers employ FDM for prototyping vehicle components, tooling for assembly lines, and customized parts. It speeds up the design validation process and enhances efficiency in automotive engineering.
6. **Consumer Electronics**: FDM is utilized in consumer electronics for designing and prototyping product enclosures, casings, and internal components. It enables rapid iteration and customization to meet evolving consumer demands.
7. **Robotics**: Robotics engineers leverage FDM to prototype robot parts, create lightweight and durable components, and customize robot designs for specific applications. It supports innovation and optimization in robotic systems.
8. **Aerospace**: In aerospace, FDM is used to manufacture lightweight parts, complex geometries, and prototypes of aircraft components. It contributes to cost reduction, faster production cycles, and weight savings in aerospace engineering.
9. **Architecture**: Architects utilize FDM for creating detailed architectural models, prototypes of building components, and intricate designs. It aids in visualizing concepts, testing structural integrity, and communicating design ideas effectively.
Each industry example demonstrates how FDM enhances innovation, accelerates product development, and addresses specific challenges through advanced manufacturing capabilities.
Quality Patents: Patents That Stand the Test of TimeAurora Consulting
Is your patent a vanity piece of paper for your office wall? Or is it a reliable, defendable, assertable, property right? The difference is often quality.
Is your patent simply a transactional cost and a large pile of legal bills for your startup? Or is it a leverageable asset worthy of attracting precious investment dollars, worth its cost in multiples of valuation? The difference is often quality.
Is your patent application only good enough to get through the examination process? Or has it been crafted to stand the tests of time and varied audiences if you later need to assert that document against an infringer, find yourself litigating with it in an Article 3 Court at the hands of a judge and jury, God forbid, end up having to defend its validity at the PTAB, or even needing to use it to block pirated imports at the International Trade Commission? The difference is often quality.
Quality will be our focus for a good chunk of the remainder of this season. What goes into a quality patent, and where possible, how do you get it without breaking the bank?
** Episode Overview **
In this first episode of our quality series, Kristen Hansen and the panel discuss:
⦿ What do we mean when we say patent quality?
⦿ Why is patent quality important?
⦿ How to balance quality and budget
⦿ The importance of searching, continuations, and draftsperson domain expertise
⦿ Very practical tips, tricks, examples, and Kristen’s Musts for drafting quality applications
https://www.aurorapatents.com/patently-strategic-podcast.html
Performance Budgets for the Real World by Tammy EvertsScyllaDB
Performance budgets have been around for more than ten years. Over those years, we’ve learned a lot about what works, what doesn’t, and what we need to improve. In this session, Tammy revisits old assumptions about performance budgets and offers some new best practices. Topics include:
• Understanding performance budgets vs. performance goals
• Aligning budgets with user experience
• Pros and cons of Core Web Vitals
• How to stay on top of your budgets to fight regressions
The DealBook is our annual overview of the Ukrainian tech investment industry. This edition comprehensively covers the full year 2023 and the first deals of 2024.
UiPath Community Day Kraków: Devs4Devs ConferenceUiPathCommunity
We are honored to launch and host this event for our UiPath Polish Community, with the help of our partners - Proservartner!
We certainly hope we have managed to spike your interest in the subjects to be presented and the incredible networking opportunities at hand, too!
Check out our proposed agenda below 👇👇
08:30 ☕ Welcome coffee (30')
09:00 Opening note/ Intro to UiPath Community (10')
Cristina Vidu, Global Manager, Marketing Community @UiPath
Dawid Kot, Digital Transformation Lead @Proservartner
09:10 Cloud migration - Proservartner & DOVISTA case study (30')
Marcin Drozdowski, Automation CoE Manager @DOVISTA
Pawel Kamiński, RPA developer @DOVISTA
Mikolaj Zielinski, UiPath MVP, Senior Solutions Engineer @Proservartner
09:40 From bottlenecks to breakthroughs: Citizen Development in action (25')
Pawel Poplawski, Director, Improvement and Automation @McCormick & Company
Michał Cieślak, Senior Manager, Automation Programs @McCormick & Company
10:05 Next-level bots: API integration in UiPath Studio (30')
Mikolaj Zielinski, UiPath MVP, Senior Solutions Engineer @Proservartner
10:35 ☕ Coffee Break (15')
10:50 Document Understanding with my RPA Companion (45')
Ewa Gruszka, Enterprise Sales Specialist, AI & ML @UiPath
11:35 Power up your Robots: GenAI and GPT in REFramework (45')
Krzysztof Karaszewski, Global RPA Product Manager
12:20 🍕 Lunch Break (1hr)
13:20 From Concept to Quality: UiPath Test Suite for AI-powered Knowledge Bots (30')
Kamil Miśko, UiPath MVP, Senior RPA Developer @Zurich Insurance
13:50 Communications Mining - focus on AI capabilities (30')
Thomasz Wierzbicki, Business Analyst @Office Samurai
14:20 Polish MVP panel: Insights on MVP award achievements and career profiling
Interaction Latency: Square's User-Centric Mobile Performance MetricScyllaDB
Mobile performance metrics often take inspiration from the backend world and measure resource usage (CPU usage, memory usage, etc) and workload durations (how long a piece of code takes to run).
However, mobile apps are used by humans and the app performance directly impacts their experience, so we should primarily track user-centric mobile performance metrics. Following the lead of tech giants, the mobile industry at large is now adopting the tracking of app launch time and smoothness (jank during motion).
At Square, our customers spend most of their time in the app long after it's launched, and they don't scroll much, so app launch time and smoothness aren't critical metrics. What should we track instead?
This talk will introduce you to Interaction Latency, a user-centric mobile performance metric inspired from the Web Vital metric Interaction to Next Paint"" (web.dev/inp). We'll go over why apps need to track this, how to properly implement its tracking (it's tricky!), how to aggregate this metric and what thresholds you should target.
Interaction Latency: Square's User-Centric Mobile Performance Metric
Monitoring with Nagios and Ganglia
1. Maciej Lasyk, Ganglia & Nagios
Maciej Lasyk
11. Sesja Linuksowa
Wrocław, 2014-04-06
1/25
Ganglia & Nagios
2. Ganglia.. what?
Ganglia – cluster / group of neurons found outside
the central nervous system
Maciej Lasyk, Ganglia & Nagios 2/25
3. Just a little about monitoring
- the need for monitoring
Maciej Lasyk, Ganglia & Nagios 3/25
4. Just a little about monitoring
- the need for monitoring
- measuring availability
Maciej Lasyk, Ganglia & Nagios 3/25
5. Just a little about monitoring
- the need for monitoring
- measuring availability
- measuring performance
Maciej Lasyk, Ganglia & Nagios 3/25
6. Just a little about monitoring
- the need for monitoring
- measuring availability
- measuring performance
- gathering additional metrics
Maciej Lasyk, Ganglia & Nagios 3/25
7. Monitoring is critical for HA
How to measure availability?
Maciej Lasyk, Ganglia & Nagios 4/25
8. Monitoring is critical for HA
How to measure availability?
A = Uptime / (Uptime + Downtime)
Maciej Lasyk, Ganglia & Nagios 4/25
9. Monitoring is critical for HA
How to measure availability?
A = Uptime / (Uptime + Downtime)
MTTD (Mean Time to Diagnose)
The average time it takes to diagnose the problem
Maciej Lasyk, Ganglia & Nagios 4/25
10. Monitoring is critical for HA
How to measure availability?
A = Uptime / (Uptime + Downtime)
MTTD (Mean Time to Diagnose)
The average time it takes to diagnose the problem
MTTR (Mean Time to Repair)
The average time it takes to fix a problem
Maciej Lasyk, Ganglia & Nagios 4/25
11. Monitoring is critical for HA
How to measure availability?
A = Uptime / (Uptime + Downtime)
MTTD (Mean Time to Diagnose)
The average time it takes to diagnose the problem
MTTR (Mean Time to Repair)
The average time it takes to fix a problem
MTTF (Mean Time to Failure)
The average time there is correct behavior
Maciej Lasyk, Ganglia & Nagios 4/25
12. Monitoring is critical for HA
How to measure availability?
A = Uptime / (Uptime + Downtime)
MTTD (Mean Time to Diagnose)
The average time it takes to diagnose the problem
MTTR (Mean Time to Repair)
The average time it takes to fix a problem
MTTF (Mean Time to Failure)
The average time there is correct behavior
MTBF (Mean Time Between Failures)
The average time between different failures of the service
Maciej Lasyk, Ganglia & Nagios 4/25
14. Monitoring is critical for HA
Maciej Lasyk, Ganglia & Nagios
A = MTTF / MTBF = MTTF / (MTTF + MTTD + MTTR)
4/25
15. What should we monitor?
Maciej Lasyk, Ganglia & Nagios
- hardware housing
- devices
- storage
- network
- hosts
- software (very deep hole)
5/25
16. What should we monitor?
Maciej Lasyk, Ganglia & Nagios
- hardware housing
- devices
- storage
- network
- hosts
- software (very deep hole)
Think dependencies!
5/25
17. When outage hits us – don't panic!
Maciej Lasyk, Ganglia & Nagios
- Notifications
6/25
18. When outage hits us – don't panic!
Maciej Lasyk, Ganglia & Nagios
- Notifications
- Escalations
L1 <-> L2 <-> L3 <-> L4 lol ;)
desktop support / devs / ops / networking /
/ storage / middleware / dc / security
6/25
19. When outage hits us – don't panic!
Maciej Lasyk, Ganglia & Nagios
- Notifications
- Escalations
L1 <-> L2 <-> L3 <-> L4 lol ;)
desktop support / devs / ops / networking /
/ storage / middleware / dc / security
- Clock is ticking – it should be simple
6/25
20. When outage hits us – don't panic!
Maciej Lasyk, Ganglia & Nagios
- Notifications
- Escalations
L1 <-> L2 <-> L3 <-> L4 lol ;)
desktop support / devs / ops / networking /
/ storage / middleware / dc / security
- Clock is ticking – it should be simple
- What if cell is offline or someone is out?
6/25
50. Maciej Lasyk, Ganglia & Nagios
Nagios recap
Notifications
- periods
- groups
- which states to be notified about?
10/25
51. Maciej Lasyk, Ganglia & Nagios
Nagios recap
Notifications
- periods
- groups
- which states to be notified about?
- escalations / rotations
10/25
52. Maciej Lasyk, Ganglia & Nagios
Nagios recap
Notifications
- periods
- groups
- which states to be notified about?
- escalations / rotations
- custom notifications method
10/25
53. Maciej Lasyk, Ganglia & Nagios
Nagios recap
Monitoring remotes
- NRPE daemons
- checks via SSH
10/25
61. Maciej Lasyk, Ganglia & Nagios
Ganglia – what is it?
Problems of big scale:
20k hosts with zylion metrics probed every 10 seconds
It is fully redundant (until you spoil it)
It is very scalable
Regexp searches and creating of views – adhoc :)
12/25
77. Maciej Lasyk, Ganglia & Nagios
Ganglia – web (events)
Events have API json based
Think – integration with whatever app :)
17/25
78. Maciej Lasyk, Ganglia & Nagios
Ganglia – web (dashboards)
- Create view -> apply as dashboard
- Create dashboard from XML
- Generate graphs and add to views
17/25
82. Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
- own modules
18/25
83. Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
- own modules
- c / c++
18/25
84. Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
- own modules
- c / c++
- mod_python
18/25
85. Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
- own modules
- c / c++
- mod_python
- spoofing
18/25
86. Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
- own modules
- c / c++
- mod_python
- spoofing
- gmetric
- gmetric4j / java
18/25
87. Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
- own modules
- c / c++
- mod_python
- spoofing
- gmetric
- gmetric4j / java
- Which to choose? gmetric / python / c/c++?
18/25
88. Maciej Lasyk, Ganglia & Nagios
Ganglia and logfiles?
ganglia-logtailer
- https://bitbucket.org/maplebed/ganglia-logtailer
- parser logfiles (realtime)
- pushes data to ganglia (via gmetric)
- yup – based on specific log formats
- yet still – open source so poke around ;)
19/25
90. Nagios + Ganglia: ganglia-web/nagios
Maciej Lasyk, Ganglia & Nagios
https://github.com/ganglia/ganglia-web
Sending Nagios Data to Ganglia
service_perfdata_command
Or replace Nagios checks with Ganglia!
- Check heartbeat.
- Check a single metric on a specific host.
- Check multiple metrics on a specific host.
- Check multiple metrics across a regex-defined
range of hosts
21/25
91. Maciej Lasyk, Ganglia & Nagios
Nagios + Ganglia: ganglia-web/nagios
Nagios pulls info from Ganglia via HTTP
21/25
92. Maciej Lasyk, Ganglia & Nagios
Nagios + Ganglia: ganglia-nagios-bridge
- https://github.com/ganglia/ganglia-nagios-bridge
- Python script run in e.g. in crontab
- pulls data from Ganglia XML via sockets
- parses XML
- send data to Nagios
- Nagios commits only passive checks
22/25
93. Maciej Lasyk, Ganglia & Nagios
Nagios + Ganglia: check_ganglia_metric
- https://pypi.python.org/pypi/check_ganglia_metric/
- basically Nagios plugin
- pulls data from Ganglia XML via sockets
- check_ganglia_metric.py
--gmetad_host=gmetad-server.example.com
--metric_host=host.example.com --metric_name=cpu_idle
23/25
94. Maciej Lasyk, Ganglia & Nagios
Nagios + Ganglia
Which one integration should I use?
24/25
95. Maciej Lasyk, Ganglia & Nagios
Nagios + Ganglia
Which one integration should I use?
Seriously – try yourself and test
24/25
96. Maciej Lasyk, Ganglia & Nagios
Freenode #ganglia
https://lists.sourceforge.net/lists/listinfo/ganglia-general
24.5/25
97. sources?
Maciej Lasyk, Ganglia & Nagios 25/25
- “Monitoring with Ganglia” book
- also nagios.org
- and “Web Operations” book
- plus some experience ;)
98. Maciej Lasyk
11. Sesja Linuksowa
2014-04-06, Wrocław
http://maciek.lasyk.info/sysop
maciek@lasyk.info
@docent-net
Ganglia & Nagios
Thank you :)
Maciej Lasyk, Ganglia & Nagios 25/25