(Go: >> BACK << -|- >> HOME <<)

SlideShare a Scribd company logo
Maciej Lasyk, Ganglia & Nagios
Maciej Lasyk
11. Sesja Linuksowa
Wrocław, 2014-04-06
1/25
Ganglia & Nagios
Ganglia.. what?
Ganglia – cluster / group of neurons found outside
the central nervous system
Maciej Lasyk, Ganglia & Nagios 2/25
Just a little about monitoring
- the need for monitoring
Maciej Lasyk, Ganglia & Nagios 3/25
Just a little about monitoring
- the need for monitoring
- measuring availability
Maciej Lasyk, Ganglia & Nagios 3/25
Just a little about monitoring
- the need for monitoring
- measuring availability
- measuring performance
Maciej Lasyk, Ganglia & Nagios 3/25
Just a little about monitoring
- the need for monitoring
- measuring availability
- measuring performance
- gathering additional metrics
Maciej Lasyk, Ganglia & Nagios 3/25
Monitoring is critical for HA
How to measure availability?
Maciej Lasyk, Ganglia & Nagios 4/25
Monitoring is critical for HA
How to measure availability?
A = Uptime / (Uptime + Downtime)
Maciej Lasyk, Ganglia & Nagios 4/25
Monitoring is critical for HA
How to measure availability?
A = Uptime / (Uptime + Downtime)
MTTD (Mean Time to Diagnose)
The average time it takes to diagnose the problem
Maciej Lasyk, Ganglia & Nagios 4/25
Monitoring is critical for HA
How to measure availability?
A = Uptime / (Uptime + Downtime)
MTTD (Mean Time to Diagnose)
The average time it takes to diagnose the problem
MTTR (Mean Time to Repair)
The average time it takes to fix a problem
Maciej Lasyk, Ganglia & Nagios 4/25
Monitoring is critical for HA
How to measure availability?
A = Uptime / (Uptime + Downtime)
MTTD (Mean Time to Diagnose)
The average time it takes to diagnose the problem
MTTR (Mean Time to Repair)
The average time it takes to fix a problem
MTTF (Mean Time to Failure)
The average time there is correct behavior
Maciej Lasyk, Ganglia & Nagios 4/25
Monitoring is critical for HA
How to measure availability?
A = Uptime / (Uptime + Downtime)
MTTD (Mean Time to Diagnose)
The average time it takes to diagnose the problem
MTTR (Mean Time to Repair)
The average time it takes to fix a problem
MTTF (Mean Time to Failure)
The average time there is correct behavior
MTBF (Mean Time Between Failures)
The average time between different failures of the service
Maciej Lasyk, Ganglia & Nagios 4/25
Monitoring is critical for HA
Maciej Lasyk, Ganglia & Nagios 4/25
Monitoring is critical for HA
Maciej Lasyk, Ganglia & Nagios
A = MTTF / MTBF = MTTF / (MTTF + MTTD + MTTR)
4/25
What should we monitor?
Maciej Lasyk, Ganglia & Nagios
- hardware housing
- devices
- storage
- network
- hosts
- software (very deep hole)
5/25
What should we monitor?
Maciej Lasyk, Ganglia & Nagios
- hardware housing
- devices
- storage
- network
- hosts
- software (very deep hole)
Think dependencies!
5/25
When outage hits us – don't panic!
Maciej Lasyk, Ganglia & Nagios
- Notifications
6/25
When outage hits us – don't panic!
Maciej Lasyk, Ganglia & Nagios
- Notifications
- Escalations
L1 <-> L2 <-> L3 <-> L4 lol ;)
desktop support / devs / ops / networking /
/ storage / middleware / dc / security
6/25
When outage hits us – don't panic!
Maciej Lasyk, Ganglia & Nagios
- Notifications
- Escalations
L1 <-> L2 <-> L3 <-> L4 lol ;)
desktop support / devs / ops / networking /
/ storage / middleware / dc / security
- Clock is ticking – it should be simple
6/25
When outage hits us – don't panic!
Maciej Lasyk, Ganglia & Nagios
- Notifications
- Escalations
L1 <-> L2 <-> L3 <-> L4 lol ;)
desktop support / devs / ops / networking /
/ storage / middleware / dc / security
- Clock is ticking – it should be simple
- What if cell is offline or someone is out?
6/25
Monitoring: notifications issues
Maciej Lasyk, Ganglia & Nagios
- false positives
7/25
Maciej Lasyk, Ganglia & Nagios
- false positives
- major events
Monitoring: notifications issues
7/25
Maciej Lasyk, Ganglia & Nagios
- false positives
- major events
- failover notifications?
Monitoring: notifications issues
7/25
Maciej Lasyk, Ganglia & Nagios
- false positives
- major events
- failover notifications?
- tolerance & critical thresholds
Monitoring: notifications issues
7/25
Monitoring: reporting
Maciej Lasyk, Ganglia & Nagios
- baseline
8/25
Maciej Lasyk, Ganglia & Nagios
- baseline
- correlation between incidents and
change management
Monitoring: reporting
8/25
Maciej Lasyk, Ganglia & Nagios
- baseline
- correlation between incidents and
change management
- trending info
Monitoring: reporting
8/25
Maciej Lasyk, Ganglia & Nagios
- baseline
- correlation between incidents and
change management
- trending info
- reporting
Monitoring: reporting
8/25
Monitoring: good practices
Maciej Lasyk, Ganglia & Nagios
- don't NIH!
9/25
Maciej Lasyk, Ganglia & Nagios
- don't NIH!
- DVCS
Monitoring: good practices
9/25
Maciej Lasyk, Ganglia & Nagios
- don't NIH!
- DVCS
- testing envs
Monitoring: good practices
9/25
Maciej Lasyk, Ganglia & Nagios
- don't NIH!
- DVCS
- testing envs
- think usability!
Monitoring: good practices
9/25
Maciej Lasyk, Ganglia & Nagios
- don't NIH!
- DVCS
- testing envs
- think usability!
- passive checks
Monitoring: good practices
9/25
Maciej Lasyk, Ganglia & Nagios
- don't NIH!
- DVCS
- testing envs
- think usability!
- passive checks
- automate – don't hardcode
Monitoring: good practices
9/25
Maciej Lasyk, Ganglia & Nagios
- don't NIH!
- DVCS
- testing envs
- think usability!
- passive checks
- automate – don't hardcode
- security
Monitoring: good practices
9/25
Maciej Lasyk, Ganglia & Nagios
Last but not least...
“Quis custodiet ipsos custodes?”
(Who will guard the guards?)
Monitoring: good practices
9/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Host / Services / Contacts
- hosts, hostgroups
10/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Host / Services / Contacts
- hosts, hostgroups
- services, service groups
10/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Host / Services / Contacts
- hosts, hostgroups
- services, service groups
- templates
10/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Host / Services / Contacts
- hosts, hostgroups
- services, service groups
- templates
- time periods
10/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Host / Services / Contacts
- hosts, hostgroups
- services, service groups
- templates
- time periods
- host and services dependencies
10/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Host / Services / Contacts
- hosts, hostgroups
- services, service groups
- templates
- time periods
- host and services dependencies
- regular expressions
10/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
10/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
10/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Checks and states
- frequencies & thresholds
10/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Checks and states
- frequencies & thresholds
- scheduling downtimes
10/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Checks and states
- frequencies & thresholds
- scheduling downtimes
- outages and flapping
10/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Notifications
- periods
10/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Notifications
- periods
- groups
10/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Notifications
- periods
- groups
- which states to be notified about?
10/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Notifications
- periods
- groups
- which states to be notified about?
- escalations / rotations
10/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Notifications
- periods
- groups
- which states to be notified about?
- escalations / rotations
- custom notifications method
10/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Monitoring remotes
- NRPE daemons
- checks via SSH
10/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Web interface – tactical overview
10/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Web interface – availability reports
10/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Web interface – trends
10/25
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Web interface – network maps
10/25
Maciej Lasyk, Ganglia & Nagios
Networking recap
Unicast
11/25
Maciej Lasyk, Ganglia & Nagios
Networking recap
Multicast
11/25
Maciej Lasyk, Ganglia & Nagios
Networking recap
Broadcast
11/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – what is it?
Problems of big scale:
20k hosts with zylion metrics probed every 10 seconds
It is fully redundant (until you spoil it)
It is very scalable
Regexp searches and creating of views – adhoc :)
12/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – architecture
13/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – architecture
13/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – topologies
Default multicast topology
14/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – topologies
Deaf / mute multicast topology
14/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – topologies
Unicast topology
14/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – topologies
Gmetad topology
14/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – topologies
Gmetad HA topology (active - active)
14/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – topologies
Gmetad hierarchical topology
14/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – RRDcached
15/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – sFlow
16/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – web (grid view)
17/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – web (cluster view)
17/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – web (physical view)
17/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – web (host view)
17/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – web (compare hosts)
17/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – web (events)
Events have API json based
Think – integration with whatever app :)
17/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – web (dashboards)
- Create view -> apply as dashboard
- Create dashboard from XML
- Generate graphs and add to views
17/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – web (graphs)
17/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
- own modules
- c / c++
- mod_python
- spoofing
- gmetric
- gmetric4j / java
- Which to choose? gmetric / python / c/c++?
18/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
18/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
- own modules
18/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
- own modules
- c / c++
18/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
- own modules
- c / c++
- mod_python
18/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
- own modules
- c / c++
- mod_python
- spoofing
18/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
- own modules
- c / c++
- mod_python
- spoofing
- gmetric
- gmetric4j / java
18/25
Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
- own modules
- c / c++
- mod_python
- spoofing
- gmetric
- gmetric4j / java
- Which to choose? gmetric / python / c/c++?
18/25
Maciej Lasyk, Ganglia & Nagios
Ganglia and logfiles?
ganglia-logtailer
- https://bitbucket.org/maplebed/ganglia-logtailer
- parser logfiles (realtime)
- pushes data to ganglia (via gmetric)
- yup – based on specific log formats
- yet still – open source so poke around ;)
19/25
So... Nagios + Ganglia!
Maciej Lasyk, Ganglia & Nagios
3 ways of integration:
- ganglia-web/nagios (PHP & bash based)
https://github.com/ganglia/ganglia-web
- ganglia-nagios-bridge (Python & cron based)
https://github.com/ganglia/ganglia-nagios-bridge
- check-ganglia-metric (Python)
https://github.com/ganglia/ganglia_contrib
20/25
Nagios + Ganglia: ganglia-web/nagios
Maciej Lasyk, Ganglia & Nagios
https://github.com/ganglia/ganglia-web
Sending Nagios Data to Ganglia
service_perfdata_command
Or replace Nagios checks with Ganglia!
- Check heartbeat.
- Check a single metric on a specific host.
- Check multiple metrics on a specific host.
- Check multiple metrics across a regex-defined
range of hosts
21/25
Maciej Lasyk, Ganglia & Nagios
Nagios + Ganglia: ganglia-web/nagios
Nagios pulls info from Ganglia via HTTP
21/25
Maciej Lasyk, Ganglia & Nagios
Nagios + Ganglia: ganglia-nagios-bridge
- https://github.com/ganglia/ganglia-nagios-bridge
- Python script run in e.g. in crontab
- pulls data from Ganglia XML via sockets
- parses XML
- send data to Nagios
- Nagios commits only passive checks
22/25
Maciej Lasyk, Ganglia & Nagios
Nagios + Ganglia: check_ganglia_metric
- https://pypi.python.org/pypi/check_ganglia_metric/
- basically Nagios plugin
- pulls data from Ganglia XML via sockets
- check_ganglia_metric.py 
--gmetad_host=gmetad-server.example.com 
--metric_host=host.example.com --metric_name=cpu_idle
23/25
Maciej Lasyk, Ganglia & Nagios
Nagios + Ganglia
Which one integration should I use?
24/25
Maciej Lasyk, Ganglia & Nagios
Nagios + Ganglia
Which one integration should I use?
Seriously – try yourself and test
24/25
Maciej Lasyk, Ganglia & Nagios
Freenode #ganglia
https://lists.sourceforge.net/lists/listinfo/ganglia-general
24.5/25
sources?
Maciej Lasyk, Ganglia & Nagios 25/25
- “Monitoring with Ganglia” book
- also nagios.org
- and “Web Operations” book
- plus some experience ;)
Maciej Lasyk
11. Sesja Linuksowa
2014-04-06, Wrocław
http://maciek.lasyk.info/sysop
maciek@lasyk.info
@docent-net
Ganglia & Nagios
Thank you :)
Maciej Lasyk, Ganglia & Nagios 25/25

More Related Content

Viewers also liked

Using Nagios with Chef
Using Nagios with ChefUsing Nagios with Chef
Using Nagios with Chef
Bryan McLellan
 
Nagios core vs. nagios xi presentation power point.pptx [diperbaiki]
Nagios core vs. nagios xi presentation power point.pptx [diperbaiki]Nagios core vs. nagios xi presentation power point.pptx [diperbaiki]
Nagios core vs. nagios xi presentation power point.pptx [diperbaiki]
Fanky Christian
 
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise EditionMarcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Nagios
 
Monitoring with Ganglia
Monitoring with GangliaMonitoring with Ganglia
Monitoring with Ganglia
Fastly
 
Nagios Conference 2013 - Eric Stanley and Andy Brist - API and Nagios
Nagios Conference 2013 - Eric Stanley and Andy Brist - API and NagiosNagios Conference 2013 - Eric Stanley and Andy Brist - API and Nagios
Nagios Conference 2013 - Eric Stanley and Andy Brist - API and Nagios
Nagios
 
Time to say goodbye to your Nagios based setup
Time to say goodbye to your Nagios based setupTime to say goodbye to your Nagios based setup
Time to say goodbye to your Nagios based setup
Check my Website
 
Nagios Conference 2012 - Mike Weber - Failover
Nagios Conference 2012 - Mike Weber - FailoverNagios Conference 2012 - Mike Weber - Failover
Nagios Conference 2012 - Mike Weber - Failover
Nagios
 
Jenkins
JenkinsJenkins
Jenkins
Hitesh Bhatia
 
Nagios, Getting Started.
Nagios, Getting Started.Nagios, Getting Started.
Nagios, Getting Started.
Hitesh Bhatia
 
Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...
Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...
Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...
Nagios
 
OTechs Network Monitoring (Nagios) Training Course
OTechs Network Monitoring (Nagios) Training CourseOTechs Network Monitoring (Nagios) Training Course
OTechs Network Monitoring (Nagios) Training Course
Osman Suliman
 
Nagios Conference 2011 - David Thomas - Know Its Broke Before Your Customers Do
Nagios Conference 2011 - David Thomas - Know Its Broke Before Your Customers DoNagios Conference 2011 - David Thomas - Know Its Broke Before Your Customers Do
Nagios Conference 2011 - David Thomas - Know Its Broke Before Your Customers Do
Nagios
 
Nagios Consulting Implementation and Maintenance
Nagios Consulting Implementation and MaintenanceNagios Consulting Implementation and Maintenance
Nagios Consulting Implementation and Maintenance
Razak Mohammed Ali
 
Nagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XI
Nagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XINagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XI
Nagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XI
Nagios
 
Metrics with Ganglia
Metrics with GangliaMetrics with Ganglia
Metrics with Ganglia
Gareth Rushgrove
 
Nagios Conference 2012 - Mike Weber - NRPE
Nagios Conference 2012 - Mike Weber - NRPENagios Conference 2012 - Mike Weber - NRPE
Nagios Conference 2012 - Mike Weber - NRPE
Nagios
 
NagiosXI - Astiostech NagiosXI Event with NTT MSC Cyberjaya
NagiosXI - Astiostech NagiosXI Event with NTT MSC CyberjayaNagiosXI - Astiostech NagiosXI Event with NTT MSC Cyberjaya
NagiosXI - Astiostech NagiosXI Event with NTT MSC Cyberjaya
Sanjay Willie
 
Nagios Conference 2011 - Mike Guthrie - Distributed Monitoring With Nagios
Nagios Conference 2011 - Mike Guthrie - Distributed Monitoring With NagiosNagios Conference 2011 - Mike Guthrie - Distributed Monitoring With Nagios
Nagios Conference 2011 - Mike Guthrie - Distributed Monitoring With Nagios
Nagios
 
Janice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios PluginsJanice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios Plugins
Nagios
 

Viewers also liked (19)

Using Nagios with Chef
Using Nagios with ChefUsing Nagios with Chef
Using Nagios with Chef
 
Nagios core vs. nagios xi presentation power point.pptx [diperbaiki]
Nagios core vs. nagios xi presentation power point.pptx [diperbaiki]Nagios core vs. nagios xi presentation power point.pptx [diperbaiki]
Nagios core vs. nagios xi presentation power point.pptx [diperbaiki]
 
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise EditionMarcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
 
Monitoring with Ganglia
Monitoring with GangliaMonitoring with Ganglia
Monitoring with Ganglia
 
Nagios Conference 2013 - Eric Stanley and Andy Brist - API and Nagios
Nagios Conference 2013 - Eric Stanley and Andy Brist - API and NagiosNagios Conference 2013 - Eric Stanley and Andy Brist - API and Nagios
Nagios Conference 2013 - Eric Stanley and Andy Brist - API and Nagios
 
Time to say goodbye to your Nagios based setup
Time to say goodbye to your Nagios based setupTime to say goodbye to your Nagios based setup
Time to say goodbye to your Nagios based setup
 
Nagios Conference 2012 - Mike Weber - Failover
Nagios Conference 2012 - Mike Weber - FailoverNagios Conference 2012 - Mike Weber - Failover
Nagios Conference 2012 - Mike Weber - Failover
 
Jenkins
JenkinsJenkins
Jenkins
 
Nagios, Getting Started.
Nagios, Getting Started.Nagios, Getting Started.
Nagios, Getting Started.
 
Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...
Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...
Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...
 
OTechs Network Monitoring (Nagios) Training Course
OTechs Network Monitoring (Nagios) Training CourseOTechs Network Monitoring (Nagios) Training Course
OTechs Network Monitoring (Nagios) Training Course
 
Nagios Conference 2011 - David Thomas - Know Its Broke Before Your Customers Do
Nagios Conference 2011 - David Thomas - Know Its Broke Before Your Customers DoNagios Conference 2011 - David Thomas - Know Its Broke Before Your Customers Do
Nagios Conference 2011 - David Thomas - Know Its Broke Before Your Customers Do
 
Nagios Consulting Implementation and Maintenance
Nagios Consulting Implementation and MaintenanceNagios Consulting Implementation and Maintenance
Nagios Consulting Implementation and Maintenance
 
Nagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XI
Nagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XINagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XI
Nagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XI
 
Metrics with Ganglia
Metrics with GangliaMetrics with Ganglia
Metrics with Ganglia
 
Nagios Conference 2012 - Mike Weber - NRPE
Nagios Conference 2012 - Mike Weber - NRPENagios Conference 2012 - Mike Weber - NRPE
Nagios Conference 2012 - Mike Weber - NRPE
 
NagiosXI - Astiostech NagiosXI Event with NTT MSC Cyberjaya
NagiosXI - Astiostech NagiosXI Event with NTT MSC CyberjayaNagiosXI - Astiostech NagiosXI Event with NTT MSC Cyberjaya
NagiosXI - Astiostech NagiosXI Event with NTT MSC Cyberjaya
 
Nagios Conference 2011 - Mike Guthrie - Distributed Monitoring With Nagios
Nagios Conference 2011 - Mike Guthrie - Distributed Monitoring With NagiosNagios Conference 2011 - Mike Guthrie - Distributed Monitoring With Nagios
Nagios Conference 2011 - Mike Guthrie - Distributed Monitoring With Nagios
 
Janice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios PluginsJanice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios Plugins
 

More from Maciej Lasyk

Rundeck & Ansible
Rundeck & AnsibleRundeck & Ansible
Rundeck & Ansible
Maciej Lasyk
 
Docker 1.11
Docker 1.11Docker 1.11
Docker 1.11
Maciej Lasyk
 
Programowanie AWSa z CLI, boto, Ansiblem i libcloudem
Programowanie AWSa z CLI, boto, Ansiblem i libcloudemProgramowanie AWSa z CLI, boto, Ansiblem i libcloudem
Programowanie AWSa z CLI, boto, Ansiblem i libcloudem
Maciej Lasyk
 
Co powinieneś wiedzieć na temat devops?f
Co powinieneś wiedzieć na temat devops?f Co powinieneś wiedzieć na temat devops?f
Co powinieneś wiedzieć na temat devops?f
Maciej Lasyk
 
"Containers do not contain"
"Containers do not contain""Containers do not contain"
"Containers do not contain"
Maciej Lasyk
 
Git Submodules
Git SubmodulesGit Submodules
Git Submodules
Maciej Lasyk
 
Linux containers & Devops
Linux containers & DevopsLinux containers & Devops
Linux containers & Devops
Maciej Lasyk
 
Under the Dome (of failure driven pipeline)
Under the Dome (of failure driven pipeline)Under the Dome (of failure driven pipeline)
Under the Dome (of failure driven pipeline)
Maciej Lasyk
 
Continuous Security in DevOps
Continuous Security in DevOpsContinuous Security in DevOps
Continuous Security in DevOps
Maciej Lasyk
 
About cultural change w/Devops
About cultural change w/DevopsAbout cultural change w/Devops
About cultural change w/Devops
Maciej Lasyk
 
Orchestrating docker containers at scale (#DockerKRK edition)
Orchestrating docker containers at scale (#DockerKRK edition)Orchestrating docker containers at scale (#DockerKRK edition)
Orchestrating docker containers at scale (#DockerKRK edition)
Maciej Lasyk
 
Orchestrating docker containers at scale (PJUG edition)
Orchestrating docker containers at scale (PJUG edition)Orchestrating docker containers at scale (PJUG edition)
Orchestrating docker containers at scale (PJUG edition)
Maciej Lasyk
 
Orchestrating Docker containers at scale
Orchestrating Docker containers at scaleOrchestrating Docker containers at scale
Orchestrating Docker containers at scale
Maciej Lasyk
 
Ghost in the shell
Ghost in the shellGhost in the shell
Ghost in the shell
Maciej Lasyk
 
Scaling and securing node.js apps
Scaling and securing node.js appsScaling and securing node.js apps
Scaling and securing node.js apps
Maciej Lasyk
 
Node.js security
Node.js securityNode.js security
Node.js security
Maciej Lasyk
 
High Availability (HA) Explained - second edition
High Availability (HA) Explained - second editionHigh Availability (HA) Explained - second edition
High Availability (HA) Explained - second edition
Maciej Lasyk
 
Stop disabling SELinux!
Stop disabling SELinux!Stop disabling SELinux!
Stop disabling SELinux!
Maciej Lasyk
 
RHEL/Fedora + Docker (and SELinux)
RHEL/Fedora + Docker (and SELinux)RHEL/Fedora + Docker (and SELinux)
RHEL/Fedora + Docker (and SELinux)
Maciej Lasyk
 
High Availability (HA) Explained
High Availability (HA) ExplainedHigh Availability (HA) Explained
High Availability (HA) Explained
Maciej Lasyk
 

More from Maciej Lasyk (20)

Rundeck & Ansible
Rundeck & AnsibleRundeck & Ansible
Rundeck & Ansible
 
Docker 1.11
Docker 1.11Docker 1.11
Docker 1.11
 
Programowanie AWSa z CLI, boto, Ansiblem i libcloudem
Programowanie AWSa z CLI, boto, Ansiblem i libcloudemProgramowanie AWSa z CLI, boto, Ansiblem i libcloudem
Programowanie AWSa z CLI, boto, Ansiblem i libcloudem
 
Co powinieneś wiedzieć na temat devops?f
Co powinieneś wiedzieć na temat devops?f Co powinieneś wiedzieć na temat devops?f
Co powinieneś wiedzieć na temat devops?f
 
"Containers do not contain"
"Containers do not contain""Containers do not contain"
"Containers do not contain"
 
Git Submodules
Git SubmodulesGit Submodules
Git Submodules
 
Linux containers & Devops
Linux containers & DevopsLinux containers & Devops
Linux containers & Devops
 
Under the Dome (of failure driven pipeline)
Under the Dome (of failure driven pipeline)Under the Dome (of failure driven pipeline)
Under the Dome (of failure driven pipeline)
 
Continuous Security in DevOps
Continuous Security in DevOpsContinuous Security in DevOps
Continuous Security in DevOps
 
About cultural change w/Devops
About cultural change w/DevopsAbout cultural change w/Devops
About cultural change w/Devops
 
Orchestrating docker containers at scale (#DockerKRK edition)
Orchestrating docker containers at scale (#DockerKRK edition)Orchestrating docker containers at scale (#DockerKRK edition)
Orchestrating docker containers at scale (#DockerKRK edition)
 
Orchestrating docker containers at scale (PJUG edition)
Orchestrating docker containers at scale (PJUG edition)Orchestrating docker containers at scale (PJUG edition)
Orchestrating docker containers at scale (PJUG edition)
 
Orchestrating Docker containers at scale
Orchestrating Docker containers at scaleOrchestrating Docker containers at scale
Orchestrating Docker containers at scale
 
Ghost in the shell
Ghost in the shellGhost in the shell
Ghost in the shell
 
Scaling and securing node.js apps
Scaling and securing node.js appsScaling and securing node.js apps
Scaling and securing node.js apps
 
Node.js security
Node.js securityNode.js security
Node.js security
 
High Availability (HA) Explained - second edition
High Availability (HA) Explained - second editionHigh Availability (HA) Explained - second edition
High Availability (HA) Explained - second edition
 
Stop disabling SELinux!
Stop disabling SELinux!Stop disabling SELinux!
Stop disabling SELinux!
 
RHEL/Fedora + Docker (and SELinux)
RHEL/Fedora + Docker (and SELinux)RHEL/Fedora + Docker (and SELinux)
RHEL/Fedora + Docker (and SELinux)
 
High Availability (HA) Explained
High Availability (HA) ExplainedHigh Availability (HA) Explained
High Availability (HA) Explained
 

Recently uploaded

The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
Larry Smarr
 
HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)
Alpen-Adria-Universität
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024
The Digital Insurer
 
What Not to Document and Why_ (North Bay Python 2024)
What Not to Document and Why_ (North Bay Python 2024)What Not to Document and Why_ (North Bay Python 2024)
What Not to Document and Why_ (North Bay Python 2024)
Margaret Fero
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Eric D. Schabell
 
Hire a private investigator to get cell phone records
Hire a private investigator to get cell phone recordsHire a private investigator to get cell phone records
Hire a private investigator to get cell phone records
HackersList
 
What's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdfWhat's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdf
SeasiaInfotech2
 
this resume for sadika shaikh bca student
this resume for sadika shaikh bca studentthis resume for sadika shaikh bca student
this resume for sadika shaikh bca student
SadikaShaikh7
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
jackson110191
 
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design ApproachesKnowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Earley Information Science
 
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecGDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
James Anderson
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
Aurora Consulting
 
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
ScyllaDB
 
5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx
SATYENDRA100
 
STKI Israeli Market Study 2024 final v1
STKI Israeli Market Study 2024 final  v1STKI Israeli Market Study 2024 final  v1
STKI Israeli Market Study 2024 final v1
Dr. Jimmy Schwarzkopf
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
UiPathCommunity
 
Interaction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance MetricInteraction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance Metric
ScyllaDB
 

Recently uploaded (20)

The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
 
HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024
 
What Not to Document and Why_ (North Bay Python 2024)
What Not to Document and Why_ (North Bay Python 2024)What Not to Document and Why_ (North Bay Python 2024)
What Not to Document and Why_ (North Bay Python 2024)
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
 
Hire a private investigator to get cell phone records
Hire a private investigator to get cell phone recordsHire a private investigator to get cell phone records
Hire a private investigator to get cell phone records
 
What's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdfWhat's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdf
 
this resume for sadika shaikh bca student
this resume for sadika shaikh bca studentthis resume for sadika shaikh bca student
this resume for sadika shaikh bca student
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
 
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design ApproachesKnowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
 
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecGDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
 
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
 
5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx
 
STKI Israeli Market Study 2024 final v1
STKI Israeli Market Study 2024 final  v1STKI Israeli Market Study 2024 final  v1
STKI Israeli Market Study 2024 final v1
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
 
Interaction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance MetricInteraction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance Metric
 

Monitoring with Nagios and Ganglia

  • 1. Maciej Lasyk, Ganglia & Nagios Maciej Lasyk 11. Sesja Linuksowa Wrocław, 2014-04-06 1/25 Ganglia & Nagios
  • 2. Ganglia.. what? Ganglia – cluster / group of neurons found outside the central nervous system Maciej Lasyk, Ganglia & Nagios 2/25
  • 3. Just a little about monitoring - the need for monitoring Maciej Lasyk, Ganglia & Nagios 3/25
  • 4. Just a little about monitoring - the need for monitoring - measuring availability Maciej Lasyk, Ganglia & Nagios 3/25
  • 5. Just a little about monitoring - the need for monitoring - measuring availability - measuring performance Maciej Lasyk, Ganglia & Nagios 3/25
  • 6. Just a little about monitoring - the need for monitoring - measuring availability - measuring performance - gathering additional metrics Maciej Lasyk, Ganglia & Nagios 3/25
  • 7. Monitoring is critical for HA How to measure availability? Maciej Lasyk, Ganglia & Nagios 4/25
  • 8. Monitoring is critical for HA How to measure availability? A = Uptime / (Uptime + Downtime) Maciej Lasyk, Ganglia & Nagios 4/25
  • 9. Monitoring is critical for HA How to measure availability? A = Uptime / (Uptime + Downtime) MTTD (Mean Time to Diagnose) The average time it takes to diagnose the problem Maciej Lasyk, Ganglia & Nagios 4/25
  • 10. Monitoring is critical for HA How to measure availability? A = Uptime / (Uptime + Downtime) MTTD (Mean Time to Diagnose) The average time it takes to diagnose the problem MTTR (Mean Time to Repair) The average time it takes to fix a problem Maciej Lasyk, Ganglia & Nagios 4/25
  • 11. Monitoring is critical for HA How to measure availability? A = Uptime / (Uptime + Downtime) MTTD (Mean Time to Diagnose) The average time it takes to diagnose the problem MTTR (Mean Time to Repair) The average time it takes to fix a problem MTTF (Mean Time to Failure) The average time there is correct behavior Maciej Lasyk, Ganglia & Nagios 4/25
  • 12. Monitoring is critical for HA How to measure availability? A = Uptime / (Uptime + Downtime) MTTD (Mean Time to Diagnose) The average time it takes to diagnose the problem MTTR (Mean Time to Repair) The average time it takes to fix a problem MTTF (Mean Time to Failure) The average time there is correct behavior MTBF (Mean Time Between Failures) The average time between different failures of the service Maciej Lasyk, Ganglia & Nagios 4/25
  • 13. Monitoring is critical for HA Maciej Lasyk, Ganglia & Nagios 4/25
  • 14. Monitoring is critical for HA Maciej Lasyk, Ganglia & Nagios A = MTTF / MTBF = MTTF / (MTTF + MTTD + MTTR) 4/25
  • 15. What should we monitor? Maciej Lasyk, Ganglia & Nagios - hardware housing - devices - storage - network - hosts - software (very deep hole) 5/25
  • 16. What should we monitor? Maciej Lasyk, Ganglia & Nagios - hardware housing - devices - storage - network - hosts - software (very deep hole) Think dependencies! 5/25
  • 17. When outage hits us – don't panic! Maciej Lasyk, Ganglia & Nagios - Notifications 6/25
  • 18. When outage hits us – don't panic! Maciej Lasyk, Ganglia & Nagios - Notifications - Escalations L1 <-> L2 <-> L3 <-> L4 lol ;) desktop support / devs / ops / networking / / storage / middleware / dc / security 6/25
  • 19. When outage hits us – don't panic! Maciej Lasyk, Ganglia & Nagios - Notifications - Escalations L1 <-> L2 <-> L3 <-> L4 lol ;) desktop support / devs / ops / networking / / storage / middleware / dc / security - Clock is ticking – it should be simple 6/25
  • 20. When outage hits us – don't panic! Maciej Lasyk, Ganglia & Nagios - Notifications - Escalations L1 <-> L2 <-> L3 <-> L4 lol ;) desktop support / devs / ops / networking / / storage / middleware / dc / security - Clock is ticking – it should be simple - What if cell is offline or someone is out? 6/25
  • 21. Monitoring: notifications issues Maciej Lasyk, Ganglia & Nagios - false positives 7/25
  • 22. Maciej Lasyk, Ganglia & Nagios - false positives - major events Monitoring: notifications issues 7/25
  • 23. Maciej Lasyk, Ganglia & Nagios - false positives - major events - failover notifications? Monitoring: notifications issues 7/25
  • 24. Maciej Lasyk, Ganglia & Nagios - false positives - major events - failover notifications? - tolerance & critical thresholds Monitoring: notifications issues 7/25
  • 25. Monitoring: reporting Maciej Lasyk, Ganglia & Nagios - baseline 8/25
  • 26. Maciej Lasyk, Ganglia & Nagios - baseline - correlation between incidents and change management Monitoring: reporting 8/25
  • 27. Maciej Lasyk, Ganglia & Nagios - baseline - correlation between incidents and change management - trending info Monitoring: reporting 8/25
  • 28. Maciej Lasyk, Ganglia & Nagios - baseline - correlation between incidents and change management - trending info - reporting Monitoring: reporting 8/25
  • 29. Monitoring: good practices Maciej Lasyk, Ganglia & Nagios - don't NIH! 9/25
  • 30. Maciej Lasyk, Ganglia & Nagios - don't NIH! - DVCS Monitoring: good practices 9/25
  • 31. Maciej Lasyk, Ganglia & Nagios - don't NIH! - DVCS - testing envs Monitoring: good practices 9/25
  • 32. Maciej Lasyk, Ganglia & Nagios - don't NIH! - DVCS - testing envs - think usability! Monitoring: good practices 9/25
  • 33. Maciej Lasyk, Ganglia & Nagios - don't NIH! - DVCS - testing envs - think usability! - passive checks Monitoring: good practices 9/25
  • 34. Maciej Lasyk, Ganglia & Nagios - don't NIH! - DVCS - testing envs - think usability! - passive checks - automate – don't hardcode Monitoring: good practices 9/25
  • 35. Maciej Lasyk, Ganglia & Nagios - don't NIH! - DVCS - testing envs - think usability! - passive checks - automate – don't hardcode - security Monitoring: good practices 9/25
  • 36. Maciej Lasyk, Ganglia & Nagios Last but not least... “Quis custodiet ipsos custodes?” (Who will guard the guards?) Monitoring: good practices 9/25
  • 37. Maciej Lasyk, Ganglia & Nagios Nagios recap Host / Services / Contacts - hosts, hostgroups 10/25
  • 38. Maciej Lasyk, Ganglia & Nagios Nagios recap Host / Services / Contacts - hosts, hostgroups - services, service groups 10/25
  • 39. Maciej Lasyk, Ganglia & Nagios Nagios recap Host / Services / Contacts - hosts, hostgroups - services, service groups - templates 10/25
  • 40. Maciej Lasyk, Ganglia & Nagios Nagios recap Host / Services / Contacts - hosts, hostgroups - services, service groups - templates - time periods 10/25
  • 41. Maciej Lasyk, Ganglia & Nagios Nagios recap Host / Services / Contacts - hosts, hostgroups - services, service groups - templates - time periods - host and services dependencies 10/25
  • 42. Maciej Lasyk, Ganglia & Nagios Nagios recap Host / Services / Contacts - hosts, hostgroups - services, service groups - templates - time periods - host and services dependencies - regular expressions 10/25
  • 43. Maciej Lasyk, Ganglia & Nagios Nagios recap 10/25
  • 44. Maciej Lasyk, Ganglia & Nagios Nagios recap 10/25
  • 45. Maciej Lasyk, Ganglia & Nagios Nagios recap Checks and states - frequencies & thresholds 10/25
  • 46. Maciej Lasyk, Ganglia & Nagios Nagios recap Checks and states - frequencies & thresholds - scheduling downtimes 10/25
  • 47. Maciej Lasyk, Ganglia & Nagios Nagios recap Checks and states - frequencies & thresholds - scheduling downtimes - outages and flapping 10/25
  • 48. Maciej Lasyk, Ganglia & Nagios Nagios recap Notifications - periods 10/25
  • 49. Maciej Lasyk, Ganglia & Nagios Nagios recap Notifications - periods - groups 10/25
  • 50. Maciej Lasyk, Ganglia & Nagios Nagios recap Notifications - periods - groups - which states to be notified about? 10/25
  • 51. Maciej Lasyk, Ganglia & Nagios Nagios recap Notifications - periods - groups - which states to be notified about? - escalations / rotations 10/25
  • 52. Maciej Lasyk, Ganglia & Nagios Nagios recap Notifications - periods - groups - which states to be notified about? - escalations / rotations - custom notifications method 10/25
  • 53. Maciej Lasyk, Ganglia & Nagios Nagios recap Monitoring remotes - NRPE daemons - checks via SSH 10/25
  • 54. Maciej Lasyk, Ganglia & Nagios Nagios recap Web interface – tactical overview 10/25
  • 55. Maciej Lasyk, Ganglia & Nagios Nagios recap Web interface – availability reports 10/25
  • 56. Maciej Lasyk, Ganglia & Nagios Nagios recap Web interface – trends 10/25
  • 57. Maciej Lasyk, Ganglia & Nagios Nagios recap Web interface – network maps 10/25
  • 58. Maciej Lasyk, Ganglia & Nagios Networking recap Unicast 11/25
  • 59. Maciej Lasyk, Ganglia & Nagios Networking recap Multicast 11/25
  • 60. Maciej Lasyk, Ganglia & Nagios Networking recap Broadcast 11/25
  • 61. Maciej Lasyk, Ganglia & Nagios Ganglia – what is it? Problems of big scale: 20k hosts with zylion metrics probed every 10 seconds It is fully redundant (until you spoil it) It is very scalable Regexp searches and creating of views – adhoc :) 12/25
  • 62. Maciej Lasyk, Ganglia & Nagios Ganglia – architecture 13/25
  • 63. Maciej Lasyk, Ganglia & Nagios Ganglia – architecture 13/25
  • 64. Maciej Lasyk, Ganglia & Nagios Ganglia – topologies Default multicast topology 14/25
  • 65. Maciej Lasyk, Ganglia & Nagios Ganglia – topologies Deaf / mute multicast topology 14/25
  • 66. Maciej Lasyk, Ganglia & Nagios Ganglia – topologies Unicast topology 14/25
  • 67. Maciej Lasyk, Ganglia & Nagios Ganglia – topologies Gmetad topology 14/25
  • 68. Maciej Lasyk, Ganglia & Nagios Ganglia – topologies Gmetad HA topology (active - active) 14/25
  • 69. Maciej Lasyk, Ganglia & Nagios Ganglia – topologies Gmetad hierarchical topology 14/25
  • 70. Maciej Lasyk, Ganglia & Nagios Ganglia – RRDcached 15/25
  • 71. Maciej Lasyk, Ganglia & Nagios Ganglia – sFlow 16/25
  • 72. Maciej Lasyk, Ganglia & Nagios Ganglia – web (grid view) 17/25
  • 73. Maciej Lasyk, Ganglia & Nagios Ganglia – web (cluster view) 17/25
  • 74. Maciej Lasyk, Ganglia & Nagios Ganglia – web (physical view) 17/25
  • 75. Maciej Lasyk, Ganglia & Nagios Ganglia – web (host view) 17/25
  • 76. Maciej Lasyk, Ganglia & Nagios Ganglia – web (compare hosts) 17/25
  • 77. Maciej Lasyk, Ganglia & Nagios Ganglia – web (events) Events have API json based Think – integration with whatever app :) 17/25
  • 78. Maciej Lasyk, Ganglia & Nagios Ganglia – web (dashboards) - Create view -> apply as dashboard - Create dashboard from XML - Generate graphs and add to views 17/25
  • 79. Maciej Lasyk, Ganglia & Nagios Ganglia – web (graphs) 17/25
  • 80. Maciej Lasyk, Ganglia & Nagios Ganglia – metrics - base / extended metrics - own modules - c / c++ - mod_python - spoofing - gmetric - gmetric4j / java - Which to choose? gmetric / python / c/c++? 18/25
  • 81. Maciej Lasyk, Ganglia & Nagios Ganglia – metrics - base / extended metrics 18/25
  • 82. Maciej Lasyk, Ganglia & Nagios Ganglia – metrics - base / extended metrics - own modules 18/25
  • 83. Maciej Lasyk, Ganglia & Nagios Ganglia – metrics - base / extended metrics - own modules - c / c++ 18/25
  • 84. Maciej Lasyk, Ganglia & Nagios Ganglia – metrics - base / extended metrics - own modules - c / c++ - mod_python 18/25
  • 85. Maciej Lasyk, Ganglia & Nagios Ganglia – metrics - base / extended metrics - own modules - c / c++ - mod_python - spoofing 18/25
  • 86. Maciej Lasyk, Ganglia & Nagios Ganglia – metrics - base / extended metrics - own modules - c / c++ - mod_python - spoofing - gmetric - gmetric4j / java 18/25
  • 87. Maciej Lasyk, Ganglia & Nagios Ganglia – metrics - base / extended metrics - own modules - c / c++ - mod_python - spoofing - gmetric - gmetric4j / java - Which to choose? gmetric / python / c/c++? 18/25
  • 88. Maciej Lasyk, Ganglia & Nagios Ganglia and logfiles? ganglia-logtailer - https://bitbucket.org/maplebed/ganglia-logtailer - parser logfiles (realtime) - pushes data to ganglia (via gmetric) - yup – based on specific log formats - yet still – open source so poke around ;) 19/25
  • 89. So... Nagios + Ganglia! Maciej Lasyk, Ganglia & Nagios 3 ways of integration: - ganglia-web/nagios (PHP & bash based) https://github.com/ganglia/ganglia-web - ganglia-nagios-bridge (Python & cron based) https://github.com/ganglia/ganglia-nagios-bridge - check-ganglia-metric (Python) https://github.com/ganglia/ganglia_contrib 20/25
  • 90. Nagios + Ganglia: ganglia-web/nagios Maciej Lasyk, Ganglia & Nagios https://github.com/ganglia/ganglia-web Sending Nagios Data to Ganglia service_perfdata_command Or replace Nagios checks with Ganglia! - Check heartbeat. - Check a single metric on a specific host. - Check multiple metrics on a specific host. - Check multiple metrics across a regex-defined range of hosts 21/25
  • 91. Maciej Lasyk, Ganglia & Nagios Nagios + Ganglia: ganglia-web/nagios Nagios pulls info from Ganglia via HTTP 21/25
  • 92. Maciej Lasyk, Ganglia & Nagios Nagios + Ganglia: ganglia-nagios-bridge - https://github.com/ganglia/ganglia-nagios-bridge - Python script run in e.g. in crontab - pulls data from Ganglia XML via sockets - parses XML - send data to Nagios - Nagios commits only passive checks 22/25
  • 93. Maciej Lasyk, Ganglia & Nagios Nagios + Ganglia: check_ganglia_metric - https://pypi.python.org/pypi/check_ganglia_metric/ - basically Nagios plugin - pulls data from Ganglia XML via sockets - check_ganglia_metric.py --gmetad_host=gmetad-server.example.com --metric_host=host.example.com --metric_name=cpu_idle 23/25
  • 94. Maciej Lasyk, Ganglia & Nagios Nagios + Ganglia Which one integration should I use? 24/25
  • 95. Maciej Lasyk, Ganglia & Nagios Nagios + Ganglia Which one integration should I use? Seriously – try yourself and test 24/25
  • 96. Maciej Lasyk, Ganglia & Nagios Freenode #ganglia https://lists.sourceforge.net/lists/listinfo/ganglia-general 24.5/25
  • 97. sources? Maciej Lasyk, Ganglia & Nagios 25/25 - “Monitoring with Ganglia” book - also nagios.org - and “Web Operations” book - plus some experience ;)
  • 98. Maciej Lasyk 11. Sesja Linuksowa 2014-04-06, Wrocław http://maciek.lasyk.info/sysop maciek@lasyk.info @docent-net Ganglia & Nagios Thank you :) Maciej Lasyk, Ganglia & Nagios 25/25