(Go: >> BACK << -|- >> HOME <<)

SlideShare a Scribd company logo
Maciej Lasyk, Ganglia & Nagios
Maciej Lasyk
11. Sesja Linuksowa
Wrocław, 2014-04-06
Ganglia & Nagios
Ganglia.. what?
Ganglia – cluster / group of neurons found outside
the central nervous system
Maciej Lasyk, Ganglia & Nagios 2/25
Just a little about monitoring
- the need for monitoring
Maciej Lasyk, Ganglia & Nagios 3/25
Just a little about monitoring
- the need for monitoring
- measuring availability
Maciej Lasyk, Ganglia & Nagios 3/25
Just a little about monitoring
- the need for monitoring
- measuring availability
- measuring performance
Maciej Lasyk, Ganglia & Nagios 3/25
Just a little about monitoring
- the need for monitoring
- measuring availability
- measuring performance
- gathering additional metrics
Maciej Lasyk, Ganglia & Nagios 3/25
Monitoring is critical for HA
How to measure availability?
Maciej Lasyk, Ganglia & Nagios 4/25
Monitoring is critical for HA
How to measure availability?
A = Uptime / (Uptime + Downtime)
Maciej Lasyk, Ganglia & Nagios 4/25
Monitoring is critical for HA
How to measure availability?
A = Uptime / (Uptime + Downtime)
MTTD (Mean Time to Diagnose)
The average time it takes to diagnose the problem
Maciej Lasyk, Ganglia & Nagios 4/25
Monitoring is critical for HA
How to measure availability?
A = Uptime / (Uptime + Downtime)
MTTD (Mean Time to Diagnose)
The average time it takes to diagnose the problem
MTTR (Mean Time to Repair)
The average time it takes to fix a problem
Maciej Lasyk, Ganglia & Nagios 4/25
Monitoring is critical for HA
How to measure availability?
A = Uptime / (Uptime + Downtime)
MTTD (Mean Time to Diagnose)
The average time it takes to diagnose the problem
MTTR (Mean Time to Repair)
The average time it takes to fix a problem
MTTF (Mean Time to Failure)
The average time there is correct behavior
Maciej Lasyk, Ganglia & Nagios 4/25
Monitoring is critical for HA
How to measure availability?
A = Uptime / (Uptime + Downtime)
MTTD (Mean Time to Diagnose)
The average time it takes to diagnose the problem
MTTR (Mean Time to Repair)
The average time it takes to fix a problem
MTTF (Mean Time to Failure)
The average time there is correct behavior
MTBF (Mean Time Between Failures)
The average time between different failures of the service
Maciej Lasyk, Ganglia & Nagios 4/25
Monitoring is critical for HA
Maciej Lasyk, Ganglia & Nagios 4/25
Monitoring is critical for HA
Maciej Lasyk, Ganglia & Nagios
What should we monitor?
Maciej Lasyk, Ganglia & Nagios
- hardware housing
- devices
- storage
- network
- hosts
- software (very deep hole)
What should we monitor?
Maciej Lasyk, Ganglia & Nagios
- hardware housing
- devices
- storage
- network
- hosts
- software (very deep hole)
Think dependencies!
When outage hits us – don't panic!
Maciej Lasyk, Ganglia & Nagios
- Notifications
When outage hits us – don't panic!
Maciej Lasyk, Ganglia & Nagios
- Notifications
- Escalations
L1 <-> L2 <-> L3 <-> L4 lol ;)
desktop support / devs / ops / networking /
/ storage / middleware / dc / security
When outage hits us – don't panic!
Maciej Lasyk, Ganglia & Nagios
- Notifications
- Escalations
L1 <-> L2 <-> L3 <-> L4 lol ;)
desktop support / devs / ops / networking /
/ storage / middleware / dc / security
- Clock is ticking – it should be simple
When outage hits us – don't panic!
Maciej Lasyk, Ganglia & Nagios
- Notifications
- Escalations
L1 <-> L2 <-> L3 <-> L4 lol ;)
desktop support / devs / ops / networking /
/ storage / middleware / dc / security
- Clock is ticking – it should be simple
- What if cell is offline or someone is out?
Monitoring: notifications issues
Maciej Lasyk, Ganglia & Nagios
- false positives
Maciej Lasyk, Ganglia & Nagios
- false positives
- major events
Monitoring: notifications issues
Maciej Lasyk, Ganglia & Nagios
- false positives
- major events
- failover notifications?
Monitoring: notifications issues
Maciej Lasyk, Ganglia & Nagios
- false positives
- major events
- failover notifications?
- tolerance & critical thresholds
Monitoring: notifications issues
Monitoring: reporting
Maciej Lasyk, Ganglia & Nagios
- baseline
Maciej Lasyk, Ganglia & Nagios
- baseline
- correlation between incidents and
change management
Monitoring: reporting
Maciej Lasyk, Ganglia & Nagios
- baseline
- correlation between incidents and
change management
- trending info
Monitoring: reporting
Maciej Lasyk, Ganglia & Nagios
- baseline
- correlation between incidents and
change management
- trending info
- reporting
Monitoring: reporting
Monitoring: good practices
Maciej Lasyk, Ganglia & Nagios
- don't NIH!
Maciej Lasyk, Ganglia & Nagios
- don't NIH!
Monitoring: good practices
Maciej Lasyk, Ganglia & Nagios
- don't NIH!
- testing envs
Monitoring: good practices
Maciej Lasyk, Ganglia & Nagios
- don't NIH!
- testing envs
- think usability!
Monitoring: good practices
Maciej Lasyk, Ganglia & Nagios
- don't NIH!
- testing envs
- think usability!
- passive checks
Monitoring: good practices
Maciej Lasyk, Ganglia & Nagios
- don't NIH!
- testing envs
- think usability!
- passive checks
- automate – don't hardcode
Monitoring: good practices
Maciej Lasyk, Ganglia & Nagios
- don't NIH!
- testing envs
- think usability!
- passive checks
- automate – don't hardcode
- security
Monitoring: good practices
Maciej Lasyk, Ganglia & Nagios
Last but not least...
“Quis custodiet ipsos custodes?”
(Who will guard the guards?)
Monitoring: good practices
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Host / Services / Contacts
- hosts, hostgroups
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Host / Services / Contacts
- hosts, hostgroups
- services, service groups
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Host / Services / Contacts
- hosts, hostgroups
- services, service groups
- templates
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Host / Services / Contacts
- hosts, hostgroups
- services, service groups
- templates
- time periods
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Host / Services / Contacts
- hosts, hostgroups
- services, service groups
- templates
- time periods
- host and services dependencies
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Host / Services / Contacts
- hosts, hostgroups
- services, service groups
- templates
- time periods
- host and services dependencies
- regular expressions
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Checks and states
- frequencies & thresholds
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Checks and states
- frequencies & thresholds
- scheduling downtimes
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Checks and states
- frequencies & thresholds
- scheduling downtimes
- outages and flapping
Maciej Lasyk, Ganglia & Nagios
Nagios recap
- periods
Maciej Lasyk, Ganglia & Nagios
Nagios recap
- periods
- groups
Maciej Lasyk, Ganglia & Nagios
Nagios recap
- periods
- groups
- which states to be notified about?
Maciej Lasyk, Ganglia & Nagios
Nagios recap
- periods
- groups
- which states to be notified about?
- escalations / rotations
Maciej Lasyk, Ganglia & Nagios
Nagios recap
- periods
- groups
- which states to be notified about?
- escalations / rotations
- custom notifications method
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Monitoring remotes
- NRPE daemons
- checks via SSH
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Web interface – tactical overview
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Web interface – availability reports
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Web interface – trends
Maciej Lasyk, Ganglia & Nagios
Nagios recap
Web interface – network maps
Maciej Lasyk, Ganglia & Nagios
Networking recap
Maciej Lasyk, Ganglia & Nagios
Networking recap
Maciej Lasyk, Ganglia & Nagios
Networking recap
Maciej Lasyk, Ganglia & Nagios
Ganglia – what is it?
Problems of big scale:
20k hosts with zylion metrics probed every 10 seconds
It is fully redundant (until you spoil it)
It is very scalable
Regexp searches and creating of views – adhoc :)
Maciej Lasyk, Ganglia & Nagios
Ganglia – architecture
Maciej Lasyk, Ganglia & Nagios
Ganglia – architecture
Maciej Lasyk, Ganglia & Nagios
Ganglia – topologies
Default multicast topology
Maciej Lasyk, Ganglia & Nagios
Ganglia – topologies
Deaf / mute multicast topology
Maciej Lasyk, Ganglia & Nagios
Ganglia – topologies
Unicast topology
Maciej Lasyk, Ganglia & Nagios
Ganglia – topologies
Gmetad topology
Maciej Lasyk, Ganglia & Nagios
Ganglia – topologies
Gmetad HA topology (active - active)
Maciej Lasyk, Ganglia & Nagios
Ganglia – topologies
Gmetad hierarchical topology
Maciej Lasyk, Ganglia & Nagios
Ganglia – RRDcached
Maciej Lasyk, Ganglia & Nagios
Ganglia – sFlow
Maciej Lasyk, Ganglia & Nagios
Ganglia – web (grid view)
Maciej Lasyk, Ganglia & Nagios
Ganglia – web (cluster view)
Maciej Lasyk, Ganglia & Nagios
Ganglia – web (physical view)
Maciej Lasyk, Ganglia & Nagios
Ganglia – web (host view)
Maciej Lasyk, Ganglia & Nagios
Ganglia – web (compare hosts)
Maciej Lasyk, Ganglia & Nagios
Ganglia – web (events)
Events have API json based
Think – integration with whatever app :)
Maciej Lasyk, Ganglia & Nagios
Ganglia – web (dashboards)
- Create view -> apply as dashboard
- Create dashboard from XML
- Generate graphs and add to views
Maciej Lasyk, Ganglia & Nagios
Ganglia – web (graphs)
Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
- own modules
- c / c++
- mod_python
- spoofing
- gmetric
- gmetric4j / java
- Which to choose? gmetric / python / c/c++?
Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
- own modules
Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
- own modules
- c / c++
Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
- own modules
- c / c++
- mod_python
Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
- own modules
- c / c++
- mod_python
- spoofing
Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
- own modules
- c / c++
- mod_python
- spoofing
- gmetric
- gmetric4j / java
Maciej Lasyk, Ganglia & Nagios
Ganglia – metrics
- base / extended metrics
- own modules
- c / c++
- mod_python
- spoofing
- gmetric
- gmetric4j / java
- Which to choose? gmetric / python / c/c++?
Maciej Lasyk, Ganglia & Nagios
Ganglia and logfiles?
- https://bitbucket.org/maplebed/ganglia-logtailer
- parser logfiles (realtime)
- pushes data to ganglia (via gmetric)
- yup – based on specific log formats
- yet still – open source so poke around ;)
So... Nagios + Ganglia!
Maciej Lasyk, Ganglia & Nagios
3 ways of integration:
- ganglia-web/nagios (PHP & bash based)
- ganglia-nagios-bridge (Python & cron based)
- check-ganglia-metric (Python)
Nagios + Ganglia: ganglia-web/nagios
Maciej Lasyk, Ganglia & Nagios
Sending Nagios Data to Ganglia
Or replace Nagios checks with Ganglia!
- Check heartbeat.
- Check a single metric on a specific host.
- Check multiple metrics on a specific host.
- Check multiple metrics across a regex-defined
range of hosts
Maciej Lasyk, Ganglia & Nagios
Nagios + Ganglia: ganglia-web/nagios
Nagios pulls info from Ganglia via HTTP
Maciej Lasyk, Ganglia & Nagios
Nagios + Ganglia: ganglia-nagios-bridge
- https://github.com/ganglia/ganglia-nagios-bridge
- Python script run in e.g. in crontab
- pulls data from Ganglia XML via sockets
- parses XML
- send data to Nagios
- Nagios commits only passive checks
Maciej Lasyk, Ganglia & Nagios
Nagios + Ganglia: check_ganglia_metric
- https://pypi.python.org/pypi/check_ganglia_metric/
- basically Nagios plugin
- pulls data from Ganglia XML via sockets
- check_ganglia_metric.py 
--metric_host=host.example.com --metric_name=cpu_idle
Maciej Lasyk, Ganglia & Nagios
Nagios + Ganglia
Which one integration should I use?
Maciej Lasyk, Ganglia & Nagios
Nagios + Ganglia
Which one integration should I use?
Seriously – try yourself and test
Maciej Lasyk, Ganglia & Nagios
Freenode #ganglia
Maciej Lasyk, Ganglia & Nagios 25/25
- “Monitoring with Ganglia” book
- also nagios.org
- and “Web Operations” book
- plus some experience ;)
Maciej Lasyk
11. Sesja Linuksowa
2014-04-06, Wrocław
Ganglia & Nagios
Thank you :)
Maciej Lasyk, Ganglia & Nagios 25/25

More Related Content

Viewers also liked

Using Nagios with Chef
Using Nagios with ChefUsing Nagios with Chef
Using Nagios with Chef
Bryan McLellan
Nagios core vs. nagios xi presentation power point.pptx [diperbaiki]
Nagios core vs. nagios xi presentation power point.pptx [diperbaiki]Nagios core vs. nagios xi presentation power point.pptx [diperbaiki]
Nagios core vs. nagios xi presentation power point.pptx [diperbaiki]
Fanky Christian
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise EditionMarcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Monitoring with Ganglia
Monitoring with GangliaMonitoring with Ganglia
Monitoring with Ganglia
Nagios Conference 2013 - Eric Stanley and Andy Brist - API and Nagios
Nagios Conference 2013 - Eric Stanley and Andy Brist - API and NagiosNagios Conference 2013 - Eric Stanley and Andy Brist - API and Nagios
Nagios Conference 2013 - Eric Stanley and Andy Brist - API and Nagios
Time to say goodbye to your Nagios based setup
Time to say goodbye to your Nagios based setupTime to say goodbye to your Nagios based setup
Time to say goodbye to your Nagios based setup
Check my Website
Nagios Conference 2012 - Mike Weber - Failover
Nagios Conference 2012 - Mike Weber - FailoverNagios Conference 2012 - Mike Weber - Failover
Nagios Conference 2012 - Mike Weber - Failover
Hitesh Bhatia
Nagios, Getting Started.
Nagios, Getting Started.Nagios, Getting Started.
Nagios, Getting Started.
Hitesh Bhatia
Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...
Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...
Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...
OTechs Network Monitoring (Nagios) Training Course
OTechs Network Monitoring (Nagios) Training CourseOTechs Network Monitoring (Nagios) Training Course
OTechs Network Monitoring (Nagios) Training Course
Osman Suliman
Nagios Conference 2011 - David Thomas - Know Its Broke Before Your Customers Do
Nagios Conference 2011 - David Thomas - Know Its Broke Before Your Customers DoNagios Conference 2011 - David Thomas - Know Its Broke Before Your Customers Do
Nagios Conference 2011 - David Thomas - Know Its Broke Before Your Customers Do
Nagios Consulting Implementation and Maintenance
Nagios Consulting Implementation and MaintenanceNagios Consulting Implementation and Maintenance
Nagios Consulting Implementation and Maintenance
Razak Mohammed Ali
Nagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XI
Nagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XINagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XI
Nagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XI
Metrics with Ganglia
Metrics with GangliaMetrics with Ganglia
Metrics with Ganglia
Gareth Rushgrove
Nagios Conference 2012 - Mike Weber - NRPE
Nagios Conference 2012 - Mike Weber - NRPENagios Conference 2012 - Mike Weber - NRPE
Nagios Conference 2012 - Mike Weber - NRPE
NagiosXI - Astiostech NagiosXI Event with NTT MSC Cyberjaya
NagiosXI - Astiostech NagiosXI Event with NTT MSC CyberjayaNagiosXI - Astiostech NagiosXI Event with NTT MSC Cyberjaya
NagiosXI - Astiostech NagiosXI Event with NTT MSC Cyberjaya
Sanjay Willie
Nagios Conference 2011 - Mike Guthrie - Distributed Monitoring With Nagios
Nagios Conference 2011 - Mike Guthrie - Distributed Monitoring With NagiosNagios Conference 2011 - Mike Guthrie - Distributed Monitoring With Nagios
Nagios Conference 2011 - Mike Guthrie - Distributed Monitoring With Nagios
Janice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios PluginsJanice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios Plugins

Viewers also liked (19)

Using Nagios with Chef
Using Nagios with ChefUsing Nagios with Chef
Using Nagios with Chef
Nagios core vs. nagios xi presentation power point.pptx [diperbaiki]
Nagios core vs. nagios xi presentation power point.pptx [diperbaiki]Nagios core vs. nagios xi presentation power point.pptx [diperbaiki]
Nagios core vs. nagios xi presentation power point.pptx [diperbaiki]
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise EditionMarcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Monitoring with Ganglia
Monitoring with GangliaMonitoring with Ganglia
Monitoring with Ganglia
Nagios Conference 2013 - Eric Stanley and Andy Brist - API and Nagios
Nagios Conference 2013 - Eric Stanley and Andy Brist - API and NagiosNagios Conference 2013 - Eric Stanley and Andy Brist - API and Nagios
Nagios Conference 2013 - Eric Stanley and Andy Brist - API and Nagios
Time to say goodbye to your Nagios based setup
Time to say goodbye to your Nagios based setupTime to say goodbye to your Nagios based setup
Time to say goodbye to your Nagios based setup
Nagios Conference 2012 - Mike Weber - Failover
Nagios Conference 2012 - Mike Weber - FailoverNagios Conference 2012 - Mike Weber - Failover
Nagios Conference 2012 - Mike Weber - Failover
Nagios, Getting Started.
Nagios, Getting Started.Nagios, Getting Started.
Nagios, Getting Started.
Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...
Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...
Nagios Conference 2014 - Konstantin Benz - Monitoring Openstack The Relations...
OTechs Network Monitoring (Nagios) Training Course
OTechs Network Monitoring (Nagios) Training CourseOTechs Network Monitoring (Nagios) Training Course
OTechs Network Monitoring (Nagios) Training Course
Nagios Conference 2011 - David Thomas - Know Its Broke Before Your Customers Do
Nagios Conference 2011 - David Thomas - Know Its Broke Before Your Customers DoNagios Conference 2011 - David Thomas - Know Its Broke Before Your Customers Do
Nagios Conference 2011 - David Thomas - Know Its Broke Before Your Customers Do
Nagios Consulting Implementation and Maintenance
Nagios Consulting Implementation and MaintenanceNagios Consulting Implementation and Maintenance
Nagios Consulting Implementation and Maintenance
Nagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XI
Nagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XINagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XI
Nagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XI
Metrics with Ganglia
Metrics with GangliaMetrics with Ganglia
Metrics with Ganglia
Nagios Conference 2012 - Mike Weber - NRPE
Nagios Conference 2012 - Mike Weber - NRPENagios Conference 2012 - Mike Weber - NRPE
Nagios Conference 2012 - Mike Weber - NRPE
NagiosXI - Astiostech NagiosXI Event with NTT MSC Cyberjaya
NagiosXI - Astiostech NagiosXI Event with NTT MSC CyberjayaNagiosXI - Astiostech NagiosXI Event with NTT MSC Cyberjaya
NagiosXI - Astiostech NagiosXI Event with NTT MSC Cyberjaya
Nagios Conference 2011 - Mike Guthrie - Distributed Monitoring With Nagios
Nagios Conference 2011 - Mike Guthrie - Distributed Monitoring With NagiosNagios Conference 2011 - Mike Guthrie - Distributed Monitoring With Nagios
Nagios Conference 2011 - Mike Guthrie - Distributed Monitoring With Nagios
Janice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios PluginsJanice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios Plugins

More from Maciej Lasyk

Rundeck & Ansible
Rundeck & AnsibleRundeck & Ansible
Rundeck & Ansible
Maciej Lasyk
Docker 1.11
Docker 1.11Docker 1.11
Docker 1.11
Maciej Lasyk
Programowanie AWSa z CLI, boto, Ansiblem i libcloudem
Programowanie AWSa z CLI, boto, Ansiblem i libcloudemProgramowanie AWSa z CLI, boto, Ansiblem i libcloudem
Programowanie AWSa z CLI, boto, Ansiblem i libcloudem
Maciej Lasyk
Co powinieneś wiedzieć na temat devops?f
Co powinieneś wiedzieć na temat devops?f Co powinieneś wiedzieć na temat devops?f
Co powinieneś wiedzieć na temat devops?f
Maciej Lasyk
"Containers do not contain"
"Containers do not contain""Containers do not contain"
"Containers do not contain"
Maciej Lasyk
Git Submodules
Git SubmodulesGit Submodules
Git Submodules
Maciej Lasyk
Linux containers & Devops
Linux containers & DevopsLinux containers & Devops
Linux containers & Devops
Maciej Lasyk
Under the Dome (of failure driven pipeline)
Under the Dome (of failure driven pipeline)Under the Dome (of failure driven pipeline)
Under the Dome (of failure driven pipeline)
Maciej Lasyk
Continuous Security in DevOps
Continuous Security in DevOpsContinuous Security in DevOps
Continuous Security in DevOps
Maciej Lasyk
About cultural change w/Devops
About cultural change w/DevopsAbout cultural change w/Devops
About cultural change w/Devops
Maciej Lasyk
Orchestrating docker containers at scale (#DockerKRK edition)
Orchestrating docker containers at scale (#DockerKRK edition)Orchestrating docker containers at scale (#DockerKRK edition)
Orchestrating docker containers at scale (#DockerKRK edition)
Maciej Lasyk
Orchestrating docker containers at scale (PJUG edition)
Orchestrating docker containers at scale (PJUG edition)Orchestrating docker containers at scale (PJUG edition)
Orchestrating docker containers at scale (PJUG edition)
Maciej Lasyk
Orchestrating Docker containers at scale
Orchestrating Docker containers at scaleOrchestrating Docker containers at scale
Orchestrating Docker containers at scale
Maciej Lasyk
Ghost in the shell
Ghost in the shellGhost in the shell
Ghost in the shell
Maciej Lasyk
Scaling and securing node.js apps
Scaling and securing node.js appsScaling and securing node.js apps
Scaling and securing node.js apps
Maciej Lasyk
Node.js security
Node.js securityNode.js security
Node.js security
Maciej Lasyk
High Availability (HA) Explained - second edition
High Availability (HA) Explained - second editionHigh Availability (HA) Explained - second edition
High Availability (HA) Explained - second edition
Maciej Lasyk
Stop disabling SELinux!
Stop disabling SELinux!Stop disabling SELinux!
Stop disabling SELinux!
Maciej Lasyk
RHEL/Fedora + Docker (and SELinux)
RHEL/Fedora + Docker (and SELinux)RHEL/Fedora + Docker (and SELinux)
RHEL/Fedora + Docker (and SELinux)
Maciej Lasyk
High Availability (HA) Explained
High Availability (HA) ExplainedHigh Availability (HA) Explained
High Availability (HA) Explained
Maciej Lasyk

More from Maciej Lasyk (20)

Rundeck & Ansible
Rundeck & AnsibleRundeck & Ansible
Rundeck & Ansible
Docker 1.11
Docker 1.11Docker 1.11
Docker 1.11
Programowanie AWSa z CLI, boto, Ansiblem i libcloudem
Programowanie AWSa z CLI, boto, Ansiblem i libcloudemProgramowanie AWSa z CLI, boto, Ansiblem i libcloudem
Programowanie AWSa z CLI, boto, Ansiblem i libcloudem
Co powinieneś wiedzieć na temat devops?f
Co powinieneś wiedzieć na temat devops?f Co powinieneś wiedzieć na temat devops?f
Co powinieneś wiedzieć na temat devops?f
"Containers do not contain"
"Containers do not contain""Containers do not contain"
"Containers do not contain"
Git Submodules
Git SubmodulesGit Submodules
Git Submodules
Linux containers & Devops
Linux containers & DevopsLinux containers & Devops
Linux containers & Devops
Under the Dome (of failure driven pipeline)
Under the Dome (of failure driven pipeline)Under the Dome (of failure driven pipeline)
Under the Dome (of failure driven pipeline)
Continuous Security in DevOps
Continuous Security in DevOpsContinuous Security in DevOps
Continuous Security in DevOps
About cultural change w/Devops
About cultural change w/DevopsAbout cultural change w/Devops
About cultural change w/Devops
Orchestrating docker containers at scale (#DockerKRK edition)
Orchestrating docker containers at scale (#DockerKRK edition)Orchestrating docker containers at scale (#DockerKRK edition)
Orchestrating docker containers at scale (#DockerKRK edition)
Orchestrating docker containers at scale (PJUG edition)
Orchestrating docker containers at scale (PJUG edition)Orchestrating docker containers at scale (PJUG edition)
Orchestrating docker containers at scale (PJUG edition)
Orchestrating Docker containers at scale
Orchestrating Docker containers at scaleOrchestrating Docker containers at scale
Orchestrating Docker containers at scale
Ghost in the shell
Ghost in the shellGhost in the shell
Ghost in the shell
Scaling and securing node.js apps
Scaling and securing node.js appsScaling and securing node.js apps
Scaling and securing node.js apps
Node.js security
Node.js securityNode.js security
Node.js security
High Availability (HA) Explained - second edition
High Availability (HA) Explained - second editionHigh Availability (HA) Explained - second edition
High Availability (HA) Explained - second edition
Stop disabling SELinux!
Stop disabling SELinux!Stop disabling SELinux!
Stop disabling SELinux!
RHEL/Fedora + Docker (and SELinux)
RHEL/Fedora + Docker (and SELinux)RHEL/Fedora + Docker (and SELinux)
RHEL/Fedora + Docker (and SELinux)
High Availability (HA) Explained
High Availability (HA) ExplainedHigh Availability (HA) Explained
High Availability (HA) Explained

Recently uploaded

The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
Larry Smarr
HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024
The Digital Insurer
What Not to Document and Why_ (North Bay Python 2024)
What Not to Document and Why_ (North Bay Python 2024)What Not to Document and Why_ (North Bay Python 2024)
What Not to Document and Why_ (North Bay Python 2024)
Margaret Fero
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Eric D. Schabell
Hire a private investigator to get cell phone records
Hire a private investigator to get cell phone recordsHire a private investigator to get cell phone records
Hire a private investigator to get cell phone records
What's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdfWhat's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdf
this resume for sadika shaikh bca student
this resume for sadika shaikh bca studentthis resume for sadika shaikh bca student
this resume for sadika shaikh bca student
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design ApproachesKnowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Earley Information Science
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecGDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
James Anderson
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
Aurora Consulting
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx
STKI Israeli Market Study 2024 final v1
STKI Israeli Market Study 2024 final  v1STKI Israeli Market Study 2024 final  v1
STKI Israeli Market Study 2024 final v1
Dr. Jimmy Schwarzkopf
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
Interaction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance MetricInteraction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance Metric

Recently uploaded (20)

The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024
What Not to Document and Why_ (North Bay Python 2024)
What Not to Document and Why_ (North Bay Python 2024)What Not to Document and Why_ (North Bay Python 2024)
What Not to Document and Why_ (North Bay Python 2024)
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Hire a private investigator to get cell phone records
Hire a private investigator to get cell phone recordsHire a private investigator to get cell phone records
Hire a private investigator to get cell phone records
What's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdfWhat's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdf
this resume for sadika shaikh bca student
this resume for sadika shaikh bca studentthis resume for sadika shaikh bca student
this resume for sadika shaikh bca student
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design ApproachesKnowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecGDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx
STKI Israeli Market Study 2024 final v1
STKI Israeli Market Study 2024 final  v1STKI Israeli Market Study 2024 final  v1
STKI Israeli Market Study 2024 final v1
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
Interaction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance MetricInteraction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance Metric

Monitoring with Nagios and Ganglia

  • 1. Maciej Lasyk, Ganglia & Nagios Maciej Lasyk 11. Sesja Linuksowa Wrocław, 2014-04-06 1/25 Ganglia & Nagios
  • 2. Ganglia.. what? Ganglia – cluster / group of neurons found outside the central nervous system Maciej Lasyk, Ganglia & Nagios 2/25
  • 3. Just a little about monitoring - the need for monitoring Maciej Lasyk, Ganglia & Nagios 3/25
  • 4. Just a little about monitoring - the need for monitoring - measuring availability Maciej Lasyk, Ganglia & Nagios 3/25
  • 5. Just a little about monitoring - the need for monitoring - measuring availability - measuring performance Maciej Lasyk, Ganglia & Nagios 3/25
  • 6. Just a little about monitoring - the need for monitoring - measuring availability - measuring performance - gathering additional metrics Maciej Lasyk, Ganglia & Nagios 3/25
  • 7. Monitoring is critical for HA How to measure availability? Maciej Lasyk, Ganglia & Nagios 4/25
  • 8. Monitoring is critical for HA How to measure availability? A = Uptime / (Uptime + Downtime) Maciej Lasyk, Ganglia & Nagios 4/25
  • 9. Monitoring is critical for HA How to measure availability? A = Uptime / (Uptime + Downtime) MTTD (Mean Time to Diagnose) The average time it takes to diagnose the problem Maciej Lasyk, Ganglia & Nagios 4/25
  • 10. Monitoring is critical for HA How to measure availability? A = Uptime / (Uptime + Downtime) MTTD (Mean Time to Diagnose) The average time it takes to diagnose the problem MTTR (Mean Time to Repair) The average time it takes to fix a problem Maciej Lasyk, Ganglia & Nagios 4/25
  • 11. Monitoring is critical for HA How to measure availability? A = Uptime / (Uptime + Downtime) MTTD (Mean Time to Diagnose) The average time it takes to diagnose the problem MTTR (Mean Time to Repair) The average time it takes to fix a problem MTTF (Mean Time to Failure) The average time there is correct behavior Maciej Lasyk, Ganglia & Nagios 4/25
  • 12. Monitoring is critical for HA How to measure availability? A = Uptime / (Uptime + Downtime) MTTD (Mean Time to Diagnose) The average time it takes to diagnose the problem MTTR (Mean Time to Repair) The average time it takes to fix a problem MTTF (Mean Time to Failure) The average time there is correct behavior MTBF (Mean Time Between Failures) The average time between different failures of the service Maciej Lasyk, Ganglia & Nagios 4/25
  • 13. Monitoring is critical for HA Maciej Lasyk, Ganglia & Nagios 4/25
  • 14. Monitoring is critical for HA Maciej Lasyk, Ganglia & Nagios A = MTTF / MTBF = MTTF / (MTTF + MTTD + MTTR) 4/25
  • 15. What should we monitor? Maciej Lasyk, Ganglia & Nagios - hardware housing - devices - storage - network - hosts - software (very deep hole) 5/25
  • 16. What should we monitor? Maciej Lasyk, Ganglia & Nagios - hardware housing - devices - storage - network - hosts - software (very deep hole) Think dependencies! 5/25
  • 17. When outage hits us – don't panic! Maciej Lasyk, Ganglia & Nagios - Notifications 6/25
  • 18. When outage hits us – don't panic! Maciej Lasyk, Ganglia & Nagios - Notifications - Escalations L1 <-> L2 <-> L3 <-> L4 lol ;) desktop support / devs / ops / networking / / storage / middleware / dc / security 6/25
  • 19. When outage hits us – don't panic! Maciej Lasyk, Ganglia & Nagios - Notifications - Escalations L1 <-> L2 <-> L3 <-> L4 lol ;) desktop support / devs / ops / networking / / storage / middleware / dc / security - Clock is ticking – it should be simple 6/25
  • 20. When outage hits us – don't panic! Maciej Lasyk, Ganglia & Nagios - Notifications - Escalations L1 <-> L2 <-> L3 <-> L4 lol ;) desktop support / devs / ops / networking / / storage / middleware / dc / security - Clock is ticking – it should be simple - What if cell is offline or someone is out? 6/25
  • 21. Monitoring: notifications issues Maciej Lasyk, Ganglia & Nagios - false positives 7/25
  • 22. Maciej Lasyk, Ganglia & Nagios - false positives - major events Monitoring: notifications issues 7/25
  • 23. Maciej Lasyk, Ganglia & Nagios - false positives - major events - failover notifications? Monitoring: notifications issues 7/25
  • 24. Maciej Lasyk, Ganglia & Nagios - false positives - major events - failover notifications? - tolerance & critical thresholds Monitoring: notifications issues 7/25
  • 25. Monitoring: reporting Maciej Lasyk, Ganglia & Nagios - baseline 8/25
  • 26. Maciej Lasyk, Ganglia & Nagios - baseline - correlation between incidents and change management Monitoring: reporting 8/25
  • 27. Maciej Lasyk, Ganglia & Nagios - baseline - correlation between incidents and change management - trending info Monitoring: reporting 8/25
  • 28. Maciej Lasyk, Ganglia & Nagios - baseline - correlation between incidents and change management - trending info - reporting Monitoring: reporting 8/25
  • 29. Monitoring: good practices Maciej Lasyk, Ganglia & Nagios - don't NIH! 9/25
  • 30. Maciej Lasyk, Ganglia & Nagios - don't NIH! - DVCS Monitoring: good practices 9/25
  • 31. Maciej Lasyk, Ganglia & Nagios - don't NIH! - DVCS - testing envs Monitoring: good practices 9/25
  • 32. Maciej Lasyk, Ganglia & Nagios - don't NIH! - DVCS - testing envs - think usability! Monitoring: good practices 9/25
  • 33. Maciej Lasyk, Ganglia & Nagios - don't NIH! - DVCS - testing envs - think usability! - passive checks Monitoring: good practices 9/25
  • 34. Maciej Lasyk, Ganglia & Nagios - don't NIH! - DVCS - testing envs - think usability! - passive checks - automate – don't hardcode Monitoring: good practices 9/25
  • 35. Maciej Lasyk, Ganglia & Nagios - don't NIH! - DVCS - testing envs - think usability! - passive checks - automate – don't hardcode - security Monitoring: good practices 9/25
  • 36. Maciej Lasyk, Ganglia & Nagios Last but not least... “Quis custodiet ipsos custodes?” (Who will guard the guards?) Monitoring: good practices 9/25
  • 37. Maciej Lasyk, Ganglia & Nagios Nagios recap Host / Services / Contacts - hosts, hostgroups 10/25
  • 38. Maciej Lasyk, Ganglia & Nagios Nagios recap Host / Services / Contacts - hosts, hostgroups - services, service groups 10/25
  • 39. Maciej Lasyk, Ganglia & Nagios Nagios recap Host / Services / Contacts - hosts, hostgroups - services, service groups - templates 10/25
  • 40. Maciej Lasyk, Ganglia & Nagios Nagios recap Host / Services / Contacts - hosts, hostgroups - services, service groups - templates - time periods 10/25
  • 41. Maciej Lasyk, Ganglia & Nagios Nagios recap Host / Services / Contacts - hosts, hostgroups - services, service groups - templates - time periods - host and services dependencies 10/25
  • 42. Maciej Lasyk, Ganglia & Nagios Nagios recap Host / Services / Contacts - hosts, hostgroups - services, service groups - templates - time periods - host and services dependencies - regular expressions 10/25
  • 43. Maciej Lasyk, Ganglia & Nagios Nagios recap 10/25
  • 44. Maciej Lasyk, Ganglia & Nagios Nagios recap 10/25
  • 45. Maciej Lasyk, Ganglia & Nagios Nagios recap Checks and states - frequencies & thresholds 10/25
  • 46. Maciej Lasyk, Ganglia & Nagios Nagios recap Checks and states - frequencies & thresholds - scheduling downtimes 10/25
  • 47. Maciej Lasyk, Ganglia & Nagios Nagios recap Checks and states - frequencies & thresholds - scheduling downtimes - outages and flapping 10/25
  • 48. Maciej Lasyk, Ganglia & Nagios Nagios recap Notifications - periods 10/25
  • 49. Maciej Lasyk, Ganglia & Nagios Nagios recap Notifications - periods - groups 10/25
  • 50. Maciej Lasyk, Ganglia & Nagios Nagios recap Notifications - periods - groups - which states to be notified about? 10/25
  • 51. Maciej Lasyk, Ganglia & Nagios Nagios recap Notifications - periods - groups - which states to be notified about? - escalations / rotations 10/25
  • 52. Maciej Lasyk, Ganglia & Nagios Nagios recap Notifications - periods - groups - which states to be notified about? - escalations / rotations - custom notifications method 10/25
  • 53. Maciej Lasyk, Ganglia & Nagios Nagios recap Monitoring remotes - NRPE daemons - checks via SSH 10/25
  • 54. Maciej Lasyk, Ganglia & Nagios Nagios recap Web interface – tactical overview 10/25
  • 55. Maciej Lasyk, Ganglia & Nagios Nagios recap Web interface – availability reports 10/25
  • 56. Maciej Lasyk, Ganglia & Nagios Nagios recap Web interface – trends 10/25
  • 57. Maciej Lasyk, Ganglia & Nagios Nagios recap Web interface – network maps 10/25
  • 58. Maciej Lasyk, Ganglia & Nagios Networking recap Unicast 11/25
  • 59. Maciej Lasyk, Ganglia & Nagios Networking recap Multicast 11/25
  • 60. Maciej Lasyk, Ganglia & Nagios Networking recap Broadcast 11/25
  • 61. Maciej Lasyk, Ganglia & Nagios Ganglia – what is it? Problems of big scale: 20k hosts with zylion metrics probed every 10 seconds It is fully redundant (until you spoil it) It is very scalable Regexp searches and creating of views – adhoc :) 12/25
  • 62. Maciej Lasyk, Ganglia & Nagios Ganglia – architecture 13/25
  • 63. Maciej Lasyk, Ganglia & Nagios Ganglia – architecture 13/25
  • 64. Maciej Lasyk, Ganglia & Nagios Ganglia – topologies Default multicast topology 14/25
  • 65. Maciej Lasyk, Ganglia & Nagios Ganglia – topologies Deaf / mute multicast topology 14/25
  • 66. Maciej Lasyk, Ganglia & Nagios Ganglia – topologies Unicast topology 14/25
  • 67. Maciej Lasyk, Ganglia & Nagios Ganglia – topologies Gmetad topology 14/25
  • 68. Maciej Lasyk, Ganglia & Nagios Ganglia – topologies Gmetad HA topology (active - active) 14/25
  • 69. Maciej Lasyk, Ganglia & Nagios Ganglia – topologies Gmetad hierarchical topology 14/25
  • 70. Maciej Lasyk, Ganglia & Nagios Ganglia – RRDcached 15/25
  • 71. Maciej Lasyk, Ganglia & Nagios Ganglia – sFlow 16/25
  • 72. Maciej Lasyk, Ganglia & Nagios Ganglia – web (grid view) 17/25
  • 73. Maciej Lasyk, Ganglia & Nagios Ganglia – web (cluster view) 17/25
  • 74. Maciej Lasyk, Ganglia & Nagios Ganglia – web (physical view) 17/25
  • 75. Maciej Lasyk, Ganglia & Nagios Ganglia – web (host view) 17/25
  • 76. Maciej Lasyk, Ganglia & Nagios Ganglia – web (compare hosts) 17/25
  • 77. Maciej Lasyk, Ganglia & Nagios Ganglia – web (events) Events have API json based Think – integration with whatever app :) 17/25
  • 78. Maciej Lasyk, Ganglia & Nagios Ganglia – web (dashboards) - Create view -> apply as dashboard - Create dashboard from XML - Generate graphs and add to views 17/25
  • 79. Maciej Lasyk, Ganglia & Nagios Ganglia – web (graphs) 17/25
  • 80. Maciej Lasyk, Ganglia & Nagios Ganglia – metrics - base / extended metrics - own modules - c / c++ - mod_python - spoofing - gmetric - gmetric4j / java - Which to choose? gmetric / python / c/c++? 18/25
  • 81. Maciej Lasyk, Ganglia & Nagios Ganglia – metrics - base / extended metrics 18/25
  • 82. Maciej Lasyk, Ganglia & Nagios Ganglia – metrics - base / extended metrics - own modules 18/25
  • 83. Maciej Lasyk, Ganglia & Nagios Ganglia – metrics - base / extended metrics - own modules - c / c++ 18/25
  • 84. Maciej Lasyk, Ganglia & Nagios Ganglia – metrics - base / extended metrics - own modules - c / c++ - mod_python 18/25
  • 85. Maciej Lasyk, Ganglia & Nagios Ganglia – metrics - base / extended metrics - own modules - c / c++ - mod_python - spoofing 18/25
  • 86. Maciej Lasyk, Ganglia & Nagios Ganglia – metrics - base / extended metrics - own modules - c / c++ - mod_python - spoofing - gmetric - gmetric4j / java 18/25
  • 87. Maciej Lasyk, Ganglia & Nagios Ganglia – metrics - base / extended metrics - own modules - c / c++ - mod_python - spoofing - gmetric - gmetric4j / java - Which to choose? gmetric / python / c/c++? 18/25
  • 88. Maciej Lasyk, Ganglia & Nagios Ganglia and logfiles? ganglia-logtailer - https://bitbucket.org/maplebed/ganglia-logtailer - parser logfiles (realtime) - pushes data to ganglia (via gmetric) - yup – based on specific log formats - yet still – open source so poke around ;) 19/25
  • 89. So... Nagios + Ganglia! Maciej Lasyk, Ganglia & Nagios 3 ways of integration: - ganglia-web/nagios (PHP & bash based) https://github.com/ganglia/ganglia-web - ganglia-nagios-bridge (Python & cron based) https://github.com/ganglia/ganglia-nagios-bridge - check-ganglia-metric (Python) https://github.com/ganglia/ganglia_contrib 20/25
  • 90. Nagios + Ganglia: ganglia-web/nagios Maciej Lasyk, Ganglia & Nagios https://github.com/ganglia/ganglia-web Sending Nagios Data to Ganglia service_perfdata_command Or replace Nagios checks with Ganglia! - Check heartbeat. - Check a single metric on a specific host. - Check multiple metrics on a specific host. - Check multiple metrics across a regex-defined range of hosts 21/25
  • 91. Maciej Lasyk, Ganglia & Nagios Nagios + Ganglia: ganglia-web/nagios Nagios pulls info from Ganglia via HTTP 21/25
  • 92. Maciej Lasyk, Ganglia & Nagios Nagios + Ganglia: ganglia-nagios-bridge - https://github.com/ganglia/ganglia-nagios-bridge - Python script run in e.g. in crontab - pulls data from Ganglia XML via sockets - parses XML - send data to Nagios - Nagios commits only passive checks 22/25
  • 93. Maciej Lasyk, Ganglia & Nagios Nagios + Ganglia: check_ganglia_metric - https://pypi.python.org/pypi/check_ganglia_metric/ - basically Nagios plugin - pulls data from Ganglia XML via sockets - check_ganglia_metric.py --gmetad_host=gmetad-server.example.com --metric_host=host.example.com --metric_name=cpu_idle 23/25
  • 94. Maciej Lasyk, Ganglia & Nagios Nagios + Ganglia Which one integration should I use? 24/25
  • 95. Maciej Lasyk, Ganglia & Nagios Nagios + Ganglia Which one integration should I use? Seriously – try yourself and test 24/25
  • 96. Maciej Lasyk, Ganglia & Nagios Freenode #ganglia https://lists.sourceforge.net/lists/listinfo/ganglia-general 24.5/25
  • 97. sources? Maciej Lasyk, Ganglia & Nagios 25/25 - “Monitoring with Ganglia” book - also nagios.org - and “Web Operations” book - plus some experience ;)
  • 98. Maciej Lasyk 11. Sesja Linuksowa 2014-04-06, Wrocław http://maciek.lasyk.info/sysop maciek@lasyk.info @docent-net Ganglia & Nagios Thank you :) Maciej Lasyk, Ganglia & Nagios 25/25