Icinga 2 v2.5 released

We’ve come a long way with our new release Icinga 2 v2.5. After the 2.4 release in November we’ve focussed on fixing many of the remaining bugs. 2.5 isn’t just a feature release – it includes all the bugfixes from the past months.

 

InfluxDB

vagrant_icinga2_influxdb_grafanaA big thank you to Simon Murray & DataCentred for contributing the new InfluxDB feature! Dive into the documentation details or just try it out yourself. We’ve also added a new Vagrant box “icinga2x-influxdb” just for Icinga 2, InfluxDB and Grafana :)

 

Timeperiod Excludes

Don’t want to be notified during the holidays? Add an on-call exclusion for a specific time period? We’re really happy that Phillipp Dallig contributed the long awaited time period exclusion and inclusion feature to Icinga 2. You’ll also find updated examples for specific time ranges in the documentation.

 

IDO Performance

The IDO database feature now supports an incremental config dump. Future restarts only update what really changed instead of a full config dump. This tremendously decreases the database reconnect time on restart making it 10 times (!) faster than before. We’ve tested this with large scale customer environments (Example with 60k services, 2000 clients, 60k dependencies running in an HA cluster).

2.4.10

[2016-08-17 13:06:03 +0200] information/IdoMysqlConnection: Finished reconnecting to MySQL IDO database in 320.08 second(s).

2.5.0

[2016-08-17 14:31:17 +0200] information/IdoMysqlConnection: Finished reconnecting to MySQL IDO database in 29.4937 second(s).

 

API

There are two new endpoints added which allow you to fetch global variables (/v1/variables) as well as defined template names (/v1/templates). Notification state and type filters can now be specified as string values (e.g. “OK”).

There’s also a new API action /v1/actions/generate-ticket which allows you to fetch the ticket required for client setups with CSR auto-signing. That way your automated setups will work like a breeze – make sure to check the updated documentation bits too.

 

Cluster

We’ve fixed a bug where one faulty client would cause other clients to disconnect. While analysing cluster stability issues we’ve also added more detailed log messages. There is a known issue with messages routing for zones with more than two endpoints – we recommend to only have two endpoints in a zone for now.

Uwe Ebel contributed the API/Cluster configuration attributes for the accept cipher list as well as the minimum TLS version – thanks a lot!

 

Documentation

icinga2_distributed_automation_docker_clientWe weren’t happy with the documentation chapters explaining how the cluster works and how the Icinga 2 client has to be installed. It was complicated and our community channels were literally flooded with questions. We’ve taken the hard road and purged away the old content.

You’ll now find two new chapters inside the documentation:

  • Service Monitoring – a good starting point for plugin integration and more examples based on the numerous CheckCommand definitions (thanks everyone for contributing!).
  • Distributed Monitoring with Master, Satellites and Clients – from roles to zones to setup to configuration modes (“top down” and “bottom up”) to scenarios. All done with real-world examples and newly added images helping you getting your distributed environment going.

The distributed monitoring chapter also explains how to automate your client setup in a Docker client by example. We’ve also made sure to add best practices and advanced hints which we learned from you, our community :)

 

More Release Highlights

  • Debian and RHEL packages for vim/nano syntax highlighting.
  • DateTime type for formatting timestamps in the configuration DSL.
  • Performance improvements for config validation.
  • Many bugfixes for check execution, notifications, downtimes, etc.

 

Changes

When upgrading your distributed environment ensure to upgrade your master and satellite instances to v2.5 first. Clients using v2.4.x may still work, but should be upgraded as well.

An IDO database schema update is required (2.5.0.sql). The categories attribute requires the array notation (deprecation warning is logged).

The icinga2.conf file includes plugins-contrib, manubulon, nscp, windows-plugin by default on new installations. This helps deploying checks even more easy but may collide with your own CheckCommands synced in global zones.

 

Update Icinga 2

Prior to upgrading your production environment you should test the new release in your staging environment as always. Make sure to read the full Changelog. Note: There was a release critical bug in 2.5.0 so we decided to go for a fixed 2.5.1 release.

Updated packages are available soon.

Special thanks to all contributors and testers making Icinga 2 v2.5 great! Simon, Philipp, Uwe, Rune, Tobias, Blerim, Bernd, Eric, Markus, Hannes, Dirk, Matthias, Emanuel … you know who you are :)

Share your Icinga 2 love

What really drives us making Icinga a great monitoring solution is community feedback and appreciation.

One thing which is really really cool – when someone sends you an email and says “Look. Icinga 2 works fine. Awesome work.” – attaching a screenshot with a hell of CPU cores and RAM. I cannot tell you this time which company he’s working for (only that the company is a NETWAYS customer we’ve been working with). This is JUST AWESOME.

icinga2_2.4.10_htop_144_cpu_cores

For reference I’ve also kindly requested a screenshot from the config validation to get an idea about the numbers and time it takes. We’ve also seen customer environments even bigger with 100k services and 6000 Icinga 2 clients. More to come ;)

icinga2_2.4.10_config_validation

Now that we are like “WTF”, “oh. wow.” and “I want that hardware.” – I’d like to ask you to do us a favor :)

No matter how big our small your Icinga 2 environment is – please send us your screenshots of “htop” and “icinga2 daemon -C” (the numbers at the end) to info@icinga.org :-)

Additional performance details are highly appreciated as well.

  • $ time icinga2 daemon -C (compare 2.4.10 and the upcoming 2.5.0)
  • IDO Mysql/Pgsql Reconnect-Logging (compare 2.4.10 and the upcoming 2.5.0)

Icinga 2 doesn’t phone home so our motivation seeing Icinga in production worldwide comes from yours truly – share your Icinga love with us :)

I promise to highlight your stories in the future here. If we’ll meet at an Icinga camp I’ll bring you some of those famous “dragee keksi” too ;)

Analyse Icinga 2 problems using the console & API

Lately we’ve been investigating on a problem with the check scheduler. This resulted in check results being late and wasn’t easy to tackle – whether it’ll be the check scheduler, cluster messages or anything else. We’ve been analysing customer partner environments quite in deep and learned a lot ourselves which we like to share with you here.

One thing you can normally do is to grep the debug log and analyse the problem in deep. It is also possible to query the Icinga 2 API fetching interesting object attributes such as “last_check”, “next_check” or even “last_check_result”.

But what if you want to calculate things for better analysis e.g. fetch the number of all services in a HA cluster node where the check results are late?

The “icinga2 console” CLI command connected to a running Icinga 2 node using the API is key here.

Primarly the icinga2 console allows for testing config expressions but can also be used to fetch all objects. Helped with the Icinga 2 DSL capabilities the console will fire the “execute-script” action towards the Icinga 2 API. Note: Now we are really into programming things here. If you say – hey, I’m not a coder – keep on learning the Icinga 2 DSL. If you require in-depth help with problems, kindly join the community channels and/or ask our partners for professional support.

 

Preparations

Start the “icinga2 console” using the –connect parameter. You can hide the API credentials in your shell environments which is more secure than passing them to the connect string.

$ ICINGA2_API_USERNAME=root ICINGA2_API_PASSWORD=icinga icinga2 console --connect 'https://localhost:5665/'

 

Fetch the last check time from a service

The following example fetches the service object “icinga” for the local NodeName host and its “last_check” attribute. This involves a function call to get_service().

 => get_service(NodeName, "icinga").last_check
1469784497.333508

In case you prefer a readable unix timestamp the upcoming 2.5 release adds the possibility to format the time value as string (DateTime).

 => DateTime(get_service(NodeName, "icinga").last_check).to_string()
"2016-07-29 13:17:57 +0200"

 

Fetch all services and their last check

Fetching all service objects and printing their name and last_check attribute involves a temporary array and a for loop iterating over all service objects. The final “res” call will print its output to the console.

 => var res = []; for (s in get_objects(Service)) { res.add([s.__name, s.last_check]) }; res

 

Fetch all services where the check result is late

Now it is time to apply a filter for the services list retrieved from “get_objects(Service)”. Using versions prior 2.5 can solve this by adding your own custom prototype method to the Array class like this:

Array.prototype.filter = function(p) { var res = []; for (o in this) { if (p(o)) { res.add(o) } }; res }

In case you’re already using Icinga 2 v2.5 you can use the built-in method. Gunnar implemented that method as part of issue #12247. You may also persist this configuration inside the icinga2.conf file – it is just a restart away.

The Array#filter method requires a function callback as parameter. This function is executed and evaluated for each array element returning a boolean value. All elements which match will be inserted into the newly returned array.

Either you’ll take a globally defined function or you’d just define a lambda function inline. The following function passes “s” as parameter and then compares the value of “s.last_check” being less than the current time minus 2 times the value of “s.check_interval”. That way you can easily compare check results being late without any hardcoded offset but normalised on the configured check interval.

s => s.last_check < get_time() - 2 * s.check_interval)

Now let’s just fetch all service object names and the formatted last_check timestamp into the “res” array where the last_check time is greater than our defined check_interval offset. Note: The get_time() function returns the current unix timestamp.

 => var res = []; for (s in get_objects(Service).filter(s => s.last_check < get_time() - 2 * s.check_interval)) { res.add([s.__name, DateTime(s.last_check).to_string()]) }; res
[ [ "10807-host!10807-service", "2016-06-10 15:54:55 +0200" ], [ "mbmif.int.netways.de!disk /", "2016-01-26 16:32:29 +0100" ] ]

If you are not necessarily interested in names but a general count, just use Array#len on the returned “res” array.

 => var res = []; for (s in get_objects(Service).filter(s => s.last_check < get_time() - 2 * s.check_interval)) { res.add([s.__name, DateTime(s.last_check).to_string()]) }; res.len()
2.000000

icinga2_console_connect_example

Count late check results in HA setup

If you are especially interested on the services being checked on a specific HA cluster node the calculation needs some adjustments. A host or service object which is not checked on the current HA endpoint is marked as “paused = true”. Vice versa all scheduled and checked objects are marked as “paused = false”.

The solution is simple based on what you’ve learned above already. Change the result set into a dictionary like this:

var res = {}

The key is extracted from the current service “s” attribute “paused”, the value increments the current value for this key. That way we’ll end up with a dictionary containing “false” and “true” as keys and the number if kate check results for both. If you are asking – why should I care about checked objects with “paused = true”, they are run on the other endpoint in my HA cluster? Simple as it sounds – the check results are replicated from the other node to the local one. If they are not fresh they are either not actively scheduled/executed, or the cluster communication is not fully intact.

 => var res = {}; for (s in get_objects(Service).filter(s => s.last_check < get_time() - 2 * s.check_interval)) { res[s.paused] += 1 }; res
{
	@false = 2.000000
	@true = 1.000000
}

Depending on the HA node where the icinga2 console is connected to, this result should exchange counts between true and false.

 

Check how often parent services are used

This example iterates through all Dependency objects and provides a unique count on all “parent_service_name” definitions. We used that to gather insights with clients using command_endpoint and a possible health check. get_objects() works for any Config Object type. In this example we’re iterating over the Dependency objects and counting how often parent service names occur.

 => var res = {}; for (dep in get_objects(Dependency)) { res[dep.parent_service_name] += 1 }; res
{
    "icinga" = 54297.000000
    "vmware-health" = 5.000000
}

Such an analysis helps to understand whether checks are not executed bound to dependencies or causing additional checks.

 

How many hosts with service checks using Command Endpoints are not connected

This analysis is based on the assumption that a command endpoint check (“Command Execution Bridge”) changes its state to UNKNOWN (3) and puts the string “connected” into its output.

This customer setup consists of a three level cluster – a HA master zone, satellites for specific regions and clients which are checked from the satellite zones using command endpoint.

The idea is to filter by the satellite zones and check how many client endpoints below those satellites are not connected.

  • Iterate over all service objects using get_objects(Service)
  • Check if their state is 3 (UNKNOWN) and the output of the last_check_result matched “connected”
  • Store the matching service host name in the “res” dictionary. Use the service zone as key and append the service name as array element

Now we would have a “res” dictionary with all zones as keys and an array of matching host names for each zone. That array may have duplicates (multiple services not connected for one host).

 => var res = {}; for (s in get_objects(Service)) { if (s.state==3) { if (match("*connected*", s.last_check_result.output)) { res[s.zone] += [s.host_name] } } };  res

Therefore we apply an additional iteration over the “res” dictionary. Note: Array#unique doesn’t exist in versions prior 2.5 but you can build it like this:

Array.prototype.unique = function() { var res = []; for (o in this) { if (o !in res) { res.add(o) } }; res }
  • Iterate over the “res” dictionary and override the current key
  • Make the array elements unique and only store the length

Now we have a “res” dictionary which holds all zones and the number of hosts where one or more services are currently not connected.

 => var res = {}; for (s in get_objects(Service)) { if (s.state==3) { if (match("*connected*", s.last_check_result.output)) { res[s.zone] += [s.host_name] } } };  for (k => v in res) { res[k] = len(v.unique()) }; res
{
	Asia = 31.000000
	Europe = 214.000000
	USA = 207.000000
}

 

Find services which use a command_endpoint from a parent zone

Those checks won’t work due to security restrictions. Though it is tremendously hard to figure out why they are not executed. Gunnar has therefore used the Icinga 2 DSL to implement a function which provides such lookups.

This little helper function is registered globally for later usage. It just extracts the zone for the current requested endpoint name.

globals.zone_for_endpoint = function(endpoint) { for (zone in get_objects(Zone)) { if (endpoint.name in zone.endpoints) { return zone } }; null }

This is the big one which checks against the hierarchy for the used command_endpoint. We’re gonna use that as comparator function callback later on.

  • If there isn’t any command_endpoint attribute set, return false (“hierarchy is valid”)
  • Fetch the check_endpoint object from the given command_endpoint name
  • Fetch the check_zone_name string from the check_endpoint
  • Set the authoritative zone (auth_zone_name) from the checkable’s zone name
  • Iterate over all zones and their parents, starting from the current checkable zone (auth_zone_name)
  • If the command_endpoint’s zone (check_zone_name) matches the hierarchy is valid
  • If not, jump to the next zone level above and check again
globals.is_invalid_hierarchy = function(c) {
  if (!c.command_endpoint) {
    return false
  }
  var check_endpoint = get_object(Endpoint, c.command_endpoint)
  var check_zone_name = zone_for_endpoint(check_endpoint).name
  var auth_zone_name = c.zone
  while (auth_zone_name) {
    var auth_zone_name = get_object(Zone, auth_zone_name).parent
    if (auth_zone_name == check_zone_name) {
      return true
    }
  }
  return false
}

That way this function returns a boolean expression which can be evaluated for all checkable objects. Note: Execute that from a satellite (or any child) zone. The result would now print all service objects and their attributes. Another trick – Array#map takes a lambda function which exchanges each array element (service object) with just the full service name (s.__name). In versions prior to 2.5 you can manually define it like this:

Array.prototype.map = function(m) { var res = []; for (o in this) { res.add(m(o)) }; res }

The result is pretty straight forward:

 => get_objects(Service).filter(s => is_invalid_hierarchy(s)).map(s => s.__name)
[ "icinga-master01.domain.com!proc ntp", "icinga-master02.domain.com!proc ntp" ]

 

Conclusion

While it may look terribly hard to implement and understand – once you’re in the flow you’ll never look back. Let us know which debugging tricks and analysis you’ve already done using the Icinga 2 console & API :)

Note: The icinga2 console is solely used read-only for debugging purposes in these examples. Keep in mind that an “execute-script” action pushed towards a running Icinga 2 is the same as you would operate as “root” on your server. Don’t try to modify or delete things unless you know what you’re doing.

Monthly Snap July – Dev Updates, Events & Social

Sometimes you have so much to do that it is pretty hard to give an update what’s going on. Sounds familiar? Welcome to my world ;-) I’d like to try a new format of information updates on a monthly basis – idea kindly borrowed from my employer’s blog.

These details should inform you what’s cool, going on, cooking, and coming up. Please let us know what you think about it!

Development Updates

 

Icinga 2

icinga_director_meme

Kudos to Christian Stankowic

We’ve debugged Icinga 2 in the past month quite in deep at several customer environments. There are plenty of fixes coming with the next 2.5 major release. This includes a bug fix for command endpoint message routing, client disconnects when another clients fails (“not signed by CA”) and numerous other bug fixes. One thing which came up – if you have more than two endpoints in one zone, there is a known bug with check result messages. It is currently advised to only have two endpoints until we investigate further on the issue.

Last week we’ve fixed a bug in the check scheduler (one of those release critical issues). Right now we are heavily investigating on a possible IDO deadlock and further notification issues. Once these are fixed we’ll happily continue testing Icinga 2 and make v2.5 a stable release you can count on. Our plans target August as release month. Get your hands dirty and help test the snapshot packages!

Icinga Web 2

Not much to say this time about the web framework. There is some consolidation work going on with official modules under the hood. The Icinga Director is under heavy development as always. Follow its development closely on Github and the issue tracker.

Icinga Exchange and Accounts

Under the hood the developers are working on bug fixes and integrating new features such as syncing git tags and releases or fixing the tag search. An upgrade of the live system is expected soon.

Upcoming events

If you are an early bird – there is a 25% discount for Icinga Camp Berlin 2017 waiting for you. We’re also looking for speakers and community members at our Icinga Camps in

Team Icinga will also attend OSMC in late November accompanied by lots of cool talks related to monitoring and Icinga.

Social

Community members are active everywhere. We do see a lot of questions asked over at monitoring-portal.org. Chime in and lend us a hand with sharing your knowledge! Thanks in advance :)

 

The Icinga 2 book (currently German only) gets a lot of nice feedback. The two authors Lennart and Thomas told us that they are in contact with the publishers to create an English version as well. For those waiting for an ebook – now available (again, German only).

 

Moving from Nagios to Icinga 2 – a journey worth a look? Even if you don’t speak German you should definitely follow her blog posts. So much good feedback – and Marianne is also actively reporting issues. Thanks a lot for your appreciation!

 

Jens is actively migrating the current Icinga 1.x environment to Icinga 2 at Müller. Most recently he discovered the possibilities of the Icinga Director.

 

The upcoming Nagstamon 2.0 release features Icinga Web 2 support. Kindly test and give the developer feedback!

 

Icinga Web 2 and also the Icinga Director are a result of many discussions and plenty of hours of development. We are proud that our users love it :)

 

Thomas (the author of the Icinga 2 book) was working on a Logstash check plugin for the new stats API available with 5.0. He didn’t realize that Jordan Sissel himself sent in a patch ;)

Last but not least – a Grafana Dashboard using the Icinga 2 API. What the heck? ;-)

 

Oh and Blerim is now officially part of our team. You’ll hear more from him in the next months :)

Monitoring MySQL database size

Our community support channels provide interesting insights into how things are being monitored in various environments. Sometimes it is not only about finding the right configuration syntax or fiddling with the perfect cluster setup. This time I’d like to share a solution for a common problem that I discovered while I was helping another Icinga user. :-)

Monitor the size of a database

Sounds easy if you are familiar with MySQL and the common check plugins – putting it all together might get complicated, especially for beginners.

Luckily, the question already provided a sample SQL query for fetching the database size:

MariaDB [(none)]> select sum(data_length + index_length) / 1024 / 1024 as "db size" from information_schema.tables where table_schema = 'icinga';
+-------------+
| db size     |
+-------------+
| 31.09375000 |
+-------------+
1 row in set (0.01 sec)

Two questions arise:

  • Is there a plugin which automatically checks the database size from a given parameter?
  • Alternatively, can I just run this query and compare the returned integer value in MB?

Find a plugin and integrate it

There’s a basic check_mysql plugin that is part of the monitoring plugins project. Additionally,  check_mysql_health has proven itself in many environments: it offers fast and easy monitoring. This plugin is also part of the Icinga training sessions demonstrating its power.

Once you’ve successfully installed it into the default Icinga 2 PluginDir (I’m skipping detailed installation instructions here), let’s go for a CheckCommand definition. The Icinga 2 Template Library (ITL) already provides such a definition inside the contributed plugins section. Include the plugins in the file /etc/icinga2/icinga2.conf:

include <plugins-contrib>

Also, make sure to define the constant PluginContribDir in the file /etc/icinga2/constants.conf:

const PluginContribDir = "/usr/lib64/nagios/plugins"

Now it is time to read about the required parameters in the documentation. We’ll need that information for setting the appropriate custom attributes later.

Create a Host and Service Apply Rule

mysql_health_db-size_icingaweb2One thing I’m always keen on: use a custom attribute dictionary on the host and allow to pass as many custom parameters to the service objects as possible. Combine this with an apply for rule and use the possibilities of the Icinga 2 DSL.

In order to use the mysql_health CheckCommand we’ll need to delegate at least the following custom attributes:

  • mysql_health_hostname: Defaults to the host’s address attribute (optional).
  • mysql_health_username: MySQL database user with the appropriate permissions for the information_schema database here.
  • mysql_health_password: MySQL database user password.
  • mysql_health_mode: “sql”, since we want to run a generic SQL query here.
  • mysql_health_name: SQL query string we want to execute; ensure that it returns a single number/count.
  • mysql_health_name2: In combination with the “sql” mode this sets the performance data label/output prefix.
  • mysql_health_units: The default calculation uses MB, so we’ll tell the plugin to use it as performance data unit.
  • mysql_health_warning: Warning threshold in MB.
  • mysql_health_critical: Critical threshold in MB.

This is a long list but once you’ve carefully read the plugin documentation and tested the various parameters it will become more clear.

Let’s construct an apply for rule generating services based on the host custom attribute databases (this is a dictionary/hash with the database name as key and multiple parameters, e.g. for thresholds).

apply Service "db-size-" for (db_name => config in host.vars.databases) {

Define the intervals and include the mysql_health check command:

  check_interval = 1m
  retry_interval = 30s

  check_command = "mysql_health"

Check whether the host dictionary provides additional configuration (such as different database username or password) and set a default. In this example the root password has access to information_schema.

  if (config.mysql_health_username) {
    vars.mysql_health_username = config.mysql_health_username
  } else {
    vars.mysql_health_username = "root"
  }
  if (config.mysql_health_password) {
    vars.mysql_health_password = config.mysql_health_password
  } else {
    vars.mysql_health_password = "icingar0xx"
  }

Now specify the sql mode and build the query. Cool thing – the query is based on the current database name that we are generating a service object for. That way we don’t need two apply rules for the databases icinga and icingaweb2 later on.

  vars.mysql_health_mode = "sql"
  vars.mysql_health_name = "select sum(data_length + index_length) / 1024 / 1024 from information_schema.tables where table_schema = '" + db_name + "';"
  vars.mysql_health_name2 = "db_size"
  vars.mysql_health_units = "MB"

Optionally, inherit warning and critical thresholds defined in the host dictionary databases. Its value is mapped into the config dictionary in the local service apply for scope. Inherit additional parameters into the service custom attributes in vars.

  if (config.mysql_health_warning) {
    vars.mysql_health_warning = config.mysql_health_warning
  }
  if (config.mysql_health_critical) {
    vars.mysql_health_critical = config.mysql_health_critical
  }

  vars += config

Question for the reader: How should the host object look like in order to generate services? :-)

The answer is simple – based on existing examples and the documentation, it is pretty straight forward:

object Host "icingamaster" {
  address = "127.0.0.1"
  check_command = "hostalive"

  /* database checks */
  vars.databases["icinga"] = {
    mysql_health_warning = 4096 //MB
    mysql_health_critical = 8192 //MB
  }
  vars.databases["icingaweb2"] = {
    mysql_health_warning = 4096 //MB
    mysql_health_critical = 8192 //MB
  }
}

Voilà – validate your configuration and reload the Icinga 2 service.

Since this is a real world example, I’ve also integrated it into the icinga2x Vagrant box. :-)

Conclusion

mysql_health_db-size_grafanaWhile it is not always clear which plugin is the best, it’s always worth looking into the existing ITL CheckCommand definitions. Maybe there already is one which also provides the perfect answer to your questions. If not, hop onto Icinga Exchange and submit the newly created CheckCommand definition to the upstream. :-)

check_mysql_health provides many possibilities to monitor your databases (local or remote) and is fairly easy to setup. Once you gather the required monitoring metrics, e.g. by manually executing the plugin or querying your database, it is all of the same (CheckCommand, host, and service configuration).

Once the Icinga 2 configuration validation returns OK, reload the daemon and enjoy fancy monitoring in Icinga Web 2 and Graphs in your preferred metrics dashboard.

 

 

Happy SysAdminDay!

Being a sysadmin involves a lot of things. Modern job descriptions do not necessarily define that as a “sysadmin” but “devops” or “manager of everything”. I’ve been a sysadmin myself in my previous job at the University of Vienna and sometimes getting back to things when managing the Icinga server infrastructure as well. Does it count to install and deploy stuff with Vagrant and Puppet? ;-)

No matter how you define your job title – the most important thing is that you love what you do, filled with passion solving things. Even if they’re not exactly visible to others (sometimes only on failure). We’re happy that Icinga is able to make your job a bit tad more easy, monitoring all the things telling you that you are doing an awesome work – “everything’s green” :-)

devotion_to_duty

Today is Friday – keep things relaxed, don’t force maintenance tasks and enjoy your day :)

Classic comic from xkcd #705

Watch out – and shop!

Most of us use Icinga 2 to monitor network services, host resources or server components – but wait, there is more! Why not ask Icinga 2 to watch items in your favourite online shop and send out notifications as soon as the price has dropped below a certain amount? Antony Stone has written a slightly unusual check to track the price of Amazon items with Icinga 2. Here is how it works:

  1. The monitoring plugin is a simple shell script (check_amazon) that accepts the Amazon product ID (Amazon Standard Identification Number, ASIN) as a mandatory parameter.
  2. Next, it uses the text-based web browser lynx to fetch the Amazon page for that item. Somewhere in the lynx-output is the information about the full name and the price of the product.
  3. Various grep commands filter the output, cut and tr remove sections and characters. As a result, the script prints the name of the item and the current price:terminal window with the check_amazon plugin output
  4. The check_amazon plugin accepts two optional parameters: -w for the warning threshold (= the price is near to what you’d like to pay for it) and -c for the critical threshold (= the price is now below the maximum price you’re prepared to pay).

We’ve copied the plugin to /opt/monitoring/plugins and made it executable (chmod +x check_amazon). In the file /etc/icinga2/constants.conf we’ve created a new global constant CustomPluginDir:

...
/* My own check plugins live in /opt/monitoring/plugins: */
const CustomPluginDir = "/opt/monitoring/plugins"
...

The new configuration file /etc/icinga2/conf.d/amazon.conf defines a host group for all Amazon items, so they all get the same check. It also contains various host objects for all the items we want to monitor. They have no address, since it’s not possible to ping Amazon items. Icinga 2 insists on having a check_command, so we’re using dummy which does nothing. For the host objects we’ve also defined a few more variables for the alert thresholds:

object HostGroup "Amazon" {
  display_name = "Amazon Items to Watch"
}

object Host "Bottle-o-gin" {
        check_command   = "dummy"
        groups          += [ "Amazon" ]
        vars.asin       = "B00G3Z92CI"
        vars.price1     = 20.00
        vars.price2     = 15.00
}

object Host "Tonic" {
        check_command   = "dummy"
        groups          += [ "Amazon" ]
        vars.asin       = "B00H2WX11E"
        vars.price1     = 25.00
        vars.price2     = 20.00
}

...

We also need to tell Icinga 2 how to run the plugin, so we’ve defined the CheckCommand like this:

object CheckCommand "check_amazon" {
        import          "plugin-check-command"
        command         = [ CustomPluginDir + "/check_amazon" ]
        arguments       = {
                "-p"    = "$host.vars.asin$"
                "-w"    = "$host.vars.price1$"
                "-c"    = "$host.vars.price2$"
        }
}

Last, we define the check to run for all items in the group. We’ve applied a standard service definition (which is then linked to a command) to anything in the Amazon group:

apply Service "Price" {
        import                  "generic-service"
        assign where            "Amazon" in host.groups
        check_command           = "check_amazon"
        check_interval          = 24h
        retry_interval          = 1h
        max_check_attempts      = 3
}

That’s it – we’re now able to monitor the hosts (the Amazon items) and their services (the price). Got any suggestions or ideas for improvement? Drop us an email or get in touch via GitHub, Twitter, Facebook, or Google+!

Firefox showing Icinga Web 2

Icinga in Amsterdam – Icinga Camp and DevOpsDays

Icinga Camp Amsterdam

vagrant_icinga2_dashingWe had a lovely time in Amsterdam meeting great community members on our Icinga Camp. Generously hosted at LeaseWeb and sponsored by OlinData, NETWAYS and Inuits we had a lot of interesting talks and discussions. Pretty much really a full day of monitoring madness connecting with the community. We also had the latest and greatest releases and ideas to share with us.

Getting started with Monitoring, going further for automation with Puppet and Ansible – and last but not least a closer look into the possibilities with the Icinga 2 REST API. We’re also proud to have a revamped Icinga 2 Dashing dashboard available for you after the talk.

grafana_net_icinga2_dashboardWe’ve also had Blerim doing a step by step introduction into performance data on the way down to metrics in Graphite and Grafana correlated with additional monitoring information (thresholds, metadata such as downtimes). Blerim further announced our very own Icinga 2 Grafana dashboard available on Grafana.net.

Tom did hack on the Icinga Director before giving an interesting talk into all the details – last commit 15 minutes before the talk ;-) And we even have a fine Icinga Director 1.1.0 release now.

Christian presented the new Powershell toolstack for automating an Icinga 2 client install on Windows. Brain overflow included we enjoyed a lovely BBQ dinner sponsored by Olindata.

You can find all slides, videos and pictures from Icinga Camp Amsterdam in the event archive and our Youtube channel.

 

DevOpsDays Amsterdam

IMG_3420We did give an Icinga 2 workshop on the first day at DevOpsDays Amsterdam based on the official Icinga 2 training material. Luckily we were also invited to the speaker’s dinner on a boat driving through the Amsterdam docks while enjoying food, beer and good music.

The official schedule in the next two upcoming days unveiled interesting #devops topics on not only automation and containers but also important topics in tech. Erica Baker from Slack kicking off DevOpsDays Amsterdam with the keynote on diversity and inclusion in tech. After lunch Avishai Ish-Shalom started with mathematics which turned into reliability and scaling pretty fast – the talk was interesting and entertaining at best :)

IMG_0034As usual the afternoon was reserved for ignites and then open spaces. Blerim suggested to discuss container monitoring in #devops environments in the open spaces gaining really much attraction and generating lots of interesting ideas and todos.

Last but not least the final day of DevOpsDays Amsterdam on Friday provided interesting talks and the ignite session. Bernd talked about Open source communities in an entertaining (“Chaos. German Style.”) but also to-the-point way. We really enjoyed the friendly atmosphere in Amsterdam. Especially the Twitter wall and of course the nice sketch-up drawings. We’ll be back for sure! :)

PS: DevOpsDays organisers are Lego addicted, the goodie bag contained a nice special mini-figure wearing the event’s t-shirt :)

 

Icinga Director 1.1.0 is here!

We are glad to announce that only three months after 1.0.0 we tagged our next Icinga Director release: 1.1.0. It comes with a lot of new features and many little bugs have been squashed. Needless to say, so far this is with no doubt the best Icinga Director release ever. Apply schema migrations

Our Kickstart helper has been improved, and so where our schema migrations. You’ll notice this as soon as you drop the new Director version to your Icinga Web 2 modules folder. Working with custom variables makes much more fun, and you are now able to revert most if not all of your changes with a single click. Convinced? Then you’re ready for an Upgrade!

History matters

Our history-related improvements do not finish here. The Activity Log got more functionality and looks better than ever before. We track a history of all import and synchronization actions and are now able to track those to their corresponding single object modifications. You can also compare full configs in a very comfortable way. As a whole, file per file, before deploying them and also months after doing so.

Director Dashboard

Dynamic applied configuration

It is now possible to use the powerful Icinga 2 Apply Rules to assign services and notifications based on other host or service properties. Work is going on for applied dependencies and groups, please expect them to also be available soon. You’ll learn to love some hidden features like services added to host templates being rendered as apply rules in an automagic way.

Notifications are ready for daily use

Notifications where incomplete with Director 1.0. Right now, all components required to deploy notifications are available. ENV-support for commands is still missing, but it’s pretty easy to work around this by passing parameters to your notification commands.

Automation is King

Job definitionA lot of effort has been invested into automating your Director Workflow. Import data from various data sources, trigger syncronization and immediately deploy eventual changes to your Icinga masters? All of this and much more can be configured on the web while running all those actions as a background service. Want this to happen only during office hours? Sure, why not. Configuration gives you very granular control over how automated Jobs should behave. Our purge mechanism now only purges objects that vanished at your external data source, so no more risk when using multiple data sources even when combining them with manual configuration.

Simplified Icinga Agent handling

Windows Agent InstructionsDirector aims to reduce the effort involved when dealing with Icinga 2 Agent configuration. Quite some work took place to get this simplified even more. We had a lot of input regarding this, which helped to get a better understanding of how you are using Icinga in your daily work. And this is, what we want to target with our efforts. We want to ease your daily tasks and to lower the entry barrier for beginners. Want to kickstart an Icinga Agent on Windows? For every Agent Director provides a dedicated Powershell script for download that takes care of certificate signing and initial configuration.

Hidden gems

Synchronization stateMany improvements targeted the UI, to make Director look better and easier to uses. But a lot of work went on behind the scenes. We restructured some important parts of the code for better testability. Our CLI learned new tricks, same goes for the REST API. Data Types work behind the scenes when you define Fields for Custom Variables. Want to define Booleans or Arrays? Feel free to do so!

Many thanks to our fantastic community!

I’d like to thank all of you for your bug reports and feature requests. Your feedback teaches us more about how you are using Icinga in your productional environments. This is highly appreciated, as this is what helps us to grow and to make Icinga and it’s addons better with every single release.

Buch! Buch! I wer’ narrisch!*

It’s done. There is a book about Icinga 2 available as hardcopy (right now) and e-book (coming soon). icinga2_coverThere is only one minor drawback: It’s entirely in German. Since Lennart and me are German native-speakers, we decided it would be best to write the book in German, too. (In fact I’m Austrian, but Austrian is so similar to German that linguists still argue if it’s the same language or not)

But don’t you despair, if you can’t read German and don’t have the time to learn how to. We are still negotiating whether there will be an English version of the book or not. We will keep you updated.

If you are able to read German, here’s what is in the book:

  • Installation of Icinga 2 including the ClassicUI and Icinga Web 2
  • Thorough introduction to the Icinga 2 configuration language
  • Monitoring with the Icinga 2 agent, NRPE and SSH
  • Notifications
  • Distributed monitoring with satellites
  • Best Practices about how to plan and implement a monitoring setup
  • 100 pages about monitoring plugins
  • Graphing with Graphite and PNP
  • Logmanagement with Logstash
  • Business Processes
  • Reporting
  • Appendices with lots of extra code examples

The whole book uses a fictional company for examples which are backed with actual code. The examples were not only tested in a test environment but some are taken out of actual setups from customers (We are  Icinga consultants for a living)

So, if you want to order one (or more) book you can do so via Amazon, directly from the publisher or many other sources. E-books are still in the pipeline but will be available via most stores, too. If you’re not sure if you want to buy one, the publisher has some samples on the website.

* The headline means “Book! Book! I’m going crazy!” in Austrian. It’s a reference to a commentary to the soccer game Germany-vs-Austria Córdoba 1978 (just replace Book with Tor, which means Goal) which is even famous outside of the soccer fanbase. In fact I don’t like soccer at all but I thought it would be a nice headline.

For your reference: