Author Archive for Michael Friedrich

Released: Icinga 1.0.3 & Icinga Web 1.0.3

Icinga reaches the next level of open source monitoring – releasing Icinga 1.0.3 and Icinga Web 1.0.3 to the world!

While Icinga Core unifies the Classic UI, IDOUtils and API in 1.0.3, Icinga Web steps from 1.0.1 directly to 1.0.3 unstable, preparing for a unified release version in October.

Several new config options have been added to Icinga Core, next to reworking check execution with execvp. We’ve also fixed several bugs, e.g. wrong service alerts in the logs and persistent comments disappearing after restart.

Icinga IDOUtils now provide extended syslog output while fixing major NULL binding errors in Oracle and enhancing column length for MySQL/Postgresql/Oracle.

We have also been working on an outstanding new feature for the Classic UI: Multiple Host/Service selection in the status views, sending commands to the Core.
Next to that, we have added a pause/continue page update button, the possibility to show only HARD states in the tactical overview and optional long_output in the status pages.

Icinga API now provides unit tests and its own debug log, next to our own oci8 implementation instead of pdo_oci. Bugs and more Queries have been added too.

Icinga Web features a new tactical overview including an underlaying template engine. Sending commands is now possible to specific instances only or doing a broadcast. A session expiry watchdog has been added such as http basic auth.
Several IE and other bugs have been fixed, code quality has been improved and configure now allows you to set the API credentials e.g. for IDOUtils DB directly. Watch out for our cool Icinga throbber after login!

Checkout Changelog or What’s New section in the docs for more information!

Please report feedback and/or bugs to our development tracker, the mailinglists and the Icinga Portal! :-)

Enjoy Icinga 1.0.3 and stay tuned for more to come! =)

  • Share/Bookmark

News from Core, CGIs & IDOUtils – Part III

Now that you have read about IDOUtils and the CGIs, it is time for the big one :-)

Icinga Core

All changes, fixes and enhancements do not affect compatibility to Nagios ™ – you’ll just get more fixes and enhancements if you decide to move over to Icinga.

The list of fixes and code improvements is rather long thanks to Andreas Ericsson who is working on his own Nagios ™ development branches. All those recent commits have been reworked into Icinga Core (if not already done). There were some nifty patches making developers life more easy and the source code a bit more readable and reusable.

Furthermore protection against typos in macro names has been added next to missing  NOTIFICATIONISESCALATED macro.  Performance data files are now closed correctly and the pipes are also set properly on configuration re-read.

SIGSEGV in checks on Solaris have been fixed thanks to Thorsten Huebler. There are also some other fixes for Solaris which are currently in development (thanks Alexander Skwar).

The fix by Ton Voon for choosing next valid time on day of DST change when clocks go one hour backwards is also in 1.0.2.

Next to that Ton Voon provided the in sync retention facility on the core by Opsera Ltd which has been reworked into Icinga – we think this might be useful.  Also, there was a Nagios ™ patch for adding new is_volatile setting of 2 for services, which respects the re-notification interval for notifications which also can be found applied and tested in Icinga Core.

There was a bug removing comments – now it is fixed and removing one comment will not remove all of them.

Scheduling a downtime for all services and the host now works as expected. Also custom notifications are not sent anymore during downtimes (thanks Sven Nierlein).  notification_period inheritance for services has been fixed using a patch by Gordon Messmer.

Notifications not being sent out when scheduled downtime is cancelled is also fixed next to the fix for first notification delay being calculated incorrectly causing notifications potentially going out early.

The initscript has been slightly reworked in order to show config errors as an own option. Furthermore the output is saved into a file which will can be looked up after a normal start. The initscript also does not remove the pid file anymore if Icinga did not stop in a timely manner. If a lockfile without running PID is found during startup, it will be removed instead of bailing out.

Starting the Core now throws an error if contactgroups are not matching. This happens now too if a service description is missing on a service object definition (if defined in used template there won’t be an error!).

Servicechecks with timeperiods containing ‘exclude’ directives are now correctly re-scheduled – this is noted in Nagios ™ Changelog for 3.4 and will be fixed in Icinga 1.0.2.

Steven D. Morrey implemented a patch for an extended scheduling queue which has been slightly reworked and improved for Icinga. The -S option functions much like -s but will dump the entire scheduling queue is it would run, in addition to providing the summary data.

Steven also created another patch long time ago – adding an event profiling option for stats of event counts and time taken for events. We integrated that as a config option in icinga.cfg and took the chance to add those stats to the current CGIs in ‘Performance Info’ – in case the option is enabled of course.

We finally implemented the state-based escalation ranges feature by Mark Gius: “The directives first_notification and last_notification apply to the total count of notifications on a particular service or host. It is sometimes desirable to escalate after the Nth critical notification, rather than after a total number of N notifications have been sent.”

Max Schubert’s patch to add enhanced diagnostic output when a regular expression fails to compile also has been added to Icinga.

There have been questions about another syslog facility – Icinga can now send log messages to syslogd using a local facility instead of the default one.  If enabled you can chose between 0 to 7.

Currently Icinga uses popen and system to run active check commands with shell intepretation. If using execv instead so there won’t be no shell expansion required. This means that 1 less process (sh) is required to execute an active check, which should give a performance improvement. When running the active check, check if there are any shell metacharacters. If there are, fallback to the shell invocation. Otherwise use the new execv method.

We had a speedup of parsing status.dat a while ago, now Matthieu Kermagoret provided another patch for minimizing loading time of the retention file. From his reports, they used  a standard setup with 1500 hosts, 19000 services and around 80 000 comments – before the restart took 20 minutes. Having the patch applied, only 2.6 seconds (!).

Icinga Core, CGIs & IDOUtils fit perfectly together with Docs, the API and the new Web. Please help us test for the upcoming release on 30.6.2010 (counter is GMT+1) and report issues !!! :-)

Interested in Icinga development and (re-)implement features and resolve performance issues? – Then please get in touch:

* Mailinglists

* IRC: irc.freenode.net #icinga #icinga-devel

* Icinga Portal

* Twitter

  • Share/Bookmark

News from Core, CGIs & IDOUtils – Part II

Part II of this series catches up on our work on the CGIs – what happened with them since 1.0.1?

Icinga CGIs

Next to the new Icinga web there was some space to fix and enhance the current classical UI (“the CGIs”).

Some minor typo fixes reported by community users have been applied, missing js files have been added and the check_daemon_running function has been modified in order to work with MacOSX again.

The quick search has been added again next to the live search (which is now called “extended search”). During a research on older patches it came up that if a user is authorized for a host all service authorizations views are derived from that. If you don’t want that you can now modify show_all_services_host_is_authorized_for in cgi.cfg to 0 (only if the user is not globally allowed to view all services).

The docs mentioned that display_name on host and service definition would fulfill another displayed name on the classical UI. This is now available exclusively to Icinga in 1.0.2 – if you don’t set display_name, the default host_name/service_description will be used instead.

Thanks to Jochen Bern from LINworks GmbH the CGIs now allow adding multiple urls for notes|action_url on host|service object definition – if you ever needed more of them (like me) :-)

Stay tuned for Part III – it will catch up on Icinga Core – and a lot of things to talk about =)

  • Share/Bookmark

News from Core, CGIs & IDOUtils – Part I

Hi there,

it’s been a while since recent release of 1.0.1 in March. Quite a lot of things happened – Hiren Patel and Massimo Forni joined the Core developer team while Hendrik moved on to new projects. But not only refreshening the team makes Icinga Core, CGIs & IDOUtils more valuable this time.

Regarding the GIT commit history and the issue roadmap for 1.0.2 you can imagine the evolution – but this is just an historical listing and does only show basic “who did commit and fix/create what on which date”.

Today Icinga will be “feature frozen” and is up for testing – we need testers for the upcoming Icinga release !!! Guides are available within our development wiki :-)

What exactly happened since 1.0.1?

Many people were asking what exactly changed in Icinga on the core side – in an easy and readable way. So let’s try it here in 3 Parts :-)

The changes and enhancements will be split into the Core itsself, the CGIs and the IDOUtils – all of them more or less historically summarized.

Part I starts with …

Icinga IDOUtils

There were some bugs, one major causing data inconsistency but also some enhancements regarding usage and performance.

The current database schema implies a centralized view on the objects table on which all relations are built and joined. During startup of Icinga Core normally old configs get deleted and existing objects marked as inactive. After that, the new config is being checked against those objects and if none found, a new one inserted. This is the expected behavior but a bug leading from the libdbi rewrite caused this check to fail and always inserting a new object. This caused an explosion of the objects table and decreasing overall performance on select/update roundtrips.

Thanks to William Preston the source is fixed, and the remaining data inconsistency with active and inactive objects related to historical checks in the RDBMS also has been fixed. Within the docs you will find a more detailed description and upcoming 1.0.2 will include upgrade SQL scripts in order to keep your database consistency!

Next to that, the string escaping has been modified again not to provoke any more errors. Some RDBMS specific fixes on wrong datatypes were added to.

The source has now completely been rewritten (s/ndo/ido/) and in order to keep everything clean, the core neb api now provides an Icinga specific object version which is used in IDOMOD 1.0.2. The old Nagios ™ one has been kept for compatibility. This implies upgrading both, Core and IDOUtils in 1.0.2.

Another performance issue on MySQL – the binary selects were a nice idea but resulting in major memory and performance problems. Just for getting case-sensitive compare this can be resolved defining the correct collation on the affected columns – thanks again William Preston.

The internal linked hash list for objects has been extended in order to minimize objects selects. This increases overall performance a bit – thanks Opsera Ltd for their Altinity patch.

Some SELECT queries asked for all columns instead of just the primary key if they were just checking for an existing row. Altering this minimizes overall unused RDBMS traffic.

IDO2DB now writes to syslog if it fails to connect to the RDBMS or if the database schema cannot be accessed – and not just quitting without error.

The IDO2DB initscript has been rewritten not to depend only on the lockfile (just like Icinga Core) and if the startup fails this will be shown too, also removing the lockfile.

Jan Drogi (ja5kier on irc.freenode.net #icinga) was asking about persistent configuration during a core restart where IDOUtils clean the config by default – e.g. to keep custom variables relations. Therefore 2 new config options for ido2db.cfg have been added: clean_realtime_tables_on_core_startup and clean_config_tables_on_core_startup. If set to 0 no startup cleaning will be performed.

Stay tuned fo the second part of this series! Meanwhile keep on testing :-)

  • Share/Bookmark

Icinga reaches Debian

It’s been a while since Christoph Maser joined Team Icinga sharing his knowledge about creating RPMs. Those packages can be found in RPMForge :-)

There were a lot of questions about getting Debian packages for Icinga and finally, we are happy to welcome Alexander Wirt onto Team Icinga!

He is Debian packager for Nagios and now Icinga and did a really great job to bring fresh Icinga Debian packages to the upstream.

Currently, Debian lenny, sid/squeeze and Ubuntu Karmic are supported. They can be found here – please check it out and tell us about it!

Take a look at README.Debian after installing IDOUtils and make sure to enable the Event Broker Module in icinga.cfg – patching configs during package install is against packaging policy. But again, we are already working on a satisfying solution for that – check #162.

Icinga’s journey is not ending – are you working on BSD ports or any other applicable operating systems repository? Then please contact us and we will make sure to enlighten the path together for Icinga :-)

Update 2010-04-09: Icinga got accepted in Debian sidhttp://packages.debian.org/sid/icinga – fire up apt and enjoy =)

  • Share/Bookmark

Icinga IDOUtils – More Improvements Part III

One last shot this time for upcoming Icinga 1.0.1 and IDOUtils:

After getting several core patches into the master and also fixing duplicated service/hoststatus updates being sent to the neb module (thanks to Matthieu Kermagoret) there will be more improvements for IDOUtils.

Since the threaded housekeeper is doing fine, it is possible to periodically clean more tables. By popular demand, the following options have been added to ido2db.cfg

They can be used for your likings, by default they are not set.

If you want to help us test for the upcoming release, you are very welcome to do so!

To help you with GIT, we now have a quite detailed tutorial how to use GIT based on Icinga in our Developer Wiki =)

  • Share/Bookmark

Icinga Core – More Enhancements

First of all – many thanks to Vitali Voroth and DECOIT GmbH and also Bill McGonigle for providing such great stuff and improving Icinga.

So what it’s all about?

As you might know, we are “monitoring” the Nagios world too and recently on the developer mailing list, an interesting patch popped up:

Currently the Icinga core sets state to CRITICAL if a service check times out. This is the default and can only be changed by recompiling the code. For several reasons you might want to define that yourself – and also, what does CRITICAL mean in this context? If the load on the monitoring box is too high, a service check may generate a timeout, not only a connection loss or similar.

We’ve been asking Bill McGonigle if we can take his patch for Icinga (it’s not applied in current Nagios CVS where it was built against), test it and in case apply it to give it back to the community. It’s a great idea to add the service_check_timeout_state to icinga.cfg and let the user decide upon his demands what state will be set in case of emergency. Bill suggested a new approach for Icinga too – changing the default state from CRITICAL to UNKNOWN. We think this is a great idea and so will it be in upcoming Icinga 1.0.1 :-)

That’s not all, folks …

Vitali Voroth on behalf of DECOIT GmbH sent a rather huge and exclusive improvement for Icinga core: escalation conditions.

Better to describe with an excerpt of the docs:

Using a patch it is now possible to define an escalation_condition (similar to escalation_options [w,u,c,r]). An escalation with a defined condition will only be escalated if the current state of a particular host/service fits the condition. One possible example of use for this could be the following scenario:

Think of two different escalations for the same service foo. One of them should only escalate when service bar is OK, the other should escalate if bar is CRITICAL or WARNING. Now think about foo being the main service offered by a company and the admin has to react immediately if it is down. bar could be a service indicating if the admin is in the office or at home and the escalation would react as following:

* If the admin is in the office, send an email first, after 5 minutes send an SMS
* If the admin is at home, send an SMS first and after 30 minutes a second SMS to the admin and the head of department

A really nice patch and Team Icinga is very happy about this core related enhancement! :-)

And as you will expect – Icinga Core provides the enhancements, while the documentation will be updated too for Icinga 1.0.1 =)

You want more?

If YOU ever wanted your ideas and patches within Nagios/Icinga, do not hesitate to contact us. And even if you want to contribute and develop Icinga, you are very welcome to do so!

Spread the word and show love for Icinga :-)

  • Share/Bookmark

Icinga chose ocilib as Oracle db layer

Just to let you know:

Based on my finished Oracle implementation and the last blog post, I’ve dropped Vincent Rogier, developer of ocilib a few lines about my work experience with ocilib.

http://orclib.sourceforge.net/2010/02/icinga-chose-ocilib-as-oracle-db-layer/

This small diary entry describes the way how Icinga and ocilib happened :-)

  • Share/Bookmark

Icinga IDOUtils – More Improvements Part II

As mentioned in the last post, there are other improvements for Icinga and IDOUtils.

This time, I want to give you a deeper look onto database performance and the housekeeping stuff.

As you might know, selecting, updating or even deleting a row from a table heavily depends on the row count. If table size grows bigger e.g. in the historical tables from IDOUtils, those queries will be slower and hold back the main process. Current approach of IDOUtils is one forked child of ido2db for one idomod connection – working sequentially on the gotten data.

So even one select taking longer will slow down the data processing and worst case the socket will get blocking and idomod complains about writing to data sink.

But how to resolve those issues?

First of all there were several approaches originally found in mysql-mods.sql – setting indexes on table columns which are being used within the WHERE clause. Regarding the fact that ido2db is not just an insert application, but also deletes historical data on demand (table trimming options), selects objects for caching and furthermore updates existing rows (service/hoststatus e.g.) we decided to apply most useful indexes on the table creation statements. It does slow down an insert a bit, but the overall benefit is much bigger than that :-)

Also the upcoming Icinga Web benefits from that – e.g. the logentries tables select performs a lot faster when using the API and a RDBMS.

But that’s not all – indexes are only one approach of improvement. In the last few months, Hendrik, Christoph and myself discussed a lot about the periodic housekeeping. The basic approach was to remove housekeeping function from the main data processing. Simply because historical deletes on large tables will take even longer and prevent new data being written to the database.

There have been discussions about a cronjob and seperated forked processes for housekeeping, but we wanted something within ido2db and simple to use. So Hendrik came up with the idea to create an own thread within each ido2db child which runs completely seperated from the main data processing flow – the so-called threaded housekeeper.

The thread just waits for the appropriate instance getting connected and then performs the periodic housekeeping – independant from the main flow. And it does not interfere with the normal data processing. So to speak it resolves a big performance issue within IDOUtils.

Basically, this is the way it performs:

  • sleep a while after creation and intialization
  • idle wait for database connection and connected instance from main process
  • perform periodic maintenance not interferring with main process
  • will be terminated when ido2db shuts down

Best thing so far – it has been implemented and tested and improved quite a while. Mostly done in our own git branches, but the final solution is within current git master and will be one of the outstanding new features for Icinga IDOUtils in the upcoming Icinga 1.0.1 release.

Stay tuned for more updates!

… and prepare for Icinga 1.0.1! =)

  • Share/Bookmark

Icinga IDOUtils – More improvements Part I

It’s been a while since I made several changes to the initial Oracle implementation in Icinga IDOUtils. Code has been split, first start of using prepared statements and binded params with ocilib and some other changes to the code.

In the last few weeks I have been investigating a lot on how to implement more improvements and optimize the critical path of data input from Icinga Core.

I want to start with IDOUtils Oracle, more information on other improvements for Icinga and IDOUtils will follow :)

Oracle implementation splits up into several parts taken care of:

  • Rewrite all queries to prepared statements and bind params at runtime
  • Add dynamic procedures for DELETE statements
  • Drop autoincrement emulation by one sequence and insert triggers
  • Add sequences for each table and use INSERT INTO … (id, …) VALUES (seq_name.nextval, …)
  • Add RETURNING id INTO :id for INSERT statements to save one round trip
  • MERGE does not support returning INTO, added SELECT seq_name.currval query instead for fetching last inserted id
  • Rewrite selecting cached objects from DB

The rewritten queries are divided as follows:

  • 1x SELECT latest data time as is (called only at startup)
  • dynamic procedure for DELETE on table by instance_id called at startup for cleaning config/status
  • dynamic procedure for DELETE on tably by instance_id, field compared to time called during periodic cleanup
  • all other queries are prepared with their own statement handler
    • 4x DELETE
    • 52x MERGE
    • 9x INSERT
    • 9x UPDATE
    • 5x SELECT

This summarizes into about 8000 lines (+) and 2000 lines (-) of code modifications :-)

Furthermore I have been thinking on how to provide an upgrade path for all existing IDOUtils Oracle users. Importing data using the newly applied sequences might lead into errors regarding currval of each sequence. A basic upgrade procedure has been provided already – if you want to try, get the latest GIT master.

Stay tuned for more interesting stories to tell :)

… and watch out for Icinga 1.0.1 and fresh IDOUtils Oracle!

  • Share/Bookmark