The Franconian
Coder Studio

When Monitoring Says Everything Is Fine, But It’s Not

When Monitoring Says Everything Is Fine, But It’s Not

Monitoring tools can give a false sense of security when they report no errors. I share a story about car diagnostics and draw parallels to software systems, emphasizing the importance of understanding the basics beyond relying on tools.

Find the error if the monitoring says everything is fine.

A while ago I was in the garage with our car. I went there because it was throwing errors that should be checked. The computer was read there. But there was no entry in the error memory. In fact, we were able to list several things that didn’t seem normal. Of course, none of these errors could be recreated.

The issue was settled for the garage. Computer says there is no error. They couldn’t recreate the error either. So after a while we took the car back with us without anything else being done.

Now we drove to a small garage with the same conditions. The employees there had obviously still learned how a car really works. And they didn’t just rely on the computer. After a little searching they were able to locate something and fixed it.

I see the same phenomenon coming our way in the area of software and infrastructure. Not because the systems are becoming too complex. But because we only learn to use tools. Often there is not enough basic knowledge to get a feel for what could be the cause of misconduct.

It’s no use adding more logging, tracing, monitoring, whatever. Learn to understand how the system works. Familiarize yourself with the basics.

I am therefore a big fan of constantly monitoring a system productively based on defined use cases. Of course, this does not replace normal testing. But at the end of the day, all that matters is whether the software does what it’s supposed to do. And any deeper tests can be incomplete and sketchy. And you start to rely too much on the fact that everything should actually work.

#monitoring#system diagnostics#software infrastructure#problem solving#tool reliance#system understanding