Site icon TwistedSifter

IT Worker Explains Why Even the Best Critical Infrastructure Can’t Always Prevent Human Mistakes

Man in data center

Shutterstock

When setting up a system that runs critical infrastructure, companies use multiple layers of redundancy to ensure the equipment never gets turned off.

In these environments, however, a system is only as good as its weakest point. The technical engineer in this story was called to a critical location to troubleshoot a power outage with essential systems.

When he got there, he found that all the critical equipment (and the backup power supplies) were plugged into a single power strip. And what is worse, the power strip got unplugged because the maintenance guy needed to charge one of his batteries.

This kind of story is a good reminder of the fact that when planning systems like this, you really need to think about the stupidity of humans. It should have been made impossible to unplug that strip, and there should have been redundant systems in place to step in should it go bad. Oh well, at least it is a funny story, and it was a simple fix. Read through the full story below.

The Mission Critical Battery Charger

At one point, I was doing work for a particular MAJOR pharmaceuticals company. A company with a name that everyone reading this has likely heard of.

Ok, this seems normal enough.

I get a call one day, and a ticket with a short SLA is generated for me to be on site within 4 hours.

The night before, some big storms had happened, so I figured it was power related.

Oh, wow. This is a huge problem.

I arrive on site earlier than needed, get escorted to the primary network room for the whole facility, and what do I find?

10 racks with no LEDs on them. I get told that the entire facility is down. No internet. The production line is down. The warehouse distribution is down. The kind of emergency companies pay consultants tens of thousands of dollars to ensure never happens.

This is a good setup.

I start checking through the racks to see if anything has power. UPS batteries are dead, so I head to the back of the rack and trace power cables from network equipment.

All of them go to PDUs, and all major routers and switches even have redundant power to multiple PDUs. I trace where the PDUs go to.

Setups like this should never have a single point of failure.

They all go to UPSs, which also have redundant power split between two different UPSs. Then I trace where the UPS power comes from……….. and I start laughing my ass off.

The UPSs for this entire cluster of racks, the racks housing the entirety of the network equipment for this facility, has single point of failure.

What? Who unplugged that critical power strip?

A large power strip that was zip tied to the wall. And lo and behold, I found the problem. The power strip’s power cable was dangling in the air.

Not plugged into the wall outlet like it should be. In the outlets place was…. a 20V battery charger.

The maintenance guy is definitely going to get in trouble.

The maintenance guy had come in the earlier that day, not had anywhere to charge a battery, so he unplugged what was apparently a mission critical power strip and plugged in his battery charger.

A few hours later, when the UPSs died, the network team noticed the site went dark.

At least it was an easy problem to fix.

After I relayed the info to the engineer I was working with, we shared a laugh. I plugged everything back in, and verified everything came back online.

As a preventive measure (and because of the absurdity of the situation), I placed a large label on the power strip. “CRITICAL INFRASTRUCTURE. DO NOT REMOVE POWER WITHOUT AUTHORIZATION”.

To this day, I still get a laugh out of it.

He got the issue solved easily, but that single point of failure should still be eliminated. If it really is critical infrastructure, this is a horrible system.

If you enjoyed this story, check out this post about a woman whose HR department advised her to quit if she was that unhappy, so she did and found herself in a role reversal years later.

Let’s see what the people in the comments think of the situation.

I’ve heard of things like this happening.

I sure hope they did.

Here is someone who had a similar experience.

They really didn’t have enough power to support these systems.

This type of story happens a lot.

Whoever set this critical infrastructure system up was a moron. It should never be possible to unplug a cabinet filled with this type of equipment by accident.

I can’t even blame the maintenance guy. He shouldn’t have had access to this room if the equipment was as critical as it seems.

If you enjoyed this story, check out this post about an employee who works fast and helps her coworkers, but is met with disapproval from her supervisor because of this practice

Exit mobile version