Latest Crowdstrike Update Issue: The issue seems widespread, affecting machines running various CrowdStrike sensor versions. CrowdStrike has acknowledged the problem and is currently investigating the cause., Technology & Science News - Times Now
This isn't a gloat post. In fact, I was completely oblivious to this massive outage until I tried to check my bank balance and it wouldn't log in.
Apparently Visa Paywave, banks, some TV networks, EFTPOS, etc. have gone down. Flights have had to be cancelled as some airlines systems have also gone down. Gas stations and public transport systems inoperable. As well as numerous Windows systems and Microsoft services affected. (At least according to one of my local MSMs.)
Seems insane to me that one company's messed up update could cause so much global disruption and so many systems gone down :/ This is exactly why centralisation of services and large corporations gobbling up smaller companies and becoming behemoth services is so dangerous.
The annoying aspect from somebody with decades of IT experience is - what should happen is that crowdstrike gets sued into oblivion, and people responsible for buying that shit should have an epihpany and properly look at how they are doing their infra.
But will happen is that they'll just buy a new crwodstrike product that promises to mitigate the fallout of them fucking up again.
Do any changes - especially upgrades - on local test environments before applying them in production?
The scary bit is what most in the industry already know: critical systems are held on with duct tape and maintained by juniors 'cos they're the cheapest Big Money can find. And even if not, There's no time. or It's too expensive. are probably the most common answers a PowerPoint manager will give to a serious technical issue being raised.
some years back I was the 'Head' of systems stuff at a national telco that provided the national telco infra. Part of my job was to manage the national systems upgrades. I had the stop/go decision to deploy, and indeed pushed the 'enter' button to do it. I was a complete PowerPoint Manager and had no clue what I was doing, it was total Accidental Empires, and I should not have been there. Luckily I got away with it for a few years. It was horrifically stressful and not the way to mitigate national risk. I feel for the CrowdStrike engineers. I wonder if the latest embargo on Russian oil sales is in anyway connected?
Not OP. But that is how it used to be done. Issue is the attacks we have seen over the years. IE ransom attacks etc. Have made corps feel they needf to fixed and update instantly to avoid attacks. So they depend on the corp they pay for the software to test roll out.
Autoupdate is a 2 edged sword. Without it, attackers etc will take advantage of delays. With it. Well today.
I isn't even a Linux vs Windows thing but a competent at your job vs don't know what the fuck you are doing thing. Critical systems are immutable and isolated or as close as reasonably possible. They don't do live updates of third party software and certainly not software that is running privileged and can crash the operating system.
I couldn't face working in corporate IT with this sort of bullshit going on.
This is just like "what not to do in IT/dev/tech 101" right here. Every since I've been in the industry for literally decades at this point I was always told, even when in school, "Never test in production, never roll anything out to production on a Friday, if you're unsure have someone senior code review" of which, Crowdstrike, failed to do all of the above. Even the most junior of junior devs should know better. So the fact that this update was allowed go through...I mean blame the juniors, the seniors, the PM's, the CTO's, everyone. If your shit is so critical that a couple bad lines of poorly written code (which apparently is what it was) can cripple the majority of the world....yeah crowdstrike is done.
It’s incredible how an issue of this magnitude didn’t get discovered before they shipped it. It’s not exactly an issue that happens in some niche cases. It’s happening on all Windows computers!
This can only happen if they didn’t test their product at all before releasing to production. Or worse: maybe they did test, got the error, and they just “eh, it’s probably just something wrong with test systems”, and then shipped anyway.
It's also a "don't allow third party proprietary shit into your kernel" issue. If the driver was open source it would actually go through a public code review and the issue would be more likely to get caught. Even if it did slip through people would publically have a fix by now with all the eyes on the code. It also wouldn't get pushed to everyone simultaneously under the control of a single company, it would get tested and packaged by distributions before making it to end users.
It's actually a "test things first and have a proper change control process" thing. Doesn't matter if it's open source, closed source scummy bullshit or even coded by God: you always test it first before hitting deploy.
It's not that clear cut a problem. There seems to be two elements; the kernel driver had a memory safety bug; and a definitions file was deployed incorrectly, triggering the bug. The kernel driver definitely deserves a lot of scrutiny and static analysis should have told them this bug existed. The live updates are a bit different since this is a real-time response system. If malware starts actively exploiting a software vulnerability, they can't wait for distribution maintainers to package their mitigation - they have to be deployed ASAP. They certainly should roll-out definitions progressively and monitor for anything anomalous but it has to be quick or the malware could beat them to it.
This is more a code safety issue than CI/CD strategy. The bug was in the driver all along, but it had never been triggered before so it passed the tests and got rolled out to everyone. Critical code like this ought to be written in memory safe languages like Rust.
More generally: delegate anything critical to a 3rd party and you've just put your business at the mercy of the quality (or lack thereof) of their own business processes which you do not control, which is especially dangerous in the current era of "cheapest as possible" hiring practices.
Having been in IT for almost 3 decades, a lesson I have learned long ago and which I've also been applying to my own things (such as having my own domain for my own e-mail address rather than using something like Google) was that you should avoid as much as possible to have your mission critical or hard to replace stuff dependent on a 3rd Party, especially if the dependency is Live (i.e. activelly connected rather than just buying and installing their software).
I've managed to avoid quite a lot of the recent enshittification exactly because I've been playing it safe in this domain for 2 decades.
Didn't Crowdstrike have a bad update to Debian systems back in April this year that caused a lot of problems? I don't think it was a big thing since not as many companies are using Crowdstrike on Debian.
Sounds like the issue here is Crowdstrike and not Windows.
They didn’t even bother to do a gradual rollout, like even small apps do.
The level of company-wide incompetence is astounding, but considering how organizations work and disregard technical people’s concerns, I’m never surprised when these things happen. It’s a social problem more than a technical one.
While I don’t totally disagree with you, this has mostly nothing to do with Windows and everything to do with a piece of corporate spyware garbage that some IT Manager decided to install. If tools like that existed for Linux, doing what they do to to the OS, trust me, we would be seeing kernel panics as well.
And if it was a kernel-level driver that failed, Linux machines would fail to boot too. The amount of people seeing this and saying “MS Bad,” (which is true, but has nothing to do with this) instead of “how does an 83 billion dollar IT security firm push an update this fucked” is hilarious
Hate to break it to you, but most IT Managers don't care about crowdstrike: they're forced to choose some kind of EDR to complete audits. But yes things like crowdstrike, huntress, sentinelone, even Microsoft Defender all run on Linux too.
I wouldn't call Crowdstrike a corporate spyware garbage. I work as a Red Teamer in cybersecurity, and EDRs are bane of my existence - they are useful, and pretty good at what they do. In the last few years, I'm struggling more and more to with engagements we do, because EDRs just get in the way and catch a lot of what would pass undetected a month ago. Staying on top of them with our tooling is getting more and more difficult, and I would call that a good thing.
I've recently tested a company without EDR, and boy was it a treat. Not defending Crowdstrike, to call that a major fuckup is great understatement, but calling it "corporate spyware garbage" feels a little bit unfair - EDRs do make a difference, and this wasn't an issue with their product in itself, but with irresponsibility of their patch management.
Still this fiasco proved once again that the biggest thread to IT sometimes is on the inside. At the end of the day a bunch of people decided to buy Crowdstrike and got screwed over. Some of them actually had good reason to use a product like that, others it was just paranoia and FOMO.
This is the problem that managers view security as a product they can simply buy as wholesale, instead of a service that they need to hire a security guy (or a whole department) for.
Hmmm… but that goes up to the CEO level, people like to see everything as a product they can buy because that has less liabilities than hiring people… Also makes a lot more sense from an accounting perspective.
Why should it be? A faulty software update from a 3rd party crashes the operating system. The exact same thing could happen to Linux hosts as well with how much access those IPSec programms usually get.
MS allegedly pushed a bad update. Ok, it happens. Crowdstrike's initial statement seems to be blaming that.
CS software csagent.sys took exception to this and royally shit the bed, disabling the entire computer. I don't think it should EVER do that, so the weight of blame must lie with them.
The really problematic part is, of course, the need to manually remediate these machines. I've just spent the morning of my day off doing just that. Thanks, Crowdstrike.
EDIT: Turns out it was 100% Crowdstrike, and the update was theirs. The initial press release from CS seemed to be blaming Microsoft for an update, but that now looks to be misleading.
It is on the sense that Windows admins are the ones that like to buy this kind of shit and use it. It's not on the sense that Windows was broken somehow.
Feel you there. 4 hours here. All of them cloud instances whereby getting acces to the actual console isn't as easy as it should be, and trying to hit F8 to get the menu to get into safe mode can take a very long time.
Sadly not. Windows doesn't boot. You can boot it into safe mode with networking, at which point maybe with anaible we could login to delete the file but since it's still manual work to get windows into safe mode there's not much point
Crowdstrike already killed some Linux machines. Let's not pretend Windows is at fault here or Linux is magically better in this area. No one is immune from software that can run as a kernel module going bad.
Every system has its faults. And I'm still going to dogpile the system with the most faults. But hell Microsoft did buy GitHub, Halo, MineCraft, and a million other things they will probably find a way to buy Linux and ruin it for us just like they ruin everything else.
Let's see, ...we are somewhere in between Extend and Extinguish on the roadmap.
Edit: Case & Point, RIP RedHat & IBM and GitHub CoPilot, what a great idea. RIP Atom Editor and probably a million other things. Do we have a KilledByMicrosoft website yet? I hope people in the pharmacy could get their prescriptions or we might have to add peoples names to the list.
I work in hospitality and our systems are completely down. No POS, no card processing, no reservations, we're completely f'ked.
Our only saving grace is the fact that we are in a remote location and we have power outages frequently. So operating without a POS is semi-normal for us.
1st please do not believe the bull that there was no problem. Many folks like me were paid to fix it before it was an issue. So other than a few companies, few saw the result, not because it did not exist. But because we were warned. People make jokes about the over panic. But if that had not happened, it would hav been years to fix, not days. Because without the panic, most corporations would have ignored it. Honestly, the panic scared shareholders. So boards of directors had to get experts to confirm the systems were compliant. And so much dependent crap was found running it was insane.
But the exaggerations of planes falling out of the sky etc. Was also bull. Most systems would have failed but BSOD would be rare, but code would crash and some works with errors shutting it down cleanly, some undiscovered until a short while later. As accounting or other errors showed up.
As other have said. The issue was that since the 1960s, computers were set up to treat years as 2 digits. So had no expectation to handle 2000 other than assume it was 1900. While from the early 90s most systems were built with ways to adapt to it. Not all were, as many were only developing top layer stuff. And many libraries etc had not been checked for this issue. Huge amounts of the infra of the world's IT ran on legacy systems. Especially in the financial sector where I worked at the time.
The internet was a fairly new thing. So often stuff had been running for decades with no one needing to change it. Or having any real knowledge of how it was coded. So folks like me were forced to hunt through code or often replace systems that were badly documented or more often not at all.
A lot of modern software development practices grew out of discovering what a fucking mess can grow if people accept an "if it ain't broke, don't touch it" mentality.
Was there patching systems and testing they survived the rollover months before it happened.
One software managed the rollover. But failed the year after. They had quickly coded in an explicit exception for 00. But then promptly forgot to fix it properly!.
Kinda I guess. It was about clocks rolling over from 1999 to 2000 and causing a buffer overflow that would supposedly crash all systems everywhere causing the country to come to a hault.
Most old systems used two digits for years. The year would go from 99 to 0. Any software doing a date comparison will get a garbage result. If a task needs to be run every 5 minutes, what will the software do if that task was last run 99 years from now? It will not work properly.
Governments and businesses spent lots of money and time patching critical systems to handle the date change. The media made a circus out of it, but when the year rolled over, everything was fine.
I love how everyone understands the issue wrong. It's not about being on Windows or Linux. It's about the ecosystem that is common place and people are used to on Windows or Linux. On windows it's accepted that every stupid anticheat can drop its filthy paws into ring 0 and normies don't mind. Linux has a fostered a less clueless community, but ultimately it's a reminder to keep vigilant and strive for pure and well documented open source with the correct permissions.
While that is true, it makes sense for antivirus/edr software to run in kernelspace. This is a fuck-up of a giant company that sells very expensive software. I wholeheartedly agree with your sentiment, but I mostly see this as a cautionary tale against putting excessive trust and power in the hands of one organization/company.
Imagine if this was actually malicious instead of the product of incompetence, and the update instead ran ransomware.
If it was malicious it wouldn't have had the reach a trusted platform would. That is what made the xz exploit so scary was the reach and the malicious attempt.
I like open source software but that's one big benefit of proprietary software. Not all proprietary software is bad. We should recognize the ones doing their best to avoid anti consumer practices and genuinely try to serve their customers needs to the best of their abilities.
I deployed it for my last employer on our linux environment. My buddies who still work there said Linux was fine while they had to help the windows Admins fix their hosts.
That's precisely why I didn't blame windows in my post, but the windows-consumer mentality of "yeah install with privileges, shove genshin impact into ring 0 why not"
I wanted to share the article with friends and copy a part of the text I wanted to draw attention to but the asshole site has selection disabled. Now I will not do that and timesnownews can go fuck themselves
Latest Crowdstrike Update Issue: Many Windows users are experiencing Blue Screen of Death (BSOD) errors due to a recent CrowdStrike update. The issue affects various sensor versions, and CrowdStrike has acknowledged the problem and is investigating the cause, as stated in a pinned message on the company's forum. Who Have Been Affected
Australian banks, airlines, and TV broadcasters first reported the issue, which quickly spread to Europe as businesses began their workday. UK broadcaster Sky News couldn't air its morning news bulletins, while Ryanair experienced IT issues affecting flight departures. In the US, the Federal Aviation Administration grounded all Delta, United, and American Airlines flights due to communication problems, and Berlin airport warned of travel delays from technical issues.
In India too, numerous IT organisations were reporting in issues with company-wide. Akasa Airlines and Spicejet experienced technical issues affecting online services. Akasa Airlines' booking and check-in systems were down at Mumbai and Delhi airports due to service provider infrastructure issues, prompting manual check-in and boarding. Passengers were advised to arrive early, and the airline assured swift resolution. Spicejet also faced problems updating flight disruptions, actively working to fix the issue. Both airlines apologized for the inconvenience caused and promised updates as soon as the problems were resolved. Crowdstrike's Response
CrowdStrike acknowledged the problem, linked to their Falcon sensor, and reverted the faulty update. However, affected machines still require manual intervention. IT admins are resorting to booting into safe mode and deleting specific system files, a cumbersome process for cloud-based servers and remote laptops. Reports from IT professionals on Reddit highlight the severity, with entire companies offline and many devices stuck in boot loops. The outage underscores the vulnerability of interconnected systems and the critical need for robust cybersecurity solutions. IT teams worldwide face a long and challenging day to resolve the issues and restore normal operations. What to Expect:
-A Technical Alert (TA) detailing the problem and potential workarounds is expected to be published shortly by CrowdStrike.
-The forum thread will remain pinned to provide users with easy access to updates and information.
What Users Should Do:
-Hold off on troubleshooting: Avoid attempting to fix the issue yourself until the official Technical Alert is released.
-Monitor the pinned thread: This thread will be updated with the latest information, including the TA and any temporary solutions.
-Be patient: Resolving software conflicts can take time. CrowdStrike is working on a solution, and updates will be posted as soon as they become available.
In an automated reply from Crowdstrike, the company had stated:
CrowdStrike is aware of reports of crashes on Windows hosts related to the Falcon Sensor. Symptoms include hosts experiencing a blue screen error related to the Falcon Sensor. The course of current action will be - our Engineering teams are actively working to resolve this issue and there is no need to open a support ticket. Status updates will be posted as we have more information to share, including when the issue is resolved. For Users Experiencing BSODs:
If you're encountering BSOD errors after a recent CrowdStrike update, you're not alone. This appears to be a widespread issue. The upcoming Technical Alert will likely provide specific details on affected CrowdStrike sensor versions and potential workarounds while a permanent fix is developed.
If you have urgent questions or concerns, consider contacting CrowdStrike support directly.
US and UK flights are grounded because of the issue, banks, media and some businesses not fully functioning. Likely we'll see more effects as the day goes on.
Same here. I was totally busy writing software in a new language and a new framework, and had a gazillion tabs on Google and stackexchange open. I didn't notice any network issues until I was on my way home, and the windows f-up was the one big thing in the radio news. Looks like Windows admins will have a busy weekend.
Seems to be some sort of kernel-embedded threat detection system. Which is why it was able to easily fuck the OS. It was running in the most trusted space.
Company offering new-age antivirus solutions, which is to say that instead of being mostly signature-based, it tries to look at application behavior instead. If Word was exploited because some user opened not_a_virus_please_open.docx from their spam folder, Word might be exploited and end up running some malware that tries to encrypt the entire drive. It's supposed to sniff out that 1. Word normally opens and saves like one document at a time and 2. some unknown program is being overly active. And so it should stop that and ring some very loud alarm bells at the IT department.
Basically they doubled down on the heuristics-based detection and by that, they claim to be able to recognize and stop all kinds of new malware that they haven't seen yet. My experience is that they're always the outlier on the top-end of false positives in business AV tests (eg AV-Comparatives Q2 2024) and their advantage has mostly disappeared since every AV has implemented that kind of behavior-based detection nowadays.
I think you mean - the shareholders enjoy the profits of scale.
When a company scales up, prices are rarely reduced. Users do get increased community support through common experiences especially when official channels are congested through events like today, but that's about the only benefit the consumer sees.
Me too. Additionally, I use guix so if a system update ever broke my machine I can just rollback to a prior system version (either via the command line or grub menu).
True, then I'd be screwed. But, because my system config is declared in a single file (plus a file for channels) i could re-install my system and be back in business relatively quickly. There's also guix home but I haven't had a chance to try that.
Windows usage isn't the cause of dysfunction in corporate IT but a symptom of it. All you would get is badly managed Linux systems compromised by bloated insecure commercial security/management software.
I guess they would want some cybersecurity software like Crowdstrike in either case? If so, this could probably have happened on any system, as it's a bug in third party software that crashes the computer.
Not that I know much about this, but if this leads to a push towards Linux it would be if companies already wanted to make the switch, but were unwilling because they thought they needed Crowdstrike specifically. This might lead them to consider alternative cybersecurity software.
No because Windows Indoctrination starts with Academia.
There will have to be heavy monetary losses before IT is forced to leave their golden goose that keeps them employed with "problems" to "fix" that soak up hours each.
But maybe they will notice the monetary losses and competitors not using their trash will pull ahead -- that will get their attention. Still they require the cognition to understand the problem and select a solution and the Linux Jungle is hard for corporate minds to navigate without smart IT help.
It's proving that POSIX architecture is necessary even if it requires additional computer literacy on the part of users and admins.
The risk of hacking (which is what Crowdstrike essentially does to get so deeply embedded and be so effective at endpoint protection) a monolithic system like Windows OS is if you screw up the whole thing comes tumbling down.
From my understanding, they have some ring 0 thing that fucked up. Could that not in theory happen on our beloved Linux systems? Or does the kernel generally not give that option?
A third party driver could break things, but a combination of different things can break things too. Crowdstrike on RHEL was causing kernel panics within the past month until Red Hat updated their kernel.
I’m shocked they didn’t have a test server to push the update to first before deploying it. I always shut off my laptop at work and happened to do so before the crowdstrike update went out on our pcs. I was 5 if 30 people on our team that could even log in as everyone working remotely had the blue screen of death.All our systems were down most of the day. This is nuts. Let’s go back to dos
I'm pretty sure this update didn't get pushed to linux endpoints, but sure, linux machines running the CrowdStrike driver are probably vulnerable to panicking on malformed config files. There are a lot of weirdos claiming this is a uniquely Windows issue.
Hi there! Looks like you linked to a Lemmy community using a URL instead of its name, which doesn't work well for people on different instances. Try fixing it like this: !linux@lemmy.ml
Thanks for the tip, so glad Lemmy makes it easy to block communities.
Also: It seems everyone is claiming it didn't affect Linux but as part of our corporate cleanup yesterday, I had 8 linux boxes I needed to drive to the office to throw a head on and reset their iDrac so sure maybe they all just happened to fail at the same time but in my 2 years on this site we've never had more than 1 down at a time ever, and never for the same reason. I'm not the tech head of the site by any means and it certainly could be unrelated, but people with significantly greater experience than me in my org chalked this up to Crowdstrike.