Quick fix for CrowdStrike update problem
Customer situation
At 06:00 CEST time, CrowdStrike released a sensor configuration update for Windows systems. Sensor configuration updates are a permanent part of the Falcon platform's protection mechanisms. This configuration update triggered a logic error that caused a system crash and blue screen (BSOD) on affected systems. At 8:00 a.m. CEST time, we received the first notification of the impact of the update from one of our customers whose Azure environment is managed by us. Immediate action was required to mitigate the problems caused by this update.
Proposed solution
Disk management strategy: Attaching the affected VM disk to another VM. Renaming the problematic files to alleviate the problem.
Implementation steps:
- Create a snapshot of the OS disk (under 1 minute).
- Creating a managed drive:Convert snapshot to managed disk (under 5 minutes). Provide effective naming conventions for easy tracking (e.g. VMname_Snapshot_date, VMname_recovered_date).
- File deletion workflow: Attach the managed disk to the jump server. Deleting the necessary files. Disconnecting and reattaching the disk to the affected VM.Turning on the VM.
Key factors
- Timing: 8:00 CEST time
- Customer: Global manufacturing company
- Reported problem: Azure environment disruption due to CrowdStrike update
- Initial reaction: Immediately begin analysis upon notification from the customer's data center service manager.
- Communication: Regular message exchanges in Microsoft Teams with the customer's internal team. Initial confusion over lack of reports indicating problems with services
- Analysis: Direct discussion with the customer's data center service manager.Confirm that the problem is indeed causing significant disruption.
Benefit provided
- All affected virtual machines in the customer's Azure environment were fully operational within 24 hours.
- Effective teamwork and rapid implementation of a customized solution minimized downtime and operational impact.
Added value
The importance of backups: Quick access to reliable backups is crucial for successful disaster recovery.
Effective communication: Keeping communication channels open to quickly communicate information and adjust strategies.
Flexibility: Ability to quickly adapt standard solutions to specific environments and constraints.
Naming conventions: Clear and consistent naming conventions are essential for managing and tracking resources during a crisis.
Conclusions of the project
The rapid and coordinated response to the CrowdStrike update problem demonstrates the effectiveness of our incident management processes. Through strong communication, detailed knowledge of the customer's environment and innovative problem solving, we successfully mitigated the potentially serious consequences of an outage in an extremely short period of time. This case highlights the importance of preparedness, teamwork and flexibility in IT service management.
Ursula Gorska
- Support and development of Microsoft and Nintex based applications
- Application design and development including digital processes
- Invoice management
- Requisition management
- Contract management
- Modern Intranet
ISCG sp. z o.o.
Al. Jerozolimskie 178, 02-486 Warsaw
NIP: 5262798378
KRS: 0000220621
Phone