Incident Response: What Happens When Things Go Wrong
The structured process for handling security incidents - from detection to recovery. How IR teams contain breaches, investigate root cause, and prevent recurrence.
On this page
Ground Up: Defender's Playbook
Part 2 of 3
View all parts
- 1Logs and Monitoring: Your First Line of Defense
- 2Incident Response: What Happens When Things Go Wrong
- 3Hardening 101: Practical Steps to Secure Systems
Your SIEM fires an alert at 2 AM. A domain admin account just logged in from an IP in a country where you have no employees. Ten minutes later, that account accessed the file server containing customer data. PowerShell is running encoded commands on three workstations.
You’re being breached. Right now. What do you do?
This is where incident response (IR) comes in. IR is the structured process for detecting, containing, investigating, and recovering from security incidents. Without it, breaches turn into panic. With it, they turn into a process.
Why Structure Matters
When an incident hits, the natural reaction is to start pulling cables and shutting things down. That instinct causes problems:
- Shutting down a compromised server destroys volatile memory (RAM) that contains evidence
- Blocking an attacker too aggressively tips them off and they switch to a different backdoor
- Restoring from backup without understanding the root cause means the attacker gets back in
- Not documenting actions makes legal and compliance response harder later
A structured IR process prevents these mistakes. It ensures you act quickly but deliberately.
The IR Lifecycle
The industry-standard framework comes from NIST (SP 800-61). Four phases:
1. Preparation → Before anything happens
2. Detection & Analysis → Finding and understanding the incident
3. Containment, Eradication & Recovery → Stopping, removing, and fixing
4. Post-Incident Activity → Learning from it
Phase 1: Preparation
This is everything you do before an incident occurs. When the breach happens at 2 AM, it’s too late to prepare.
IR Plan:
- Who gets called? (IR team roles and contact info)
- What’s the escalation path? (analyst → IR lead → CISO → legal → executives)
- When do you involve law enforcement?
- What authority does the IR team have? (Can they isolate systems without VP approval?)
Technical readiness:
- Log collection and SIEM configured (covered in the previous post)
- Forensic tools available (disk imaging, memory capture)
- Network diagrams and asset inventory current
- Backup integrity verified regularly
- Communication plan (how do you talk to each other if email is compromised?)
IR team roles:
| Role | Responsibility |
|---|---|
| IR Lead | Coordinates response, makes decisions |
| Analyst | Investigates, analyzes evidence |
| IT/Ops | Implements containment and recovery actions |
| Communications | Internal and external messaging |
| Legal | Regulatory requirements, law enforcement liaison |
| Management | Business decisions, resource allocation |
Phase 2: Detection and Analysis
Something triggered the investigation. Now you need to understand what you’re dealing with.
Detection sources:
- SIEM alerts and detection rules
- User reports (“I clicked a link and something weird happened”)
- Threat intelligence feeds
- External notification (FBI calls you, a researcher discloses a finding)
- Anomaly detection (unusual data transfers, new admin accounts)
Initial analysis questions:
1. What happened? → Malware? Unauthorized access? Data theft?
2. When did it start? → How long has the attacker been in?
3. What's affected? → Which systems, accounts, data?
4. Is it still active? → Is the attacker still in the environment?
5. What's the scope? → One machine or the whole domain?
Severity classification:
| Severity | Criteria | Example |
|---|---|---|
| Critical | Active data exfiltration, ransomware spreading, critical systems compromised | Domain controller compromised, ransomware encrypting file shares |
| High | Confirmed compromise, no active exfiltration yet | Attacker has a shell on a web server, hasn’t moved laterally |
| Medium | Suspicious activity, not confirmed as malicious | Multiple failed logins to admin accounts from unusual IP |
| Low | Minor policy violation, no compromise | Employee plugged in an unauthorized USB device |
Severity determines how many people get woken up at 2 AM.
Evidence collection:
Collect evidence before making changes. Once you restart a server or wipe a disk, volatile data is gone.
Order of volatility (collect first → last):
1. Memory (RAM) → Running processes, network connections, encryption keys
2. Network state → Active connections, routing tables
3. Running processes → What's executing right now
4. Disk → Files, logs, registry
5. Remote logging → SIEM data, centralized logs
6. Backups → Historical state comparison
# Linux - capture volatile data
# Memory dump (requires special tools like LiME)
# Active network connections
ss -tlnp > /evidence/network_connections.txt
# Running processes
ps auxf > /evidence/processes.txt
# Open files
lsof > /evidence/open_files.txt
# Login history
last > /evidence/logins.txt
# Windows - capture volatile data
# Active connections
netstat -ano > C:\evidence\connections.txt
# Running processes
Get-Process | Out-File C:\evidence\processes.txt
# Logged-in users
query user > C:\evidence\users.txt
# Recent event logs
wevtutil epl Security C:\evidence\security.evtx
Phase 3: Containment, Eradication, and Recovery
Three sub-phases, each with a distinct goal.
Containment: Stop the Bleeding
The goal is to prevent the attacker from causing more damage without alerting them (if possible) and without destroying evidence.
Short-term containment (immediate actions):
| Action | When to Use |
|---|---|
| Isolate the system from the network | Confirmed compromise, active lateral movement |
| Disable compromised accounts | Credential theft confirmed |
| Block attacker IP at firewall | Known C2 communication |
| DNS sinkhole malicious domains | Malware phoning home |
| Increase logging/monitoring | Need more visibility before acting |
Long-term containment:
- Move compromised systems to an isolated VLAN (they stay running for investigation but can’t reach anything)
- Apply emergency patches to prevent the same exploit on other systems
- Reset credentials for at-risk accounts
- Deploy additional monitoring on systems the attacker touched
A critical decision: Do you let the attacker continue while you watch, or do you cut them off immediately?
- Watch first if: you need to understand the full scope, the attacker hasn’t reached critical data yet, you want to identify all compromised systems before acting
- Cut immediately if: ransomware is actively encrypting, data is being exfiltrated, critical systems are at risk
Eradication: Remove the Threat
Once contained, remove every trace of the attacker:
- Remove malware from all affected systems
- Close the initial entry point (patch the vulnerability, disable the phished account)
- Remove persistence mechanisms - backdoor accounts, scheduled tasks, web shells, SSH keys, registry run keys (remember the kill chain installation stage)
- Reset all potentially compromised credentials - not just the ones you know about
The biggest mistake in eradication: thinking you found everything. Attackers establish multiple persistence mechanisms. If you find and remove three backdoors but miss the fourth, they’re back in tomorrow.
Recovery: Return to Normal
Bring systems back online carefully:
- Rebuild compromised systems from clean images (don’t just “clean” a compromised server - rebuild it)
- Restore data from verified clean backups
- Verify integrity of restored systems before reconnecting to the network
- Monitor closely for signs of re-compromise (increased logging, extra alerting)
- Staged reconnection - bring systems back one at a time, watching for anomalies
How do you know the attacker isn’t still in? You don’t, with certainty. That’s why monitoring during recovery is critical.
Phase 4: Post-Incident Activity
The incident is over. Now make sure it doesn’t happen the same way again.
Post-incident review (blameless):
What happened? → Timeline of events from initial compromise to recovery
How was it detected? → Alert, user report, external notification?
What went well? → Fast containment, good communication, effective tools
What didn't go well? → Slow detection, missing logs, unclear escalation
What's the root cause? → Unpatched server, phished employee, misconfiguration
What do we change? → Specific, actionable improvements
Key principle: blameless. If an employee clicked a phishing link, the question isn’t “why did they click?” It’s “why did the phishing email get through our filters?” and “why did clicking a link give the attacker code execution?”
Blaming individuals stops people from reporting incidents. That makes everything worse.
Documentation:
- Incident timeline
- Systems affected
- Evidence collected
- Actions taken
- Root cause analysis
- Recommendations
This documentation serves legal requirements, insurance claims, regulatory compliance, and most importantly - organizational learning.
Indicators of Compromise (IOCs):
Document everything the attacker used so you can detect it in the future:
| IOC Type | Example |
|---|---|
| IP addresses | C2 server IPs |
| Domains | Phishing domains, malware download sites |
| File hashes | Malware hashes (MD5, SHA256) |
| Email addresses | Phishing sender addresses |
| File paths | Where malware was dropped |
| Registry keys | Persistence mechanisms |
| User agents | Custom C2 user agents |
Share IOCs with your industry’s information sharing community (ISACs) so other organizations can detect the same threat.
Common Incident Types
| Type | Key Actions |
|---|---|
| Phishing/credential theft | Reset credentials, check for mail forwarding rules, review login history, scan for persistence |
| Ransomware | Isolate immediately, assess backup integrity, determine variant, do not pay without legal/executive decision |
| Web application compromise | Take application offline, review web logs, check for web shells, patch vulnerability |
| Insider threat | Involve HR and legal early, preserve evidence carefully, monitor discreetly |
| Data breach | Determine what was accessed, legal/regulatory notification requirements, preserve evidence for potential litigation |
Incident Response Frameworks
| Framework | Use Case |
|---|---|
| NIST SP 800-61 | General IR guidance (what we covered here) |
| SANS Incident Handling | Six-phase model popular in training |
| MITRE ATT&CK | Map attacker techniques to detection and response |
| CISA Incident Response Playbooks | Government-focused playbooks |
Try It
You don’t need a real incident to practice. Tabletop exercises walk through scenarios:
Scenario: An employee reports their computer is running slowly. Your EDR tool shows PowerShell connecting to an external IP every 60 seconds. The employee received a suspicious email with an attachment yesterday.
Ask yourself:
- What phase are you in? (Detection & Analysis)
- What’s your first action? (Don’t turn off the computer - capture volatile evidence)
- What logs do you check? (EDR, email logs, DNS logs, proxy logs)
- How do you contain it? (Isolate from network, check if other machines are affected)
- Who do you notify? (IR lead, management if data exposure is possible)
Running these scenarios regularly builds the muscle memory that makes real incidents manageable.
What’s Next
You know how to detect incidents (logs and monitoring) and respond to them (IR process). The next post covers the proactive side - hardening systems so incidents are less likely to happen in the first place. Prevention is always cheaper than response.
References
- NIST SP 800-61 Rev. 2 - Computer Security Incident Handling Guide
- SANS Incident Handler’s Handbook
- CISA Cybersecurity Incident Response
- The Incident Response Hierarchy of Needs
Incidents are inevitable. Chaos isn’t. The difference between a breach that costs millions and one that’s contained in hours is whether you have a process and practiced it before you needed it.
Related Articles
Logs and Monitoring: Your First Line of Defense
Understanding log types, where to find them, what to look for, and how SIEM tools turn raw data into actionable alerts. The foundation of every detection strategy.
Reverse Shells Explained: The Complete Foundation
What reverse shells are, why attackers use them, how they work under the hood, and where they fit in the attack chain. The foundation before you touch a terminal.
Authentication Attacks: Passwords, Sessions, and Tokens
How login systems break - brute force, credential stuffing, session hijacking, token flaws, and MFA bypass. The complete beginner's guide to auth attacks.