Incident Response: What Happens When Things Go Wrong

Your SIEM fires an alert at 2 AM. A domain admin account just logged in from an IP in a country where you have no employees. Ten minutes later, that account accessed the file server containing customer data. PowerShell is running encoded commands on three workstations.

You’re being breached. Right now. What do you do?

This is where incident response (IR) comes in. IR is the structured process for detecting, containing, investigating, and recovering from security incidents. Without it, breaches turn into panic. With it, they turn into a process.

Why Structure Matters

When an incident hits, the natural reaction is to start pulling cables and shutting things down. That instinct causes problems:

Shutting down a compromised server destroys volatile memory (RAM) that contains evidence
Blocking an attacker too aggressively tips them off and they switch to a different backdoor
Restoring from backup without understanding the root cause means the attacker gets back in
Not documenting actions makes legal and compliance response harder later

A structured IR process prevents these mistakes. It ensures you act quickly but deliberately.

The IR Lifecycle

The industry-standard framework comes from NIST (SP 800-61). Four phases:

1. Preparation       →  Before anything happens
2. Detection & Analysis  →  Finding and understanding the incident
3. Containment, Eradication & Recovery  →  Stopping, removing, and fixing
4. Post-Incident Activity  →  Learning from it

Phase 1: Preparation

This is everything you do before an incident occurs. When the breach happens at 2 AM, it’s too late to prepare.

IR Plan:

Who gets called? (IR team roles and contact info)
What’s the escalation path? (analyst → IR lead → CISO → legal → executives)
When do you involve law enforcement?
What authority does the IR team have? (Can they isolate systems without VP approval?)

Technical readiness:

Log collection and SIEM configured (covered in the previous post)
Forensic tools available (disk imaging, memory capture)
Network diagrams and asset inventory current
Backup integrity verified regularly
Communication plan (how do you talk to each other if email is compromised?)

IR team roles:

Role	Responsibility
IR Lead	Coordinates response, makes decisions
Analyst	Investigates, analyzes evidence
IT/Ops	Implements containment and recovery actions
Communications	Internal and external messaging
Legal	Regulatory requirements, law enforcement liaison
Management	Business decisions, resource allocation

Phase 2: Detection and Analysis

Something triggered the investigation. Now you need to understand what you’re dealing with.

Detection sources:

SIEM alerts and detection rules
User reports (“I clicked a link and something weird happened”)
Threat intelligence feeds
External notification (FBI calls you, a researcher discloses a finding)
Anomaly detection (unusual data transfers, new admin accounts)

Initial analysis questions:

1. What happened?        → Malware? Unauthorized access? Data theft?
2. When did it start?    → How long has the attacker been in?
3. What's affected?      → Which systems, accounts, data?
4. Is it still active?   → Is the attacker still in the environment?
5. What's the scope?     → One machine or the whole domain?

Severity classification:

Severity	Criteria	Example
Critical	Active data exfiltration, ransomware spreading, critical systems compromised	Domain controller compromised, ransomware encrypting file shares
High	Confirmed compromise, no active exfiltration yet	Attacker has a shell on a web server, hasn’t moved laterally
Medium	Suspicious activity, not confirmed as malicious	Multiple failed logins to admin accounts from unusual IP
Low	Minor policy violation, no compromise	Employee plugged in an unauthorized USB device

Severity determines how many people get woken up at 2 AM.

Evidence collection:

Collect evidence before making changes. Once you restart a server or wipe a disk, volatile data is gone.

Order of volatility (collect first → last):
1. Memory (RAM)        → Running processes, network connections, encryption keys
2. Network state       → Active connections, routing tables
3. Running processes   → What's executing right now
4. Disk              → Files, logs, registry
5. Remote logging     → SIEM data, centralized logs
6. Backups           → Historical state comparison

# Linux - capture volatile data
# Memory dump (requires special tools like LiME)
# Active network connections
ss -tlnp > /evidence/network_connections.txt
# Running processes
ps auxf > /evidence/processes.txt
# Open files
lsof > /evidence/open_files.txt
# Login history
last > /evidence/logins.txt

# Windows - capture volatile data
# Active connections
netstat -ano > C:\evidence\connections.txt
# Running processes
Get-Process | Out-File C:\evidence\processes.txt
# Logged-in users
query user > C:\evidence\users.txt
# Recent event logs
wevtutil epl Security C:\evidence\security.evtx

Phase 3: Containment, Eradication, and Recovery

Three sub-phases, each with a distinct goal.

Containment: Stop the Bleeding

The goal is to prevent the attacker from causing more damage without alerting them (if possible) and without destroying evidence.

Short-term containment (immediate actions):

Action	When to Use
Isolate the system from the network	Confirmed compromise, active lateral movement
Disable compromised accounts	Credential theft confirmed
Block attacker IP at firewall	Known C2 communication
DNS sinkhole malicious domains	Malware phoning home
Increase logging/monitoring	Need more visibility before acting

Long-term containment:

Move compromised systems to an isolated VLAN (they stay running for investigation but can’t reach anything)
Apply emergency patches to prevent the same exploit on other systems
Reset credentials for at-risk accounts
Deploy additional monitoring on systems the attacker touched

A critical decision: Do you let the attacker continue while you watch, or do you cut them off immediately?

Watch first if: you need to understand the full scope, the attacker hasn’t reached critical data yet, you want to identify all compromised systems before acting
Cut immediately if: ransomware is actively encrypting, data is being exfiltrated, critical systems are at risk

Eradication: Remove the Threat

Once contained, remove every trace of the attacker:

Remove malware from all affected systems
Close the initial entry point (patch the vulnerability, disable the phished account)
Remove persistence mechanisms - backdoor accounts, scheduled tasks, web shells, SSH keys, registry run keys (remember the kill chain installation stage)
Reset all potentially compromised credentials - not just the ones you know about

The biggest mistake in eradication: thinking you found everything. Attackers establish multiple persistence mechanisms. If you find and remove three backdoors but miss the fourth, they’re back in tomorrow.

Recovery: Return to Normal

Bring systems back online carefully:

Rebuild compromised systems from clean images (don’t just “clean” a compromised server - rebuild it)
Restore data from verified clean backups
Verify integrity of restored systems before reconnecting to the network
Monitor closely for signs of re-compromise (increased logging, extra alerting)
Staged reconnection - bring systems back one at a time, watching for anomalies

How do you know the attacker isn’t still in? You don’t, with certainty. That’s why monitoring during recovery is critical.

Phase 4: Post-Incident Activity

The incident is over. Now make sure it doesn’t happen the same way again.

Post-incident review (blameless):

What happened?           → Timeline of events from initial compromise to recovery
How was it detected?     → Alert, user report, external notification?
What went well?          → Fast containment, good communication, effective tools
What didn't go well?     → Slow detection, missing logs, unclear escalation
What's the root cause?   → Unpatched server, phished employee, misconfiguration
What do we change?       → Specific, actionable improvements

Key principle: blameless. If an employee clicked a phishing link, the question isn’t “why did they click?” It’s “why did the phishing email get through our filters?” and “why did clicking a link give the attacker code execution?”

Blaming individuals stops people from reporting incidents. That makes everything worse.

Documentation:

Incident timeline
Systems affected
Evidence collected
Actions taken
Root cause analysis
Recommendations

This documentation serves legal requirements, insurance claims, regulatory compliance, and most importantly - organizational learning.

Indicators of Compromise (IOCs):

Document everything the attacker used so you can detect it in the future:

IOC Type	Example
IP addresses	C2 server IPs
Domains	Phishing domains, malware download sites
File hashes	Malware hashes (MD5, SHA256)
Email addresses	Phishing sender addresses
File paths	Where malware was dropped
Registry keys	Persistence mechanisms
User agents	Custom C2 user agents

Share IOCs with your industry’s information sharing community (ISACs) so other organizations can detect the same threat.

Common Incident Types

Type	Key Actions
Phishing/credential theft	Reset credentials, check for mail forwarding rules, review login history, scan for persistence
Ransomware	Isolate immediately, assess backup integrity, determine variant, do not pay without legal/executive decision
Web application compromise	Take application offline, review web logs, check for web shells, patch vulnerability
Insider threat	Involve HR and legal early, preserve evidence carefully, monitor discreetly
Data breach	Determine what was accessed, legal/regulatory notification requirements, preserve evidence for potential litigation

Incident Response Frameworks

Framework	Use Case
NIST SP 800-61	General IR guidance (what we covered here)
SANS Incident Handling	Six-phase model popular in training
MITRE ATT&CK	Map attacker techniques to detection and response
CISA Incident Response Playbooks	Government-focused playbooks

Try It

You don’t need a real incident to practice. Tabletop exercises walk through scenarios:

Scenario: An employee reports their computer is running slowly. Your EDR tool shows PowerShell connecting to an external IP every 60 seconds. The employee received a suspicious email with an attachment yesterday.

Ask yourself:

What phase are you in? (Detection & Analysis)
What’s your first action? (Don’t turn off the computer - capture volatile evidence)
What logs do you check? (EDR, email logs, DNS logs, proxy logs)
How do you contain it? (Isolate from network, check if other machines are affected)
Who do you notify? (IR lead, management if data exposure is possible)

Running these scenarios regularly builds the muscle memory that makes real incidents manageable.

What’s Next

You know how to detect incidents (logs and monitoring) and respond to them (IR process). The next post covers the proactive side - hardening systems so incidents are less likely to happen in the first place. Prevention is always cheaper than response.

References

Incidents are inevitable. Chaos isn’t. The difference between a breach that costs millions and one that’s contained in hours is whether you have a process and practiced it before you needed it.