Skip to content
· 10 min read INFO @Sdmrf

Incident Response: What Happens When Things Go Wrong

The structured process for handling security incidents - from detection to recovery. How IR teams contain breaches, investigate root cause, and prevent recurrence.

On this page

Your SIEM fires an alert at 2 AM. A domain admin account just logged in from an IP in a country where you have no employees. Ten minutes later, that account accessed the file server containing customer data. PowerShell is running encoded commands on three workstations.

You’re being breached. Right now. What do you do?

This is where incident response (IR) comes in. IR is the structured process for detecting, containing, investigating, and recovering from security incidents. Without it, breaches turn into panic. With it, they turn into a process.

Why Structure Matters

When an incident hits, the natural reaction is to start pulling cables and shutting things down. That instinct causes problems:

  • Shutting down a compromised server destroys volatile memory (RAM) that contains evidence
  • Blocking an attacker too aggressively tips them off and they switch to a different backdoor
  • Restoring from backup without understanding the root cause means the attacker gets back in
  • Not documenting actions makes legal and compliance response harder later

A structured IR process prevents these mistakes. It ensures you act quickly but deliberately.

The IR Lifecycle

The industry-standard framework comes from NIST (SP 800-61). Four phases:

1. Preparation       →  Before anything happens
2. Detection & Analysis  →  Finding and understanding the incident
3. Containment, Eradication & Recovery  →  Stopping, removing, and fixing
4. Post-Incident Activity  →  Learning from it

Phase 1: Preparation

This is everything you do before an incident occurs. When the breach happens at 2 AM, it’s too late to prepare.

IR Plan:

  • Who gets called? (IR team roles and contact info)
  • What’s the escalation path? (analyst → IR lead → CISO → legal → executives)
  • When do you involve law enforcement?
  • What authority does the IR team have? (Can they isolate systems without VP approval?)

Technical readiness:

  • Log collection and SIEM configured (covered in the previous post)
  • Forensic tools available (disk imaging, memory capture)
  • Network diagrams and asset inventory current
  • Backup integrity verified regularly
  • Communication plan (how do you talk to each other if email is compromised?)

IR team roles:

RoleResponsibility
IR LeadCoordinates response, makes decisions
AnalystInvestigates, analyzes evidence
IT/OpsImplements containment and recovery actions
CommunicationsInternal and external messaging
LegalRegulatory requirements, law enforcement liaison
ManagementBusiness decisions, resource allocation

Phase 2: Detection and Analysis

Something triggered the investigation. Now you need to understand what you’re dealing with.

Detection sources:

  • SIEM alerts and detection rules
  • User reports (“I clicked a link and something weird happened”)
  • Threat intelligence feeds
  • External notification (FBI calls you, a researcher discloses a finding)
  • Anomaly detection (unusual data transfers, new admin accounts)

Initial analysis questions:

1. What happened?        → Malware? Unauthorized access? Data theft?
2. When did it start?    → How long has the attacker been in?
3. What's affected?      → Which systems, accounts, data?
4. Is it still active?   → Is the attacker still in the environment?
5. What's the scope?     → One machine or the whole domain?

Severity classification:

SeverityCriteriaExample
CriticalActive data exfiltration, ransomware spreading, critical systems compromisedDomain controller compromised, ransomware encrypting file shares
HighConfirmed compromise, no active exfiltration yetAttacker has a shell on a web server, hasn’t moved laterally
MediumSuspicious activity, not confirmed as maliciousMultiple failed logins to admin accounts from unusual IP
LowMinor policy violation, no compromiseEmployee plugged in an unauthorized USB device

Severity determines how many people get woken up at 2 AM.

Evidence collection:

Collect evidence before making changes. Once you restart a server or wipe a disk, volatile data is gone.

Order of volatility (collect first → last):
1. Memory (RAM)        → Running processes, network connections, encryption keys
2. Network state       → Active connections, routing tables
3. Running processes   → What's executing right now
4. Disk              → Files, logs, registry
5. Remote logging     → SIEM data, centralized logs
6. Backups           → Historical state comparison
# Linux - capture volatile data
# Memory dump (requires special tools like LiME)
# Active network connections
ss -tlnp > /evidence/network_connections.txt
# Running processes
ps auxf > /evidence/processes.txt
# Open files
lsof > /evidence/open_files.txt
# Login history
last > /evidence/logins.txt
# Windows - capture volatile data
# Active connections
netstat -ano > C:\evidence\connections.txt
# Running processes
Get-Process | Out-File C:\evidence\processes.txt
# Logged-in users
query user > C:\evidence\users.txt
# Recent event logs
wevtutil epl Security C:\evidence\security.evtx

Phase 3: Containment, Eradication, and Recovery

Three sub-phases, each with a distinct goal.

Containment: Stop the Bleeding

The goal is to prevent the attacker from causing more damage without alerting them (if possible) and without destroying evidence.

Short-term containment (immediate actions):

ActionWhen to Use
Isolate the system from the networkConfirmed compromise, active lateral movement
Disable compromised accountsCredential theft confirmed
Block attacker IP at firewallKnown C2 communication
DNS sinkhole malicious domainsMalware phoning home
Increase logging/monitoringNeed more visibility before acting

Long-term containment:

  • Move compromised systems to an isolated VLAN (they stay running for investigation but can’t reach anything)
  • Apply emergency patches to prevent the same exploit on other systems
  • Reset credentials for at-risk accounts
  • Deploy additional monitoring on systems the attacker touched

A critical decision: Do you let the attacker continue while you watch, or do you cut them off immediately?

  • Watch first if: you need to understand the full scope, the attacker hasn’t reached critical data yet, you want to identify all compromised systems before acting
  • Cut immediately if: ransomware is actively encrypting, data is being exfiltrated, critical systems are at risk

Eradication: Remove the Threat

Once contained, remove every trace of the attacker:

  • Remove malware from all affected systems
  • Close the initial entry point (patch the vulnerability, disable the phished account)
  • Remove persistence mechanisms - backdoor accounts, scheduled tasks, web shells, SSH keys, registry run keys (remember the kill chain installation stage)
  • Reset all potentially compromised credentials - not just the ones you know about

The biggest mistake in eradication: thinking you found everything. Attackers establish multiple persistence mechanisms. If you find and remove three backdoors but miss the fourth, they’re back in tomorrow.

Recovery: Return to Normal

Bring systems back online carefully:

  1. Rebuild compromised systems from clean images (don’t just “clean” a compromised server - rebuild it)
  2. Restore data from verified clean backups
  3. Verify integrity of restored systems before reconnecting to the network
  4. Monitor closely for signs of re-compromise (increased logging, extra alerting)
  5. Staged reconnection - bring systems back one at a time, watching for anomalies

How do you know the attacker isn’t still in? You don’t, with certainty. That’s why monitoring during recovery is critical.

Phase 4: Post-Incident Activity

The incident is over. Now make sure it doesn’t happen the same way again.

Post-incident review (blameless):

What happened?           → Timeline of events from initial compromise to recovery
How was it detected?     → Alert, user report, external notification?
What went well?          → Fast containment, good communication, effective tools
What didn't go well?     → Slow detection, missing logs, unclear escalation
What's the root cause?   → Unpatched server, phished employee, misconfiguration
What do we change?       → Specific, actionable improvements

Key principle: blameless. If an employee clicked a phishing link, the question isn’t “why did they click?” It’s “why did the phishing email get through our filters?” and “why did clicking a link give the attacker code execution?”

Blaming individuals stops people from reporting incidents. That makes everything worse.

Documentation:

  • Incident timeline
  • Systems affected
  • Evidence collected
  • Actions taken
  • Root cause analysis
  • Recommendations

This documentation serves legal requirements, insurance claims, regulatory compliance, and most importantly - organizational learning.

Indicators of Compromise (IOCs):

Document everything the attacker used so you can detect it in the future:

IOC TypeExample
IP addressesC2 server IPs
DomainsPhishing domains, malware download sites
File hashesMalware hashes (MD5, SHA256)
Email addressesPhishing sender addresses
File pathsWhere malware was dropped
Registry keysPersistence mechanisms
User agentsCustom C2 user agents

Share IOCs with your industry’s information sharing community (ISACs) so other organizations can detect the same threat.

Common Incident Types

TypeKey Actions
Phishing/credential theftReset credentials, check for mail forwarding rules, review login history, scan for persistence
RansomwareIsolate immediately, assess backup integrity, determine variant, do not pay without legal/executive decision
Web application compromiseTake application offline, review web logs, check for web shells, patch vulnerability
Insider threatInvolve HR and legal early, preserve evidence carefully, monitor discreetly
Data breachDetermine what was accessed, legal/regulatory notification requirements, preserve evidence for potential litigation

Incident Response Frameworks

FrameworkUse Case
NIST SP 800-61General IR guidance (what we covered here)
SANS Incident HandlingSix-phase model popular in training
MITRE ATT&CKMap attacker techniques to detection and response
CISA Incident Response PlaybooksGovernment-focused playbooks

Try It

You don’t need a real incident to practice. Tabletop exercises walk through scenarios:

Scenario: An employee reports their computer is running slowly. Your EDR tool shows PowerShell connecting to an external IP every 60 seconds. The employee received a suspicious email with an attachment yesterday.

Ask yourself:

  1. What phase are you in? (Detection & Analysis)
  2. What’s your first action? (Don’t turn off the computer - capture volatile evidence)
  3. What logs do you check? (EDR, email logs, DNS logs, proxy logs)
  4. How do you contain it? (Isolate from network, check if other machines are affected)
  5. Who do you notify? (IR lead, management if data exposure is possible)

Running these scenarios regularly builds the muscle memory that makes real incidents manageable.

What’s Next

You know how to detect incidents (logs and monitoring) and respond to them (IR process). The next post covers the proactive side - hardening systems so incidents are less likely to happen in the first place. Prevention is always cheaper than response.

References


Incidents are inevitable. Chaos isn’t. The difference between a breach that costs millions and one that’s contained in hours is whether you have a process and practiced it before you needed it.

Related Articles