Modern Incident Response: Tackling Malicious ML Artifacts

Security Joes
May 12
12 min read

We are now living in an era dominated by automated systems, with artificial intelligence (AI) models emerging as one of the most prominent technologies. Organizations increasingly rely on AI to expand operations, enhance products, and deliver higher-quality services to their clients. While the impact of these systems is undeniable—and the demand for more robust models across various domains continues to grow—their associated risks are also escalating, exposing us to unprecedented and previously unimaginable threats.

Due to the specialized knowledge and resources required to develop and train machine learning (ML) models, it is common practice to share models with the broader community to foster collective progress. Popular platforms like Hugging Face have made it easier than ever to distribute and access pre-trained models, accelerating innovation but also widening the attack surface, as mentioned by ReversingLabs in this blog. While this openness is essential for advancing the state of the art, it also introduces new security concerns.

ML models are commonly shared using files such as .pkl, .pt, .onnx, and .pb, which serve as standard serialization formats. These formats are essential for distributing and deploying models across different environments. However, their flexibility can also be exploited—making them potential covert vectors for delivering malware. When a malicious model becomes the origin of a security breach, incident responders face distinct challenges in detection, forensic analysis, and attribution.

Below we explore current detection/forensic techniques, real-world examples and advanced response procedures to build timelines and attribution for model-based attacks. We also recommend how an IR firm like Security Joes can stand out by developing ML-native IR capabilities and sharing insights.

Want to see us in action? Book a demo

Anatomy of a Model-Based Breach

Model-based breaches represent a rapidly emerging and increasingly sophisticated category of cyber threats. These attacks exploit the implicit trust, complexity, and interconnected nature of machine learning ecosystems. Understanding how such breaches unfold is vital for developing precise detection techniques and response strategies.

Example of the potential attack flow of an ML-based breach.

Typically, these incidents follow a multi-stage lifecycle, each phase posing unique challenges for defenders:

1. Initial Access via Malicious Model Distribution: Attackers embed malicious payloads or exploit code into serialized ML model files (e.g., .pkl, .pt). These tainted models are then distributed through public repositories, included in seemingly legitimate software packages, or shared via forums. While platforms like Hugging Face play a critical role in democratizing AI, they may also serve as unwitting distribution vectors if proper validation and vetting processes are lacking.

2. Execution and Payload Activation: Upon deserialization or inference—often within an ML or data science workflow—the malicious code embedded in the model executes. This can result in arbitrary code execution, data theft, or broader system compromise. These payloads often bypass traditional security controls, such as antivirus software, which typically do not inspect the internals of model files.

3. Lateral Movement and Persistence: Once inside the environment, attackers may move laterally to access other systems, establish persistence mechanisms, or tamper with ML pipelines. In many cases, these intrusions occur in development environments, granting adversaries privileged access to sensitive datasets, proprietary models, or deployment infrastructure.

4. Data Exfiltration or Model Manipulation: The attacker’s objectives may include stealing training data, compromising intellectual property, or introducing subtle backdoors into models. These manipulations can degrade model performance or alter outcomes in ways that are difficult to detect, posing serious long-term risks to system integrity and reliability.

Detecting Model-Based Breaches

Model-based breaches are a distinct and emerging class of threats that remain relatively underexplored within the industry. As the adoption of machine learning models continues to accelerate, understanding and preparing for these risks is becoming increasingly critical. While initiatives like MITRE ATLAS have been established to track these evolving threats and assist defenders in understanding the techniques used by adversaries, continued research, practical experience, and greater maturity are essential to effectively address and mitigate these challenges.

That being said, here we share practical methods for detecting and responding to such breaches, with a focus on tools and strategies for monitoring model behavior, logging interactions, and applying anomaly detection techniques to uncover potentially malicious activity.

The Sour Side of Pickles

Pickle is a native serialization format in Python that enables the conversion of complex objects into byte streams for storage or transmission, and their reconstruction back into Python objects. It supports a range of protocol versions, from the original ASCII-based Protocol 0 to the more efficient binary Protocol 5 introduced in Python 3.8. Newer versions maintain backward compatibility for reading older formats, but older Python environments cannot read Pickle files created with newer protocols.

Pickle uses a stack-based virtual machine model to encode and decode data. The format is composed of a sequence of opcodes, each representing a specific instruction (e.g., pushing a value to the stack, creating a list, invoking a function). These opcodes are interpreted by the Pickle runtime to reconstruct the original object graph.

Key components include:

Opcodes: Instruction tokens that serialize or reconstruct data (e.g., MARK, PUT, GET, BUILD, STOP).
Memo Table: A reference map used to track object identities and manage shared references or cyclic structures.
Binary Data: Protocols ≥1 use binary representations for compactness and performance (e.g., BININT, BINUNICODE, BINPUT).
Custom Object Support: User-defined classes can control their serialization through methods like reduce and getstate.

A Pickle stream typically ends with a STOP opcode, indicating that no further opcodes follow. The stream can be embedded in files, databases, or transmitted over networks. File extensions are not enforced, but .pkl or .pickle are commonly used.

From a security and incident response perspective, Pickle is inherently unsafe for use with untrusted data. This is because:

Executable Payloads: Pickle can serialize and deserialize Python functions, classes, and code that is executed during unpickling (e.g., via the reduce method).
Trojan Delivery Mechanism: Malicious actors can craft Pickle payloads that execute arbitrary code upon deserialization, making Pickle a viable container for backdoors or remote code execution (RCE).
Lack of Isolation: Deserialization occurs in the Python interpreter without sandboxing or validation.

Due to their ability to encapsulate executable code, Pickle files should be considered high-risk artifacts during forensic investigations. Security analysts should treat any unsolicited or anomalous Pickle data as potentially malicious—especially within environments that frequently serialize and exchange objects, such as machine learning development workflows or model deployment pipelines.

Static Scanning & Disassembly

One of the first steps is to analyze the suspect model file itself. Traditional endpoint detection may not flag a model file, so security teams use specialized scanners and disassemblers. Tools like PickleScan (used by Hugging Face), HiddenLayer’s Model Scanner, or Protect AI’s Guardian attempt to detect dangerous code patterns in model files. These work by scanning for known “dangerous” functions or opcodes in the serialization stream. For example, HiddenLayer released YARA rules to catch pickle files that use Python built-ins like exec, eval, os.system, or spawning subprocesses.

Researchers have built pickle "disassemblers to inspect opcodes and reveal hidden instructions. In one case, analysts disassembled a PyTorch .pt (pickle) model and found it contained a built-in .exec opcode followed by Python script code – specifically using ctypes calls to load shellcode into memory. This static dissection immediately revealed the malicious intent (injecting a Cobalt Strike beacon via Windows APIs). Similar static analysis can be done for other formats (e.g. examining ONNX or TensorFlow model files for anomalies), though pickle-based models are currently the most at risk.

Basic analysis pipeline to examine a malicious ML model.

Analysis: Cobalt Strike Stager Hidden Inside a Pickle

Python’s pickle module is powerful because it can serialize and deserialize almost any Python object, including functions and classes. When you load a pickle file, Python doesn't just read static data; it reconstructs objects by executing Python bytecode-like instructions. This means the loading process can call functions, import modules, and even run code embedded in the pickle file — all automatically, without your explicit permission. Because of this, a malicious actor can craft a pickle file that runs harmful code when loaded.

Let's analyze the following malicious pickle file

SHA256: 391f5d0cefba81be3e59e7b029649dfb32ea50f72c4d51663117fdd4d5d1e176

Pickle files are stored in a binary format, not plain text. That means they contain raw bytes, control characters, and encoded structures representing Python objects. For example, you'll see garbled or unreadable characters when opened in a notepad.

To start analyzing a pickle, you must use Python’s pickletools module, which is used to disassemble pickle files, that is, to convert the binary format into a human-readable set of instructions. After downloading this SHA256 file, using this Python code, we can disassemble the file and see what's inside.

import pickletools

with open("malicious_pickle.pkl", "rb") as f:
	data = f.read()
	pickletools.dis(data)

This is the output, a disassembled version of this pickle:

The code starts with setting the option pickle protocol version to 3. Then, GLOBAL 'builtins exec' pushes the exec function (from the built-in namespace) onto the stack. Using exec() allows execution of arbitrary Python code. In lines 3-4, Stores the exec reference in memory slot 0 for reuse. Loads a huge string of Python code (obfuscated, but decodable). This is the payload to be executed by exec().

This string contains:

An import of ctypes, urllib.request, base64, codecs.
A base64-encoded shellcode blob assigned to AbCCDeBsaaSSfKK2.
The payload then:
1. Decodes the shellcode
2. Allocates memory
3. Copies the shellcode into memory
4. Creates a new thread to execute it via CreateThread()
5. Waits for execution to finish

In short, it’s a memory-resident malware loader written in Python.

This Base64 is obfuscated, it has a double encoding that hides the true intent of the data

It avoids detection by simple scanners looking for obvious patterns (like shellcode or scripts).

After decoding this Base64 twice, we are left with a pure Windows x64 shellcode binary machine instructions meant to be executed directly in memory.

This 64-bit Windows shellcode demonstrates classic fileless malware behavior.

The shellcode uses a custom hashing algorithm to dynamically resolve API functions (like LoadLibraryA, GetProcAddress, VirtualAlloc, etc.) by parsing the Export Address Table of DLLs in memory, avoiding reliance on hardcoded names to evade detection.

Once the required APIs are resolved, it proceeds to:

Load wininet.dll
Call WinInet functions (InternetOpenA, InternetConnectA, etc.) to establish a remote connection
Download and execute a secondary payload directly in memory using VirtualAlloc and likely CreateThread

If we look at the shellcode's last two lines using HxD, we can see the IPv4 in Hexadecimal:

Hexadecimal representation of the IPv4

31 32 31 2E 31 39 39 2E 36 38 2E 32 31 30

Convert from Hex to Text

And this is the shellcode hardcoded C2 IP address:

121.199.68.210

Quick lookup of this IP in VirusTotal leads us to this:

Overall, this is a textbook example of in-memory execution and dynamic API resolution, often seen in malware droppers and advanced initial access loaders.

Memory Forensics

When a malicious ML model is executed, memory forensics becomes essential, as such attacks often operate entirely in-memory without leaving disk artifacts. Incident Response teams can analyze memory snapshots or live environments to detect signs of compromise, such as injected bytecode, shellcode, suspicious hooks, or C2-related strings. Tools like Volatility help identify indicators of compromise (IoCs), including Python pickle opcodes or abnormal API calls (e.g., VirtualAlloc, CreateThread). This approach is critical for detecting fileless malware embedded in ML models that execute solely in RAM.

Execution Monitoring & Sandboxing

Sandboxing is another key technique for analyzing potentially malicious ML models. By loading the model in an isolated, network-restricted environment, responders can safely observe its behavior. This approach helps detect if the model spawns subprocesses, initiates network connections, or modifies the file system—revealing actions like launching cryptominers or reverse shells. Treating the model like suspect code, sandboxing enables dynamic analysis, aids in collecting indicators of compromise (IoCs), and is especially effective when paired with model conversion to safer formats like safetensors.

System and Audit Log Correlation

Traditional forensic analysis remains valuable in model-based incidents. Incident responders should correlate system and application logs around the time the model was loaded—such as Jupyter notebook events, OS logs, or EDR data. For example, if a model is loaded at 14:05 and suspicious activity like outbound connections or process spawning begins at 14:06, this temporal link is a strong indicator. Malicious models often invoke OS resources, leaving behind traces. By pivoting on process IDs or user accounts, responders can trace post-compromise behavior back to the model, helping to confirm it as the attack vector.

Pickle File Forensics

When Python pickle files are suspected, responders turn to pickle-specific forensic techniques. Tools like Fickling enable safe inspection of pickle contents without execution, helping analysts identify malicious payloads. Key opcodes like GLOBAL and REDUCE are examined for calls to dangerous functions (e.g., os.system, subprocess.Popen). Unusual file characteristics—like small size and high entropy—may signal obfuscated code. Teams also extract embedded strings or encoded data (e.g., base64) for further analysis. Even more static formats like ONNX can be scrutinized for suspicious operator patterns or metadata, and runtime instrumentation may reveal exploits targeting vulnerabilities in model-loading libraries.

Memory and Host Artifacts

Once a malicious model executes, it may leave behind artifacts commonly associated with malware. Incident responders search for Indicators of Compromise (IoCs) such as new files, registry modifications, reverse shell activity, or unusual process behaviors. For example, ransomware triggered by a model may result in encrypted files and ransom notes—clues that help trace the incident back to the model execution. Even in fileless cases, EDR tools might log suspicious API calls (e.g., code injection or credential access) tied to the Python process.

To detect such threats, responders must combine multiple layers of analysis: static inspection of the model’s contents, dynamic behavioral monitoring (via memory forensics and sandboxing), and host-level telemetry review. This holistic approach is essential, as model-based attacks are designed to evade static detection by executing payloads purely in-memory upon deserialization. Real-time monitoring and forensic tracing are critical to catching these stealthy threats in action.

Case Studies: Malicious Model Incidents

Although incidents involving poisoned ML models are still emerging, there are already several real-world examples and public reports demonstrating this threat:

The Arxiv Paper

A recent study ("Models Are Codes") uncovered 91 malicious AI models on Hugging Face using a scanning framework called MalHug. These included reverse shells and reconnaissance payloads — reinforcing the urgency of securing model ingestion pipelines.

ReversingLabs “NullifAI” Findings

Researchers at ReversingLabs uncovered two malicious PyTorch models on the Hugging Face model hub containing embedded Python reverse shell code within their pickle data. These models, undetected by Hugging Face’s automated scanners, used a novel evasion technique dubbed “NullifAI” to silently open backdoors upon loading. The malware connected to a hardcoded IP address and was wrapped in an atypical 7z-compressed format to bypass detection rules expecting standard .pt files.

ReversingLabs identified the threat through a combination of behavioral correlation (network connections initiated by the model) and static analysis of pickle opcodes. This case serves as a clear warning that threat actors are planting backdoored ML models in public repositories and emphasizes the need for incident response teams to incorporate model forensics into their investigations.

HiddenLayer Malicious Pickle “in the Wild”

Security firm HiddenLayer identified at least three malicious ML model files uploaded to public repositories that were used to deliver post-exploitation frameworks like Cobalt Strike, Metasploit, and Mythic. In one case, a January 2022 pickle file embedded a Python script that injected a 64-bit Cobalt Strike beacon into memory using ctypes—effectively functioning as a malware dropper.

These models, discovered on platforms like GitHub or community model hubs, demonstrate active exploitation of pickle serialization in the wild. While specific victim incidents weren’t disclosed, the models posed real threats if loaded by unsuspecting users. HiddenLayer’s analysis, including IoCs such as file hashes and C2 IPs, offers a blueprint for incident response. Notably, one IP address was linked to known threat actors like TrickBot and APT29, reinforcing the importance of forensic inspection and threat attribution when dealing with suspect ML artifacts.

Trail of Bits “Sleepy Pickle”

Trail of Bits introduced Sleepy Pickle, a proof-of-concept technique for embedding stealthy malware in machine learning models. Unlike overt attacks, this method relies on trojanized models that activate malicious behavior under specific conditions—such as producing harmful outputs or leaking input data—without immediately revealing themselves. The payload can remain dormant, making detection by traditional incident response methods difficult.

The researchers also unveiled Sticky Pickle, a more advanced variant capable of self-propagation: any new model derived from the infected one (e.g., via fine-tuning) inherits the malicious payload. This poses a significant challenge for DFIR teams, as one poisoned model could compromise an entire ML pipeline, complicating containment and scope analysis.

While these are currently proofs-of-concept, they highlight a new frontier in model-based threats, where the malware is embedded not in code but in the learned parameters of the model itself.

Community Incident – Crypto Miner via ML Pipeline

In late 2024, users of the ComfyUI Stable Diffusion tool uncovered a cryptojacking script embedded in an extension pack, specifically within the ComfyUI-Impact-Pack that used the Ultralytics library. The attack was traced to a compromised Python script (downloads.py) that silently downloaded and executed a mining binary, causing high CPU/GPU usage and network traffic to a crypto mining pool. This was a supply chain attack targeting the ML tooling—not a model file itself, but a script within the environment.

The incident, publicly reported by a vigilant user, highlights that threats in ML workflows extend beyond model files. Any component—scripts, libraries, or environment files—can be subverted. For incident response teams, this underscores the importance of examining the entire ML ecosystem during investigations, not just the model artifacts.

While public postmortems of ML-based breaches remain rare, the ComfyUI case, alongside prior examples, illustrates key detection methods: treating ML assets as suspect code, disassembling model files or scripts, correlating suspicious system activity with model usage, and leveraging threat intel. These early cases offer valuable lessons for defenders as the threat landscape around AI systems continues to evolve.

Want to see us in action? Book a demo

Security Joes is exploring automated incident classification pipelines that include AI model artifacts as first-class citizens in digital forensics—because modern breaches don’t always start with EXEs.

Sources:

Arxiv - Models are Codes (Measuring and Exploiting Code-injection Vulnerabilities in Machine Learning Models)
HiddenLayer – Weaponizing ML Models with Ransomware (on detecting malicious model code)
HiddenLayer – Pickle Files: New ML Model Attack Vector (real-world malicious model examples)
DarkReading – 'Sleepy Pickle' Exploit (on stealth and lack of disk artifacts)
ReversingLabs – Malicious ML Models on Hugging Face (bypassing PickleScan, found reverse-shell in model)
Trail of Bits – Exploiting ML Models (Part 2) (attackers obfuscating payloads to evade scanners)
Hackread – PickleScan Vulnerabilities (attackers can bypass model file scanners, need multi-layer defenses)
GitHub Incident Report – Crypto Mining via ComfyUI/Ultralytics (supply chain attack in ML tool, example of detection by suspicious subprocess)
Splunk Blog – Paws in the Pickle Jar (statistics on prevalence of pickle models and inherent risk)
HiddenLayer – Pickle Files: Attack in Wild (pickle disassembly and IoCs like C2 IPs for attribution)
Trail of Bits – Exploiting ML Models (Part 1) (explanation of pickle exploit technique and implications for IR)