In this answer we're going to explore what exactly the pickle protocol allows an attacker to do. This means we're only going to rely on documented features of the protocol, not implementation details (with a few exceptions). In other words, we'll assume that the source code of the pickle
module is correct and bug-free and allows us to do exactly what the documentation says and nothing more.
What does the pickle protocol allow an attacker to do?
Pickle allows classes to customize how their instances are pickled. During the unpickling process, we can:
- Call (almost) any class's
__setstate__
method (as long as we manage to unpickle an instance of that class).
- Invoke arbitrary callables with arbitrary arguments, thanks to the
__reduce__
method (as long as we can gain access to the callable somehow).
- Invoke (almost) any unpickled object's
append
, extend
and __setitem__
methods, once again thanks to __reduce__
.
- Access any attribute that
Unpickler.find_class
allows us to.
- Freely create instances of the following types:
str
, bytes
, list
, tuple
, dict
, int
, float
, bool
. This is not documented, but these types are built into the protocol itself and don't go through Unpickler.find_class
.
The most useful (from an attacker's perspective) feature here is the ability to invoke callables. If they can access exec
or eval
, they can make us execute arbitrary code. If they can access os.system
or subprocess.Popen
they can run arbitrary shell commands. Of course, we can deny them access to these with Unpickler.find_class
. But how exactly should we implement our find_class
method? Which functions and classes are safe, and which are dangerous?
An attacker's toolbox
Here I'll try to explain some methods an attacker can use to do evil things. Giving an attacker access to any of these functions/classes means you're in danger.
- Arbitrary code execution during unpickling:
exec
and eval
(duh)
os.system
, os.popen
, subprocess.Popen
and all other subprocess
functions
types.FunctionType
, which allows to create a function from a code object (can be created with compile
or types.CodeType
)
typing.get_type_hints
. Yes, you read that right. How, you ask? Well, typing.get_type_hints
evaluates forward references. So all you need is an object with __annotations__
like {'x': 'os.system("rm -rf /")'}
and get_type_hints
will run the code for you.
functools.singledispatch
. I see you shaking your head in disbelief, but it's true. Single-dispatch functions have a register
method, which internally calls typing.get_type_hints
.
- ... and probably a few more
Accessing things without going through Unpickler.find_class
:
Just because our find_class
method prevents an attacker from accessing something directly doesn't mean there's no indirect way of accessing that thing.
See Ned Batchelder's Eval is really dangerous to find out how an attacker can use these to gain access to pretty much everything.
Code execution after unpickling:
An attacker doesn't necessarily have to do something dangerous during the unpickling process - they can also try to return a dangerous object and let you call a dangerous function on accident. Maybe you call typing.get_type_hints
on the unpickled object, or maybe you expect to unpickle a CuteBunny
but instead unpickle a FerociousDragon
and get your hand bitten off when you try to .pet()
it. Always make sure the unpickled object is of the type you expect, its attributes are of the types you expect, and it doesn't have any attributes you don't expect it to have.
At this point, it should be obvious that there aren't many modules/classes/functions you can trust. When you implement your find_class
method, never ever write a blacklist - always write a whitelist, and only include things you're sure can't be abused.
So what's the answer to the question?
If you really only allow access to bool
, str
, bytes
, bytearray
, int
, float
, complex
, tuple
, list
, dict
, set
and frozenset
then you're most likely safe. But let's be honest - you should probably use JSON instead.
In general, I think most classes are safe - with exceptions like subprocess.Popen
, of course. The worst thing an attacker can do is call the class - which generally shouldn't do anything more dangerous than return an instance of that class.
What you really need to be careful about is allowing access to functions (and other non-class callables), and how you handle the unpickled object.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…