Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
119 views
in Technique[技术] by (71.8m points)

python - Hook into pickle to obtain pre-binary representation?

If I understand correctly, pickle converts the state of an object into something like a dict including the class of the object, and then writes that data to a binary file. Obtaining the state of an object is done via a complex interface, in the simplest case accessing the object's __dict__ but possibly involving user-defined methods like __getstate__, __setstate__, etc. . When a pickle file is loaded, the binary data is read into a dict-like representation, and these converted back into objects.

My question: Is it possible to hook into pickle at the point after obtaining the object state but before writing the binary data, and the same in the other direction (after reading binary data but before restoring objects)?

Background: I'm thinking of implementing something similar to jsonpickle and hickle, i.e. having the same interface of dump and load, but using another file format to store data (here: JSON & HDF5). If possible, I would like to avoid reproducing the lengths pickle goes to in accessing and restoring object states but reuse that part, and only create a new "backend".

A solution using dill would be just as good.

question from:https://stackoverflow.com/questions/65940313/hook-into-pickle-to-obtain-pre-binary-representation

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If you dumps an object, and look at the module pickle.py: https://github.com/python/cpython/blob/3.9/Lib/pickle.py#L107, you'll see that pickle converts an object to a series of opcodes (and recursively stored data). This is what is basically what is written to disk when you use dump. I authored the part of hickle that stores arbitrary objects -- by first using dill.dumps to generate a string of optcodes and data, then using HDF to store the string. If you turn on tracing in dill, you can see how the opcodes and data are stored in the string.

>>> x = dict(a=[1,2,3], b=set((4,5,6)))
>>> import dill
>>> dill.detect.trace(True)
>>> dill.dumps(x)
D2: <dict object at 0x11023c870>
T1: <class 'set'>
F2: <function _load_type at 0x11070f2f0>
# F2
# T1
# D2
b'x80x03}qx00(Xx01x00x00x00aqx01]qx02(Kx01Kx02Kx03eXx01x00x00x00bqx03cdill._dill
_load_type
qx04Xx03x00x00x00setqx05x85qx06Rqx07]qx08(Kx04Kx05Kx06ex85qRq
u.'

It creates a dict, which stores a list of ints (no special function needed), then stores a special function (load_type) to help reconstitute the set, and finally stores the set of ints. Optcodes at the beginning signify the version and protocol.

So, yes, you can access the state (in serialized form) before it is dumped to file.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...