Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
366 views
in Technique[技术] by (71.8m points)

How do I convert to ONNX a Spark ML model with multiple input columns and use it for scoring dynamic batch size?

I converted a logistic regression model with dynamic batch size from Spark ML to ONNX using this:

initial_types = [('Features', FloatTensorType([None, 5]))]
onnx_model = convert_sparkml(s_clf, 'Occupancy detection Pyspark Logistic Regression model', initial_types, spark_session = sess)

Then I successfully scored df1, a dynamic batch of samples whose shape is (12417, 5) using the code below:

import onnxruntime as rt
sess = rt.InferenceSession(bmodel)
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name
df1 = df[features_cols]
predictions = sess.run([label_name], {input_name: df1.values.astype(np.float32)})[0]

Now I try to build a pipeline and convert to ONNX. I tried to convert the first stage of it, which is just a VectorAssembler using:

initial_types = [
('Temperature', FloatTensorType([None, 1])),
('Humidity', FloatTensorType([None, 1])),
('Light', FloatTensorType([None, 1])),
('CO2', FloatTensorType([None, 1])),
('HumidityRatio', FloatTensorType([None, 1])),
]
onnx_model = convert_sparkml(assembler, 'Occupancy detection Pyspark Assembler of features', initial_types, spark_session = sess).

Trying to consume it using this code:

predictions = sess.run([label_name],
{
"Temperature": [df1.Temperature.values.astype(np.float32)],
"Humidity": [df1.Humidity.values.astype(np.float32)],
"Light": [df1.Light.values.astype(np.float32)],
"CO2": [df1.CO2.values.astype(np.float32)],
"HumidityRatio": [df1.HumidityRatio.values.astype(np.float32)],
})[0]

fails, with [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: Light for the following indices index: 1 Got: 12417 Expected: 1.

Just for testing, I selected a single sample by adding df1 = df1[:1], then the code above works..

How can I export a model with multiple input columns like above, so I could score it using onnxruntime on dynamic batch size? How come Logistic Regression works flawlessly, and this simple VectorAssembler fails?

Thanks for your help, Adi

question from:https://stackoverflow.com/questions/65886029/how-do-i-convert-to-onnx-a-spark-ml-model-with-multiple-input-columns-and-use-it

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...