Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
342 views
in Technique[技术] by (71.8m points)

google cloud platform - High wall time with synchronous Calls to Datastore in Apache Beam transform

I have been investigating a Beam job (that uses GCP Dataflow) that has been running very slow. Using the GCP profiler, I was able to identify that most of the wall time was spent waiting on calls to Datastore (within a beam transform) using the Standard Java Datastore library (https://cloud.google.com/datastore/docs/reference/libraries). I wrote a test job that just makes simple datastore reads within a beam transform and saw the same results. Here is a screenshot of the profiler

enter image description here

You can see in the screenshot that on average it is taking around 19s to run these simple Datastore queries. I understand it is best practice to use the beam datastore library in order to parallelize/batch these calls to Datastore but for our use case we need these sequential synchronous reads within the beam transform. Anyways I was wondering if anyone had any insight into why these calls might be taking as long as they do? You can see in the screenshot that most of the wall time is spent waiting on the SocketInputStream.socketRead0() method which indicates possible network or remote server latency.

Here is a screenshot of the test job DoFn code i wrote that i ran the above profile on:

enter image description here

question from:https://stackoverflow.com/questions/65924579/high-wall-time-with-synchronous-calls-to-datastore-in-apache-beam-transform

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...