Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
593 views
in Technique[技术] by (71.8m points)

amazon web services - AWS Sagemaker - ClientError: Data download failed:Could not download

I encountered and error when I deploy my training job in my notebook instance. This what it says: "UnexpectedStatusException: Error for Training job tensorflow-training-2021-01-26-09-55-05-768: Failed. Reason: ClientError: Data download failed:Could not download s3://forex-model-data/data/train2001_2020.npz: insufficient disk space"

I deploy training job to try running it to different instances in 3 epoch. I use ml.c5.4xlarge, ml.c5.18xlarge, ml.m5.24xlarge, also I have two sets of training data, train2001_2020.npz and train2016_2020.npz.

First, I run train2001_2020 to ml.c5.18xlarge and ml.c5.18xlarge and the training job completed, then I switch to train2016_2020 and run it to ml.c5.4xlarge and ml.c5.18xlarge and it goes well. Then when I tried to run it using ml.m5.24xlarge I got an error (quoted above), but my dataset is train2016_2020 not train2001_2020 then when I rerun it again with all other instances it has the same error. What happen?

I stopped the instances and refresh everything, but I encountered same issue.

question from:https://stackoverflow.com/questions/65902366/aws-sagemaker-clienterror-data-download-failedcould-not-download

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

It's not really clear to all the test are you doing, but that error usually means that there is not enough disk space on the instance you are using for the training job. You can try to increase the additional storage for the instance (you can do in the estimator parameters if you are using the sagemaker SDK in a notebook).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...