I train ResNet34 model on CIFAR-10 dataset. I'm simulating federated learning: I have one server model and multiple local models; all training happens on learning models and sometimes I synchronize them with the server model. Since I store models as lists of trainable parameters, synchronization is the following:
def communicate_to_server(*, local_data, prev_local_data, server_data, n_machines):
for i in range(len(local_data)):
server_data[i] += (local_data[i] - prev_local_data[i]) / n_machines
local_data[i][:] = server_data[i]
prev_local_data[i][:] = server_data[i]
So, the server model collects updates and shares its model with a local model.
To perform training for the local model, I have an instance of ResNet34 network. When I want to make an SGD step, I set model parameters to local model:
def set_params(model, data):
params = [param for param in model.parameters() if param.requires_grad]
for param, d in zip(params, data):
param.data = d
And then do usual loss-backpropagation function stuff (I'm using optim.SGD
with 0
momentum).
My question: Am I allowed to do these operations? I.e. am I allowed to simply substitute param.data
with new values? Am I allowed to do assignments like local_data[i][:] = server_data[i]
(I tried to insert .detach()
in all possible places, it didn't seem to affect anything)? Can I be sure that I don't create any weird computational graph connections or don't break existing ones?
While I don't have direct evidence that something is wrong, I'm concerned that my testing accuracy is much lower compared to what the usual SGD achieves (77% vs 90%), while training accuracy is higher (100% vs 99%).
question from:
https://stackoverflow.com/questions/65895214/manually-setting-parameters-data-in-pytorch-model 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…