I have an SSH.py
with the goal of connecting to many servers over SSH to run a Python script (worker.py
). I am using Paramiko, but am very new to it and learning as I go. On each server I ssh over with, I need to keep the Python script running -- this is for training a model parallely and so the script needs to run on all machines as to update model parameters/train jointly. The Python script on the servers need to be running so either all the SSH connections cannot close or I have to figure out a way for the Python script on the servers to keep running even if I close the connection.
From extensive googling, it looks like you can achieve this with nohup
or:
client = paramiko.SSHClient()
client.connect(ip_address, username, password)
transport = client.get_transport()
channel = transport.open_session()
channel.exec_command("python worker.py > /logs/'command output' 2>&1")
However, what is unclear to me is how do we close/exit all SSH connections? I am running the SSH.py
file on cmd.exe
, would closing the cmd.exe
be enough for all processes remotely to close?
In addition, is my use of client.close()
correct for my purposes?
Please see below what I have for my code.
# SSH.py
import paramiko
import argparse
import os
path = "path"
python_script = "worker.py"
# definitions for ssh connection and cluster
ip_list = ['XXX.XXX.XXX.XXX', XXX.XXX.XXX.XXX', XXX.XXX.XXX.XXX']
port_list = [':XXXX', ':XXXX', ':XXXX']
user_list = ['user', 'user', 'user']
password_list = ['pass', 'pass', 'pass']
node_list = list(map(lambda x: f'-node{x + 1} ', list(range(len(ip_list)))))
cluster = ' '.join([node + ip + port for node, ip, port in zip(node_list, ip_list, port_list)])
# run script on command line of local machine
os.system(f"cd {path} && python {python_script} {cluster} -type worker -index 0 -batch 64 > {path}/logs/'command output'/{ip_list[0]}.log 2>&1")
# loop for IP and password
for i, (ip, user, password) in enumerate(zip(ip_list[1:], user_list[1:], password_list[1:]), 1):
try:
print("Open session in: " + ip + "...")
client = paramiko.SSHClient()
client.connect(ip, user, password)
transport = client.get_transport()
channel = transport.open_session()
except paramiko.SSHException:
print("Connection Failed")
quit()
try:
channel.exec_command(f"cd {path} && python {python_script} {cluster} -type worker -index {i} -batch 64 > {path}/logs/'command output'/{ip_list[i]}.log 2>&1", timeout=30)
client.close() # here I am closing connection but above command should be running, my question is can I safely close cmd.exe on which I am running SSH.py?
except paramiko.SSHException:
print("Cannot run file. Continue with other IPs in list...")
client.close()
continue
The code is based on Running process of remote SSH server in the background using Python Paramiko
Edit: It seems like the channel.exec_command() is not executing the command
f"cd {path} && python {python_script} {cluster} -type worker -index {i} -batch 64 > {path}/logs/'command output'/{ip_list[i]}.log 2>&1"
So I wonder if it is because of client.close()
? What would happen if I comment out all the lines with client.close()
? Would this help? Is this dangerous? When I quit my local Python script, would this close all my SSH connections and hence, no need for client.close()
?
Also all my machines have Windows OS.
See Question&Answers more detail:
os