I am trying to run a python script which runs fine on 1 processor, however, when I try to run it using mpi (6 processors) it fails giving the following error:
MPI_INIT has failed because at least one MPI process is unreachable
from another. This usually means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used. Your MPI job will now abort.
You may wish to try to narrow down the problem;
- Check the output of ompi_info to see which BTL/MTL plugins are
available.
- Run your application with MPI_THREAD_SINGLE.
- Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
if using MTL-based communications) to see exactly which
communication plugins were considered and/or discarded.
I am not sure what is causing this problem. I am using a cluster on which my other mpi jobs run without a problem. For this job I created a virtual python environment, in which I installed libraries like plotly, tqdm, h5py, mpi4py and numpy.
question from:
https://stackoverflow.com/questions/65907482/python-script-failing-on-mpi 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…