When I run a PySpark code created using Jupyter Notebook of the Web Interfaces of a Dataproc Cluster, I found the running code does not use all resources either from Master Node or Worker nodes.
(当我运行使用Dataproc Cluster Web界面的Jupyter Notebook创建的PySpark代码时,我发现正在运行的代码并没有使用主节点或工作节点中的所有资源。)
It uses only part of them. (它仅使用其中的一部分。)
I found a solution to this issue in answer of a question here said "Changing Scheduler properties to FIFO". (我在这里回答“将调度程序属性更改为FIFO”时找到了解决该问题的方法。)
I have two questions here: 1) How can I change the Scheduler properties?
(我在这里有两个问题:1)如何更改Scheduler属性?)
2) Is there any other method to make PySpark uses all resources other than changing Scheduler properties?
(2)除了更改Scheduler属性以外,还有其他方法可以使PySpark使用所有资源吗?)
Thanks in advance
(提前致谢)
ask by Hatem Elattar translate from so 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…