Configuring a bunch of Servers can be a pain, recently I had to setup tc priority queues for limiting the bandwidth and response time on 10 Servers (going in an out), that would mean 100 command line calls, and since they are all different of course doing it via parallel-ssh would not help. Being a Python guy it immediately sprang to mind that there is a good CSV lib, and since 10×10 Servers is a Matrix it can easily be represented in a CSV. I wrote about handling CSV files earlier, so check out Handling CSV Files. The tricky part is the handling of SSH commands in the script, but lucky me, theres a lib for that, it’s called Paramiko. With a little abstraction done by Commandline.org it is easily useable to setup connections pass commands and get results. The Tutorial on Commandline.org is actually quite nice to get started on SSH in Python.
The setup so far:
- CSV with 10×10 Rows/Columns including the hostnames and ports to connect to
- A script parsing the CSV and passing the needed commands via ssh.py to the host
The problem:
- It takes forever!
Since the whole script runs sequentially, and connection via SSH is not the fastest thing in the world, the script is painfully slow to run. The solution: Threading! Since all connections are obviously independent they can easily be run in parallel without worrying about race conditions. This is where the nice Python API comes in. A Thread is just a class derived from Thread.
class Controller(Thread):
def init(self, var1, var2):
Thread.init(self)
self.var1 = var1
self.var2 = var2
All there is to do now, is implementing the run method and doing whatever is needed there
def run(self):
do something cool
Now to integrate it in the Main Script
controllers = []
for r in rows:
for c in columns:
current = Controller(r, c)
controllers.append(current)
for i in controllers:
i.join()
All that does is create a separate thread per item, and append to a array of threads. By running join it makes sure that every thread is done before the scripts exists. And thats it, you now got a threaded script running ssh commands on a bunch of hosts.
Then why do so many people complain about GIL being a parallelization killer?
LikeLike
Well in that case the GIL does not matter as the time being spent is mostly outside python and the GIL only limits concurrent threads in python. Almost all the time here is spent on network.
LikeLike