NOTE: This is a summary of the official documentation.
Installation and Required Packages
To install the required packages:
conda install ipyparallel
pip install ipyparallel
After the installation is complete, to enable Jupyter extension, execute:
ipcluster nbextension enable
IPython has four main components, which enable multiple types of parallel processing (single program, multiple data, multiple programs, multiple data, task, farming, etc.):
The IPython engine:[...] The engine listens for requests over the network, runs code, and returns results. [...] When multiple engines are started, parallel and distributed computing becomes possible.
The IPython Controller:The IPython controller processes provide an interface for working with a set of engines. [...] The controller is composed of a Hub and a collection of Schedulers. These Schedulers are typically run in separate processes but on the same machine as the Hub, but can be run anywhere from local threads or on remote machines. The controller also provides a single point of contact for users who wish to access the engines connected to the controller.
The Hub:The center of an IPython cluster is the Hub. This is the process that keeps track of engine connections, schedulers, clients, as well as all task requests and results.[...]
The Schedulers:All actions that can be performed on the engine go through a Scheduler. While the engines themselves block when user code is run, the schedulers hide that from the user to provide a fully asynchronous interface to a set of engines.
IPython client and views:There is one primary object, the
Client, for connecting to a cluster. For each execution model, there is a corresponding
View. These views allow users to interact with a set of engines through the interface. Here are the two default views: The
DirectViewclass for explicit addressing. The
LoadBalancedViewclass for destination-agnostic scheduling.
All components are part of the
For testing purposes, it is useful to run a cluster in a single machine. To launch a controller with 4 engines on localhost, run:
ipcluster start -n 4
A more real application is when one has multiple machines that will run the IPython engines, and/or the controller.
ipcontroller commands initialize each one of the instances, respectively.
One more consideration to have is the communication protocol between the engines and the controller. This can be either using SSH, MPI (
conda install mpi4py), etc (see documentation).