.. schwimmbad documentation master file, created by sphinx-quickstart on Tue Aug 1 15:54:14 2017. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. ************** The Schwimmbad ************** .. image:: http://img.shields.io/badge/license-MIT-blue.svg?style=flat :target: https://github.com/adrn/schwimmbad/blob/master/LICENSE ``schwimmbad`` provides a uniform interface to parallel processing pools and enables switching easily between local development (e.g., serial processing or with :py:mod:`multiprocessing`) and deployment on a cluster or supercomputer (via, e.g., MPI or JobLib). .. toctree:: :maxdepth: 1 install contributing Getting started =============== The utilities provided by ``schwimmbad`` require that your tasks or data can be "chunked" and your code can be "mapped" onto the chunked tasks. For example, the very common use-case of executing the same function or code on every datum in a large data set:: data = list(range(10000)) def do_the_processing(x): # Do something with each value in the data set! # For example, here we just square each value return x**2 values = [] for x in data: values.append(do_the_processing(x)) In the example above, instead of looping over each item in ``data``, we could have equally well used the Python built-in :func:`map` function:: values = list(map(do_the_processing, data)) This applies the function (passed as the first argument) to each element in the iterable (second argument). In Python 3, :func:`map` returns a generator object, so we call :class:`list` on this to get out the values. The above :func:`map` example executes the function in *serial* over each element in ``data``. That is, it just goes one by one through the ``data`` object, executes the function, returns, and carries on, all on the same processor core. If we can write our code in this style (using :func:`map`), we can easily swap in the ``Pool`` classes provided by ``schwimmbad`` to allow us to switch between various parallel processing frameworks. The easiest to understand is the :class:`~schwimmbad.SerialPool`, as this is just a class-based wrapper of the built-in (serial) :func:`map` function. The same example above can be run with :class:`~schwimmbad.SerialPool` by doing:: from schwimmbad import SerialPool pool = SerialPool() values = list(pool.map(do_the_processing, data)) The only difference here is we call the ``.map()`` *method* of the instantiated ``Pool`` class. If we wanted to switch to using multiprocessing, using, for example, the Python built-in :mod:`multiprocessing` package to utilize multiple cores on the same processor, we can just swap out the pool definition to instead use the :class:`~schwimmbad.MultiPool` class:: from schwimmbad import MultiPool with MultiPool() as pool: values = list(pool.map(do_the_processing, data)) Note that the core processing line (``values = ...``) looks identical to the above, but we've only changed the definition of the ``pool`` object. In detail, we're now using a `context manager `_ (using a Python ``with`` statement) to handle creating and *closing* the multiprocessing pool. With the exception of the :class:`~schwimmbad.SerialPool`, all other pool classes need to be explicity closed after processing. We could have also written:: pool = MultiPool() values = list(pool.map(do_the_processing, data)) pool.close() See the examples listed below for demonstrations of using the :class:`~schwimmbad.MPIPool` and :class:`~schwimmbad.JoblibPool`. Examples -------- .. toctree:: :maxdepth: 2 examples/index API documentation ================= .. toctree:: :maxdepth: 1 api