\section{Parallelization layer} Mitsuba is built on top of a flexible parallelization layer, which spreads out various types of computation over local and remote cores. The guiding principle is that if an operation can potentially take longer than a few seconds, it ought to use all the cores it can get. Here, we will go through a basic example, which will hopefully provide sufficient intuition to realize more complex tasks. To obtain good (i.e. close to linear) speedups, the parallelization layer depends on several key assumptions of the task to be parallelized: \begin{itemize} \item The task can easily be split up into a discrete number of \emph{work units}, which requires a negligible amount of computation. \item Each work unit is small in footprint so that it can easily be transferred over the network or shared memory. \item A work unit constitutes a significant amount of computation, which by far outweighs the cost of transmitting it to another node. \item The \emph{work result} obtained by processing a work unit is again small in footprint, so that it can easily be transferred back. \item Merging all work results to a solution of the whole problem requires a negligible amount of additional computation. \end{itemize} This essentially corresponds to a parallel version of \emph{Map} (one part of \emph{Map\&Reduce}) and is ideally suited for most rendering workloads. The example we consider here computes a \code{ROT13} ``encryption'' of a string, which most certainly violates the ``significant amount of computation'' assumption. It was chosen due to the inherent parallelism and simplicity of this task. While of course over-engineered to the extreme, the example hopefully communicates how this framework might be used in more complex scenarios. We will implement this program as a plugin for the utility launcher \code{mtsutil}, which frees us from having to write lots of code to set up the framework, prepare the scheduler, etc. We start by creating the utility skeleton file \code{src/utils/rot13.cpp}: \begin{cpp} #include <mitsuba/render/util.h> MTS_NAMESPACE_BEGIN class ROT13Encoder : public Utility { public: int run(int argc, char **argv) { cout << "Hello world!" << endl; return 0; } MTS_DECLARE_UTILITY() }; MTS_EXPORT_UTILITY(ROT13Encoder, "Perform a ROT13 encryption of a string") MTS_NAMESPACE_END \end{cpp} The file must also be added to the build system: insert the line \begin{shell} plugins += $\texttt{env}$.SharedLibrary('plugins/rot13', ['src/utils/rot13.cpp']) \end{shell} into the SConscript (near the comment ``\code{Build the plugins -- utilities}''). After compiling using \code{scons}, the \code{mtsutil} binary should automatically pick up your new utility plugin: \begin{shell} $\texttt{\$}$ mtsutil .. The following utilities are available: addimages Generate linear combinations of EXR images rot13 Perform a ROT13 encryption of a string \end{shell} It can be executed as follows: \begin{shell} $\texttt{\$}$ mtsutil rot13 2010-08-16 18:38:27 INFO main [src/mitsuba/mtsutil.cpp:276] Mitsuba version 0.1.1, Copyright (c) 2010 Wenzel Jakob 2010-08-16 18:38:27 INFO main [src/mitsuba/mtsutil.cpp:350] Loading utility "rot13" .. Hello world! \end{shell} Our approach for implementing distributed ROT13 will be to treat each character as an indpendent work unit. Since the ordering is lost when sending out work units, we must also include the position of the character in both the work units and the work results. All of the relevant interfaces are contained in \code{include/mitsuba/core/sched.h}. For reference, here are the interfaces of \code{WorkUnit} and \code{WorkResult}: \newpage \begin{cpp} /** * Abstract work unit. Represents a small amount of information * that encodes part of a larger processing task. */ class MTS_EXPORT_CORE WorkUnit : public Object { public: /// Copy the content of another work unit of the same type virtual void set(const WorkUnit *workUnit) = 0; /// Fill the work unit with content acquired from a binary data stream virtual void load(Stream *stream) = 0; /// Serialize a work unit to a binary data stream virtual void save(Stream *stream) const = 0; /// Return a string representation virtual std::string toString() const = 0; MTS_DECLARE_CLASS() protected: /// Virtual destructor virtual ~WorkUnit() { } }; /** * Abstract work result. Represents the information that encodes * the result of a processed <tt>WorkUnit</tt> instance. */ class MTS_EXPORT_CORE WorkResult : public Object { public: /// Fill the work result with content acquired from a binary data stream virtual void load(Stream *stream) = 0; /// Serialize a work result to a binary data stream virtual void save(Stream *stream) const = 0; /// Return a string representation virtual std::string toString() const = 0; MTS_DECLARE_CLASS() protected: /// Virtual destructor virtual ~WorkResult() { } }; \end{cpp} \newpage In our case, the \code{WorkUnit} implementation then looks like this: \begin{cpp} class ROT13WorkUnit : public WorkUnit { public: void set(const WorkUnit *workUnit) { const ROT13WorkUnit *wu = static_cast<const ROT13WorkUnit *>(workUnit); m_char = wu->m_char; m_pos = wu->m_pos; } void load(Stream *stream) { m_char = stream->readChar(); m_pos = stream->readInt(); } void save(Stream *stream) const { stream->writeChar(m_char); stream->writeInt(m_pos); } std::string toString() const { std::ostringstream oss; oss << "ROT13WorkUnit[" << endl << " char = '" << m_char << "'," << endl << " pos = " << m_pos << endl << "]"; return oss.str(); } inline char getChar() const { return m_char; } inline void setChar(char value) { m_char = value; } inline int getPos() const { return m_pos; } inline void setPos(int value) { m_pos = value; } MTS_DECLARE_CLASS() private: char m_char; int m_pos; }; MTS_IMPLEMENT_CLASS(ROT13WorkUnit, false, WorkUnit) \end{cpp} The \code{ROT13WorkResult} implementation is not reproduced since it is almost identical (except that it doesn't need the \code{set} method). The similarity is not true in general: for most algorithms, the work unit and result will look completely different. Next, we need a class, which does the actual work of turning a work unit into a work result (a subclass of \code{WorkProcessor}). Again, we need to implement a range of support methods to enable the various ways in which work processor instances will be submitted to remote worker nodes and replicated amongst local threads. \begin{cpp} class ROT13WorkProcessor : public WorkProcessor { public: /// Construct a new work processor ROT13WorkProcessor() : WorkProcessor() { } /// Unserialize from a binary data stream (nothing to do in our case) ROT13WorkProcessor(Stream *stream, InstanceManager *manager) : WorkProcessor(stream, manager) { } /// Serialize to a binary data stream (nothing to do in our case) void serialize(Stream *stream, InstanceManager *manager) const { } ref<WorkUnit> createWorkUnit() const { return new ROT13WorkUnit(); } ref<WorkResult> createWorkResult() const { return new ROT13WorkResult(); } ref<WorkProcessor> clone() const { return new ROT13WorkProcessor(); // No state to clone in our case } /// No internal state, thus no preparation is necessary void prepare() { } /// Do the actual computation void process(const WorkUnit *workUnit, WorkResult *workResult, const bool &stop) { const ROT13WorkUnit *wu = static_cast<const ROT13WorkUnit *>(workUnit); ROT13WorkResult *wr = static_cast<ROT13WorkResult *>(workResult); wr->setPos(wu->getPos()); wr->setChar((std::toupper(wu->getChar()) - 'A' + 13) % 26 + 'A'); } MTS_DECLARE_CLASS() }; MTS_IMPLEMENT_CLASS_S(ROT13WorkProcessor, false, WorkProcessor) \end{cpp} Since our work processor has no state, most of the implementations are rather trivial. Note the \code{stop} field in the \code{process} method. This field is used to abort running jobs at the users requests, hence it is a good idea to periodically check its value during lengthy computations. Finally, we need a so-called \emph{parallel process} instance, which is responsible for creating work units and stitching work results back into a solution of the whole problem. The \code{ROT13} implementation might look as follows: \begin{cpp} class ROT13Process : public ParallelProcess { public: ROT13Process(const std::string &input) : m_input(input), m_pos(0) { m_output.resize(m_input.length()); } ref<WorkProcessor> createWorkProcessor() const { return new ROT13WorkProcessor(); } std::vector<std::string> getRequiredPlugins() { std::vector<std::string> result; result.push_back("rot13"); return result; } EStatus generateWork(WorkUnit *unit, int worker /* unused */) { if (m_pos >= (int) m_input.length()) return EFailure; ROT13WorkUnit *wu = static_cast<ROT13WorkUnit *>(unit); wu->setPos(m_pos); wu->setChar(m_input[m_pos++]); return ESuccess; } void processResult(const WorkResult *result, bool cancelled) { if (cancelled) // indicates a work unit, which was return; // cancelled partly through its execution const ROT13WorkResult *wr = static_cast<const ROT13WorkResult *>(result); m_output[wr->getPos()] = wr->getChar(); } inline const std::string &getOutput() { return m_output; } MTS_DECLARE_CLASS() public: std::string m_input; std::string m_output; int m_pos; }; MTS_IMPLEMENT_CLASS(ROT13Process, false, ParallelProcess) \end{cpp} The \code{generateWork} method produces work units until we have moved past the end of the string, after which it returns the status code \code{EFailure}. Note the method \code{getRequiredPlugins()}: this is necessary to use the utility across machines. When communicating with another node, it ensures that the remote side loads the \code{ROT13*} classes at the right moment. To actually use the \code{ROT13} encoder, we must first launch the newly created parallel process from the main utility function (the `Hello World' code we wrote earlier). We can adapt it as follows: \begin{cpp} int run(int argc, char **argv) { if (argc < 2) { cout << "Syntax: mtsutil rot13 <text>" << endl; return -1; } ref<ROT13Process> proc = new ROT13Process(argv[1]); ref<Scheduler> sched = Scheduler::getInstance(); /* Submit the encryption job to the scheduler */ sched->schedule(proc); /* Wait for its completion */ sched->wait(proc); cout << "Result: " << proc->getOutput() << endl; return 0; } \end{cpp} After compiling everything using \code{scons}, an exemplary usage would be to encode a string (e.g. \code{SECUREBYDESIGN}), while forwarding all computation to a network machine. (\code{-p0} disables all local worker threads). Adding a verbose flag (\code{-v}) shows some additional scheduling information: \begin{shell} $\texttt{\$}$ mtsutil -vc feynman -p0 rot13 SECUREBYDESIGN 2010-08-17 01:35:46 INFO main [src/mitsuba/mtsutil.cpp:201] Mitsuba version 0.1.1, Copyright (c) 2010 Wenzel Jakob 2010-08-17 01:35:46 INFO main [SocketStream] Connecting to "feynman:7554" 2010-08-17 01:35:46 DEBUG main [Thread] Spawning thread "net0_r" 2010-08-17 01:35:46 DEBUG main [RemoteWorker] Connection to "feynman" established (2 cores). 2010-08-17 01:35:46 DEBUG main [Scheduler] Starting .. 2010-08-17 01:35:46 DEBUG main [Thread] Spawning thread "net0" 2010-08-17 01:35:46 INFO main [src/mitsuba/mtsutil.cpp:275] Loading utility "rot13" .. 2010-08-17 01:35:46 DEBUG main [Scheduler] Scheduling process 0: ROT13Process[unknown].. 2010-08-17 01:35:46 DEBUG main [Scheduler] Waiting $\texttt{for}$ process 0 2010-08-17 01:35:46 DEBUG net0 [Scheduler] Process 0 has finished generating work 2010-08-17 01:35:46 DEBUG net0_r[Scheduler] Process 0 is $\texttt{complete}$. Result: FRPHEROLQRFVTA 2010-08-17 01:35:46 DEBUG main [Scheduler] Pausing .. 2010-08-17 01:35:46 DEBUG net0 [Thread] Thread "net0" has finished 2010-08-17 01:35:46 DEBUG main [Scheduler] Stopping .. 2010-08-17 01:35:46 DEBUG main [RemoteWorker] Shutting down 2010-08-17 01:35:46 DEBUG net0_r[Thread] Thread "net0_r" has finished \end{shell} \subsection{Global Resources} TBD.