/************************************************************************ MODULE: BasicThreadPool SUMMARY: Some simple thread pooling. You create a thread pool by constructing a BasicThreadPool object. For example: long nthreads = 4; BasicThreadPool pool(nthreads); creates a thread pool of 4 threads. These threads will exist until the destructor for pool is called. The simplest way to use a thread pools is as follows. Suppose you have a task that consists of N subtasks, indexed 0..N-1. Then you can write: pool.exec_range(N, [&](long first, long last) { for (long i = first; i < last; i++) { ... code to process subtask i ... } } ); The second argument to exec_range is a C++11 "lambda". The "[&]" indicates that all local variables in the calling context are captured by reference, so the lambda body can reference all visible local variables directly. C++11 provides other methods for capturing local variables. As a more concrete example, we could parallelize the following calculation: void mul(ZZ *x, const ZZ *a, const ZZ *b, long n) { for (long i = 0; i < n; i++) mul(x[i], a[i], b[i]); } as follows: void mul(ZZ *x, const ZZ *a, const ZZ *b, long n, BasicThreadPool *pool) { pool->exec_range(n, [&](long first, long last) { for (long i = first; i < last; i++) mul(x[i], a[i], b[i]); } ); } As another example, we could parallelize the following calculation: void mul(ZZ_p *x, const ZZ_p *a, const ZZ_p *b, long n) { for (long i = 0; i < n; i++) mul(x[i], a[i], b[i]); } as follows: void mul(ZZ_p *x, const ZZ_p *a, const ZZ_p *b, long n, BasicThreadPool *pool) { ZZ_pContext context; context.save(); pool->exec_range(n, [&](long first, long last) { context.restore(); for (long i = first; i < last; i++) mul(x[i], a[i], b[i]); } ); } This illustrates a simple and efficient means for ensuring that all threads are working with the same ZZ_p modulus. ==================================================================== A lower-level interface is also provided. One can write: pool.exec_index(n, [&](long index) { ... code to process index i ... } ); This will activate n threads with indices 0..n-1, and execute the given code on each index. The parameter n must be in the range 0..nthreads, otherwise an error is raised. This lower-level interface is useful in some cases, especially when memory is managed in some special way. For convenience, a method is provided to break subtasks up into smaller, almost-equal-sized groups of subtasks: Vec<long> pvec; long n = pool.SplitProblems(N, pvec); can be used for this. N is the number of subtasks, indexed 0..N-1. This method will compute n as needed by exec_index, and the range of subtasks to be processed by a given index in the range 0..n-1 is pvec[index]..pvec[index+1]-1 Thus, the logic of exec_range example can be written using the lower-level exec_index interface as follows: Vec<long> pvec; long n = pool.SplitProblems(N, pvec); pool.exec_index(n, [&](long index) { long first = pvec[index]; long last = pvec[index+1]; for (long i = first; i < last; i++) { ... code to process subtask i ... } } ); However, with this approach, memory or other resources can be assigned to each index = 0..n-1, and managed externally. ==================================================================== NOTES: When one activates a thread pool with nthreads threads, the *current* thread (the one activating the pool) will also participate in the computation. This means that the thread pool only contains nthreads-1 other threads. If, during an activation, any thread throws an exception, it will be caught and rethrown in the activating thread when all the threads complete. If more than one thread throws an exception, the first one that is caught is the one that is rethrown. Methods are also provided for adding, deleting, and moving threads in and among thread pools. If NTL_THREADS=off, the corresponding header file may be included, by the BasicThreadPool class is not defined. THREAD BOOSTING: While users are free to use a thread pool as they wish, NTL can be enabled so that it *internally* uses a thread pool to speed up certain computations. This is currently a work in progress. To use this feature, NTL should be configured with NTL_THREAD_BOOST=on. The user can then initialize the (thread local) variable NTLThreadPool, either directly or via the convenience function SetNumThreads (see below). NTL applications may use the NTLThreadPool themselves: the logic is designed so that if this pool is already activated when entering a thread-boosted routine, the thread-boosting is temporarily disabled. This means that an application can seamlessly use higer-level parallization when possible (which is usually more effective) or rely on NTL's internal parallelization at a lower leve. ***************************************************************************/ class BasicThreadPool { private: BasicThreadPool(const BasicThreadPool&); // disabled void operator=(const BasicThreadPool&); // disabled public: BasicThreadPool(long nthreads); // creates a pool with nthreads threads, including the current thread // (so nthreads-1 other threads get created) template<class Fct> void exec_index(long cnt, Fct fct); // activate by index (see example usage above) template<class Fct> void exec_range(long sz, Fct fct); // activate by range (see example usage above) long SplitProblems(long nproblems, Vec<long>& pvec) const; // splits nproblems problems among (at most) nthreads threads. // returns the actual number of threads nt to be used, and // initializes pvec to have length nt+1, so that for t = 0..nt-1, // thread t processes subproblems pvec[t]..pvec[t+1]-1 long NumThreads() const; // return number of threads (including current thread) bool active() const; // indicates an activation is in process void add(long n = 1); // add n threads to the pool void remove(long n = 1); // remove n threads from the pool void move(BasicThreadPool& other, long n = 1) // move n threads from other pool to this pool }; // THREAD BOOSTING FEATURES: extern thread_local BasicThreadPool *NTLThreadPool; // pool used internally by NTL with NTL_THREAD_BOOST=on void SetNumThreads(long n); // convenience routine to set NTLThreadPool (created using new) // If NTL_THREADS=off, then this is still defined, but does nothing