Work-Span Model

Definitions

Work \(W(n)\): Total number of operations (as if run on 1 processor)
Span \(S(n)\): Length of longest dependency chain (critical path, as if infinite processors)
Parallelism: \(\dfrac{W(n)}{S(n)}\) = max useful processors
Efficient: polylogarithmic span and polylogarithmic overhead
Optimal: polylogarithmic span and constant overhead
Overhead: \(\dfrac{W_{\text{parallel}}(n)}{W_{\text{sequential}}(n)}\)

Brent's Theorem

\[ T_p \leq \frac{W}{p} + S \]

With \(p\) processors, time is at most work divided by processors plus span. You can't do better than either \(\frac{W}{p}\) (work-limited) or \(S\) (span-limited).

How to Compute Work and Span of Recursive Algorithms

Recipe:

Identify the recursive calls and how they divide the problem
Identify the non-recursive work at each level
Set up recurrences:
- Work: sum ALL work (recursive + local). E.g., \(W(n) = 2W\!\left(\frac{n}{2}\right) + O(n)\)
- Span: take MAX of parallel branches + sequential work. E.g., \(S(n) = S\!\left(\frac{n}{2}\right) + O(n)\) [if two calls are parallel, take max = one call]
Solve recurrences (Master theorem or expansion)

Common Recurrence Solutions

Recurrence	Solution
\(T(n) = 2T\!\left(\frac{n}{2}\right) + O(1)\)	\(O(n)\)
\(T(n) = 2T\!\left(\frac{n}{2}\right) + O(n)\)	\(O(n \log n)\)
\(T(n) = 2T\!\left(\frac{n}{2}\right) + O(\log n)\)	\(O(n)\)
\(T(n) = T\!\left(\frac{n}{2}\right) + O(1)\)	\(O(\log n)\)
\(T(n) = T\!\left(\frac{n}{2}\right) + O(n)\)	\(O(n)\)
\(T(n) = 3T\!\left(\frac{n}{2}\right) + O(n)\)	\(O\!\left(n^{\log_2 3}\right) \approx O\!\left(n^{1.58}\right)\)
\(T(n) = 3T\!\left(\frac{n}{3}\right) + O(n)\)	\(O(n \log n)\)
\(T(n) = T\!\left(\frac{n}{2}\right) + O\!\left(\sqrt{n}\right)\)	\(O\!\left(\sqrt{n}\right)\) [geometric series]

Processor Count Questions

"How many processors for Tina's algorithm to be worthwhile over Charles'?"
- Sequential: \(T_{\text{seq}}(n)\). Parallel with \(p\) processors: \(\frac{W}{p} + S\).
- Need \(\frac{W}{p} + S < T_{\text{seq}}(n)\). Solve for \(p\).
- Usually: \(p > \frac{W(n)}{T_{\text{seq}}(n)}\) to START being worthwhile
"Maximum useful processors?"
- \(p_{\max} = \frac{W(n)}{S(n)}\) (parallelism). Beyond this, adding processors doesn't help.

Worked Example: 2024 Exam Q4 ("eagle" algorithm)

procedure eagle(ps)
  n = length(ps)
  if n == 0 { return 0 }
  a = eagle(slice(ps, 0, n * 0.5))       -- recursive, size n/2
  b = eagle(slice(ps, n * 0.5, n))       -- recursive, size n/2 (parallel with a)
  c = 0
  for (i = 0; i * i < n; i++) {          -- sqrt(n) iterations
    c += ps[i * i]
    ps[i * i] += b
  }
  return c

Span: \(S(n) = S\!\left(\frac{n}{2}\right) + O\!\left(\sqrt{n}\right)\) [two recursive calls in parallel → take max = \(S(n/2)\); loop is \(O(\sqrt{n})\)]
- \(S(n) = O\!\left(\sqrt{n}\right)\) (geometric series dominated by largest term)
Work: \(W(n) = 2W\!\left(\frac{n}{2}\right) + O\!\left(\sqrt{n}\right)\)
- \(W(n) = O(n)\) (Master theorem: \(a=2,\; b=2,\; f(n)=\sqrt{n},\; n^{\log_b a} = n > \sqrt{n}\))

Worked Example: 2024 Retake Q5 ("erumpent" algorithm)

procedure erumpent(ps)
  n = length(ps)
  if n == 0 { return 0 }; if n == 1 { return ps[0] }
  qs = map (\p -> p * n + 1) ps           -- O(n) work, O(1) span
  a = erumpent(slice(qs, 0, n * 0.5))     -- size n/2
  b = erumpent(slice(qs, n*0.25, n*0.75)) -- size n/2
  c = erumpent(slice(qs, n*0.5, n*1))     -- size n/2, all three parallel
  return a + b + c

Span: \(S(n) = S\!\left(\frac{n}{2}\right) + O(1)\) [three parallel calls → max = \(S(n/2)\); map is \(O(1)\) span]
- \(S(n) = O(\log n)\)
Work: \(W(n) = 3W\!\left(\frac{n}{2}\right) + O(n)\)
- \(W(n) = O\!\left(n^{\log_2 3}\right) \approx O\!\left(n^{1.58}\right)\) (Master theorem: \(a=3,\; b=2,\; n^{\log_2 3} > n\))