From e65fb4ea5f4aaaf1f0b6c8c0c7619eb087f94497 Mon Sep 17 00:00:00 2001
From: Troels Henriksen <athas@sigkill.dk>
Date: Sat, 21 Dec 2024 17:17:14 +0100
Subject: [PATCH] More remarks.

---
 openmp.tex | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/openmp.tex b/openmp.tex
index 552868b..f3d8736 100644
--- a/openmp.tex
+++ b/openmp.tex
@@ -674,7 +674,10 @@ \subsection{Summation with Parallel Regions}
 parallel region where each thread processes a chunk of the input and
 writes a result to its corresponding element to the results array,
 then after the final region we have a sequential loop that aggregates
-the results array to a single final result.
+the results array to a single final result. Since this final loop only
+has one iteration per thread, which is usually a very low number (and
+s constant irrespective of the input size), sequential execution poses
+no performance problem in this case.
 
 \Cref{lst:openmp-partition-sum} shows an implementation of vector
 summation using this technique. The integer array \texttt{sums}
@@ -800,7 +803,7 @@ \subsection{Filtering with parallel regions}
   return p;
 \end{lstlisting}
 
-Although it only runs for \texttt{P} iterations, the \texttt{memcpy}
+Although it only runs for \texttt{P} iterations, the \texttt{memcpy()}
 operation is likely to be expensive. Because this loop both reads and
 writes the output index \texttt{p}, it \emph{must} be sequential. For
 some filtering problems this may be acceptable: if we expect that the