From e65fb4ea5f4aaaf1f0b6c8c0c7619eb087f94497 Mon Sep 17 00:00:00 2001 From: Troels Henriksen Date: Sat, 21 Dec 2024 17:17:14 +0100 Subject: [PATCH] More remarks. --- openmp.tex | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/openmp.tex b/openmp.tex index 552868b..f3d8736 100644 --- a/openmp.tex +++ b/openmp.tex @@ -674,7 +674,10 @@ \subsection{Summation with Parallel Regions} parallel region where each thread processes a chunk of the input and writes a result to its corresponding element to the results array, then after the final region we have a sequential loop that aggregates -the results array to a single final result. +the results array to a single final result. Since this final loop only +has one iteration per thread, which is usually a very low number (and +s constant irrespective of the input size), sequential execution poses +no performance problem in this case. \Cref{lst:openmp-partition-sum} shows an implementation of vector summation using this technique. The integer array \texttt{sums} @@ -800,7 +803,7 @@ \subsection{Filtering with parallel regions} return p; \end{lstlisting} -Although it only runs for \texttt{P} iterations, the \texttt{memcpy} +Although it only runs for \texttt{P} iterations, the \texttt{memcpy()} operation is likely to be expensive. Because this loop both reads and writes the output index \texttt{p}, it \emph{must} be sequential. For some filtering problems this may be acceptable: if we expect that the