charmplusplus · hizv · Feb 14, 2024 · Feb 14, 2024
diff --git a/README.md b/README.md
@@ -272,7 +272,7 @@ executable named `nqueen`.
 
 Following the previous example, to run the program on two processors, type
 
-     $ ./charmrun +p2 ./nqueen 12 6
+     $ ./charmrun ++n 2 ./nqueen 12 6
 
 This should run for a few seconds, and print out:
 `There are 14200 Solutions to 12 queens. Time=0.109440 End time=0.112752`
@@ -307,7 +307,7 @@ want to run program on only one machine, for example, your laptop. This
 can save you all the hassle of setting up ssh daemons.
 To use this option, just type:
 
-     $ ./charmrun ++local ./nqueen 12 100 +p2
+     $ ./charmrun ++local ./nqueen 12 100 ++n 2
 
 However, for best performance, you should launch one node program per processor.
 

diff --git a/doc/ampi/02-building.rst b/doc/ampi/02-building.rst
@@ -175,7 +175,7 @@ arguments. A typical invocation of an AMPI program ``pgm`` with
 
 .. code-block:: bash
 
-   $ ./charmrun +p16 ./pgm +vp64
+   $ ./charmrun ++n 16 ./pgm +vp64
 
 Here, the AMPI program ``pgm`` is run on 16 physical processors with 64
 total virtual ranks (which will be mapped 4 per processor initially).
@@ -189,7 +189,7 @@ example:
 
 .. code-block:: bash
 
-   $ ./charmrun +p16 ./pgm +vp128 +tcharm_stacksize 32K +balancer RefineLB
+   $ ./charmrun ++n 16 ./pgm +vp128 +tcharm_stacksize 32K +balancer RefineLB
 
 Running with ampirun
 ~~~~~~~~~~~~~~~~~~~~

diff --git a/doc/ampi/04-extensions.rst b/doc/ampi/04-extensions.rst
@@ -566,15 +566,15 @@ of the AMPI program with some additional command line options.
 
 .. code-block:: bash
 
-   $ ./charmrun ./pgm +p4 +vp4 +msgLogWrite +msgLogRank 2 +msgLogFilename "msg2.log"
+   $ ./charmrun ./pgm ++n 4 +vp4 +msgLogWrite +msgLogRank 2 +msgLogFilename "msg2.log"
 
 In the above example, a parallel run with 4 worker threads and 4 AMPI
 ranks will be executed, and the changes in the MPI environment of worker
 thread 2 (also rank 2, starting from 0) will get logged into diskfile
 "msg2.log".
 
 Unlike the first run, the re-run is a sequential program, so it is not
-invoked by charmrun (and omitting charmrun options like +p4 and +vp4),
+invoked by charmrun (and omitting charmrun options like ++n 4 and +vp4),
 and additional command line options are required as well.
 
 .. code-block:: bash

diff --git a/doc/ampi/05-examples.rst b/doc/ampi/05-examples.rst
@@ -31,7 +31,7 @@ MiniFE
    program.
 
 -  Refer to the ``README`` file on how to run the program. For example:
-   ``./charmrun +p4 ./miniFE.x nx=30 ny=30 nz=30 +vp32``
+   ``./charmrun ++n 4 ./miniFE.x nx=30 ny=30 nz=30 +vp32``
 
 MiniMD v2.0
 ~~~~~~~~~~~
@@ -44,7 +44,7 @@ MiniMD v2.0
    execute ``make ampi`` to build the program.
 
 -  Refer to the ``README`` file on how to run the program. For example:
-   ``./charmrun +p4 ./miniMD_ampi +vp32``
+   ``./charmrun ++n 4 ./miniMD_ampi +vp32``
 
 CoMD v1.1
 ~~~~~~~~~
@@ -72,7 +72,7 @@ MiniXYCE v1.0
    ``test/``.
 
 -  Example run command:
-   ``./charmrun +p3 ./miniXyce.x +vp3 -circuit ../tests/cir1.net -t_start 1e-6 -pf params.txt``
+   ``./charmrun ++n 3 ./miniXyce.x +vp3 -circuit ../tests/cir1.net -t_start 1e-6 -pf params.txt``
 
 HPCCG v1.0
 ~~~~~~~~~~
@@ -84,7 +84,7 @@ HPCCG v1.0
    AMPI compilers.
 
 -  Run with a command such as:
-   ``./charmrun +p2 ./test_HPCCG 20 30 10 +vp16``
+   ``./charmrun ++n 2 ./test_HPCCG 20 30 10 +vp16``
 
 MiniAMR v1.0
 ~~~~~~~~~~~~
@@ -140,7 +140,7 @@ Lassen v1.0
 
 -  No changes necessary to enable AMPI virtualization. Requires some
    C++11 support. Set ``AMPIDIR`` in Makefile and ``make``. Run with:
-   ``./charmrun +p4 ./lassen_mpi +vp8 default 2 2 2 50 50 50``
+   ``./charmrun ++n 4 ./lassen_mpi +vp8 default 2 2 2 50 50 50``
 
 Kripke v1.1
 ~~~~~~~~~~~
@@ -167,7 +167,7 @@ Kripke v1.1
 
    .. code-block:: bash
 
-      $ ./charmrun +p8 ./src/tools/kripke +vp8 --zones 64,64,64 --procs 2,2,2 --nest ZDG
+      $ ./charmrun ++n 8 ./src/tools/kripke +vp8 --zones 64,64,64 --procs 2,2,2 --nest ZDG
 
 MCB v1.0.3 (2013)
 ~~~~~~~~~~~~~~~~~
@@ -181,7 +181,7 @@ MCB v1.0.3 (2013)
 
    .. code-block:: bash
 
-      $ OMP_NUM_THREADS=1 ./charmrun +p4 ./../src/MCBenchmark.exe --weakScaling
+      $ OMP_NUM_THREADS=1 ./charmrun ++n 4 ./../src/MCBenchmark.exe --weakScaling
        --distributedSource --nCores=1 --numParticles=20000 --multiSigma --nThreadCore=1 +vp16
 
 .. _not-yet-ampi-zed-reason-1:
@@ -228,7 +228,7 @@ SNAP v1.01 (C version)
    while the C version works out of the box on all platforms.
 
 -  Edit the Makefile for AMPI compiler paths and run with:
-   ``./charmrun +p4 ./snap +vp4 --fi center_src/fin01 --fo center_src/fout01``
+   ``./charmrun ++n 4 ./snap +vp4 --fi center_src/fin01 --fo center_src/fout01``
 
 Sweep3D
 ~~~~~~~
@@ -248,7 +248,7 @@ Sweep3D
 
    -  Modify file ``input`` to set the different parameters. Refer to
       file ``README`` on how to change those parameters. Run with:
-      ``./charmrun ./sweep3d.mpi +p8 +vp16``
+      ``./charmrun ./sweep3d.mpi ++n 8 +vp16``
 
 PENNANT v0.8
 ~~~~~~~~~~~~
@@ -264,7 +264,7 @@ PENNANT v0.8
 
 -  For PENNANT-v0.8, point CC in Makefile to AMPICC and just ’make’. Run
    with the provided input files, such as:
-   ``./charmrun +p2 ./build/pennant +vp8 test/noh/noh.pnt``
+   ``./charmrun ++n 2 ./build/pennant +vp8 test/noh/noh.pnt``
 
 Benchmarks
 ----------
@@ -307,7 +307,7 @@ NAS Parallel Benchmarks (NPB 3.3)
       *cg.256.C* will appear in the *CG* and ``bin/`` directories. To
       run the particular benchmark, you must follow the standard
       procedure of running AMPI programs:
-      ``./charmrun ./cg.C.256 +p64 +vp256 ++nodelist nodelist``
+      ``./charmrun ./cg.C.256 ++n 64 +vp256 ++nodelist nodelist``
 
 NAS PB Multi-Zone Version (NPB-MZ 3.3)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -340,7 +340,7 @@ NAS PB Multi-Zone Version (NPB-MZ 3.3)
       directory. In the previous example, a file *bt-mz.256.C* will be
       created in the ``bin`` directory. To run the particular benchmark,
       you must follow the standard procedure of running AMPI programs:
-      ``./charmrun ./bt-mz.C.256 +p64 +vp256 ++nodelist nodelist``
+      ``./charmrun ./bt-mz.C.256 ++n 64 +vp256 ++nodelist nodelist``
 
 HPCG v3.0
 ~~~~~~~~~
@@ -352,7 +352,7 @@ HPCG v3.0
 -  No AMPI-ization needed. To build, modify ``setup/Make.AMPI`` for
    compiler paths, do
    ``mkdir build && cd build && configure ../setup/Make.AMPI && make``.
-   To run, do ``./charmrun +p16 ./bin/xhpcg +vp64``
+   To run, do ``./charmrun ++n 16 ./bin/xhpcg +vp64``
 
 Intel Parallel Research Kernels (PRK) v2.16
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -408,7 +408,7 @@ HYPRE-2.11.1
    ``LIBFLAGS``. Then run ``make``.
 
 -  To run the ``new_ij`` test, run:
-   ``./charmrun +p64 ./new_ij -n 128 128 128 -P 4 4 4 -intertype 6 -tol 1e-8 -CF 0 -solver 61 -agg_nl 1 27pt -Pmx 6 -ns 4 -mu 1 -hmis -rlx 13 +vp64``
+   ``./charmrun ++n 64 ./new_ij -n 128 128 128 -P 4 4 4 -intertype 6 -tol 1e-8 -CF 0 -solver 61 -agg_nl 1 27pt -Pmx 6 -ns 4 -mu 1 -hmis -rlx 13 +vp64``
 
 MFEM-3.2
 ~~~~~~~~
@@ -440,7 +440,7 @@ MFEM-3.2
    -  ``make parallel MFEM_USE_MPI=YES MPICXX=~/charm/bin/ampicxx HYPRE_DIR=~/hypre-2.11.1/src/hypre METIS_DIR=~/metis-4.0.3``
 
 -  To run an example, do
-   ``./charmrun +p4 ./ex15p -m ../data/amr-quad.mesh +vp16``. You may
+   ``./charmrun ++n 4 ./ex15p -m ../data/amr-quad.mesh +vp16``. You may
    want to add the runtime options ``-no-vis`` and ``-no-visit`` to
    speed things up.
 
@@ -464,10 +464,10 @@ XBraid-1.1
    HYPRE in their Makefiles and ``make``.
 
 -  To run an example, do
-   ``./charmrun +p2 ./ex-02 -pgrid 1 1 8 -ml 15 -nt 128 -nx 33 33 -mi 100 +vp8 ++local``.
+   ``./charmrun ++n 2 ./ex-02 -pgrid 1 1 8 -ml 15 -nt 128 -nx 33 33 -mi 100 +vp8 ++local``.
 
 -  To run a driver, do
-   ``./charmrun +p4 ./drive-03 -pgrid 2 2 2 2 -nl 32 32 32 -nt 16 -ml 15 +vp16 ++local``
+   ``./charmrun ++n 4 ./drive-03 -pgrid 2 2 2 2 -nl 32 32 32 -nt 16 -ml 15 +vp16 ++local``
 
 Other AMPI codes
 ----------------

diff --git a/doc/charisma/manual.rst b/doc/charisma/manual.rst
@@ -483,7 +483,7 @@ Turing Cluster, use the customized job launcher ``rjq`` or ``rj``).
 
 .. code-block:: bash
 
-   $ charmrun pgm +p4
+   $ charmrun pgm ++n 4
 
 Please refer to Charm++'s manual and tutorial for more details of
 building and running a Charm++ program.
@@ -619,7 +619,7 @@ instance, the following command uses ``RefineLB``.
 
 .. code-block:: bash
 
-   $ ./charmrun ./pgm +p16 +balancer RefineLB
+   $ ./charmrun ./pgm ++n 16 +balancer RefineLB
 
 .. _secsparse:
 

diff --git a/doc/charm++/manual.rst b/doc/charm++/manual.rst
@@ -8452,7 +8452,7 @@ mode. For example:
 
 .. code-block:: bash
 
-     $ ./charmrun hello +p4 +restart log
+     $ ./charmrun hello ++n 4 +restart log
 
 Restarting is the reverse process of checkpointing. Charm++ allows
 restarting the old checkpoint on a different number of physical
@@ -8481,7 +8481,7 @@ After a failure, the system may contain fewer or more processors. Once
 the failed components have been repaired, some processors may become
 available again. Therefore, the user may need the flexibility to restart
 on a different number of processors than in the checkpointing phase.
-This is allowable by giving a different ``+pN`` option at runtime. One
+This is allowable by giving a different ``++n N`` option at runtime. One
 thing to note is that the new load distribution might differ from the
 previous one at checkpoint time, so running a load balancer (see
 Section :numref:`loadbalancing`) after restart is suggested.
@@ -8618,9 +8618,9 @@ it stores them in the local disk. The checkpoint files are named
 Users can pass the runtime option ``+ftc_disk`` to activate this mode. For
 example:
 
-.. code-block:: c++
+.. code-block:: bash
 
-      ./charmrun hello +p8 +ftc_disk
+      ./charmrun hello ++n 8 +ftc_disk
 
 Building Instructions
 ^^^^^^^^^^^^^^^^^^^^^
@@ -8629,7 +8629,7 @@ In order to have the double local-storage checkpoint/restart
 functionality available, the parameter ``syncft`` must be provided at
 build time:
 
-.. code-block:: c++
+.. code-block:: bash
 
       ./build charm++ netlrts-linux-x86_64 syncft
 
@@ -8656,7 +8656,7 @@ name:
 
 .. code-block:: bash
 
-   $ ./charmrun hello +p8 +kill_file <file>
+   $ ./charmrun hello ++n 8 +kill_file <file>
 
 An example of this usage can be found in the ``syncfttest`` targets in
 ``tests/charm++/jacobi3d``.
@@ -9967,7 +9967,7 @@ program
 
 .. code-block:: bash
 
-   $ ./charmrun pgm +p1000 +balancer RandCentLB +LBDump 2 +LBDumpSteps 4 +LBDumpFile lbsim.dat
+   $ ./charmrun pgm ++n 1000 +balancer RandCentLB +LBDump 2 +LBDumpSteps 4 +LBDumpFile lbsim.dat
 
 This will collect data on files lbsim.dat.2,3,4,5. We can use this data
 to analyze the performance of various centralized strategies using:
@@ -11330,7 +11330,7 @@ used, and a port number to listen the shrink/expand commands:
 
 .. code-block:: bash
 
-   	$ ./charmrun +p4 ./jacobi2d 200 20 +balancer GreedyLB ++nodelist ./mynodelistfile ++server ++server-port 1234
+   	$ ./charmrun ++n 4 ./jacobi2d 200 20 +balancer GreedyLB ++nodelist ./mynodelistfile ++server ++server-port 1234
 
 The CCS client to send shrink/expand commands needs to specify the
 hostname, port number, the old(current) number of processor and the
@@ -11988,7 +11988,7 @@ To run a Charm++ program named “pgm” on four processors, type:
 
 .. code-block:: bash
 
-   $ charmrun pgm +p4
+   $ charmrun pgm ++n 4
 
 Execution on platforms which use platform specific launchers, (i.e.,
 **aprun**, **ibrun**), can proceed without charmrun, or charmrun can be
@@ -12122,7 +12122,7 @@ advanced options are available:
 ``++p N``
    Total number of processing elements to create. In SMP mode, this
    refers to worker threads (where
-   :math:`\texttt{n} * \texttt{ppn} = \texttt{p}`), otherwise it refers
+   :math:`\texttt{n} \times \texttt{ppn} = \texttt{p}`), otherwise it refers
    to processes (:math:`\texttt{n} = \texttt{p}`). The default is 1. Use
    of ``++p`` is discouraged in favor of ``++processPer*`` (and
    ``++oneWthPer*`` in SMP mode) where desirable, or ``++n`` (and
@@ -12230,7 +12230,7 @@ The remaining options cover details of process launch and connectivity:
 
    .. code-block:: bash
 
-      $ ./charmrun +p4 ./pgm 100 2 3 ++runscript ./set_env_script
+      $ ./charmrun ++n 4 ./pgm 100 2 3 ++runscript ./set_env_script
 
    In this case, ``set_env_script`` is invoked on each node. **Note:** When this
    is provided, ``charmrun`` will not invoke the program directly, instead only
@@ -12400,20 +12400,29 @@ like:
 
    $ ./charmrun ++ppn 3 +p6 +pemap 1-3,5-7 +commap 0,4 ./app <args>
 
-This will create two logical nodes/OS processes (2 = 6 PEs/3 PEs per
-node), each with three worker threads/PEs (``++ppn 3``). The worker
-threads/PEs will be mapped thusly: PE 0 to core 1, PE 1 to core 2, PE 2
-to core 3 and PE 4 to core 5, PE 5 to core 6, and PE 6 to core 7.
-PEs/worker threads 0-2 compromise the first logical node and 3-5 are the
-second logical node. Additionally, the communication threads will be
-mapped to core 0, for the communication thread of the first logical
-node, and to core 4, for the communication thread of the second logical
-node.
-
 Please keep in mind that ``+p`` always specifies the total number of PEs
 created by Charm++, regardless of mode (the same number as returned by
-``CkNumPes()``). The ``+p`` option does not include the communication
-thread, there will always be exactly one of those per logical node.
+``CkNumPes()``). So this will create two logical nodes/OS processes
+(2 = 6 PEs/3 PEs per node), each with three worker threads/PEs
+(``++ppn 3``).
+
+We recommend using ``++n``, especially with ``++ppn``. Recall
+that :math:`\texttt{n} \times \texttt{ppn} = \texttt{p}`. So the example becomes:
+
+.. code-block:: bash
+
+   $ ./charmrun ++ppn 3 ++n 2 +pemap 1-3,5-7 +commap 0,4 ./app <args>
+
+The worker threads/PEs will be mapped thusly: PE 0 to
+core 1, PE 1 to core 2, PE 2 to core 3 and PE 4 to core 5, PE 5 to
+core 6, and PE 6 to core 7 (``+pemap``). PEs/worker threads 0-2
+compromise the first logical node and 3-5 are the second logical node.
+Additionally, the communication threads will be mapped to core 0, for
+the communication thread of the first logical node, and to core 4,
+for the communication thread of the second logical node (``+commap``).
+
+Note that the ``+p`` option does not include the communication
+thread. There will always be exactly one of those per logical node.
 
 Multicore Options
 ^^^^^^^^^^^^^^^^^
@@ -12526,7 +12535,7 @@ nodes than there are hosts in the group, it will reuse hosts. Thus,
 
 .. code-block:: bash
 
-   $ charmrun pgm ++nodegroup kale-sun +p6
+   $ charmrun pgm ++nodegroup kale-sun ++n 6
 
 uses hosts (charm, dp, grace, dagger, charm, dp) respectively as nodes
 (0, 1, 2, 3, 4, 5).
@@ -12536,7 +12545,7 @@ Thus, if one specifies
 
 .. code-block:: bash
 
-   $ charmrun pgm +p4
+   $ charmrun pgm ++n 4
 
 it will use “localhost” four times. “localhost” is a Unix trick; it
 always find a name for whatever machine you’re on.
@@ -13237,7 +13246,7 @@ of the above incantation, for various kinds of process launchers:
 
 .. code-block:: bash
 
-   $ ./charmrun +p2 `which valgrind` --log-file=VG.out.%p --trace-children=yes ./application_name ...application arguments...
+   $ ./charmrun ++n 2 `which valgrind` --log-file=VG.out.%p --trace-children=yes ./application_name ...application arguments...
    $ aprun -n 2 `which valgrind` --log-file=VG.out.%p --trace-children=yes ./application_name ...application arguments...
 
 The first adaptation is to use :literal:`\`which valgrind\`` to obtain a

diff --git a/doc/faq/manual.rst b/doc/faq/manual.rst
@@ -204,7 +204,7 @@ following command:
 
 .. code-block:: bash
 
-   ./charmrun +p14 ./pgm ++ppn 7 +commap 0 +pemap 1-7
+   ./charmrun ++n 2 ./pgm ++ppn 7 +commap 0 +pemap 1-7
 
 See :ref:`sec-smpopts` of the Charm++ manual for more information.
 

diff --git a/doc/libraries/manual.rst b/doc/libraries/manual.rst
@@ -36,7 +36,7 @@ client is a small Java program. A typical use of this is:
 
    	cd charm/examples/charm++/wave2d
    	make
-   	./charmrun ./wave2d +p2 ++server ++server-port 1234
+   	./charmrun ./wave2d ++n 2 ++server ++server-port 1234
    	~/ccs_tools/bin/liveViz localhost 1234
 
 Use git to obtain a copy of ccs_tools (prior to using liveViz) and build