[IE CLDNN] Updated GPU device config (openvinotoolkit#6040)

yunji-yunji · Jun 9, 2021 · aa4a18d · aa4a18d
1 parent f0cf1df
commit aa4a18d
Show file tree

Hide file tree

Showing 29 changed files with 301 additions and 278 deletions.
diff --git a/docs/IE_DG/API_Changes.md b/docs/IE_DG/API_Changes.md
@@ -14,6 +14,15 @@ The sections below contain detailed list of changes made to the Inference Engine
  * InferenceEngine::Parameter(std::shared_ptr<ngraph::Variant>& var)
  * std::shared_ptr<ngraph::Variant> InferenceEngine::Parameter::asVariant() const
  * InferenceEngine::Parameter::operator std::shared_ptr<ngraph::Variant>() const
+ * KEY_CLDNN_NV12_TWO_INPUTS GPU plugin option. Use KEY_GPU_NV12_TWO_INPUTS instead
+ * KEY_CLDNN_PLUGIN_PRIORITY GPU plugin option. Use KEY_GPU_PLUGIN_PRIORITY instead
+ * KEY_CLDNN_PLUGIN_THROTTLE GPU plugin option. Use KEY_GPU_PLUGIN_THROTTLE instead
+ * KEY_CLDNN_MEM_POOL GPU plugin option
+ * KEY_CLDNN_GRAPH_DUMPS_DIR GPU plugin option
+ * KEY_CLDNN_SOURCES_DUMPS_DIR GPU plugin option
+ * KEY_DUMP_KERNELS GPU plugin option
+ * KEY_TUNING_MODE GPU plugin option
+ * KEY_TUNING_FILE GPU plugin option
 
 ## 2021.3
 
@@ -528,7 +537,7 @@ The sections below contain detailed list of changes made to the Inference Engine
  * DLIA_CONFIG_KEY(ENABLE_STREAMING) config key
 
 ### Removed API
- 
+
  * InferenceEngine::EltwiseLayer::Select from InferenceEngine::EltwiseLayer::eOperation enumeration
 
 ## 2019 R2
@@ -577,7 +586,7 @@ The sections below contain detailed list of changes made to the Inference Engine
  * DLIA_CONFIG_KEY(IO_TRANSFORMATIONS_NATIVE) config key
  * DLIA_CONFIG_KEY(DUMP_SUPPORTED_LAYERS_INFORMATION) config key
  * GNA_CONFIG_VALUE(SW_FP32) config value for GNA_CONFIG_KEY(DEVICE_MODE) key
- * MULTI_CONFIG_KEY(DEVICE_PRIORITIES) config key for `MULTI` device 
+ * MULTI_CONFIG_KEY(DEVICE_PRIORITIES) config key for `MULTI` device
  * InferenceEngine::CNNNetReader::ReadNetwork(const std::wstring &filepath) new method
  * InferenceEngine::CNNNetReader::ReadWeights(const std::wstring &filepath) new method
  * InferenceEngine::ExecutableNetwork::ExecutableNetwork(IExecutableNetwork::Ptr actual, InferenceEnginePluginPtr plg) constructor with additional `plg` parameter
@@ -593,7 +602,7 @@ The sections below contain detailed list of changes made to the Inference Engine
  * InferenceEngine::EltwiseLayer::Logical_NOT, InferenceEngine::EltwiseLayer::Mean, InferenceEngine::EltwiseLayer::Select extensions to InferenceEngine::EltwiseLayer::eOperation enumeration
  * InferenceEngine::OneHotLayer new class
  * InferenceEngine::SelectLayer new class
- * InferenceEngine::BroadcastLayer new class 
+ * InferenceEngine::BroadcastLayer new class
  * InferenceEngine::MathLayer new class
  * InferenceEngine::ReduceLayer new class
  * InferenceEngine::TopKLayer new class

diff --git a/docs/IE_DG/Extensibility_DG/GPU_Kernel.md b/docs/IE_DG/Extensibility_DG/GPU_Kernel.md
@@ -219,22 +219,6 @@ __kernel void example_relu_kernel(
 
 ## Debugging Tips<a name="debugging-tips"></a>
 
-* **Dumping the Resulting Kernels**.
-It is recommended to get a dump of the kernel with all of
-the values set by the Inference Engine, such as tensor sizes,
-floating-point, and integer kernel parameters. To get the dump, add the
-following line to your code that configures the GPU plugin to output the
-custom kernels:
-
-@snippet snippets/GPU_Kernel.cpp part1
-
-When the Inference Engine compiles the kernels for the specific network,
-it also outputs the resulting code for the custom kernels. In the
-directory of your executable, find files like
-`clDNN_program0.cl`, `clDNN_program1.cl`. There are as many files as
-distinct sets of parameters for your custom kernel: different input
-tensor sizes and kernel parameters.
-
 * **Using `printf` in the OpenCL™ Kernels**.
 To debug the specific values, you can use `printf` in your kernels.
 However, be careful: for instance, do not output excessively

diff --git a/docs/IE_DG/GPU_Kernels_Tuning.md b/docs/IE_DG/GPU_Kernels_Tuning.md
diff --git a/docs/IE_DG/Intro_to_Performance.md b/docs/IE_DG/Intro_to_Performance.md
@@ -22,7 +22,7 @@ $ benchmark_app -m <model.xml> -enforcebf16=false
 Notice that for quantized (e.g. INT8) models the bfloat16 calculations (of the layers that remain in FP32) is disabled by default.
 Refer to the [CPU Plugin documentation](supported_plugins/CPU.md) for more details.
 
-Similarly, the GPU device has a dedicated config key to enable FP16 execution of the layers that remain in FP32 in the quantized models (as the quantization is typically performed on the FP32 models), refer to the ENABLE_FP16_FOR_QUANTIZED_MODELS key in the [GPU Plugin documentation](supported_plugins/CL_DNN.md) 
+Similarly, the GPU device has a dedicated config key to enable FP16 execution of the layers that remain in FP32 in the quantized models (as the quantization is typically performed on the FP32 models), refer to the ENABLE_FP16_FOR_QUANTIZED_MODELS key in the [GPU Plugin documentation](supported_plugins/GPU.md)
 
 ## Latency vs. Throughput
 One way to increase computational efficiency is batching, which combines many (potentially tens) of
@@ -72,30 +72,20 @@ Instead, it is possible to keep a separate infer request per camera or another s
 
 ## Benchmark App
 [Benchmark App](../../inference-engine/samples/benchmark_app/README.md) sample is the best performance reference.
-It has a lot of device-specific knobs, but the primary usage is as simple as: 
+It has a lot of device-specific knobs, but the primary usage is as simple as:
 ```bash
 $ ./benchmark_app –d GPU –m <model> -i <input>
 ```
-to measure the performance of the model on the GPU. 
+to measure the performance of the model on the GPU.
 Or
 ```bash
 $ ./benchmark_app –d CPU –m <model> -i <input>
 ```
 to execute on the CPU instead.
 
-For example, for the CPU throughput mode from the previous section, you can play with number of streams (`-nstreams` command-line param). 
-Try different values of the `-nstreams` argument from `1` to a number of CPU cores and find one that provides the best performance. For example, on a 8-core CPU, compare the `-nstreams 1` (which is a latency-oriented scenario) to the `2`, `4` and `8` streams. Notice that `benchmark_app` automatically queries/creates/runs number of requests required to saturate the given number of streams. 
+For example, for the CPU throughput mode from the previous section, you can play with number of streams (`-nstreams` command-line param).
+Try different values of the `-nstreams` argument from `1` to a number of CPU cores and find one that provides the best performance. For example, on a 8-core CPU, compare the `-nstreams 1` (which is a latency-oriented scenario) to the `2`, `4` and `8` streams. Notice that `benchmark_app` automatically queries/creates/runs number of requests required to saturate the given number of streams.
 
 Finally, notice that when you don't specify number of streams with `-nstreams`, "AUTO" value for the streams is used, e.g. for the CPU this is [CPU_THROUGHPUT_AUTO](supported_plugins/CPU.md). You can spot the actual value behind "AUTO" for your machine in the application output.
 Notice that the "AUTO" number is not necessarily most optimal, so it is generally recommended to play either with the benchmark_app's "-nstreams" as described above, or via  [new Workbench tool](@ref workbench_docs_Workbench_DG_Introduction).This allows you to simplify the app-logic, as you don't need to combine multiple inputs into a batch to achieve good CPU performance.
 Instead, it is possible to keep a separate infer request per camera or another source of input and process the requests in parallel using Async API.
-
-## Kernels Tuning for GPU
-
-GPU backend comes with a feature, that allows models tuning, so the workload is configured to fit better into hardware.
-
-Tuning is time consuming process, which internally execute every layer several (or even hundreds) times to find most performant configuration.
-
-This configuration is saved into json-formatted file, whose name can be passed as plugin param to network. GPU backend will process this data to configure kernels for the best performance.
-
-For more details about Kernels Tuning and How-To please refer to [GPU Kernels Tuning](GPU_Kernels_Tuning.md).