DTT1 - Test module if Ansible execution fails not report to Workflow #5411

fcaffieri · 2024-05-21T16:30:45Z

Description

The objective of the issue is to solve two bugs:

Bug when deleting the known hosts since it gives the following error:

stderr: Host not found in /home/user/.ssh/known_hosts

Bug with Ansible does not report the errors to Workflow causing the Workflow itself not to fail in the event of a failure in the tests

Tasks

Fix the iteration bug into the ansible execution
Fix the problem that Ansible does not report the error to the workflow in case of a playbook failure

When rendering the playbook with Jinja, what happens is that it tries to replace the value of item with Jinja, when it has to be an iterable of the loop. So when the playbook reaches ansible, it will render the item variable and it has already been replaced and is empty, which is why it cannot loop ansible.
The implementation was modified to not render with Jinja the templates that are not required (within the test module it is not required to use the Jinja render)

Test:

TASK [Cleaning old key ssh-keygen registries] **********************************
changed: [localhost] => (item=ec2-44-212-5-227.compute-1.amazonaws.com) => changed=true
   ansible_loop_var: item
   cmd:
   - ssh-keygen
   - -f
   - ~/.ssh/known_hosts
   - -R
   - ec2-44-212-5-227.compute-1.amazonaws.com
   delta: '0:00:00.007695'
   end: '2024-05-21 17:05:02.493476'
   item: ec2-44-212-5-227.compute-1.amazonaws.com
   msg: ''
   rc: 0
   start: '2024-05-21 17:05:02.485781'
   stderr: Host ec2-44-212-5-227.compute-1.amazonaws.com not found in /home/fcaffieri/.ssh/known_hosts
   stderr_lines: <omitted>
   stdout: ''
   stdout_lines: <omitted>

PLAY RECAP ************************************************ *********************
localhost : ok=2 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0

Now replace the values correctly.

fcaffieri · 2024-05-21T20:44:30Z

Update report

Analyzing bug number 2.

During the analysis of this bug, an error was found in the execution of the commands that require sudo. Step to detail:

You have to execute this command which installs the manager with the installation wizard:

commands = [
                     f"curl -sO https://{s3_url}/{release}/wazuh-install.sh && bash wazuh-install.sh --wazuh-server {node_name} --ignore-check"
             ]
ConnectionManager.execute_commands(inventory_path, commands)

The execute_commands executes the command via ssh like this:

ssh_command = [
             "ssh",
             "-i", data.get('private_key_path'),
             "-o", "StrictHostKeyChecking=no",
             "-o", "UserKnownHostsFile=/dev/null",
             "-p", str(data.get('port')),
             f"{data.get('username')}@{data.get('host')}",
             "sudo",
             command
         ]

It incorporates sudo into all the commands that require it, the problem is that by having 2 or more commands nested with the "&&", it will only execute the first command with sudo and the subsequent one not. In fact this bug was found for the mentioned example:

bash ./wazuh-install.sh --wazuh-server wazuh-1 --ignore-check -o
./wazuh-install.sh: line 2586: /var/log/wazuh-install.log: Permission denied
This script must be run as root.

That said, it is necessary to modify all the executions of nested commands to a list of commands, since the executor allows sending a list of commands and executes them one by one, thus incorporating sudo for each command. In the aforementioned example, it would look something like this:

commands = [
                     f"curl -sO https://{s3_url}/{release}/wazuh-install.sh",
                     f"bash wazuh-install.sh --wazuh-server {node_name} --ignore-check"
             ]

After these changes the tests were satisfactory.

Analyzing bug number 2: it was detected that the result of the playbook execution is not being analyzed, it is only analyzed if it fails with an exception and not if the executed playbook(s) fail.
I am working on the implementation of the bugs, with the purpose of reporting if any playbook fails, to report this and raise the respective error.

fcaffieri · 2024-05-22T18:46:24Z

Update report

All modifications to fix the bugs were made.

Test forcing failure into one playbook:

Input yaml:

version: 0.1
description: This workflow is used to test manager deployment for DDT1 PoC
variables:
  manager-os:
    - linux-ubuntu-20.04-amd64
  infra-provider: aws
  working-dir: /tmp/dtt1-poc

tasks:
  # Unique manager allocate task
  - task: "allocate-manager-{manager}"
    description: "Allocate resources for the manager."
    do:
      this: process
      with:
        path: python3
        args:
          - modules/allocation/main.py
          - action: create
          - provider: "{infra-provider}"
          - size: large
          - composite-name: "{manager}"
          - inventory-output: "{working-dir}/manager-{manager}/inventory.yaml"
          - track-output: "{working-dir}/manager-{manager}/track.yaml"
          - label-termination-date: "1d"
          - label-team: "qa"
    on-error: "abort-all"
    foreach:
      - variable: manager-os
        as: manager
    cleanup:
      this: process
      with:
        path: python3
        args:
          - modules/allocation/main.py
          - action: delete
          - track-output: "{working-dir}/manager-{manager}/track.yaml"

  # Generic manager test task
  - task: "run-manager-tests"
    description: "Run tests install for the manager."
    do:
      this: process
      with:
        path: python3
        args:
          - modules/testing/main.py
          - targets:
            - wazuh-1: "{working-dir}/manager-linux-ubuntu-20.04-amd64/inventory.yaml"
          - tests: "install,restart,stop,uninstall"
          - component: "manager"
          - wazuh-version: "4.7.4"
          - wazuh-revision: "40717"
          - live: False
    depends-on:
      - "allocate-manager-linux-ubuntu-20.04-amd64"

The test shows the real error and does not continue with the rest of the tests:

Full log:

test-manager-aws.log

fcaffieri · 2024-05-22T20:19:21Z

Update report

Full test:

Input yaml:

version: 0.1
description: This workflow is used to test agents deployment for DDT1 PoC
variables:
  agent-os:
    - linux-redhat-8-amd64
    - linux-redhat-8-arm64
    - linux-centos-8-amd64
    - linux-centos-8-arm64
    - linux-debian-12-amd64
    - linux-debian-12-arm64
    - linux-ubuntu-20.04-amd64
    - linux-ubuntu-20.04-arm64
    - linux-oracle-9-amd64
    - linux-amazon-2023-amd64
    - linux-amazon-2023-arm64

  windows-agent-os:
    - windows-server-2019-amd64
    - windows-server-2022-amd64

  macos-agent-os:
    - macos-ventura-13-amd64
    - macos-ventura-13-arm64

  manager-os: linux-ubuntu-22.04-amd64
  infra-provider: aws
  macos-infra-provider: vagrant
  working-dir: /tmp/dtt1-poc

tasks:
  # Unique manager allocate task
  - task: "allocate-manager-{manager-os}"
    description: "Allocate resources for the manager."
    do:
      this: process
      with:
        path: python3
        args:
          - modules/allocation/main.py
          - action: create
          - provider: "{infra-provider}"
          - size: large
          - composite-name: "{manager-os}"
          - inventory-output: "{working-dir}/manager-{manager-os}/inventory.yaml"
          - track-output: "{working-dir}/manager-{manager-os}/track.yaml"
          - label-termination-date: "1d"
          - label-team: "qa"
          - label-issue: "https://github.com/wazuh/wazuh-qa/issues/5191"
    on-error: "abort-all"
    cleanup:
      this: process
      with:
        path: python3
        args:
          - modules/allocation/main.py
          - action: delete
          - track-output: "{working-dir}/manager-{manager-os}/track.yaml"

  # Unique agent allocate task
  - task: "allocate-agent-{agent}"
    description: "Allocate resources for the agent."
    do:
      this: process
      with:
        path: python3
        args:
          - modules/allocation/main.py
          - action: create
          - provider: "{infra-provider}"
          - size: small
          - composite-name: "{agent}"
          - inventory-output: "{working-dir}/agent-{agent}/inventory.yaml"
          - track-output: "{working-dir}/agent-{agent}/track.yaml"
          - label-termination-date: "1d"
          - label-team: "qa"
          - label-issue: "https://github.com/wazuh/wazuh-qa/issues/5191"
    on-error: "abort-all"
    foreach:
      - variable: agent-os
        as: agent
    cleanup:
      this: process
      with:
        path: python3
        args:
          - modules/allocation/main.py
          - action: delete
          - track-output: "{working-dir}/agent-{agent}/track.yaml"

  # Unique macOS agent allocate task
  - task: "allocate-macos-agent-{agent}"
    description: "Allocate resources for the agent."
    do:
      this: process
      with:
        path: python3
        args:
          - modules/allocation/main.py
          - action: create
          - provider: "{macos-infra-provider}"
          - size: small
          - composite-name: "{agent}"
          - inventory-output: "{working-dir}/agent-{agent}/inventory.yaml"
          - track-output: "{working-dir}/agent-{agent}/track.yaml"
          - label-termination-date: "1d"
          - label-team: "qa"
          - label-issue: "https://github.com/wazuh/wazuh-qa/issues/5191"
    on-error: "abort-all"
    foreach:
      - variable: macos-agent-os
        as: agent
    cleanup:
      this: process
      with:
        path: python3
        args:
          - modules/allocation/main.py
          - action: delete
          - track-output: "{working-dir}/agent-{agent}/track.yaml"

  # Unique Windows agent allocate task
  - task: "allocate-windows-agent-{agent}"
    description: "Allocate resources for the agent."
    do:
      this: process
      with:
        path: python3
        args:
          - modules/allocation/main.py
          - action: create
          - provider: "{infra-provider}"
          - size: large
          - composite-name: "{agent}"
          - inventory-output: "{working-dir}/agent-{agent}/inventory.yaml"
          - track-output: "{working-dir}/agent-{agent}/track.yaml"
          - label-termination-date: "1d"
          - label-team: "qa"
          - label-issue: "https://github.com/wazuh/wazuh-qa/issues/5191"
    on-error: "abort-all"
    foreach:
      - variable: windows-agent-os
        as: agent
    cleanup:
      this: process
      with:
        path: python3
        args:
          - modules/allocation/main.py
          - action: delete
          - track-output: "{working-dir}/agent-{agent}/track.yaml"

  # Unique manager provision task
  - task: "provision-manager-{manager-os}"
    description: "Provision the manager."
    do:
      this: process
      with:
        path: python3
        args:
          - modules/provision/main.py
          - inventory: "{working-dir}/manager-{manager-os}/inventory.yaml"
          - install:
            - component: wazuh-manager
              type: assistant
              version: 4.7.4
              live: True
    depends-on:
      - "allocate-manager-{manager-os}"
    on-error: "abort-all"

  # Generic agent test task
  - task: "run-agent-{agent}-tests"
    description: "Run tests install for the agent {agent}."
    do:
      this: process
      with:
        path: python3
        args:
          - modules/testing/main.py
          - targets:
            - wazuh-1: "{working-dir}/manager-{manager-os}/inventory.yaml"
            - agent: "{working-dir}/agent-{agent}/inventory.yaml"
          - tests: "install,registration,connection,basic_info,restart,stop,uninstall"
          - component: "agent"
          - wazuh-version: "4.7.4"
          - wazuh-revision: "40717"
          - live: True
    foreach:
      - variable: agent-os
        as: agent
    depends-on:
      - "allocate-agent-{agent}"
      - "provision-manager-{manager-os}"

  # Generic macOS agent test task
  - task: "run-macos-agent-{agent}-tests"
    description: "Run tests install for the agent {agent}."
    do:
      this: process
      with:
        path: python3
        args:
          - modules/testing/main.py
          - targets:
            - wazuh-1: "{working-dir}/manager-{manager-os}/inventory.yaml"
            - agent: "{working-dir}/agent-{agent}/inventory.yaml"
          - tests: "install,registration,basic_info,connection,restart,stop,uninstall"
          - component: "agent"
          - wazuh-version: "4.7.4"
          - wazuh-revision: "40717"
          - live: True
    foreach:
      - variable: macos-agent-os
        as: agent
    depends-on:
      - "allocate-macos-agent-{agent}"
      - "provision-manager-{manager-os}"

  # Generic Windows agent test task
  - task: "run-windows-agent-{agent}-tests"
    description: "Run tests install for the agent {agent}."
    do:
      this: process
      with:
        path: python3
        args:
          - modules/testing/main.py
          - targets:
            - wazuh-1: "{working-dir}/manager-{manager-os}/inventory.yaml"
            - agent: "{working-dir}/agent-{agent}/inventory.yaml"
          - tests: "install,registration,basic_info,connection,restart,stop,uninstall"
          - component: "agent"
          - wazuh-version: "4.7.4"
          - wazuh-revision: "40717"
          - live: True
    foreach:
      - variable: windows-agent-os
        as: agent
    depends-on:
      - "allocate-windows-agent-{agent}"
      - "provision-manager-{manager-os}"

Result:
Failures are observed in Windows and Debian-12 tests expected in the log, the idea of the test is to show that the failure is returned to the Workflow so that the error is reported:

test-full-agent-with-aws-debug.log

fcaffieri · 2024-05-22T20:44:08Z

New test without debug:

Result:

In this test, the failure is observed in the Windows server 2022 and the error is displayed without the need for debugging and informs the Workflow.

test-full-agent-with-aws.log

pro-akim · 2024-05-23T12:27:54Z

Review Notes

LGTM

rauldpm · 2024-05-23T13:17:25Z

LGTM

fcaffieri self-assigned this May 21, 2024

fcaffieri added level/task Task issue type/bug labels May 21, 2024

fcaffieri added this to Release 4.9.0 May 21, 2024

wazuhci moved this to Backlog in Release 4.9.0 May 21, 2024

wazuhci moved this from Backlog to In progress in Release 4.9.0 May 21, 2024

fcaffieri linked a pull request May 22, 2024 that will close this issue

Fixes Ansible not returning error to Workflow module #5423

Merged

wazuhci moved this from In progress to Pending review in Release 4.9.0 May 22, 2024

wazuhci moved this from Pending review to In review in Release 4.9.0 May 23, 2024

wazuhci moved this from In review to Pending final review in Release 4.9.0 May 23, 2024

wazuhci moved this from Pending final review to In final review in Release 4.9.0 May 23, 2024

rauldpm closed this as completed May 23, 2024

wazuhci moved this from In final review to Done in Release 4.9.0 May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DTT1 - Test module if Ansible execution fails not report to Workflow #5411

DTT1 - Test module if Ansible execution fails not report to Workflow #5411

fcaffieri commented May 21, 2024 •

edited

Loading

fcaffieri commented May 21, 2024

fcaffieri commented May 21, 2024 •

edited

Loading

fcaffieri commented May 22, 2024

fcaffieri commented May 22, 2024

fcaffieri commented May 22, 2024

pro-akim commented May 23, 2024

rauldpm commented May 23, 2024

DTT1 - Test module if Ansible execution fails not report to Workflow #5411

DTT1 - Test module if Ansible execution fails not report to Workflow #5411

Comments

fcaffieri commented May 21, 2024 • edited Loading

Description

Tasks

Related

fcaffieri commented May 21, 2024

Update report

fcaffieri commented May 21, 2024 • edited Loading

Update report

fcaffieri commented May 22, 2024

Update report

fcaffieri commented May 22, 2024

Update report

fcaffieri commented May 22, 2024

pro-akim commented May 23, 2024

Review Notes

rauldpm commented May 23, 2024

fcaffieri commented May 21, 2024 •

edited

Loading

fcaffieri commented May 21, 2024 •

edited

Loading