Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DTT1 - Test module if Ansible execution fails not report to Workflow #5411

Closed
2 tasks done
fcaffieri opened this issue May 21, 2024 · 7 comments · Fixed by #5423
Closed
2 tasks done

DTT1 - Test module if Ansible execution fails not report to Workflow #5411

fcaffieri opened this issue May 21, 2024 · 7 comments · Fixed by #5423
Assignees
Labels

Comments

@fcaffieri
Copy link
Member

fcaffieri commented May 21, 2024

Description

The objective of the issue is to solve two bugs:

  1. Bug when deleting the known hosts since it gives the following error:
stderr: Host not found in /home/user/.ssh/known_hosts
  1. Bug with Ansible does not report the errors to Workflow causing the Workflow itself not to fail in the event of a failure in the tests

Tasks

  • Fix the iteration bug into the ansible execution
  • Fix the problem that Ansible does not report the error to the workflow in case of a playbook failure

Related

@fcaffieri fcaffieri self-assigned this May 21, 2024
@wazuhci wazuhci moved this to Backlog in Release 4.9.0 May 21, 2024
@fcaffieri
Copy link
Member Author

Update report

Analyzing bug number 1:

The problem is because Jinja is being used to render the playbooks, for example:

- hosts: localhost
   become: true
   become_user: "{{ current_user }}"
   tasks:
     - name: Cleaning old key ssh-keygen registries
       command: "ssh-keygen -f /home/{{current_user}}/.ssh/known_hosts -R '{{ item }}'"
       loop: "{{ hosts_ip }}"

When rendering the playbook with Jinja, what happens is that it tries to replace the value of item with Jinja, when it has to be an iterable of the loop. So when the playbook reaches ansible, it will render the item variable and it has already been replaced and is empty, which is why it cannot loop ansible.
The implementation was modified to not render with Jinja the templates that are not required (within the test module it is not required to use the Jinja render)

Test:

TASK [Cleaning old key ssh-keygen registries] **********************************
changed: [localhost] => (item=ec2-44-212-5-227.compute-1.amazonaws.com) => changed=true
   ansible_loop_var: item
   cmd:
   - ssh-keygen
   - -f
   - ~/.ssh/known_hosts
   - -R
   - ec2-44-212-5-227.compute-1.amazonaws.com
   delta: '0:00:00.007695'
   end: '2024-05-21 17:05:02.493476'
   item: ec2-44-212-5-227.compute-1.amazonaws.com
   msg: ''
   rc: 0
   start: '2024-05-21 17:05:02.485781'
   stderr: Host ec2-44-212-5-227.compute-1.amazonaws.com not found in /home/fcaffieri/.ssh/known_hosts
   stderr_lines: <omitted>
   stdout: ''
   stdout_lines: <omitted>

PLAY RECAP ************************************************ *********************
localhost : ok=2 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0

Now replace the values correctly.

@wazuhci wazuhci moved this from Backlog to In progress in Release 4.9.0 May 21, 2024
@fcaffieri
Copy link
Member Author

fcaffieri commented May 21, 2024

Update report

Analyzing bug number 2.

During the analysis of this bug, an error was found in the execution of the commands that require sudo. Step to detail:

You have to execute this command which installs the manager with the installation wizard:

commands = [
                     f"curl -sO https://{s3_url}/{release}/wazuh-install.sh && bash wazuh-install.sh --wazuh-server {node_name} --ignore-check"
             ]
ConnectionManager.execute_commands(inventory_path, commands)

The execute_commands executes the command via ssh like this:

ssh_command = [
             "ssh",
             "-i", data.get('private_key_path'),
             "-o", "StrictHostKeyChecking=no",
             "-o", "UserKnownHostsFile=/dev/null",
             "-p", str(data.get('port')),
             f"{data.get('username')}@{data.get('host')}",
             "sudo",
             command
         ]

It incorporates sudo into all the commands that require it, the problem is that by having 2 or more commands nested with the "&&", it will only execute the first command with sudo and the subsequent one not. In fact this bug was found for the mentioned example:

bash ./wazuh-install.sh --wazuh-server wazuh-1 --ignore-check -o
./wazuh-install.sh: line 2586: /var/log/wazuh-install.log: Permission denied
This script must be run as root.

That said, it is necessary to modify all the executions of nested commands to a list of commands, since the executor allows sending a list of commands and executes them one by one, thus incorporating sudo for each command. In the aforementioned example, it would look something like this:

commands = [
                     f"curl -sO https://{s3_url}/{release}/wazuh-install.sh",
                     f"bash wazuh-install.sh --wazuh-server {node_name} --ignore-check"
             ]

After these changes the tests were satisfactory.


Analyzing bug number 2: it was detected that the result of the playbook execution is not being analyzed, it is only analyzed if it fails with an exception and not if the executed playbook(s) fail.
I am working on the implementation of the bugs, with the purpose of reporting if any playbook fails, to report this and raise the respective error.

@fcaffieri
Copy link
Member Author

Update report

All modifications to fix the bugs were made.

Test forcing failure into one playbook:

Input yaml:

version: 0.1
description: This workflow is used to test manager deployment for DDT1 PoC
variables:
  manager-os:
    - linux-ubuntu-20.04-amd64
  infra-provider: aws
  working-dir: /tmp/dtt1-poc

tasks:
  # Unique manager allocate task
  - task: "allocate-manager-{manager}"
    description: "Allocate resources for the manager."
    do:
      this: process
      with:
        path: python3
        args:
          - modules/allocation/main.py
          - action: create
          - provider: "{infra-provider}"
          - size: large
          - composite-name: "{manager}"
          - inventory-output: "{working-dir}/manager-{manager}/inventory.yaml"
          - track-output: "{working-dir}/manager-{manager}/track.yaml"
          - label-termination-date: "1d"
          - label-team: "qa"
    on-error: "abort-all"
    foreach:
      - variable: manager-os
        as: manager
    cleanup:
      this: process
      with:
        path: python3
        args:
          - modules/allocation/main.py
          - action: delete
          - track-output: "{working-dir}/manager-{manager}/track.yaml"

  # Generic manager test task
  - task: "run-manager-tests"
    description: "Run tests install for the manager."
    do:
      this: process
      with:
        path: python3
        args:
          - modules/testing/main.py
          - targets:
            - wazuh-1: "{working-dir}/manager-linux-ubuntu-20.04-amd64/inventory.yaml"
          - tests: "install,restart,stop,uninstall"
          - component: "manager"
          - wazuh-version: "4.7.4"
          - wazuh-revision: "40717"
          - live: False
    depends-on:
      - "allocate-manager-linux-ubuntu-20.04-amd64"


The test shows the real error and does not continue with the rest of the tests:

image

Full log:

test-manager-aws.log

@fcaffieri fcaffieri linked a pull request May 22, 2024 that will close this issue
@fcaffieri
Copy link
Member Author

Update report

Full test:

Input yaml:

version: 0.1
description: This workflow is used to test agents deployment for DDT1 PoC
variables:
  agent-os:
    - linux-redhat-8-amd64
    - linux-redhat-8-arm64
    - linux-centos-8-amd64
    - linux-centos-8-arm64
    - linux-debian-12-amd64
    - linux-debian-12-arm64
    - linux-ubuntu-20.04-amd64
    - linux-ubuntu-20.04-arm64
    - linux-oracle-9-amd64
    - linux-amazon-2023-amd64
    - linux-amazon-2023-arm64

  windows-agent-os:
    - windows-server-2019-amd64
    - windows-server-2022-amd64

  macos-agent-os:
    - macos-ventura-13-amd64
    - macos-ventura-13-arm64

  manager-os: linux-ubuntu-22.04-amd64
  infra-provider: aws
  macos-infra-provider: vagrant
  working-dir: /tmp/dtt1-poc

tasks:
  # Unique manager allocate task
  - task: "allocate-manager-{manager-os}"
    description: "Allocate resources for the manager."
    do:
      this: process
      with:
        path: python3
        args:
          - modules/allocation/main.py
          - action: create
          - provider: "{infra-provider}"
          - size: large
          - composite-name: "{manager-os}"
          - inventory-output: "{working-dir}/manager-{manager-os}/inventory.yaml"
          - track-output: "{working-dir}/manager-{manager-os}/track.yaml"
          - label-termination-date: "1d"
          - label-team: "qa"
          - label-issue: "https://github.com/wazuh/wazuh-qa/issues/5191"
    on-error: "abort-all"
    cleanup:
      this: process
      with:
        path: python3
        args:
          - modules/allocation/main.py
          - action: delete
          - track-output: "{working-dir}/manager-{manager-os}/track.yaml"

  # Unique agent allocate task
  - task: "allocate-agent-{agent}"
    description: "Allocate resources for the agent."
    do:
      this: process
      with:
        path: python3
        args:
          - modules/allocation/main.py
          - action: create
          - provider: "{infra-provider}"
          - size: small
          - composite-name: "{agent}"
          - inventory-output: "{working-dir}/agent-{agent}/inventory.yaml"
          - track-output: "{working-dir}/agent-{agent}/track.yaml"
          - label-termination-date: "1d"
          - label-team: "qa"
          - label-issue: "https://github.com/wazuh/wazuh-qa/issues/5191"
    on-error: "abort-all"
    foreach:
      - variable: agent-os
        as: agent
    cleanup:
      this: process
      with:
        path: python3
        args:
          - modules/allocation/main.py
          - action: delete
          - track-output: "{working-dir}/agent-{agent}/track.yaml"

  # Unique macOS agent allocate task
  - task: "allocate-macos-agent-{agent}"
    description: "Allocate resources for the agent."
    do:
      this: process
      with:
        path: python3
        args:
          - modules/allocation/main.py
          - action: create
          - provider: "{macos-infra-provider}"
          - size: small
          - composite-name: "{agent}"
          - inventory-output: "{working-dir}/agent-{agent}/inventory.yaml"
          - track-output: "{working-dir}/agent-{agent}/track.yaml"
          - label-termination-date: "1d"
          - label-team: "qa"
          - label-issue: "https://github.com/wazuh/wazuh-qa/issues/5191"
    on-error: "abort-all"
    foreach:
      - variable: macos-agent-os
        as: agent
    cleanup:
      this: process
      with:
        path: python3
        args:
          - modules/allocation/main.py
          - action: delete
          - track-output: "{working-dir}/agent-{agent}/track.yaml"

  # Unique Windows agent allocate task
  - task: "allocate-windows-agent-{agent}"
    description: "Allocate resources for the agent."
    do:
      this: process
      with:
        path: python3
        args:
          - modules/allocation/main.py
          - action: create
          - provider: "{infra-provider}"
          - size: large
          - composite-name: "{agent}"
          - inventory-output: "{working-dir}/agent-{agent}/inventory.yaml"
          - track-output: "{working-dir}/agent-{agent}/track.yaml"
          - label-termination-date: "1d"
          - label-team: "qa"
          - label-issue: "https://github.com/wazuh/wazuh-qa/issues/5191"
    on-error: "abort-all"
    foreach:
      - variable: windows-agent-os
        as: agent
    cleanup:
      this: process
      with:
        path: python3
        args:
          - modules/allocation/main.py
          - action: delete
          - track-output: "{working-dir}/agent-{agent}/track.yaml"

  # Unique manager provision task
  - task: "provision-manager-{manager-os}"
    description: "Provision the manager."
    do:
      this: process
      with:
        path: python3
        args:
          - modules/provision/main.py
          - inventory: "{working-dir}/manager-{manager-os}/inventory.yaml"
          - install:
            - component: wazuh-manager
              type: assistant
              version: 4.7.4
              live: True
    depends-on:
      - "allocate-manager-{manager-os}"
    on-error: "abort-all"

  # Generic agent test task
  - task: "run-agent-{agent}-tests"
    description: "Run tests install for the agent {agent}."
    do:
      this: process
      with:
        path: python3
        args:
          - modules/testing/main.py
          - targets:
            - wazuh-1: "{working-dir}/manager-{manager-os}/inventory.yaml"
            - agent: "{working-dir}/agent-{agent}/inventory.yaml"
          - tests: "install,registration,connection,basic_info,restart,stop,uninstall"
          - component: "agent"
          - wazuh-version: "4.7.4"
          - wazuh-revision: "40717"
          - live: True
    foreach:
      - variable: agent-os
        as: agent
    depends-on:
      - "allocate-agent-{agent}"
      - "provision-manager-{manager-os}"

  # Generic macOS agent test task
  - task: "run-macos-agent-{agent}-tests"
    description: "Run tests install for the agent {agent}."
    do:
      this: process
      with:
        path: python3
        args:
          - modules/testing/main.py
          - targets:
            - wazuh-1: "{working-dir}/manager-{manager-os}/inventory.yaml"
            - agent: "{working-dir}/agent-{agent}/inventory.yaml"
          - tests: "install,registration,basic_info,connection,restart,stop,uninstall"
          - component: "agent"
          - wazuh-version: "4.7.4"
          - wazuh-revision: "40717"
          - live: True
    foreach:
      - variable: macos-agent-os
        as: agent
    depends-on:
      - "allocate-macos-agent-{agent}"
      - "provision-manager-{manager-os}"

  # Generic Windows agent test task
  - task: "run-windows-agent-{agent}-tests"
    description: "Run tests install for the agent {agent}."
    do:
      this: process
      with:
        path: python3
        args:
          - modules/testing/main.py
          - targets:
            - wazuh-1: "{working-dir}/manager-{manager-os}/inventory.yaml"
            - agent: "{working-dir}/agent-{agent}/inventory.yaml"
          - tests: "install,registration,basic_info,connection,restart,stop,uninstall"
          - component: "agent"
          - wazuh-version: "4.7.4"
          - wazuh-revision: "40717"
          - live: True
    foreach:
      - variable: windows-agent-os
        as: agent
    depends-on:
      - "allocate-windows-agent-{agent}"
      - "provision-manager-{manager-os}"

Result:
Failures are observed in Windows and Debian-12 tests expected in the log, the idea of the test is to show that the failure is returned to the Workflow so that the error is reported:

test-full-agent-with-aws-debug.log

@fcaffieri
Copy link
Member Author

New test without debug:

Result:

In this test, the failure is observed in the Windows server 2022 and the error is displayed without the need for debugging and informs the Workflow.

image

test-full-agent-with-aws.log

@wazuhci wazuhci moved this from In progress to Pending review in Release 4.9.0 May 22, 2024
@wazuhci wazuhci moved this from Pending review to In review in Release 4.9.0 May 23, 2024
@pro-akim
Copy link
Member

Review Notes

LGTM

@wazuhci wazuhci moved this from In review to Pending final review in Release 4.9.0 May 23, 2024
@wazuhci wazuhci moved this from Pending final review to In final review in Release 4.9.0 May 23, 2024
@rauldpm
Copy link
Member

rauldpm commented May 23, 2024

LGTM

@rauldpm rauldpm closed this as completed May 23, 2024
@wazuhci wazuhci moved this from In final review to Done in Release 4.9.0 May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants