Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature idea] deployment-service action log #96

Open
stevendanna opened this issue Apr 16, 2019 · 1 comment
Open

[feature idea] deployment-service action log #96

stevendanna opened this issue Apr 16, 2019 · 1 comment

Comments

@stevendanna
Copy link
Contributor

Motivation

A goal of Automate 2 is for all customers to trust our automatic management of upgrades and service state. Currently, however, Automate users and Chef Software support engineers cannot answer any of the following questions easily:

  • When was the last upgrade of this deployment?
  • What happened when this deployment was migrated from Automate 1?
  • When was the last time the user changed the deployment configuration?

The ability to answer such questions is essential to debugging problems with Chef Automate installation and upgrades and increasing overall trust in the system.

Feature Description

The Action Log will provide users and support staff with information about the historical and in-progress modifications of their Chef Automate installation via a command-line interface. Information will be provided on both deployment- and service-level events. This would include:

  • Logs for deployment-level events such as initial installation, upgrades, configuration requests, and service state remediations.
  • Logs for service-level events such as installs, loads, restarts, and reconfigurations

This action log would persist across deployment-service restarts and allow you to query information about the entire history of the Chef Automate installation.

The following is example output of such commands (IDs and Timstamps are reused/inconsistent in the example output for convenience).

Overall Deployment History

> chef-automate history
ID         StartTime                  Description     	Status
17d37a16b  2018-09-01T10:31:22+01:00  Initial Deployment   Success
c6961888c  2018-09-03T09:13:41+01:00  Configuration Set    Success
c17a82a1a  2018-09-04T12:22:13+01:00  Automatic Upgrade    Success
87f4dab7e  2018-09-04T13:30:31+01:00  Service Remediation  FAILURE
e96644f81  2018-09-05T22:12:51+01:00  Service Remediation  In Progress

Or, for an A1 migrate that failed once:

> chef-automate history
ID         StartTime                  Description        Status
17d37a16b  2018-09-01T10:31:22+01:00  Migration from A1  Failure
e96644f81  2018-09-01T10:31:29+01:00  Migration from A1  Success
c6961888c  2018-09-03T09:13:41+01:00  Configuration Set  Success
c17a82a1a  2018-09-04T12:22:13+01:00  Automatic Upgrade  In Progress

Individual Service History

The user can inspect the history of individual services. Every service-level action is tied to some deployment-level action.

> chef-automate service-history ingest-service
ID         StartTime                  Description    Cause                       	Status
056eeacff  2018-09-01T10:31:22+01:00  Install        Initial Deployment(17d37a16b)   Success
3cbaae510  2018-09-01T10:31:22+01:00  Configuration  Initial Deployment(17d37a16b)   Success
cf9978b43  2018-09-01T10:31:22+01:00  Load           Initial Deployment(17d37a16b)   Success
53a6e646e  2018-09-01T10:31:22+01:00  Install        Upgrade(c17a82a1a)          	Success
88fafa013  2018-09-01T10:31:22+01:00  Configuration  Upgrade(c17a82a1a)          	Success
3bf3a594c  2018-09-01T10:31:22+01:00  Load           Upgrade(c17a82a1a)          	Success

Individual Event History

The user can poll the status of individual events. The output here might be event-specific. Since we only have deployment-service logs, we also provide information about how to get logs from the system for this event:

> chef-automate show-event c17a82a1a
  	Event: c17a82a1a
Description: Automate Upgrade
     Status: Success

Deployment Service Log:
time="2018-09-05T10:16:01Z" msg="Starting periodic converge" current_manifest=20180905100418 next_manifest=20180905100418
time="2018-09-05T10:16:01Z" msg="Found hart override" hart="&{chef deployment-service /go/src/github.com/chef/a2/results/chef-deployment-service-0.1.0-20180905100253-x86_64-linux.hart 0 1 0 20180905100253}" name=deployment-service origin=chef

... SNIP (actual log would be longer, with command to control output)...

To read the complete Chef Automate log (including all service
logs) for the duration of this event run:

	journalctl -u chef-automate --after 2018-09-01T10:31:22+01:00 --before 2018-09-01T10:32:33+01:00

Data Retention and Key Events

The log should be persistent for the entire lifetime of an installation. However, since an installation may produce a large number of events, the Action log will have data retention and pruning features to limit data growth while retaining key installation events.

Deployment service actions that resulted in no change (for instance, most periodic converges) will be aggressively pruned from the event history. Users will be able to configure the retention period for these events with a configuration value:

action_log.retention.unchanged_actions = “1h”

Deployment service actions that succeeded will be kept for a small amount of time by default. Users will be able to configure the retention period of these events with a configuration value:

action_log.retention.successful_actions = “2d”

Deployment service actions that failed will be kept for a longer amount of time:

action_log.retention.failed_actions = “20d”

Certain key events such as A1 migration and initial deployment will never be pruned.

Since journalctl will still contain full logs, users who want long-term storage of all cluster events can configure their journal log retention accordingly.

@jaym
Copy link
Contributor

jaym commented Apr 16, 2019

just a minor nit, but let's not call it action log because we already have something called actions in the product

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants