You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A goal of Automate 2 is for all customers to trust our automatic management of upgrades and service state. Currently, however, Automate users and Chef Software support engineers cannot answer any of the following questions easily:
When was the last upgrade of this deployment?
What happened when this deployment was migrated from Automate 1?
When was the last time the user changed the deployment configuration?
The ability to answer such questions is essential to debugging problems with Chef Automate installation and upgrades and increasing overall trust in the system.
Feature Description
The Action Log will provide users and support staff with information about the historical and in-progress modifications of their Chef Automate installation via a command-line interface. Information will be provided on both deployment- and service-level events. This would include:
Logs for deployment-level events such as initial installation, upgrades, configuration requests, and service state remediations.
Logs for service-level events such as installs, loads, restarts, and reconfigurations
This action log would persist across deployment-service restarts and allow you to query information about the entire history of the Chef Automate installation.
The following is example output of such commands (IDs and Timstamps are reused/inconsistent in the example output for convenience).
Overall Deployment History
> chef-automate history
ID StartTime Description Status
17d37a16b 2018-09-01T10:31:22+01:00 Initial Deployment Success
c6961888c 2018-09-03T09:13:41+01:00 Configuration Set Success
c17a82a1a 2018-09-04T12:22:13+01:00 Automatic Upgrade Success
87f4dab7e 2018-09-04T13:30:31+01:00 Service Remediation FAILURE
e96644f81 2018-09-05T22:12:51+01:00 Service Remediation In Progress
Or, for an A1 migrate that failed once:
> chef-automate history
ID StartTime Description Status
17d37a16b 2018-09-01T10:31:22+01:00 Migration from A1 Failure
e96644f81 2018-09-01T10:31:29+01:00 Migration from A1 Success
c6961888c 2018-09-03T09:13:41+01:00 Configuration Set Success
c17a82a1a 2018-09-04T12:22:13+01:00 Automatic Upgrade In Progress
Individual Service History
The user can inspect the history of individual services. Every service-level action is tied to some deployment-level action.
The user can poll the status of individual events. The output here might be event-specific. Since we only have deployment-service logs, we also provide information about how to get logs from the system for this event:
> chef-automate show-event c17a82a1a
Event: c17a82a1a
Description: Automate Upgrade
Status: Success
Deployment Service Log:
time="2018-09-05T10:16:01Z" msg="Starting periodic converge" current_manifest=20180905100418 next_manifest=20180905100418
time="2018-09-05T10:16:01Z" msg="Found hart override" hart="&{chef deployment-service /go/src/github.com/chef/a2/results/chef-deployment-service-0.1.0-20180905100253-x86_64-linux.hart 0 1 0 20180905100253}" name=deployment-service origin=chef
... SNIP (actual log would be longer, with command to control output)...
To read the complete Chef Automate log (including all service
logs) for the duration of this event run:
journalctl -u chef-automate --after 2018-09-01T10:31:22+01:00 --before 2018-09-01T10:32:33+01:00
Data Retention and Key Events
The log should be persistent for the entire lifetime of an installation. However, since an installation may produce a large number of events, the Action log will have data retention and pruning features to limit data growth while retaining key installation events.
Deployment service actions that resulted in no change (for instance, most periodic converges) will be aggressively pruned from the event history. Users will be able to configure the retention period for these events with a configuration value:
action_log.retention.unchanged_actions = “1h”
Deployment service actions that succeeded will be kept for a small amount of time by default. Users will be able to configure the retention period of these events with a configuration value:
action_log.retention.successful_actions = “2d”
Deployment service actions that failed will be kept for a longer amount of time:
action_log.retention.failed_actions = “20d”
Certain key events such as A1 migration and initial deployment will never be pruned.
Since journalctl will still contain full logs, users who want long-term storage of all cluster events can configure their journal log retention accordingly.
The text was updated successfully, but these errors were encountered:
Motivation
A goal of Automate 2 is for all customers to trust our automatic management of upgrades and service state. Currently, however, Automate users and Chef Software support engineers cannot answer any of the following questions easily:
The ability to answer such questions is essential to debugging problems with Chef Automate installation and upgrades and increasing overall trust in the system.
Feature Description
The Action Log will provide users and support staff with information about the historical and in-progress modifications of their Chef Automate installation via a command-line interface. Information will be provided on both deployment- and service-level events. This would include:
This action log would persist across deployment-service restarts and allow you to query information about the entire history of the Chef Automate installation.
The following is example output of such commands (IDs and Timstamps are reused/inconsistent in the example output for convenience).
Overall Deployment History
Or, for an A1 migrate that failed once:
Individual Service History
The user can inspect the history of individual services. Every service-level action is tied to some deployment-level action.
Individual Event History
The user can poll the status of individual events. The output here might be event-specific. Since we only have deployment-service logs, we also provide information about how to get logs from the system for this event:
Data Retention and Key Events
The log should be persistent for the entire lifetime of an installation. However, since an installation may produce a large number of events, the Action log will have data retention and pruning features to limit data growth while retaining key installation events.
Deployment service actions that resulted in no change (for instance, most periodic converges) will be aggressively pruned from the event history. Users will be able to configure the retention period for these events with a configuration value:
action_log.retention.unchanged_actions = “1h”
Deployment service actions that succeeded will be kept for a small amount of time by default. Users will be able to configure the retention period of these events with a configuration value:
action_log.retention.successful_actions = “2d”
Deployment service actions that failed will be kept for a longer amount of time:
action_log.retention.failed_actions = “20d”
Certain key events such as A1 migration and initial deployment will never be pruned.
Since journalctl will still contain full logs, users who want long-term storage of all cluster events can configure their journal log retention accordingly.
The text was updated successfully, but these errors were encountered: