Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-666] - Robust health check #715

Merged
merged 28 commits into from
Nov 6, 2023
Merged

Conversation

clemire
Copy link
Contributor

@clemire clemire commented Oct 27, 2023

Changelist

Add HealthCheck interface along with an implementation, and migrate the pricefeed daemon to conform to the interface.

The pricefeed daemon actually does expect to update on a regular cadence, since the update loop does not do any work / computation besides iterating over a cache - so we continue to use this logic to determine daemon health.

Note: I removed the panic that was in the pricefeed daemon on invalid updates. We'll consolidate this logic and rely on this panic to be triggered by the update monitor instead. This also gives users the future option to disable panics for the pricefeed daemon, if they'd like, once we add flag support for delaying or disabling panics on daemon failure.

Upcoming PRs:

  • migrate liquidations with appropriate health-check logic
  • migrate bridge with appropriate health-check logic
  • update the UpdateMonitor code to use the HealthCheck interface instead of relying on daemon response frequency to detect errors.

Test Plan

Existing tests

  • added a new HealthCheck unit tests that validates expected daemon health state after a successful or failed price update.

Author/Reviewer Checklist

  • If this PR has changes that result in a different app state given the same prior state and transaction list, manually add the state-breaking label.
  • If this PR isn't state-breaking but has changes that modify behavior in PrepareProposal or ProcessProposal, manually add the label proposal-breaking.
  • If this PR is one of many that implement a specific feature, manually label them all feature:[feature-name].
  • If you wish to for mergify-bot to automatically create a PR to backport your change to a release branch, manually add the label backport/[branch-name].
  • Manually add any of the following labels: refactor, chore, bug.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 27, 2023

Walkthrough

The changes primarily focus on enhancing the health check functionality of the pricefeed daemon in the protocol/daemons/pricefeed/client package. The Client struct now includes a HealthCheckable interface to track the daemon's health. The StartPriceUpdater function has been modified to take an additional parameter c *Client. New test cases have been added to validate the health check functionality. A new error type ErrEmptyMarketPriceUpdate has been introduced to handle empty market price updates.

Changes

File(s) Summary
protocol/daemons/pricefeed/client/client.go Added sync/atomic package import. Added isHealthy field of type atomic.Bool to the Client struct. Implemented the DaemonClient interface for the Client struct. Added HealthCheck method to check the health of the daemon. Added setHealth method to set the health of the daemon. Modified the newClient function to initialize the isHealthy field. Modified the newTickerWithStop method to accept an interval in milliseconds. Added StartPriceUpdater method to the subTaskRunner with the Client as a parameter. Added c.isHealthy.Store(false) in the Stop method to set the daemon as unhealthy.
protocol/daemons/pricefeed/client/client_test.go The diff includes changes to the FakeSubTaskRunner struct, the StartPriceUpdater function, the TestStart_InvalidConfig function, the TestPriceUpdater_Mixed function, and the addition of the TestHealthCheck function. The StartPriceUpdater function now takes an additional parameter c *Client. The TestStart_InvalidConfig function now checks for an error message related to the health of the pricefeed daemon. The TestPriceUpdater_Mixed function includes a new test case with an empty market price update error. The TestHealthCheck function tests the health check functionality.
protocol/daemons/pricefeed/client/sub_task_runner.go The StartPriceUpdater method of the SubTaskRunner interface now takes an additional parameter c *Client. The StartPriceUpdater method of the SubTaskRunnerImpl struct now takes an additional parameter c *Client and sets the health status of the client based on the success of the price updater task. The RunPriceUpdaterTaskLoop function now returns an error types.ErrEmptyMarketPriceUpdate if the length of the market price updates is 0.
protocol/daemons/pricefeed/client/types/errors.go Added a new file types with a package-level variable ErrEmptyMarketPriceUpdate of type error that represents an error when a market price update has a length of 0. No alterations to existing code or signatures.
protocol/daemons/types/daemon_client.go Added a new interface called DaemonClient in the types package. The DaemonClient interface extends the existing HealthCheckable interface. Additionally, there is a commented out method called Stoppable which is marked as a TODO item with a reference to a ticket (CORE-29). The diff does not include any alterations to the signatures of exported functions, global data structures, global variables, or return values.
protocol/daemons/types/health_checkable.go Added a new package types with an interface HealthCheckable that defines a method HealthCheck which takes a context and returns an error if the service is unhealthy. This interface is intended to be implemented by services that can be health checked.
protocol/daemons/pricefeed/client/client.go The diff introduces the following changes:
  • The sync/atomic package is imported.
  • A new field isHealthy of type atomic.Bool is added to the Client struct.
  • The HealthCheck method is added to the Client struct, which checks the health of the daemon based on the isHealthy field.
  • The setHealth method is added to the Client struct, which sets the health of the daemon.
  • The setHealth method is called in the start method to indicate that the daemon is healthy.
  • The isHealthy field is set to false in the Stop method to indicate that the daemon is not healthy.
  • The StartPriceUpdater method is called with an additional argument c (the client) in the start method. |
    | protocol/daemons/pricefeed/client/client_test.go | The diff includes changes to the FakeSubTaskRunner struct, StartPriceUpdater function, TestStart_InvalidConfig function, TestPriceUpdater_Mixed function, and the addition of the TestHealthCheck function. The StartPriceUpdater function now takes an additional parameter c *Client. The TestStart_InvalidConfig function now checks for an error message related to the daemon being uninitialized. The TestPriceUpdater_Mixed function includes a new test case with an empty market price update error. The TestHealthCheck function tests the health check functionality of the price updater. |
    | protocol/daemons/pricefeed/client/sub_task_runner.go | Added a new parameter c *Client to the StartPriceUpdater function in the SubTaskRunner interface and its implementation. The StartPriceUpdater function now sets the health status of the client based on the success or failure of the price updater task. In the RunPriceUpdaterTaskLoop function, if the length of the market price updates is 0, it returns an error types.ErrEmptyMarketPriceUpdate. |
    | protocol/daemons/pricefeed/client/types/errors.go | Added a new file types with a package declaration. The file defines a new error variable ErrEmptyMarketPriceUpdate with the value "Market price update has length of 0". No alterations to existing code or signatures are made. |
    | protocol/daemons/types/health_checkable.go | Added a new interface called HealthCheckable in the types package. The interface defines a method HealthCheck that takes a context and returns an error if the service is unhealthy. No alterations to existing code or global variables. |
    | protocol/daemons/types/health_checkable_test.go | The diff introduces a new test file types_test.go that tests the HealthCheckableImpl struct in the types package. The tests cover various scenarios of updating and checking the health status. The HealthCheckableImpl struct is initialized with a time provider and has methods to record update successes and failures, as well as perform health checks. The tests verify the expected behavior of the health check based on the recorded updates. No alterations to the signatures of exported functions, global data structures, global variables, interfaces, return values, or thrown exceptions are made. |
    | protocol/daemons/pricefeed/client/client.go | The diff introduces the following changes:
  • The Client struct now includes the HealthCheckableImpl field to track the health of the daemon.
  • The Client struct implements the HealthCheckable interface.
  • The newClient function initializes the health status of the daemon.
  • The newTickerWithStop function now expects the Client as a parameter.
  • The Stop method toggles the daemon into an unhealthy state when it stops.
  • The start method now expects the Client as a parameter when calling subTaskRunner.StartPriceUpdater. |
    | protocol/daemons/pricefeed/client/client_test.go | The diff includes changes to the FakeSubTaskRunner struct, StartPriceUpdater function, TestStart_InvalidConfig function, TestPriceUpdater_Mixed function, and the addition of the TestHealthCheck_Mixed function. The StartPriceUpdater function now takes an additional parameter c *Client. The TestStart_InvalidConfig function now checks for an error message related to the health of the pricefeed daemon. The TestPriceUpdater_Mixed function includes a new test case with an empty market price update error. The TestHealthCheck_Mixed function tests the health check functionality of the price updater. |
    | protocol/daemons/pricefeed/client/sub_task_runner.go | Added error handling and logging in the StartPriceUpdater function of the SubTaskRunnerImpl struct. The StartPriceUpdater function now takes a c *Client parameter. In the RunPriceUpdaterTaskLoop function, if the length of the request.MarketPriceUpdates is 0, it returns an error types.ErrEmptyMarketPriceUpdate. |
    | protocol/daemons/types/health_checkable.go | The diff introduces a new package called "types" that contains an interface called "HealthCheckable" and a struct called "timeBoundedHealthCheckable". The "HealthCheckable" interface defines methods for health checking a service, reporting success or failure, and performing a health check. The "timeBoundedHealthCheckable" struct implements the "HealthCheckable" interface and tracks the timestamps of the last successful and failed updates. It also provides methods for reporting success or failure and performing health checks. The implementation includes synchronization using a mutex. No alterations to existing code or global variables are mentioned in the diff. |
    | protocol/daemons/types/health_checkable_test.go | The diff introduces a new test function TestHealthCheckableImpl_Mixed in the types_test package. This test function tests the behavior of a health checkable implementation in different scenarios. It includes test cases for both healthy and unhealthy states based on the updates reported to the health checkable instance. The test cases cover scenarios such as no updates, no successful updates, recent successful updates, recent failed updates, and updates exceeding a certain time limit. The test function uses a mock time provider to control the timestamps used during the test. No alterations to the signatures of exported functions, global data structures, global variables, interfaces, return values, or thrown exceptions are made. |
    | protocol/daemons/types/health_checkable.go | The diff introduces a new package called "types" that contains an interface called "HealthCheckable" and a struct called "timeBoundedHealthCheckable". The "HealthCheckable" interface defines methods for health checking a service, reporting success or failure, and performing a health check. The "timeBoundedHealthCheckable" struct implements the "HealthCheckable" interface and tracks the timestamps of the last successful and failed updates. It provides methods for reporting success or failure, performing health checks, and initializing the health status. The implementation includes synchronization using a mutex. The diff also introduces a constant "MaximumAcceptableUpdateDelay" and a function "NewHealthCheckableImpl" for creating a new instance of "HealthCheckableImpl". |
    | protocol/daemons/types/health_checkable_test.go | The diff introduces a new test file for the types package. It includes tests for the HealthCheckableImpl struct and its methods. The tests cover different scenarios of health check status based on the recorded updates. The code changes include the addition of test cases, test data, and test assertions. |
    | protocol/daemons/pricefeed/client/client.go | The diff introduces the following changes:
  • Import of the "cosmossdk.io/errors" package.
  • Import of the "github.com/dydxprotocol/v4-chain/protocol/lib/time" package.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "SubTaskRunner" interface and its implementation.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "SubTaskRunnerImpl" struct.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "SubTaskRunnerImpl" struct.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "SubTaskRunnerImpl" struct.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "SubTaskRunnerImpl" struct.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "SubTaskRunnerImpl" struct.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "SubTaskRunnerImpl" struct.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "SubTaskRunnerImpl" struct.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "StartPriceUpdater" function.
  • Addition of the "c *Client" parameter to the "StartPriceUpdater" function in the "

Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • If you reply to a review comment from CodeRabbit, the bot will automatically respond.
  • To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
  • Note: Review comments are made on code diffs or files, not on the PR overview.
  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai help to get help.
  • @coderabbitai resolve to resolve all the CodeRabbit review comments.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

@clemire clemire force-pushed the crystal/CORE-666-robust-health-check branch from 108949f to 29919ca Compare October 27, 2023 17:54
@clemire clemire marked this pull request as ready for review October 27, 2023 22:34
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between da31d6e and b8b00c1.
Files selected for processing (6)
  • protocol/daemons/pricefeed/client/client.go (5 hunks)
  • protocol/daemons/pricefeed/client/client_test.go (4 hunks)
  • protocol/daemons/pricefeed/client/sub_task_runner.go (5 hunks)
  • protocol/daemons/pricefeed/client/types/errors.go (1 hunks)
  • protocol/daemons/types/daemon_client.go (1 hunks)
  • protocol/daemons/types/health_checkable.go (1 hunks)
Files skipped from review due to trivial changes (1)
  • protocol/daemons/pricefeed/client/types/errors.go
Additional comments: 20
protocol/daemons/types/health_checkable.go (1)
  • 1-11: The HealthCheckable interface is well defined and follows the best practices of Go interfaces. It is simple, small, and provides a single responsibility, which is to check the health of a service. The use of context allows for timeout control, which is a good practice for potentially long-running operations.
protocol/daemons/types/daemon_client.go (1)
  • 1-8: The new DaemonClient interface is a good addition as it provides a clear contract for what a daemon client should be able to do. It currently extends the HealthCheckable interface, which is a good start. The TODO comment indicates that there are plans to extend it further with a Stoppable interface, which would be a good addition for managing the lifecycle of the daemons.
protocol/daemons/pricefeed/client/sub_task_runner.go (5)
  • 38-41: The StartPriceUpdater method in the SubTaskRunner interface now requires a Client parameter. Ensure that all calls to this method throughout the codebase have been updated to match the new signature.

  • 80-83: The StartPriceUpdater method in the SubTaskRunnerImpl struct now requires a Client parameter. Ensure that all calls to this method throughout the codebase have been updated to match the new signature.

  • 93-95: The error handling logic has been improved. Instead of panicking when an error occurs, the error is now logged and the health status of the client is set based on the success of the task.

  • 271-274: The comment has been updated to reflect the new error handling logic. Instead of panicking when the length of the market price updates is 0, an error is now returned.

  • 299-301: An error types.ErrEmptyMarketPriceUpdate is now returned when the length of the market price updates is 0. This is a good practice as it allows the caller to handle the error appropriately.

protocol/daemons/pricefeed/client/client_test.go (5)
  • 54-54: The StartPriceUpdater function signature has been updated to include a Client parameter. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 251-253: The test now checks that the daemon is not healthy on startup and becomes healthy after the first successful market update. This is a good practice as it ensures that the health check functionality is working as expected.

  • 627-627: The test case "No exchange market prices, does not call UpdateMarketPrices" now expects an error types.ErrEmptyMarketPriceUpdate when there are no exchange market prices. This is a good practice as it ensures that the function behaves as expected when there are no exchange market prices.

  • 730-751: The singleTickTickerAndStop function creates a ticker that ticks once before the stop channel is signaled. This function is used in the tests to simulate a single tick of the price updater. This is a good practice as it allows for more precise control over the timing of the tests.

  • 753-811: The TestHealthCheck function tests the health check functionality of the daemon. It first tests that the daemon becomes healthy after a successful market price update, and then tests that the daemon becomes unhealthy after a failed market price update. This is a good practice as it ensures that the health check functionality is working as expected.

protocol/daemons/pricefeed/client/client.go (8)
  • 8-8: The import of sync/atomic is new and is used for atomic operations on the isHealthy field of the Client struct. This ensures that read and write operations on isHealthy are thread-safe.

  • 54-58: The isHealthy field has been added to the Client struct. It is an atomic boolean that indicates whether the daemon is healthy. The atomic boolean ensures that read and write operations on isHealthy are thread-safe.

  • 61-61: The assertion that Client implements the DaemonClient interface is new. This is a good practice as it will cause a compile-time error if Client does not satisfy the DaemonClient interface.

  • 67-75: The HealthCheck method has been added to the Client struct. It checks the isHealthy field and returns an error if the daemon is unhealthy. This is a good practice as it allows for health checks of the daemon.

  • 78-79: The setHealth method has been added to the Client struct. It sets the isHealthy field. This is a good practice as it allows for setting the health status of the daemon.

  • 97-97: The comment has been updated to remove the reference to client.CompleteStartup. Ensure that this change is correct and that the newTickerWithStop method is still being called from the StartNewClient method before the daemonStartup waitgroup signals.

  • 122-122: The isHealthy field is set to false in the Stop method. This is a good practice as it marks the daemon as unhealthy when it is stopped.

  • 281-281: The StartPriceUpdater method now takes an additional parameter c *Client. Ensure that all calls to this method throughout the codebase have been updated to match the new signature.

@clemire clemire force-pushed the crystal/CORE-666-robust-health-check branch from b8b00c1 to 4f499ad Compare October 31, 2023 22:52
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between cb8e3a1 and 4f499ad.
Files selected for processing (6)
  • protocol/daemons/pricefeed/client/client.go (5 hunks)
  • protocol/daemons/pricefeed/client/client_test.go (4 hunks)
  • protocol/daemons/pricefeed/client/sub_task_runner.go (5 hunks)
  • protocol/daemons/pricefeed/client/types/errors.go (1 hunks)
  • protocol/daemons/types/daemon_client.go (1 hunks)
  • protocol/daemons/types/health_checkable.go (1 hunks)
Files skipped from review due to trivial changes (1)
  • protocol/daemons/pricefeed/client/types/errors.go
Additional comments: 16
protocol/daemons/types/health_checkable.go (1)
  • 1-11: The HealthCheckable interface is well defined and follows the Go best practices for interface design. It's simple, clear, and provides a single method HealthCheck that takes a context and returns an error. This allows for flexible implementation across different types of services that can be health checked.
protocol/daemons/types/daemon_client.go (1)
  • 1-8: The new DaemonClient interface extends HealthCheckable and has a placeholder for Stoppable. Ensure that the Stoppable method is implemented in the future as per the TODO comment. Also, make sure that all daemons that implement this interface provide an implementation for the methods of HealthCheckable.
protocol/daemons/pricefeed/client/sub_task_runner.go (4)
  • 35-41: The StartPriceUpdater function in the SubTaskRunner interface has been updated to include a new parameter c *Client. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 77-83: The StartPriceUpdater function in the SubTaskRunnerImpl struct has been updated to include a new parameter c *Client. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 90-98: The StartPriceUpdater function now sets the health status of the client based on the success or failure of the price updater task. This is a good practice as it allows for better monitoring and error handling.

  • 268-277: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [268-300]

The RunPriceUpdaterTaskLoop function now returns an error types.ErrEmptyMarketPriceUpdate if the length of the market price updates is 0. This is a good practice as it allows for better error handling and debugging.

protocol/daemons/pricefeed/client/client.go (6)
  • 5-10: The sync/atomic package is imported to provide atomic operations on the isHealthy field. This is a good practice for concurrent programming to avoid data races.

  • 50-58: The isHealthy field is added to the Client struct to track the health status of the daemon. The sync/atomic package is used to provide atomic operations on this field, which is a good practice for concurrent programming to avoid data races.

  • 63-75: The HealthCheck method is added to the Client struct to check the health of the daemon based on the isHealthy field. This is a good practice for monitoring the health status of the daemon.

  • 77-80: The setHealth method is added to the Client struct to set the health status of the daemon. This is a good practice for updating the health status of the daemon.

  • 122-123: The isHealthy field is set to false in the Stop method to indicate that the daemon is not healthy. This is a good practice for updating the health status of the daemon when it is stopped.

  • 278-284: The StartPriceUpdater method is called with an additional argument c (the client) in the start method. This is a good practice for passing the client to the StartPriceUpdater method so that it can update the health status of the client based on the success of the price updater task.

protocol/daemons/pricefeed/client/client_test.go (4)
  • 51-57: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [54-58]

The StartPriceUpdater function now takes an additional parameter c *Client. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 251-253: The test case now checks for an error message related to the health of the pricefeed daemon. This is a good practice as it ensures that the health check functionality is working as expected.

  • 627-628: A new test case has been added to handle the scenario where the length of the market price updates is 0. This is a good practice as it ensures that the function can handle edge cases.

  • 753-811: A new test function TestHealthCheck has been added. This function tests the health check functionality of the pricefeed daemon. It checks that the daemon's health status changes based on the success or failure of the price update task. This is a good practice as it ensures that the health check functionality is working as expected.

protocol/daemons/pricefeed/client/client.go Outdated Show resolved Hide resolved
protocol/daemons/types/daemon_client.go Outdated Show resolved Hide resolved
protocol/daemons/pricefeed/client/client_test.go Outdated Show resolved Hide resolved
protocol/daemons/pricefeed/client/client.go Outdated Show resolved Hide resolved
// HealthCheck returns an error if a service is unhealthy. This method takes
// a context to allow the caller to use a context-based timeout with the call.
// If the service is healthy, this method returns nil.
HealthCheck(ctx context.Context) (err error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should be the behavior if the daemon is in STOPPED state? Or is this not a concern because HealthCheck method won't be able to get called at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the stopped state, the last successful update timestamp will never be refreshed and the daemon will become unhealthy after a waiting period of 5 min.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 4f499ad and d1e2aee.
Files selected for processing (5)
  • protocol/daemons/liquidation/client/sub_task_runner.go (1 hunks)
  • protocol/daemons/pricefeed/client/client.go (5 hunks)
  • protocol/daemons/pricefeed/client/client_test.go (4 hunks)
  • protocol/daemons/pricefeed/client/sub_task_runner.go (6 hunks)
  • protocol/daemons/types/health_checkable.go (1 hunks)
Files skipped from review due to trivial changes (1)
  • protocol/daemons/types/health_checkable.go
Additional comments: 15
protocol/daemons/pricefeed/client/sub_task_runner.go (4)
  • 36-42: The StartPriceUpdater function now takes a c *Client parameter. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 78-84: The StartPriceUpdater function now takes a c *Client parameter. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 91-101: The StartPriceUpdater function now sets the health status of the client based on the success of the price updater task. This is a good practice as it allows for better monitoring and error handling.

  • 270-279: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [270-304]

The RunPriceUpdaterTaskLoop function now returns an error types.ErrEmptyMarketPriceUpdate if the length of the market price updates is 0. This is a good practice as it allows for better error handling and debugging.

protocol/daemons/pricefeed/client/client.go (7)
  • 2-8: The import of the "cosmossdk.io/errors" package is new. Ensure that it is used correctly in the code.

  • 50-59: The healthLock and healthError fields are added to the Client struct. These are used to track the health of the daemon and synchronize access to the healthError field.

  • 65-77: The HealthCheck method is added to the Client struct. This method checks the health of the daemon and returns an error if the daemon is unhealthy. The method is synchronized by the healthLock.

  • 79-86: The setHealth method is added to the Client struct. This method sets the health of the daemon and is synchronized by the healthLock.

  • 88-93: The newClient function is modified to initialize the healthError field with an error indicating that the daemon is uninitialized.

  • 128-130: The Stop method is modified to set the daemon as unhealthy when it is stopped.

  • 286-291: The StartPriceUpdater method of the SubTaskRunner interface and SubTaskRunnerImpl struct now takes an additional parameter c *Client and sets the health status of the client based on the success of the price updater task.

protocol/daemons/pricefeed/client/client_test.go (4)
  • 51-57: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [54-58]

The StartPriceUpdater function now takes an additional parameter c *Client. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 251-253: The test now checks for an error message related to the daemon being uninitialized. This is a good practice as it ensures that the daemon is in the expected state at startup.

  • 627-628: A new test case has been added to check for an error when the length of the market price updates is 0. This is a good practice as it ensures that the function behaves as expected when given an empty input.

  • 753-811: A new test function TestHealthCheck_Mixed has been added to test the health check functionality of the daemon. This is a good practice as it ensures that the health check functionality is working as expected.

protocol/daemons/liquidation/client/sub_task_runner.go Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between d1e2aee and 6c75e51.
Files selected for processing (2)
  • protocol/daemons/types/health_checkable.go (1 hunks)
  • protocol/daemons/types/health_checkable_test.go (1 hunks)
Files skipped from review due to trivial changes (1)
  • protocol/daemons/types/health_checkable_test.go
Additional comments: 9
protocol/daemons/types/health_checkable.go (9)
  • 1-8: The import section looks clean and organized. All the necessary packages are imported.

  • 10-12: The constant MaximumAcceptableUpdateDelay is well defined and its purpose is clear.

  • 14-18: The HealthCheckable interface is well defined and its purpose is clear.

  • 20-40: The HealthCheckableImpl struct is well defined and its fields are well documented. The use of a mutex for synchronization is a good practice to prevent data races.

  • 42-47: The NewHealthCheckableImpl function is well defined and its purpose is clear.

  • 49-58: The InitializeHealthStatus method is well defined and its purpose is clear. The use of a mutex for synchronization is a good practice to prevent data races.

  • 60-66: The RecordUpdateSuccess method is well defined and its purpose is clear. The use of a mutex for synchronization is a good practice to prevent data races.

  • 68-75: The RecordUpdateFailure method is well defined and its purpose is clear. The use of a mutex for synchronization is a good practice to prevent data races.

  • 77-113: The HealthCheck method is well defined and its purpose is clear. The use of a mutex for synchronization is a good practice to prevent data races. The error handling is well implemented.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 6c75e51 and 3795643.
Files selected for processing (4)
  • protocol/daemons/pricefeed/client/client.go (4 hunks)
  • protocol/daemons/pricefeed/client/client_test.go (5 hunks)
  • protocol/daemons/pricefeed/client/sub_task_runner.go (6 hunks)
  • protocol/daemons/types/health_checkable_test.go (1 hunks)
Additional comments: 16
protocol/daemons/types/health_checkable_test.go (1)
  • 1-137: The tests are well-structured and cover a variety of scenarios. The use of table-driven tests makes it easy to add new test cases in the future. The use of mocks for the time provider is a good practice as it allows for precise control over the time in tests. The error messages in the tests are descriptive and helpful. The tests also check for both the presence and absence of errors, which is a good practice.
protocol/daemons/pricefeed/client/client.go (5)
  • 49-55: The Client struct now includes the HealthCheckableImpl field to track the health of the daemon. This is a good practice for monitoring the health of long-running services.

  • 65-79: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [60-69]

The newClient function initializes the health status of the daemon. This is a good practice to ensure that the health status is properly set when the client is created.

  • 65-79: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [72-84]

The newTickerWithStop function now expects the Client as a parameter. This is a good practice to ensure that the client's state is properly managed within the function.

  • 98-108: The Stop method toggles the daemon into an unhealthy state when it stops. This is a good practice to ensure that the health status is properly updated when the daemon is stopped.

  • 262-268: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [262-269]

The start method now expects the Client as a parameter when calling subTaskRunner.StartPriceUpdater. This is a good practice to ensure that the client's state is properly managed within the function.

protocol/daemons/pricefeed/client/sub_task_runner.go (4)
  • 2-13: The imports look fine. Ensure that the newly added packages "cosmossdk.io/errors" and "github.com/dydxprotocol/v4-chain/protocol/lib/time" are used in the code.

  • 37-43: The StartPriceUpdater function in the SubTaskRunner interface now takes a *Client as a parameter. Ensure that all implementations of this interface have been updated accordingly.

  • 79-85: The StartPriceUpdater function in the SubTaskRunnerImpl struct now takes a *Client as a parameter. This is consistent with the interface change.

  • 91-107: The StartPriceUpdater function now updates the health status of the client based on the success of the price updater task. This is a good practice as it allows for better monitoring of the system's health.

protocol/daemons/pricefeed/client/client_test.go (6)
  • 14-14: The import of the libtime package is added. Ensure that it is used in the code.

  • 55-58: The StartPriceUpdater function now takes a Client as a parameter. This change should be reflected in all calls to this function.

  • 252-258: The test case checks that the daemon is not healthy on startup and becomes healthy after the first successful market update. This is a good practice to ensure the health check functionality is working as expected.

  • 632-633: The test case checks for an error when there are no exchange market prices. This is a good practice to ensure the system can handle edge cases.

  • 735-756: The singleTickTickerAndStop function creates a ticker that ticks once before the stop channel is signaled. This function is used in the test cases to simulate the behavior of the price updater task loop.

  • 758-816: The TestHealthCheck_Mixed function tests the HealthCheck method under different scenarios. This is a good practice to ensure the health check functionality is working as expected under different conditions.

protocol/daemons/pricefeed/client/sub_task_runner.go Outdated Show resolved Hide resolved
protocol/daemons/types/health_checkable.go Outdated Show resolved Hide resolved
protocol/daemons/types/health_checkable.go Outdated Show resolved Hide resolved
protocol/daemons/types/health_checkable.go Outdated Show resolved Hide resolved
protocol/daemons/types/health_checkable.go Outdated Show resolved Hide resolved
protocol/daemons/types/health_checkable.go Outdated Show resolved Hide resolved
protocol/daemons/types/health_checkable.go Outdated Show resolved Hide resolved
protocol/daemons/types/health_checkable.go Outdated Show resolved Hide resolved
protocol/daemons/types/health_checkable.go Outdated Show resolved Hide resolved
protocol/daemons/types/health_checkable.go Outdated Show resolved Hide resolved
protocol/daemons/types/health_checkable.go Outdated Show resolved Hide resolved
protocol/daemons/pricefeed/client/client.go Outdated Show resolved Hide resolved
if err != nil {
panic(err)

if err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should something similar happen for the following subtasks too?

  • StartPriceEncoder
  • StartPriceFetcher
  • StartMarketParamUpdater

Why or why not?

Copy link
Contributor Author

@clemire clemire Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • StartPriceEncoder
  • StartPriceFetcher

No, because I think measuring the contents of the market update already captures more or less the same information, and is much simpler, than instrumenting these subtasks as well.

  • StartMarketParamUpdater

I think we made the decision that if a market update fails, we will continue to operate the protocol with the current market configuration on the daemon so that trading can continue for existing markets. (The reasonable assumption here IMO is that market param updates are failing not just for one validator, but across the network.) Marking unhealthy and panicking on a failure here could bring down the network instead of allowing degraded performance by continuing to support trading for existing markets. The better approach would probably be to continue the network and allow another governance vote to fix any bad parameters in the exchange_config_json.

One possibility to consider is that some daemons could fail and others succeed due to disagreeing app versions, but I think we should catch this by upgrading via governance to enforce compatibility if there is a backwards-incompatible change in daemon configuration.

Copy link
Contributor

@ttl33 ttl33 Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, makes sense.

For StartMarketParamUpdater do we have clear err log msgs to indicate this failed? Do we have dashboard + alerting on this?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 3795643 and 393b53b.
Files selected for processing (5)
  • protocol/daemons/pricefeed/client/client.go (4 hunks)
  • protocol/daemons/pricefeed/client/client_test.go (4 hunks)
  • protocol/daemons/pricefeed/client/sub_task_runner.go (6 hunks)
  • protocol/daemons/types/health_checkable.go (1 hunks)
  • protocol/daemons/types/health_checkable_test.go (1 hunks)
Additional comments: 21
protocol/daemons/types/health_checkable_test.go (1)
  • 1-146: The test cases are comprehensive and cover a variety of scenarios. The use of table-driven tests makes it easy to add new test cases in the future. The use of mocks for time provides control over the test environment. Good job!
protocol/daemons/pricefeed/client/client.go (4)
  • 31-39: The HealthCheckable interface is added to the Client struct to track the health of the daemon. This is a good practice for monitoring the health status of long-running services.

  • 60-68: The newClient function is updated to initialize the HealthCheckable field with a new TimeBoundedHealthCheckable instance. This is a good practice for initializing the health check mechanism.

  • 80-82: The newTickerWithStop method is updated to take a *Client parameter. This is a good practice for encapsulating the logic for creating a new ticker and a stop channel within the Client struct.

  • 259-265: The StartPriceUpdater method is updated to take a *Client parameter. This is a good practice for encapsulating the logic for starting the price updater within the Client struct.

protocol/daemons/types/health_checkable.go (9)
  • 1-8: The import section is clean and well-organized. All the necessary packages are imported.

  • 10-12: The constant MaxAcceptableUpdateDelay is well defined and its purpose is clear.

  • 14-22: The HealthCheckable interface is well defined with clear method signatures.

  • 24-41: The timestampWithError struct and its methods are well defined. The Update method updates the timestamp and error, and the Timestamp and Error methods return the timestamp and error respectively.

  • 43-61: The timeBoundedHealthCheckable struct is well defined and implements the HealthCheckable interface. It tracks the timestamps of the last successful and failed updates and uses a time provider to determine the current time.

  • 63-71: The NewTimeBoundedHealthCheckable function creates a new HealthCheckable instance and initializes it to an unhealthy state by reporting an error. This is a good practice as it ensures that the health check is not falsely reporting a healthy state before the first update.

  • 73-79: The ReportSuccess method records a successful update. It is thread-safe as it uses a mutex lock to prevent data races.

  • 81-86: The ReportFailure method records a failed update. It is also thread-safe as it uses a mutex lock.

  • 88-129: The HealthCheck method checks the health of the service and returns an error if the service is unhealthy. The conditions for unhealthiness are well defined and the error messages are informative. The method is thread-safe as it uses a mutex lock.

protocol/daemons/pricefeed/client/sub_task_runner.go (4)
  • 2-8: The import of "cosmossdk.io/errors" is added. Ensure that this package is available and correctly imported.

  • 36-42: The StartPriceUpdater function in the SubTaskRunner interface now takes an additional parameter c *Client. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 78-84: The StartPriceUpdater method in the SubTaskRunnerImpl struct now takes an additional parameter c *Client. Ensure that all calls to this method throughout the codebase have been updated to match the new signature.

  • 90-104: The StartPriceUpdater method now checks if the RunPriceUpdaterTaskLoop function returns an error. If it does, it logs the error and calls the c.ReportFailure method. This is a good practice as it ensures that errors are properly logged and reported.

protocol/daemons/pricefeed/client/client_test.go (3)
  • 53-57: The StartPriceUpdater function now takes an additional parameter c *Client. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 248-261: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [248-304]

The TestStop function tests the Stop method of the Client struct. It sets up a daemon and gRPC servers, runs them, and then stops the daemon. The test checks that the Stop method executes successfully without any hangs. This test seems to be well-written and covers the necessary scenarios.

  • 628-634: The TestHealthCheck_Mixed function tests the HealthCheck method of the Client struct in various scenarios. It checks the health status of the daemon based on the UpdateMarketPrices error response. This test seems to be well-written and covers the necessary scenarios.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 393b53b and 0c48c23.
Files selected for processing (1)
  • protocol/daemons/pricefeed/client/client_test.go (4 hunks)
Additional comments: 5
protocol/daemons/pricefeed/client/client_test.go (5)
  • 51-57: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [54-58]

The StartPriceUpdater function now takes an additional parameter c *Client. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 251-257: The test TestStart_InvalidConfig now checks for an error in the client.HealthCheck() function and expects the error message to contain "no successful update has occurred". This is a good practice to ensure that the health check functionality is working as expected.

  • 631-633: The TestPriceUpdater_Mixed test now includes a new test case where priceUpdateError is set to types.ErrEmptyMarketPriceUpdate. This is a good practice to test for edge cases and error conditions.

  • 734-760: The singleTickTickerAndStop function is added to create a ticker that ticks once before the stop channel is signaled. This is a good practice to simulate real-world scenarios in unit tests.

  • 762-820: The TestHealthCheck_Mixed test is added to test the health check functionality of the price updater. This is a good practice to ensure that the health check functionality is working as expected.

protocol/daemons/types/health_checkable.go Outdated Show resolved Hide resolved
protocol/daemons/types/health_checkable.go Show resolved Hide resolved
protocol/daemons/types/health_checkable.go Outdated Show resolved Hide resolved
protocol/daemons/types/health_checkable.go Outdated Show resolved Hide resolved
protocol/daemons/types/health_checkable.go Outdated Show resolved Hide resolved
protocol/daemons/types/health_checkable.go Show resolved Hide resolved
protocol/daemons/types/health_checkable.go Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 0c48c23 and 9db3da6.
Files selected for processing (3)
  • protocol/daemons/pricefeed/client/client_test.go (5 hunks)
  • protocol/daemons/types/health_checkable.go (1 hunks)
  • protocol/testutil/daemons/common.go (1 hunks)
Additional comments: 14
protocol/daemons/types/health_checkable.go (9)
  • 1-8: The import section is clean and well-organized. All the necessary packages for this file are imported correctly.

  • 10-12: The constant MaxAcceptableUpdateDelay is defined correctly. It's good to see that the delay time is defined as a constant at the top of the file, which makes it easier to change if needed.

  • 14-24: The HealthCheckable interface is well-defined with clear method signatures. The comments provide a good explanation of what each method does.

  • 26-43: The timestampWithError struct and its methods are well-implemented. The Update method allows for easy updating of the timestamp and error, and the Timestamp and Error methods provide easy access to these fields.

  • 45-63: The timeBoundedHealthCheckable struct is well-implemented. It correctly implements the HealthCheckable interface and includes a mutex for thread safety. The comments provide a good explanation of the struct and its fields.

  • 65-73: The NewTimeBoundedHealthCheckable function is well-implemented. It correctly creates a new timeBoundedHealthCheckable and initializes it to an unhealthy state.

  • 75-81: The ReportSuccess method is well-implemented. It correctly updates the lastSuccessfulUpdate field in a thread-safe manner.

  • 83-88: The ReportFailure method is well-implemented. It correctly updates the lastFailedUpdate field in a thread-safe manner.

  • 90-131: The HealthCheck method is well-implemented. It correctly checks the health of the service based on the timestamps of the last successful and failed updates. The error messages provide clear information about the reason for the service being unhealthy.

protocol/daemons/pricefeed/client/client_test.go (5)
  • 15-15: The new import daemontestutils is used for testing utilities related to daemons. Ensure that this package is available and accessible.

  • 55-58: The StartPriceUpdater function now takes a new parameter c *Client. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 252-258: The health check is expected to fail on startup until the first successful market update. This is a good practice as it ensures that the daemon is functioning correctly before reporting a healthy status.

  • 632-633: A new test case has been added to handle an empty market price update. This is a good practice as it ensures that the function behaves correctly when no market prices are available.

  • 735-793: A new test function TestHealthCheck_Mixed has been added to test the health check functionality. This is a good practice as it ensures that the health check behaves correctly under different conditions.

protocol/testutil/daemons/common.go Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 9db3da6 and 0d8d97b.
Files selected for processing (1)
  • protocol/daemons/types/health_checkable.go (1 hunks)
Additional comments: 1
protocol/daemons/types/health_checkable.go (1)
  • 1-133: The HealthCheckable interface and timeBoundedHealthCheckable struct are well designed and thread-safe. The health check mechanism is based on the timestamps of the last successful and failed updates, which is a good approach. The error messages are informative and provide useful context about the health status of the service. The use of a time provider for timestamping and checking update staleness is a good practice. The NewTimeBoundedHealthCheckable function initializes the health checkable to an unhealthy state, which is a safe default. The ReportSuccess and ReportFailure methods are thread-safe and update the relevant timestamps correctly. The HealthCheck method checks the health status correctly based on the timestamps and returns informative error messages. Overall, the code is well written, follows best practices, and is easy to understand.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 0d8d97b and 4d14a5b.
Files selected for processing (3)
  • protocol/daemons/pricefeed/client/sub_task_runner.go (6 hunks)
  • protocol/daemons/types/health_checkable.go (1 hunks)
  • protocol/daemons/types/health_checkable_test.go (1 hunks)
Files skipped from review due to trivial changes (1)
  • protocol/daemons/types/health_checkable_test.go
Additional comments: 13
protocol/daemons/types/health_checkable.go (9)
  • 1-8: The import section looks clean and organized. All the necessary packages for this file are imported correctly.

  • 10-12: The constant MaxAcceptableUpdateDelay is defined correctly. It's good to see that magic numbers are avoided and constants are used instead.

  • 14-24: The HealthCheckable interface is well defined with clear method signatures. It's a good practice to define an interface for services that can be health checked.

  • 26-43: The timestampWithError struct and its methods are well defined. It couples a timestamp and error to make it easier to update them in tandem.

  • 45-64: The timeBoundedHealthCheckable struct is well defined. It implements the HealthCheckable interface by tracking the timestamps of the last successful and failed updates. The comments provide a clear explanation of the struct and its purpose.

  • 66-74: The NewTimeBoundedHealthCheckable function is well defined. It creates a new HealthCheckable instance and initializes it to an unhealthy state by reporting an error. This is a good practice as it ensures that the health check status is not undefined at the start.

  • 76-82: The ReportSuccess method is well defined. It records a successful update and is thread-safe.

  • 84-89: The ReportFailure method is well defined. It records a failed update and is thread-safe.

  • 91-132: The HealthCheck method is well defined. It returns an error if a service is unhealthy based on the conditions defined in the comments. The method is thread-safe and the error messages are descriptive, which will help in debugging.

protocol/daemons/pricefeed/client/sub_task_runner.go (4)
  • 2-8: The import of the "cosmossdk.io/errors" package is new. Ensure that it is used correctly in the code.

  • 36-42: The StartPriceUpdater function in the SubTaskRunner interface now takes an additional Client parameter. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 78-84: The StartPriceUpdater function in the SubTaskRunnerImpl struct now takes an additional Client parameter. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 90-103: The StartPriceUpdater function now reports success or failure to the client and logs errors. This is a good practice as it allows for better monitoring and error handling.

Comment on lines 274 to 283
metrics.Latency,
)

// On startup the length of request will likely be 0. However, sending a request of length 0
// is a fatal error.
// On startup the length of request will likely be 0. Even so, we return an error here because this
// is unexpected behavior once the daemon reaches a steady state. The daemon health check process should
// be robust enough to ignore temporarily unhealthy daemons.
// Sending a request of length 0, however, causes a panic.
// panic: rpc error: code = Unknown desc = Market price update has length of 0.
if len(request.MarketPriceUpdates) > 0 {
_, err := priceFeedServiceClient.UpdateMarketPrices(ctx, request)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [274-306]

The RunPriceUpdaterTaskLoop function now returns an error of type types.ErrEmptyMarketPriceUpdate if the length of request.MarketPriceUpdates is 0. This is a good practice as it prevents sending empty updates which could cause a panic. However, it might be better to check this condition before starting the telemetry measurement on line 278 to avoid unnecessary computations for empty updates.

- if len(request.MarketPriceUpdates) > 0 {
- 	_, err := priceFeedServiceClient.UpdateMarketPrices(ctx, request)
- 	if err != nil {
- 		logger.Error("Failed to run price updater task loop for price daemon", "error", err)
- 		telemetry.IncrCounter(
- 			1,
- 			metrics.PricefeedDaemon,
- 			metrics.PriceUpdaterTaskLoop,
- 			metrics.Error,
- 		)
- 		return err
- 	}
- } else {
- 	logger.Info("Price update had length of 0")
- 	telemetry.IncrCounter(
- 		1,
- 		metrics.PricefeedDaemon,
- 		metrics.PriceUpdaterZeroPrices,
- 		metrics.Count,
- 	)
- 	return types.ErrEmptyMarketPriceUpdate
- }
+ if len(request.MarketPriceUpdates) == 0 {
+ 	logger.Info("Price update had length of 0")
+ 	telemetry.IncrCounter(
+ 		1,
+ 		metrics.PricefeedDaemon,
+ 		metrics.PriceUpdaterZeroPrices,
+ 		metrics.Count,
+ 	)
+ 	return types.ErrEmptyMarketPriceUpdate
+ }
+ _, err := priceFeedServiceClient.UpdateMarketPrices(ctx, request)
+ if err != nil {
+ 	logger.Error("Failed to run price updater task loop for price daemon", "error", err)
+ 	telemetry.IncrCounter(
+ 		1,
+ 		metrics.PricefeedDaemon,
+ 		metrics.PriceUpdaterTaskLoop,
+ 		metrics.Error,
+ 	)
+ 	return err
+ }

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 4d14a5b and 7f4ae9b.
Files selected for processing (1)
  • protocol/daemons/types/health_checkable.go (1 hunks)
Additional comments: 1
protocol/daemons/types/health_checkable.go (1)
  • 1-132: The HealthCheckable interface and its implementation timeBoundedHealthCheckable are well designed. The thread-safety of the methods is ensured by using mutex locks. The health check mechanism is robust and covers all possible scenarios. The use of a time provider for timestamping is a good practice as it allows for easier testing and more control over time-related operations. The error messages are descriptive and provide useful information about the state of the service. The code is clean, well-structured, and follows best practices.

if err != nil {
panic(err)

if err == nil {
Copy link
Contributor

@ttl33 ttl33 Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, makes sense.

For StartMarketParamUpdater do we have clear err log msgs to indicate this failed? Do we have dashboard + alerting on this?

protocol/daemons/types/health_checkable_test.go Outdated Show resolved Hide resolved
protocol/daemons/types/health_checkable_test.go Outdated Show resolved Hide resolved
@clemire
Copy link
Contributor Author

clemire commented Nov 4, 2023

... for failed market param updates, yes we have monitoring, error logs and dashboard. alert here

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 7f4ae9b and a87f57f.
Files selected for processing (1)
  • protocol/daemons/types/health_checkable_test.go (1 hunks)

Comment on lines 1 to 163
),
},
"healthy: one recent successful update": {
updates: []struct {
timestamp time.Time
err error
}{
{Time1, nil}, // successful update
},
healthCheckTime: Time2,
expectedHealthStatus: nil, // expect healthy
},
"unhealthy: one recent successful update, followed by a failed update": {
updates: []struct {
timestamp time.Time
err error
}{
{Time1, nil}, // successful update
{Time2, TestError}, // failed update
},
healthCheckTime: Time3,
expectedHealthStatus: fmt.Errorf(
"last update failed at %v with error: '%w', most recent successful update occurred at %v",
Time2,
TestError,
Time1,
),
},
"healthy: one recent failed update followed by a successful update": {
updates: []struct {
timestamp time.Time
err error
}{
{Time1, TestError}, // failed update
{Time2, nil}, // successful update
},
healthCheckTime: Time3,
expectedHealthStatus: nil, // expect healthy
},
"unhealthy: last successful update was more than 5 minutes ago": {
updates: []struct {
timestamp time.Time
err error
}{
{Time1, nil}, // successful update
},
healthCheckTime: Time_5Minutes_And_2Seconds,
expectedHealthStatus: fmt.Errorf(
"last successful update occurred at %v, which is more than %v ago. "+
"Last failure occurred at %v with error '%w'",
Time1,
types.MaxAcceptableUpdateDelay,
Time0,
InitializingStatus,
),
},
}
for name, tc := range tests {
t.Run(name, func(t *testing.T) {
// Setup.

// Construct list of timestamps to provide for the timeProvider stored within the health checkable instance.
timestamps := make([]time.Time, 0, len(tc.updates)+2)

// The first timestamp is used during HealthCheckable initialization.
timestamps = append(timestamps, Time0)
// One timestamp used for each update.
for _, update := range tc.updates {
timestamps = append(timestamps, update.timestamp)
}
// A final timestamp is consumed by the HealthCheck call.
timestamps = append(timestamps, tc.healthCheckTime)

// Create a new time-bounded health checkable instance.
hci := types.NewTimeBoundedHealthCheckable(
"test",
mockTimeProviderWithTimestamps(timestamps),
)

// Act.
// Report the test sequence of successful / failed updates.
for _, update := range tc.updates {
if update.err == nil {
hci.ReportSuccess()
} else {
hci.ReportFailure(update.err)
}
}

// Assert.
// Check the health status after all updates have been reported.
err := hci.HealthCheck()
if tc.expectedHealthStatus == nil {
require.NoError(t, err)
} else {
require.ErrorContains(t, err, tc.expectedHealthStatus.Error())
}
})
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test cases are comprehensive and cover a variety of scenarios. The use of a mock time provider to control the timestamps is a good practice. However, the error messages in the test cases could be more descriptive to provide better context when a test fails. For example, the error message in line 103 could be updated to include the time of the last successful update and the maximum acceptable delay.

- "unhealthy: last successful update was more than 5 minutes ago"
+ "unhealthy: last successful update was at %v, which is more than the max delay of %v ago"

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Make sure it correctly replaces the highlighted code, has no missing lines and indentaion issues.

Suggested change
package types_test
import (
"fmt"
"github.com/dydxprotocol/v4-chain/protocol/daemons/types"
libtime "github.com/dydxprotocol/v4-chain/protocol/lib/time"
"github.com/dydxprotocol/v4-chain/protocol/mocks"
"github.com/stretchr/testify/require"
"testing"
"time"
)
var (
Time0 = time.Unix(0, 0)
Time1 = Time0.Add(time.Second)
Time2 = Time0.Add(2 * time.Second)
Time3 = Time0.Add(3 * time.Second)
Time4 = Time0.Add(4 * time.Second)
Time_5Minutes_And_2Seconds = Time0.Add(5*time.Minute + 2*time.Second)
TestError = fmt.Errorf("test error")
InitializingStatus = fmt.Errorf("test is initializing")
)
// mockTimeProviderWithTimestamps returns a TimeProvider that returns the given timestamps in order.
func mockTimeProviderWithTimestamps(times []time.Time) libtime.TimeProvider {
m := mocks.TimeProvider{}
for _, timestamp := range times {
m.On("Now").Return(timestamp).Once()
}
return &m
}
func TestHealthCheckableImpl_Mixed(t *testing.T) {
tests := map[string]struct {
updates []struct {
timestamp time.Time
// leave error nil for a successful update
err error
}
healthCheckTime time.Time
expectedHealthStatus error
}{
"unhealthy: no updates, returns initializing error": {
healthCheckTime: Time1,
expectedHealthStatus: fmt.Errorf(
"no successful update has occurred; last failed update occurred at %v with error '%w'",
Time0,
InitializingStatus,
),
},
"unhealthy: no successful updates": {
updates: []struct {
timestamp time.Time
err error
}{
{Time1, TestError}, // failed update
},
healthCheckTime: Time2,
expectedHealthStatus: fmt.Errorf(
"no successful update has occurred; last failed update occurred at %v with error '%w'",
Time1,
TestError,
),
},
"healthy: one recent successful update": {
updates: []struct {
timestamp time.Time
err error
}{
{Time1, nil}, // successful update
},
healthCheckTime: Time2,
expectedHealthStatus: nil, // expect healthy
},
"unhealthy: one recent successful update, followed by a failed update": {
updates: []struct {
timestamp time.Time
err error
}{
{Time1, nil}, // successful update
{Time2, TestError}, // failed update
},
healthCheckTime: Time3,
expectedHealthStatus: fmt.Errorf(
"last update failed at %v with error: '%w', most recent successful update occurred at %v",
Time2,
TestError,
Time1,
),
},
"healthy: one recent failed update followed by a successful update": {
updates: []struct {
timestamp time.Time
err error
}{
{Time1, TestError}, // failed update
{Time2, nil}, // successful update
},
healthCheckTime: Time3,
expectedHealthStatus: nil, // expect healthy
},
"unhealthy: last successful update was more than 5 minutes ago": {
updates: []struct {
timestamp time.Time
err error
}{
{Time1, nil}, // successful update
},
healthCheckTime: Time_5Minutes_And_2Seconds,
expectedHealthStatus: fmt.Errorf(
"last successful update occurred at %v, which is more than %v ago. "+
"Last failure occurred at %v with error '%w'",
Time1,
types.MaxAcceptableUpdateDelay,
Time0,
InitializingStatus,
),
},
}
for name, tc := range tests {
t.Run(name, func(t *testing.T) {
// Setup.
// Construct list of timestamps to provide for the timeProvider stored within the health checkable instance.
timestamps := make([]time.Time, 0, len(tc.updates)+2)
// The first timestamp is used during HealthCheckable initialization.
timestamps = append(timestamps, Time0)
// One timestamp used for each update.
for _, update := range tc.updates {
timestamps = append(timestamps, update.timestamp)
}
// A final timestamp is consumed by the HealthCheck call.
timestamps = append(timestamps, tc.healthCheckTime)
// Create a new time-bounded health checkable instance.
hci := types.NewTimeBoundedHealthCheckable(
"test",
mockTimeProviderWithTimestamps(timestamps),
)
// Act.
// Report the test sequence of successful / failed updates.
for _, update := range tc.updates {
if update.err == nil {
hci.ReportSuccess()
} else {
hci.ReportFailure(update.err)
}
}
// Assert.
// Check the health status after all updates have been reported.
err := hci.HealthCheck()
if tc.expectedHealthStatus == nil {
require.NoError(t, err)
} else {
require.ErrorContains(t, err, tc.expectedHealthStatus.Error())
}
})
}
}
package types_test
import (
"fmt"
"github.com/dydxprotocol/v4-chain/protocol/daemons/types"
libtime "github.com/dydxprotocol/v4-chain/protocol/lib/time"
"github.com/dydxprotocol/v4-chain/protocol/mocks"
"github.com/stretchr/testify/require"
"testing"
"time"
)
var (
Time0 = time.Unix(0, 0)
Time1 = Time0.Add(time.Second)
Time2 = Time0.Add(2 * time.Second)
Time3 = Time0.Add(3 * time.Second)
Time4 = Time0.Add(4 * time.Second)
Time_5Minutes_And_2Seconds = Time0.Add(5*time.Minute + 2*time.Second)
TestError = fmt.Errorf("test error")
InitializingStatus = fmt.Errorf("test is initializing")
)
// mockTimeProviderWithTimestamps returns a TimeProvider that returns the given timestamps in order.
func mockTimeProviderWithTimestamps(times []time.Time) libtime.TimeProvider {
m := mocks.TimeProvider{}
for _, timestamp := range times {
m.On("Now").Return(timestamp).Once()
}
return &m
}
func TestHealthCheckableImpl_Mixed(t *testing.T) {
tests := map[string]struct {
updates []struct {
timestamp time.Time
// leave error nil for a successful update
err error
}
healthCheckTime time.Time
expectedHealthStatus error
}{
"unhealthy: no updates, returns initializing error": {
healthCheckTime: Time1,
expectedHealthStatus: fmt.Errorf(
"no successful update has occurred; last failed update occurred at %v with error '%w'",
Time0,
InitializingStatus,
),
},
"unhealthy: no successful updates": {
updates: []struct {
timestamp time.Time
err error
}{
{Time1, TestError}, // failed update
},
healthCheckTime: Time2,
expectedHealthStatus: fmt.Errorf(
"no successful update has occurred; last failed update occurred at %v with error '%w'",
Time1,
TestError,
),
},
"healthy: one recent successful update": {
updates: []struct {
timestamp time.Time
err error
}{
{Time1, nil}, // successful update
},
healthCheckTime: Time2,
expectedHealthStatus: nil, // expect healthy
},
"unhealthy: one recent successful update, followed by a failed update": {
updates: []struct {
timestamp time.Time
err error
}{
{Time1, nil}, // successful update
{Time2, TestError}, // failed update
},
healthCheckTime: Time3,
expectedHealthStatus: fmt.Errorf(
"last update failed at %v with error: '%w', most recent successful update occurred at %v",
Time2,
TestError,
Time1,
),
},
"healthy: one recent failed update followed by a successful update": {
updates: []struct {
timestamp time.Time
err error
}{
{Time1, TestError}, // failed update
{Time2, nil}, // successful update
},
healthCheckTime: Time3,
expectedHealthStatus: nil, // expect healthy
},
"unhealthy: last successful update was more than 5 minutes ago": {
updates: []struct {
timestamp time.Time
err error
}{
{Time1, nil}, // successful update
},
healthCheckTime: Time_5Minutes_And_2Seconds,
expectedHealthStatus: fmt.Errorf(
"last successful update occurred at %v, which is more than the max delay of %v ago. "+
"Last failure occurred at %v with error '%w'",
Time1,
types.MaxAcceptableUpdateDelay,
Time0,
InitializingStatus,
),
},
}
for name, tc := range tests {
t.Run(name, func(t *testing.T) {
// Setup.
// Construct list of timestamps to provide for the timeProvider stored within the health checkable instance.
timestamps := make([]time.Time, 0, len(tc.updates)+2)
// The first timestamp is used during HealthCheckable initialization.
timestamps = append(timestamps, Time0)
// One timestamp used for each update.
for _, update := range tc.updates {
timestamps = append(timestamps, update.timestamp)
}
// A final timestamp is consumed by the HealthCheck call.
timestamps = append(timestamps, tc.healthCheckTime)
// Create a new time-bounded health checkable instance.
hci := types.NewTimeBounded
</details>
<!-- suggestion_end -->
<!-- This is an auto-generated comment by CodeRabbit -->

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between a87f57f and 1ac3bd9.
Files selected for processing (1)
  • protocol/daemons/types/health_checkable_test.go (1 hunks)
Additional comments: 1
protocol/daemons/types/health_checkable_test.go (1)
  • 1-163: The test cases are well-structured and cover a variety of scenarios. The use of a mock time provider to control the timestamps is a good practice. The error messages are descriptive and provide useful context. The test checks both the success and failure paths of the health check functionality. The code is clean, readable, and follows good practices for writing unit tests.

@CodeRabbitSupport
Copy link

CodeRabbit experienced a temporary issue resulting in some of the walkthrough comments getting garbled. This should be resolved with subsequent commits to the PR.

@clemire clemire merged commit cc9317d into main Nov 6, 2023
15 of 16 checks passed
@clemire clemire deleted the crystal/CORE-666-robust-health-check branch November 6, 2023 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

4 participants