Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create replibyte debug command #159

Open
wtait1-ff opened this issue Jun 8, 2022 · 7 comments
Open

Create replibyte debug command #159

wtait1-ff opened this issue Jun 8, 2022 · 7 comments
Labels
enhancement New feature or request feature New feature request

Comments

@wtait1-ff
Copy link
Contributor

Proposal

It seems to be a common theme that when bug reports / help issues are filed, the author is asked for certain info like

  • replibyte version
  • contents of config file

It would be helpful for people both authoring + triaging issues if a user could run a command like replibyte debug that would collect all that useful info automatically. Then they would just need to copy that output and past it in the issue.

Other thoughts

  • at least it would be useful to output the locally available info like stated above (replibyte version + config file). But it would be cool if the command could try to connect to the configured database and retrieve extra information like the database version
  • this could work well for this other issue, as running this command could be one of the checklist items for filing a bug-type issue
@evoxmusic
Copy link
Contributor

It makes total sense to me and will help troubleshoot issues and even build a reproducible environment. Do you think you can propose a PR for this?

@wtait1-ff
Copy link
Contributor Author

I'm happy to give it a go yes 👍

Oh also I forgot to mention this in the original issue, but see as the config file obviously contains sensitive data like database credentials, cloud account credentials

  1. this command would have to some fields in the config file to anonymize before creating the debug output
  2. since replibyte specializes in anonymizing data, it would be if the existing transformers can be re-used (I might need a bit of guidance on that part)

@evoxmusic
Copy link
Contributor

💯 , you can inspire yourself (or even re-use) this part from telemetry.rs that is anonymizing sensitive data from conf.yaml.

@evoxmusic evoxmusic added enhancement New feature or request feature New feature request labels Jun 12, 2022
@evoxmusic
Copy link
Contributor

Hi @wtait1-ff , let me know if you need any help 👍🏽

@thomasgouveia
Copy link

Hi @evoxmusic,

It seems that there has been no activity on this issue for a while, and I'm interested to work on it for my first contribution to the project. Is it possible? If so, can we agree with the information that the debug command should return to the caller?

In my mind, to avoid a big PR, it would be better to split the issue in two more atomic issues :

  • Implement a basic debug command with the replibyte version and the configuration file redacted.
  • Enhance this command once ready to add more information, such as storage version, database version etc..

In a very simple way, the debug command can display something similar to the caller:

replibyte debug

# Output

Replibyte $VERSION (running on $OS/$ARCH)
config :
  # here the config file with all sensitive data redacted
  $CONFIG

I think we should at least display the 4 following informations to help reproduce a bug :

  • $VERSION: The replibyte version
  • $OS : The operating system
  • $ARCH: The system architecture
  • $CONFIG: The configuration file with all sensitive data redacted for sure

What do you think about it ?

Thanks !

@evoxmusic
Copy link
Contributor

Hi @thomasgouveia , thank you for your help. I'm happy to speak with you to see how we can add this because it would be super helpful. I am working on adding some benchmarking to Replibyte to improve the overall performance, which could also be added to the debug part in some way.

The $CONFIG element is one of the most important elements often missing to debug better. Something that would also be helpful is to provide a stack trace of what happened - like a profiling file but without the data.. Do you have any idea here?

@thomasgouveia
Copy link

Happy to help! For sure, it makes sense to do something that is relevant to help debugging issues or to provide context for an issue.

Just to be sure we agree on what we want to achieve, at first, the idea was to provide a replibyte debug command that will output to the user information about the environment where replibyte is executed. I think this command should be independent and simply output something like I gave in my previous message.

You said that it would be nice to have a kind of "profiling" file with a stack trace, I totally agree with you for that because it is clearly better to analysis the software behavior. I suppose that in your idea, you want to be able to trace what is done through the execution when executing any of the commands, for example a dump. If so, for me it sounds more like a --debug (or --verbose) global flag that will provide additional context (whether if there is an error or not during the command execution). We could probably bake something with tracing crates (or creating our own) to register sort of events at different code levels, and at the end of the execution, we can create the profiling file based on the events, with all the additional data such as the version of replibyte, the OS/arch etc.

Do you agree with that? Do you have any idea for this profiling file? I will be pleased to give it a try!

Another point, about the redact of the configuration. I saw that you use transformers in telemetry.rs to hide sensitive data that comes from the configuration file. I have another approach for that, using traits. I tried to do something with the current configuration :

trait Redact {
    fn redact(&self) -> Self;
}

This trait will need to be implemented by each configuration related structs, so that way, each block can redact their sensitive data :

impl Redact for Config {
    fn redact(&self) -> Self {
        // We create a mutable deep copy of the current element
        // as we don't want to alter our base configuration
        let mut copy = self.clone();

        if copy.encryption_key.is_some() {
            copy.encryption_key = Some(REDACTED.to_string());
        }

        if let Some(source) = copy.source {
            copy.source = Some(source.redact())
        }

        if let Some(destination) = copy.destination {
            copy.destination = Some(destination.redact())
        }

        copy.datastore = match copy.datastore {
            DatastoreConfig::AWS(cfg) => DatastoreConfig::AWS(cfg.redact()),
            DatastoreConfig::GCP(cfg) => DatastoreConfig::GCP(cfg.redact()),
            // We don't need to redact this piece of configuration, it does not contain any sensitive information
            DatastoreConfig::LocalDisk(cfg) => DatastoreConfig::LocalDisk(cfg)
        };

        copy
    }
}

At the end, we can simply call config.redact() to get a copy of our configuration, that is wiped from all sensitive data. We can also easily test each block independently to check that our logic of hiding sensitive data is ok. Is it ok for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature New feature request
Projects
None yet
Development

No branches or pull requests

3 participants