Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixing progress counter bug #197

Merged
merged 7 commits into from
Oct 25, 2023
Merged

fixing progress counter bug #197

merged 7 commits into from
Oct 25, 2023

Conversation

mieslep
Copy link
Collaborator

@mieslep mieslep commented Sep 4, 2023

  1. Consolidates progress logging into helper class JobCounter
  2. Separates counts within each thread from the global counts
  3. Increments global counts as each thread completes
  4. Introduces new parameter to enable per-part logging
  5. Per-part logging on a single line per part

Which issue(s) this PR fixes:
Fixes #196

Checklist:

  • Automated Tests added/updated
  • Documentation added/updated
  • CLA Signed: DataStax CLA

@mieslep mieslep requested a review from a team as a code owner September 4, 2023 11:58
@@ -6,3 +6,6 @@ spark.cdm.schema.target.keyspaceTable target.regression_performance

spark.cdm.perfops.numParts 32
spark.cdm.perfops.batchSize 1

spark.cdm.perfops.printStatsAfter 450
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's please coordinate with docs crew to publish this at https://docs.datastax.com/en/astra-serverless/docs/migrate/cassandra-data-migrator.html page


public void printProgress() {
printProgress(false);
printProgress(true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what we're doing in here by calling both true and false?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a bit of an interesting choice, not entirely happy with it. false is for the thread itself, and I was trying to hide this messiness from the rest of the application. There are two kinds of progress counts, the existing printStatsAfter gives feedback as to how we are progressing within the global dataset, and then the new (and noisier) printStatsPerPart which prints an entry as each part completes.

    public void printProgress(boolean global) {
        if (global) {
            if (shouldPrintGlobalProgress()) {
                printAndLogProgress("Progress Counts: ", true);
            }
        } else {
            if (printPerThread) {
                printAndLogProgress("Thread Counts: ", false);
            }
        }
    }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be just 2 separate methods (printGlobalProgress & printThreadProgess) to make it cleaner.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OR remove the printProgress(boolean global) method completely & refactor the printProgress() method to something like below

    public void printProgress() {
        if (printPerThread) {
            printAndLogProgress("Thread Counts: ", false);
        }
        if (shouldPrintGlobalProgress()) {
            printAndLogProgress("Progress Counts: ", true);
        }
    }

@msmygit
Copy link
Collaborator

msmygit commented Sep 5, 2023

Any idea why we're receiving these ERRORs in the build output and how do we address them?

@mieslep
Copy link
Collaborator Author

mieslep commented Sep 6, 2023

@msmygit the ERROR log messages are being generated by junit tests which are probing for various conditions both positive and negative. I'm not sure how exactly to do this, but we could feasibly direct log output to a place other than console? I tried a few things to get it to work, but TBH the logging configuration on this seems messy enough (e.g. we have log4j exclusions in the pom.xml) and my Maven-foo is not strong.


public void printProgress() {
printProgress(false);
printProgress(true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be just 2 separate methods (printGlobalProgress & printThreadProgess) to make it cleaner.


public void printProgress() {
printProgress(false);
printProgress(true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OR remove the printProgress(boolean global) method completely & refactor the printProgress() method to something like below

    public void printProgress() {
        if (printPerThread) {
            printAndLogProgress("Thread Counts: ", false);
        }
        if (shouldPrintGlobalProgress()) {
            printAndLogProgress("Progress Counts: ", true);
        }
    }

@msmygit
Copy link
Collaborator

msmygit commented Oct 25, 2023

I've encorporated the review comments and pushed the changes. Merging here.

@msmygit msmygit merged commit 1fc5fc6 into main Oct 25, 2023
3 checks passed
@msmygit msmygit deleted the bug/fix-chunk-counting branch October 25, 2023 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

progress counts are inaccurate
3 participants