Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSS] [JDK17] Upgrade JUnit / JavaDoc from 4 to 5. #7337

Draft
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

slfan1989
Copy link
Contributor

@slfan1989 slfan1989 commented Jan 29, 2025

Description of PR

Upgrading JUnit from 4 to 5 is a key task in the process of upgrading our Hadoop project to JDK 17. This task is somewhat complex due to the large number of unit tests and the high coupling between modules. The good news is that our team members have already submitted several upgrade-related PRs, which will significantly ease the workload for the subsequent upgrades. However, the previous upgrade efforts lacked a unified summary and tracking, making it difficult to monitor progress and manage the modules that still need to be upgraded. Therefore, the purpose of this PR is to centralize the tracking of the entire process of upgrading JUnit from 4 to 5, summarize the progress, address the issues encountered during the upgrade, and discuss key challenges.

I have thoroughly reviewed all of our modules and summarized the current upgrade progress.

  • The following modules do not require an upgrade:
hadoop-assemblies
hadoop-build-tools
hadoop-client
hadoop-client-api
hadoop-client-check-invariants
hadoop-client-check-test-invariants
hadoop-client-minicluster
hadoop-client-runtime
hadoop-cloud-storage
hadoop-annotations
hadoop-auth-example
hadoop-dist
hadoop-hdfs-native-client
hadoop-hdfs-nfs
hadoop-maven-plugins
hadoop-minicluster
hadoop-project
hadoop-project-dist
hadoop-benchmark
hadoop-dynamometer-dist
hadoop-openstack
hadoop-pipes
hadoop-tools-dist
hadoop-yarn-applications-catalog-docker
hadoop-yarn-registry
hadoop-yarn-site
hadoop-yarn-ui

How was this patch tested?

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 21s #7337 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7337/1/console
versions git=2.34.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@jojochuang
Copy link
Contributor

Wow! Appreciate such a huge work!
You'd probably find it easier to divide it into subtasks, component by component. This is impossible to review too.

@slfan1989 slfan1989 changed the title [DISCUESS] [JDK17] Upgrade JUnit from 4 to 5 in hadoop-common. [DISCUESS] [JDK17] Upgrade JUnit from 4 to 5. Jan 30, 2025
@slfan1989
Copy link
Contributor Author

Wow! Appreciate such a huge work!
You'd probably find it easier to divide it into subtasks, component by component. This is impossible to review too.

Thank you for your message! This PR is currently an experimental version (it cannot be merged directly). I have attempted to upgrade some modules related to hadoop-common from JUnit 4 to JUnit 5. During the upgrade process, there are some issues that need to be discussed with everyone to establish the basic standards for this module upgrade.

@slfan1989
Copy link
Contributor Author

slfan1989 commented Jan 30, 2025

@steveloughran @ayushtkn @Hexiaoqiao @jojochuang @aajisaka @cnauroth @szetszwo

Upgrading JUnit from 4 to 5 is not only a technical upgrade opportunity but also a good chance for us to improve the test-related code and fix potential issues. I’d like to discuss whether the improvement strategies I’ve been considering are reasonable.

  1. Import Optimization

There are currently two ways to reference JUnit methods in the project:

  • The first way is by importing static methods, allowing us to directly use assertTrue in unit tests.
  • The second way is by importing org.junit.Assert, which requires us to use the fully qualified name (e.g., Assert.assertTrue) in unit tests.

I hope to unify all references to the first method (static imports) in this update, but I'm unsure if this is feasible.

  1. Import * Optimization

I’ve noticed that many unit tests use the import * approach. Do we need to expand these imports in the unit tests?

  1. Addressing Checkstyle Issues, Incomplete JavaDocs, and Typos

There are several Checkstyle issues in the code, such as incomplete JavaDocs and typos. Should we also address these issues in this update?

Some logs use + for string concatenation instead of placeholders. Some methods could also be optimized using lambda expressions. Should we address these issues in this update as well?

  1. Improvements for Large Modules

For example, the hadoop-common module, I’ve noticed, has many downstream dependencies, affecting several other modules, as shown below:

hadoop-azure
hadoop-aws
hadoop-distcp
hadoop-azure-datalake
hadoop-hdfs
hadoop-hdfs-rbf
hadoop-cos
hadoop-aliyun
hadoop-mapreduce-client-core

We can’t make all the changes at once because there are too many involved modules, and some test classes have dependencies on each other.

I’ve found that JUnit 4 and JUnit 5 can coexist, meaning that a module can have both JUnit 4 and JUnit 5 test styles, and the tests will still run successfully. For very large modules, we can modify them incrementally, based on the package structure. For example, for hadoop-common, we can submit multiple PRs to complete the module’s upgrade step by step. Make sure that each PR submitted doesn’t include changes to more than 50 classes, as this will make the review process easier.

HADOOP-19415. [JDK17] Upgrade JUnit from 4 to 5 in hadoop-common Part1.

cli、conf、constants、crypto、fs

HADOOP-19415. [JDK17] Upgrade JUnit from 4 to 5 in hadoop-common Part2.

ha、http

If we encounter unit tests that depend on other modules, we can leave them unchanged for now. Once the non-dependent parts of the other modules are upgraded, we can submit a single PR to complete the upgrade of the dependencies.

Let’s illustrate with an example:

The FCStatisticsBaseTest class in hadoop-common has dependencies on the following three modules:
hadoop-aws, hadoop-common, and hadoop-aliyun.

image

Due to these dependencies, we will not upgrade this abstract class for now. Instead, we will parallelly upgrade the parts of these three modules(hadoop-aws, hadoop-common, and hadoop-aliyun) that have no dependencies. Once the other parts are upgraded, we will come back to upgrade the unit tests with dependencies.

I’d also like to sincerely thank @zhtttylz for providing the unit test upgrade tool. With this tool, I was able to complete the PR, which involves multiple modules, in just two days, significantly improving efficiency. If we can confirm the coding guidelines, @zhtttylz and I will submit the relevant PRs together to complete the JUnit version upgrade.

I’m also willing to volunteer to drive and follow up on the upgrade work to ensure it’s completed in a shorter time frame. I estimate the upgrade should take about 1-3 months.

@cnauroth
Copy link
Contributor

Wow, this is awesome @slfan1989 and @zhtttylz ! Here are my opinions on how to proceed.

  1. Import Optimization
    ...
  2. Import * Optimization
    ...
  3. Addressing Checkstyle Issues, Incomplete JavaDocs, and Typos
    ...

I'd like to suggest that in this phase, we stick to a minimal effort of making a straight translation to JUnit 5 in the interest of completing a JDK 17 release as soon as possible. These are going to be large pull requests to review, even after we break them into sub-tasks. It will take longer to code review if we start including other unrelated improvements.

All of these proposals are good code quality improvements though. We can pick them up as future work after completing the JUnit 5 migration and after the JDK 17 release.

I've also heard there's a desire to go toward AssertJ. If so, then that might change the plan on points 1 and 2.

  1. Improvements for Large Modules
    ...
    Due to these dependencies, we will not upgrade this abstract class for now. Instead, we will parallelly upgrade the parts of these three modules(hadoop-aws, hadoop-common, and hadoop-aliyun) that have no dependencies. Once the other parts are upgraded, we will come back to upgrade the unit tests with dependencies.

I agree with your strategy here. It sounds like this is the best way to manage pull request size and code review process.

@slfan1989
Copy link
Contributor Author

Wow, this is awesome @slfan1989 and @zhtttylz ! Here are my opinions on how to proceed.

  1. Import Optimization
    ...
  2. Import * Optimization
    ...
  3. Addressing Checkstyle Issues, Incomplete JavaDocs, and Typos
    ...

I'd like to suggest that in this phase, we stick to a minimal effort of making a straight translation to JUnit 5 in the interest of completing a JDK 17 release as soon as possible. These are going to be large pull requests to review, even after we break them into sub-tasks. It will take longer to code review if we start including other unrelated improvements.

All of these proposals are good code quality improvements though. We can pick them up as future work after completing the JUnit 5 migration and after the JDK 17 release.

I've also heard there's a desire to go toward AssertJ. If so, then that might change the plan on points 1 and 2.

  1. Improvements for Large Modules
    ...
    Due to these dependencies, we will not upgrade this abstract class for now. Instead, we will parallelly upgrade the parts of these three modules(hadoop-aws, hadoop-common, and hadoop-aliyun) that have no dependencies. Once the other parts are upgraded, we will come back to upgrade the unit tests with dependencies.

I agree with your strategy here. It sounds like this is the best way to manage pull request size and code review process.

@cnauroth Thank you for your message! I agree with your point of view. We will only make the necessary changes and will not adjust other improvements for now. Once the JUnit 5 upgrade is complete, we will proceed with further optimizations.

Here are the improvement principles I've summarized:

  1. Focus on making the necessary changes to ensure the upgrade is completed smoothly.
  2. If the Yetus report highlights Checkstyle issues or other problems, we will address them to achieve a successful compile result.
  3. Break down large modules into smaller PRs to make the code review process more efficient and user-friendly.

cc: @steveloughran @ayushtkn @Hexiaoqiao

@cnauroth
Copy link
Contributor

@slfan1989 , great summary. Thank you for these patches!

@slfan1989 slfan1989 changed the title [DISCUESS] [JDK17] Upgrade JUnit from 4 to 5. [DISCUESS] [JDK17] Upgrade JUnit / JavaDoc from 4 to 5. Feb 1, 2025
@slfan1989
Copy link
Contributor Author

After the merge of HADOOP-15984, we noticed that some unit tests stopped running. To resolve this, we found a solution that allows the unit tests to run again. We can refer to HADOOP-19405. hadoop-aws and hadoop-azure tests have stopped running.(#7335) for more details.

However, during the process of completing this PR, a new issue was exposed — we encountered a large number of Javadoc errors. During the JDK 11 upgrade, we addressed issues in some modules, but many modules still have Javadoc problems. We plan to address these issues during the JDK 17 upgrade process.

I do not intend to create a separate JIRA and PR for each module to fix the Javadoc errors, as this would generate significant noise. Improving Javadoc does not pose any execution risks since the PRs won’t modify the code itself.

For larger modules, such as hadoop-common, hdfs, hdfs-rbf, resourcemanager, and nodemanager, we will create individual JIRAs. For other modules, we will submit the changes under larger module scopes, such as hadoop-tools, yarn, mapreduce, and hdfs.

@slfan1989
Copy link
Contributor Author

To sync the recent progress, I have completed the upgrade of all MapReduce module tests from JUnit4 to JUnit5 locally (there may still be some issues, but we'll wait for Yetus feedback). I plan to start submitting the PRs tomorrow.

@steveloughran steveloughran changed the title [DISCUESS] [JDK17] Upgrade JUnit / JavaDoc from 4 to 5. [DISCUSS] [JDK17] Upgrade JUnit / JavaDoc from 4 to 5. Feb 5, 2025
@slfan1989
Copy link
Contributor Author

@cnauroth @steveloughran

The following PR is ready, please help to review it

MapReduce:
MAPREDUCE-7420. [JDK17] Upgrade Junit 4 to 5 in hadoop-mapreduce-client-core Part1. #7363
MAPREDUCE-7421. [JDK17] Upgrade Junit 4 to 5 in hadoop-mapreduce-client-jobclient Part1. #7358

HADOOP:
HADOOP-19431. [JDK17] Upgrade JUnit from 4 to 5 in hadoop-distcp. #7368
HADOOP-19425. [JDK17] Upgrade JUnit from 4 to 5 in hadoop-azure Part1. #7369

HDFS:
HDFS-17719. [JDK17] Upgrade JUnit from 4 to 5 in hadoop-hdfs-httpfs Part1. #7371

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants