General Retrospective for January 2025 Releases #64

adamfarley · 2024-11-18T16:54:05Z

Summary

A retrospective for all efforts surrounding the titular releases.

All community members are welcome to contribute to the agenda via comments below.

This will be a virtual meeting after the release, with at least a week of notice in the #release Slack channel.

On the day of the meeting we'll review the agenda and add a list of actions at the end.

Invited: Everyone.

Time, Date, and URL

Time:
Date:
URL:

Details

Retrospective Owner Tasks (in order):

Post retro URL in #Release around the start of the new release.
Wait until most builds are released, with no signs of a respin.
Announce the retrospective's date + time on #Release a week in advance.
Host the retrospective:
- Go through the agenda.
- Create a list of actions.
Process each action:
- Create a "WIP" issue including the source comment.
- Add the issue to the current iteration.
- Add an issue link to the action list.
Create a new retrospective issue for the next release.
Set a calendar reminder so you remember to do step 1 before the next release.
Close this issue.

TLDR

Add proposed agenda items as comments below.

Haroon-Khel · 2025-01-15T12:55:32Z

Release pipelines which have some errors in the downstream jobs are not able to generate the release summary report in TRSS.
As an example https://trss.adoptium.net/resultSummary?parentId=6780fa67f66194006d2f37b1 which corresponds to https://ci.adoptium.net/job/build-scripts/job/release-openjdk11-pipeline/51/

SL/Jan15: think this is related to the amount of failures to the point where it essentially exceeds the character limit in the release report. Pipelines with a some errors can be generated, pipelines with a massive amount of information to 'share' exceed the limit. aqa-test-tools/issues/xxxx to print out a msg if limit is hit (so the user knows no report will be generated).

Haroon-Khel · 2025-01-15T13:29:48Z

https://adoptium.slack.com/archives/CLCFNV2JG/p1736788317222669?thread_ts=1736429650.249329&cid=CLCFNV2JG

Regarding the release trigger getting confused with the tags, the picture below is an example of how it might get confused

The top 3 entries are fine; we have a tag jdk-23.0.2+22 which we want to use as a dryrun so we push a dryrun-ga tag, jdk-23.0.2-dryrun-ga, which has the same commit sha as jdk-23.0.2+22, and jdk-23.0.2+22 has a corresponding jdk-23.0.2+22_adopt tag present. All is good. Except jdk-23.0.1-dryrun-ga is present which shares the same commit sha as jdk-23.0.2+22. This will confuse the trigger and/or the downstream release pipeline it kicks off. The solution is to remove the jdk-23.0.1-dryrun-ga tag (delete it)

smlambert · 2025-01-20T18:19:14Z

I have manually increased the TIME_LIMIT on https://ci.adoptium.net/job/Test_openjdk23_hs_extended.openjdk_riscv64_linux/ as it appears to be hitting its 25 hour limit and aborting before completion.

FYI @Haroon-Khel

We will need to investigate why its timing out (my guess is that certain testcase failures hanging and running long until each one hits its timeout). If there is time, we should figure that out during the dry run assessment, but if not a longer time limit will hopefully allow the jobs to finish without aborting even with the timeouts.

Haroon-Khel · 2025-01-21T14:11:45Z

https://adoptium.slack.com/archives/C09NW3L2J/p1737466419332739

https://ci.adoptium.net/view/git-mirrors/job/git-mirrors/job/adoptium/job/git-skara-jdk23u/2688/console

+ git push origin master --tags
To github.com:adoptium/jdk23u
 ! [rejected]                jdk-23.0.2-dryrun-ga -> jdk-23.0.2-dryrun-ga (already exists)
error: failed to push some refs to 'github.com:adoptium/jdk23u'
hint: Updates were rejected because the tag already exists in the remote.

Steps to resolve this from Andrew:

The problem was because we manually tagged the jdk-23.0.2+00_adopt tag, rather than letting the mirror job do it… so when it tried it got a missmatch…
To resolve I did:
Deleted local cache on jenkins-worker, rm -rf /home/jenkins/workspace/git-mirrors/adoptium/git-skara-jdk23u/workspace/jdk23u
Deleted jdk-23.0.2+00_adopt tag from mirror repo
Then re-ran mirror job….

sophia-guo · 2025-01-23T03:21:34Z

Some test jobs timeout during this release and dry run even with timeout=25 hours. The reason is TRSS unavailable and based on assumptive test times ( very random and small number) testlist number is set as 1. So not parallel at all for release jdk21,17,11,8. Only jdk23 tests are parallel. This is why for this release even for some primary platforms tests run slowly.

The timeout caused build failed and no test results are archived, we have to rerun the job to get the test results.

For example:
https://ci.adoptium.net/job/Test_openjdk21_hs_extended.openjdk_x86-64_mac/81/
https://ci.adoptium.net/job/Test_openjdk17_hs_extended.openjdk_x86-64_linux/233/
https://ci.adoptium.net/job/Test_openjdk21_hs_extended.openjdk_aarch64_linux/127/

19:00:04  Starting to generate parallel test lists.
19:00:04  
19:00:05  Parsing /home/jenkins/workspace/Test_openjdk17_hs_extended.openjdk_x86-64_linux/aqa-tests/TKG/../openjdk/playlist.xml
19:00:06  Attempting to get test duration data from TRSS.
19:00:06  curl --silent --max-time 120 -L -k https://trss.adoptopenjdk.net/api/getTestAvgDuration?limit=10&jdkVersion=17&impl=hs&platform=x86-64_linux&group=openjdk&level=extended
19:00:06  Warning: cannot parse data from TRSS.
19:00:06  Unexpected character (e) at position 0.
19:00:06  	at org.json.simple.parser.Yylex.yylex(Yylex.java:610)
19:00:06  	at org.json.simple.parser.JSONParser.nextToken(JSONParser.java:269)
19:00:06  	at org.json.simple.parser.JSONParser.parse(JSONParser.java:118)
19:00:06  	at org.json.simple.parser.JSONParser.parse(JSONParser.java:92)
19:00:06  	at org.testKitGen.TestDivider.parseDuration(TestDivider.java:162)
19:00:06  	at org.testKitGen.TestDivider.getDataFromTRSS(TestDivider.java:252)
19:00:06  	at org.testKitGen.TestDivider.createDurationQueue(TestDivider.java:281)
19:00:06  	at org.testKitGen.TestDivider.divideTests(TestDivider.java:404)
19:00:06  	at org.testKitGen.TestDivider.generateLists(TestDivider.java:425)
19:00:06  	at org.testKitGen.MainRunner.genParallelList(MainRunner.java:74)
19:00:06  	at org.testKitGen.MainRunner.main(MainRunner.java:38)
19:00:06  Attempting to get test duration data from cached files.
19:00:06  
19:00:06  TEST DURATION
19:00:06  ====================================================================================
19:00:06  Total number of tests searched: 86
19:00:06  Number of test durations found: 0
19:00:06  No test duration data found.
19:00:06  (Default duration assigned, executed tests: 40s; not executed tests: 0s.)
19:00:06  ====================================================================================
19:00:06  
19:00:06  Test target is split into 1 lists.
19:00:06  Reducing estimated test running time from 28m40s to 28m40s.
19:00:06  
19:00:06  -------------------------------------testList_0-------------------------------------
19:00:06  Number of tests: 86
19:00:06  Estimated running time: 28m40s

It may be better to fall back to parallel by a pre-defined numberofnode if this happens ( even though rarely happens).

sxa · 2025-01-27T11:13:04Z

Should "Re-run in Grinder" links always set PARALLEL=None by default?
Reason: The "blocks" in jenkins show differently if you have parallel on or off, so if there is a mix it won't show both types, only the most recent "type".
If it was all set to non-parallel the blocks in jenkins for the pipelines stages would be more visible to the different people running tests which would make it easier to monitor progress.

For example at the time of originally posting this comment it's only showing two jobs in the main display:

despite there being other jobs prior to that still running which were initiated with the re-run links which have DYNAMIC=Parallel. I usually try to switch mine to use PARALLEL=None but it would be preferable not to have this by default (since it's usually a small number of targets and you don't benefit so much from running in parallel - particularly when it makes collating the results more complex).

sxa · 2025-01-30T12:50:04Z

Should we cherry pick cacerts updates that occur in the master branch between the branching for the dry-run and the final GA builds?
OpenJ9 noticed an issue where there ones were different from ours because they are not currently basing things from our release branches.
References:

Slack thread: https://adoptium.slack.com/archives/C09NW3L2J/p1738169329374839
OpenJ9 issue with investigation/details: jdk_security3_0 FAILED sun/security/ssl/X509TrustManagerImpl/Entrust/Distrust.java ValidatorException: No trusted certificate eclipse-openj9/openj9#21027
PR which did the update in temurin-build/master: Update CA Certs temurin-build#4105

adamfarley self-assigned this Nov 18, 2024

adamfarley mentioned this issue Nov 18, 2024

General Retrospective for September and October 2024 Releases #54

Closed

8 tasks

adamfarley changed the title ~~General Retrospective for January 2024 Releases~~ General Retrospective for January 2025 Releases Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General Retrospective for January 2025 Releases #64

General Retrospective for January 2025 Releases #64

adamfarley commented Nov 18, 2024

Haroon-Khel commented Jan 15, 2025 •

edited by smlambert

Loading

Haroon-Khel commented Jan 15, 2025 •

edited

Loading

smlambert commented Jan 20, 2025

Haroon-Khel commented Jan 21, 2025 •

edited

Loading

sophia-guo commented Jan 23, 2025 •

edited

Loading

sxa commented Jan 27, 2025

sxa commented Jan 30, 2025

General Retrospective for January 2025 Releases #64

General Retrospective for January 2025 Releases #64

Comments

adamfarley commented Nov 18, 2024

Haroon-Khel commented Jan 15, 2025 • edited by smlambert Loading

Haroon-Khel commented Jan 15, 2025 • edited Loading

smlambert commented Jan 20, 2025

Haroon-Khel commented Jan 21, 2025 • edited Loading

sophia-guo commented Jan 23, 2025 • edited Loading

sxa commented Jan 27, 2025

sxa commented Jan 30, 2025

Haroon-Khel commented Jan 15, 2025 •

edited by smlambert

Loading

Haroon-Khel commented Jan 15, 2025 •

edited

Loading

Haroon-Khel commented Jan 21, 2025 •

edited

Loading

sophia-guo commented Jan 23, 2025 •

edited

Loading