Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Fine tune distances between RSEs #837

Open
d-ylee opened this issue Aug 20, 2024 · 12 comments · May be fixed by #857
Open

Enhancement: Fine tune distances between RSEs #837

d-ylee opened this issue Aug 20, 2024 · 12 comments · May be fixed by #857
Assignees

Comments

@d-ylee
Copy link
Contributor

d-ylee commented Aug 20, 2024

Enhancement Description

Currently, two RSEs that do not share a region are given the slowest/longest distance value of 13, but that probably doesn't make sense.

Use Case

The fine tuning of distance could allow more optimal transfers.

Possible Solution

We may want to try a 4x4 matrix where the two European regions are declared better connected than Europe/US and where Europe and US are better connected than anything to Other.

The logic here could be modified:

DEFAULT_DISTANCE_RULES = {'site': 1, 'region&country': 4, 'country': 7, 'region': 10, 'other': 13}

Example Distances Comparison:
Same Site < Europe+Europe or US+US < Europe+US < Other

Related Issues

No response

@d-ylee d-ylee self-assigned this Aug 20, 2024
@d-ylee
Copy link
Contributor Author

d-ylee commented Aug 20, 2024

@KatyEllis @ericvaandering This issue is for what was discussed in our meeting today. Are there any more details that need to be added?

@d-ylee
Copy link
Contributor Author

d-ylee commented Aug 22, 2024

I made a script to pull the RSE list in our CMS production node. I learned that each RSE is assigned a region (A, B, C, D), but I also found that not all RSEs are assigned a region. Would these RSEs need to have the region attribute applied? Are these regions defined somewhere?

@ericvaandering
Copy link
Member

These regions were defined a long time ago based on some actual measurements. I guess the logic would have to be that things that don't have a region are treated worst off all for the region part of the calculation. But should be treated according to country if they have one, etc.

Which sites don't have a region? Are any of them Tier2s? My suspicion is those sites got added and no one bothered to set the region since it's done by hand.

@d-ylee
Copy link
Contributor Author

d-ylee commented Aug 23, 2024

@ericvaandering These are the sites that did not have a region assigned as an RSE attribute.

{
  "None": [
    "T3_CH_CERN_CTA_RecallTest",
    "T3_US_MIT",
    "T2_PK_NCP_Temp",
    "T2_FR_GRIF_IRFU_Temp",
    "T2_HU_Budapest_Temp",
    "T2_US_Florida_Temp",
    "T2_US_MIT_Temp",
    "T2_US_Purdue_Temp",
    "T2_US_UCSD_Temp",
    "T2_US_Wisconsin_Temp",
    "T2_FR_IPHC_Temp",
    "T2_CH_CERN_Temp",
    "T1_FR_CCIN2P3_Tape",
    "T1_RU_JINR_Tape",
    "T1_DE_KIT_Tape",
    "T3_US_UMiss",
    "T3_US_NotreDame_Test",
    "T3_CH_CERN_CTA_CastorTest",
    "T3_IT_MIB_Temp",
    "T3_KR_KISTI_Test",
    "T3_KR_KISTI",
    "T3_US_CMU_Test",
    "T3_US_Rice",
    "T3_US_PuertoRico",
    "T2_PT_NCG_Lisbon_Temp",
    "T2_DE_DESY_Temp",
    "T2_RU_IHEP_Temp",
    "T2_IT_Pisa_Temp",
    "T2_TR_METU_Temp",
    "T2_PL_Swierk_Temp",
    "T1_US_FNAL_Tape_Test",
    "T1_FR_CCIN2P3_Tape_Test",
    "T3_US_Baylor",
    "T3_US_FNALLPC_Temp",
    "T3_BG_UNI_SOFIA",
    "T3_CH_PSI_Temp",
    "T3_US_NotreDame_Temp",
    "T2_CH_CSCS_Temp",
    "T2_AT_Vienna",
    "T3_CH_CERN_CTA_Test",
    "T3_HR_IRB",
    "T1_FR_CCIN2P3_Disk_Temp",
    "T3_US_Baylor_Test",
    "T2_UK_SGrid_Bristol_Temp",
    "T2_EE_Estonia_Temp",
    "T2_RU_ITEP_Temp",
    "T2_UK_London_IC_Temp",
    "T2_BR_SPRACE_Temp",
    "T2_AT_Vienna_Temp",
    "T2_KR_KISTI_Temp",
    "T1_IT_CNAF_Tape_Test",
    "T3_US_Princeton_ICSE",
    "T0_CH_CERN_Tape",
    "T1_RU_JINR_Disk_Temp",
    "T3_US_Rutgers_Test",
    "T1_IT_CNAF_Disk_Temp",
    "T3_IT_Trieste_Temp",
    "T3_US_UMD_Temp",
    "T0_CH_CERN_Disk",
    "T1_US_FNAL_Disk_Temp",
    "T3_US_CMU_Temp",
    "T3_TW_NTU_HEP_Test",
    "T1_UK_RAL_Disk_Temp",
    "T2_BE_IIHE_Temp",
    "T2_BE_UCL_Temp",
    "T2_US_Caltech_Temp",
    "T2_DE_RWTH_Temp",
    "T2_PL_Warsaw",
    "T2_RU_INR_Temp",
    "T2_GR_Ioannina_Temp",
    "T2_IN_TIFR_Temp",
    "T3_US_NERSC",
    "T3_US_PuertoRico_Test",
    "T3_CH_PSI_Test",
    "T3_MX_Cinvestav_Temp",
    "T3_FR_IPNL_Temp",
    "T1_ES_PIC_Disk_Temp",
    "T3_US_NotreDame",
    "T3_MX_Cinvestav_Test",
    "T1_DE_KIT_Disk_Temp",
    "T3_CH_CERN_OpenData",
    "T3_US_UMD",
    "T3_US_Rutgers",
    "T0_CH_CERN_Tape_Test",
    "T3_US_OSU",
    "T3_DM_MOCK_RSE",
    "T3_FR_IPNL",
    "T3_US_Colorado",
    "T3_US_CMU",
    "T3_CH_CERNBOX",
    "T0_CH_CERN_Disk_Test",
    "T3_US_FNALLPC_Test",
    "T2_IT_Bari_Temp",
    "T2_RU_JINR_Temp",
    "T2_US_Nebraska_Temp",
    "T2_ES_IFCA_Temp",
    "T2_TW_NCHC_Temp",
    "T1_US_FNAL_Tape",
    "T1_ES_PIC_Tape",
    "T3_US_FNALLPC",
    "T3_MX_Cinvestav",
    "T3_US_PuertoRico_Temp",
    "T3_TW_TIDC_Test",
    "T1_UK_RAL_Tape",
    "T1_UK_RAL_Tape_Test",
    "T3_IT_Trieste",
    "T2_FR_GRIF_Temp",
    "T3_TW_TIDC_Temp",
    "T2_FR_GRIF_LLR_Temp",
    "T2_BR_UERJ_Temp",
    "T3_KR_KNU",
    "T3_KR_KNU_Temp",
    "T2_IT_Legnaro_Temp",
    "T2_UK_London_Brunel_Temp",
    "T2_IT_Rome_Temp",
    "T2_UK_SGrid_RALPP_Temp",
    "T2_ES_CIEMAT_Temp",
    "T1_ES_PIC_Tape_Test",
    "T3_IT_MIB",
    "T3_IR_IPM",
    "T2_US_MIT_Tape",
    "T3_US_UMiss_Test",
    "T3_US_UMD_Test",
    "T3_TW_TIDC",
    "T3_US_Brown",
    "T3_TW_NTU_HEP",
    "T3_US_MIT_Test",
    "T3_BG_UNI_SOFIA_Test",
    "T3_CY_UCY_Temp",
    "T2_RC_MOCK",
    "T2_CN_Beijing_Temp",
    "T2_FI_HIP_Temp",
    "T2_US_Vanderbilt_Temp",
    "T2_UA_KIPT_Temp",
    "T1_IT_CNAF_Tape",
    "T1_RU_JINR_Tape_Test",
    "T1_DE_KIT_Tape_Test",
    "T3_CH_PSI",
    "T3_KR_UOS",
    "T3_IT_MIB_Test",
    "T2_US_MIT_Tape_Test",
    "T3_US_Colorado_Test",
    "T3_KR_UOS_Test",
    "T3_DM_MOCK_RSE2",
    "T3_IT_Bologna_Test"
  ]
}

@ericvaandering
Copy link
Member

OK, from what I can see these mostly fall into Tier3 and un-used categories (Temp and Test). And Tape which surprises me but may make sense. Pulling off of tape should be last resort (but getting to tape should use the best link, so...) @nsmith- may have some insight here. I didn't go through exhaustively, but I only see Warsaw as a Tier2 with no region.

As I recall the regions were roughly Western Europe, Eastern Europe, North America, and Other. Does that match what you see.

@d-ylee
Copy link
Contributor Author

d-ylee commented Sep 17, 2024

This is all of the RSE region, tier and country:

{
  "A": {
    "T1": [
      "Germany",
      "United Kingdom",
      "France"
    ],
    "T2": [
      "Hungary",
      "Switzerland",
      "Austria",
      "Italy",
      "United Kingdom",
      "Germany",
      "Lebanon",
      "France"
    ]
  },
  "B": {
    "T1": [
      "United States"
    ],
    "T2": [
      "Brazil",
      "United States"
    ]
  },
  "C": {
    "T1": [
      "Italy",
      "Spain"
    ],
    "T2": [
      "Greece",
      "T\u00fcrkiye",
      "Belgium",
      "Germany",
      "Italy",
      "United Kingdom",
      "Portugal",
      "Spain",
      "Ukraine"
    ],
    "T3": [
      "Cyprus"
    ]
  },
  "D": {
    "T1": [
      "Russian Federation"
    ],
    "T2": [
      "Pakistan",
      "Finland",
      "Estonia",
      "China",
      "Poland",
      "Russian Federation",
      "India",
      "Taiwan",
      "Korea, Republic of"
    ],
    "T3": [
      "China",
      "Korea, Republic of"
    ]
  },
  "None": {
    "T0": [
      "Switzerland"
    ],
    "T1": [
      "Germany",
      "United States",
      "Italy",
      "United Kingdom",
      "Russian Federation",
      "Spain",
      "France"
    ],
    "T2": [
      "T\u00fcrkiye",
      "Finland",
      "Belgium",
      "United States",
      "India",
      "Ukraine",
      "Greece",
      "Switzerland",
      "Estonia",
      "Poland",
      "Portugal",
      "Spain",
      "France",
      "Korea, Republic of",
      "Hungary",
      "RC",
      "United Kingdom",
      "Russian Federation",
      "China",
      "Brazil",
      "Pakistan",
      "Germany",
      "Italy",
      "Austria",
      "Taiwan"
    ],
    "T3": [
      "Croatia",
      "Mexico",
      "Switzerland",
      "Cyprus",
      "Dominica",
      "United States",
      "Italy",
      "Bulgaria",
      "France",
      "Iran, Islamic Republic of",
      "Taiwan",
      "Korea, Republic of"
    ]
  }
}

From what I see,
A: Mostly Western Europe (+ Lebanon (T2_LB_HPC4L)?)
B: Americas (US + Brazil)
C: Mix of Western and Eastern Europe
D: Eastern Europe + Asia (Also has Finland)

@nsmith-
Copy link
Contributor

nsmith- commented Sep 18, 2024

We (Fernando and I) used FTS logs to determine the rates and transfer efficiencies per link and then created a graph and used some tool (Gephi I think) to cluster the graph into 4 regions. I'm sure there is a presentation with details somewhere but I can't find it at the moment

@d-ylee
Copy link
Contributor Author

d-ylee commented Sep 30, 2024

At least from what I see, the regional distances should be the following, from shortest to longest:

  1. A <-> C
  2. A <-> B
  3. B <-> C
  4. A <-> D

Would these have values between region (10) and country (7)?

@nsmith-
Copy link
Contributor

nsmith- commented Sep 30, 2024

Don't forget B<->D and C<->D :)

@d-ylee
Copy link
Contributor Author

d-ylee commented Oct 15, 2024

Based on a discussion with @KatyEllis:
For A <-> C, we will set the distance to 11
For C <-> D, we will set the distance to 12
For ACD <-> B, we will set it to 13 (Others)

d-ylee added a commit to d-ylee/CMSRucio that referenced this issue Oct 15, 2024
Set distances between:
A (West Europe) and C (East Europe) regions to 11
C (East Europe) and D (Asia) regions to 12
@d-ylee d-ylee linked a pull request Oct 15, 2024 that will close this issue
@dynamic-entropy
Copy link
Contributor

Hello @d-ylee
Were any changes made to the distances in production recently?

@d-ylee
Copy link
Contributor Author

d-ylee commented Nov 14, 2024

@dynamic-entropy I don't think so. The changes in PR #857 were not yet merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants