Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Machine requirement: Replacement for Equinix x64 servers #3597

Open
sxa opened this issue Dec 19, 2023 · 34 comments
Open

New Machine requirement: Replacement for Equinix x64 servers #3597

sxa opened this issue Dec 19, 2023 · 34 comments

Comments

@sxa
Copy link
Member

sxa commented Dec 19, 2023

Equinix have been sponsoring our infrastructure by providing a generous amount of capacity for the Node.js infrastructure. This is now coming to and end and we need to make a plan for migrating our systems away from Equinix (Note: This does not affect the aarch64 Altras which are supplied as part of the Works On Arm project, but are hosted by Equinix)

  • infra
    • joyent:
      debian10-x64-1: {ip: 147.28.162.110, alias: grafana}
      smartos15-x64-1: {ip: 147.28.183.83, alias: backup}
      ubuntu1604-x64-1: {ip: 147.28.162.105, alias: unencrypted}
  • release
    • joyent:
      smartos18-x64-2: {ip: 147.28.162.101}
      smartos20-x64-2: {ip: 147.28.162.108}
      ubuntu1804_docker-x64-1: {ip: 147.28.162.104, user: ubuntu}
  • test
    • equinix
      ubuntu2204-x64-1: {ip: 147.75.72.255, alias: jenkins-workspace-7}
      ubuntu2204-x64-2: {ip: 145.40.96.123, alias: jenkins-workspace-8}
    • equinix_mnx:
      smartos18-x64-3: {ip: 147.28.162.102}
      smartos18-x64-4: {ip: 147.28.162.103}
      smartos20-x64-3: {ip: 147.28.162.107}
      smartos20-x64-4: {ip: 147.28.162.109}
      ubuntu1804-x64-1: {ip: 147.28.162.99, user: ubuntu}

The joyent and equinix_mnx ones are in the nodecore project in the portal, the two test ones are in Node.js

@richardlau
Copy link
Member

The Joyent/equinix_mnx machines are in a separate account (Nodecore) -- I think MNX are paying for those so would hopefully be unaffected (cc @bahamat).

@richardlau
Copy link
Member

FWIW the Jenkins workspace machines are c3.small.x86 which are:
1x Intel Xeon E-2278G 8-Core Processor @ 3.40GHz
32GB RAM
2x 480GB SSD
2x 10Gbps
1x Intel HD Graphics P630

I think we're only using one of the two disks.

By contrast the third non-Equinix hosted jenkins-workspace machine hosted on IBM Cloud is:
2 vCPU | 4 GB
25 GB SAN boot disk
1 TB SAN disk
100 Mbps

So I think the takeaway here is disk space. Also jenkins-workspace-7 is where our temp binary git repository (used in the arm and Windows fanned jobs) currently resides.

@vielmetti
Copy link

@richardlau the Nodecore systems referenced are also on an account that's currently 100% subsidized, and that subsidy is ending.

I'm currently investigating what I can do about pricing discounts, but I know that "free" is not continuing for these.

@bahamat
Copy link

bahamat commented Dec 19, 2023

We'd like to offer hosting those instances on mnx.io.

This would be like when they were hosted at Joyent. We'd set up a dedicated Triton account with individual instances (rather than two dedicated physical servers). The account billing will be covered by us (MNX). I will assist in getting everything set up and provide credentials to anyone that needs it.

We're also adding another datacenter which will be publicly available in the coming months for the offsite backup instance.

@sxa
Copy link
Member Author

sxa commented Dec 20, 2023

Thank you @bahamat - that's great to hear! Let us know when that's in place.

@richardlau
Copy link
Member

@bahamat That's sounds great. For clarity, would that include the two machines in the Node.js account or just the ones in the Nodecore one?

@bahamat
Copy link

bahamat commented Dec 20, 2023

@richardlau The NodeCore account is the only one I have access to, so that’s the one I meant.

For the others, I’d need to know what the requirements are, then I need to check if we have available capacity for it.

If you have VMs there, I need the cpu/ram/storage for them, then I can see how much more we can provide.

@UlisesGascon UlisesGascon pinned this issue Dec 20, 2023
@richardlau
Copy link
Member

@bahamat Details are in #3597 (comment). There are two machines with that configuration (c3.small.x86 in Equinix). I think we might not need as much CPU/RAM, but they are consuming disk space, i.e.

jenkins-workspace-7:

root@test-equinix-ubuntu2204-x64-1:/home/iojs# df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3       439G  354G   63G  86% /
root@test-equinix-ubuntu2204-x64-1:/home/iojs# du -hs /home/iojs/build/workspace/
240G    /home/iojs/build/workspace/
root@test-equinix-ubuntu2204-x64-1:/home/iojs# du -hs /home/iojs/build/binary_tmp.git
56G     /home/iojs/build/binary_tmp.git
root@test-equinix-ubuntu2204-x64-1:/home/iojs#

jenkins-workspace-8:

root@test-equinix-ubuntu2204-x64-2:/home/iojs# df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3       438G  107G  309G  26% /
root@test-equinix-ubuntu2204-x64-2:/home/iojs# du -hs /home/iojs/build/workspace/
101G    /home/iojs/build/workspace/
root@test-equinix-ubuntu2204-x64-2:/home/iojs#

@richardlau
Copy link
Member

@bahamat Have you been able to review #3597 (comment) and #3597 (comment) as to whether the two Jenkins workspace machines could be included in the mnx offer, or if we'll need to source replacements elsewhere?

@ryanaslett
Copy link
Contributor

What is the timeline for this issue? Is there a deadline from Equinix?

@UlisesGascon
Copy link
Member

What is the timeline for this issue? Is there a deadline from Equinix?

I was checking the notes and seems like start of April is the current deadline.

Do we feel confortable with the given deadline?

@UlisesGascon
Copy link
Member

@vielmetti
Copy link

Reading through the meeting minutes now, thanks. @UlisesGascon .

@vielmetti
Copy link

Regarding deadlines, I want to give the message of a sense that we want the project to continue to succeed and we don't want to disrupt operations, and also it's important now to have a plan in place for transition on a timeline. We'll support you through that timeline. (and if you need more time, let me know, but don't delay unnecessarily).

@richardlau
Copy link
Member

This is a list of the affected machines:

In the Nodecore organization (these VMs are spread over two instances in Equinix Metal):

  • infra-joyent-debian10-x64-1 aka grafana
  • infra-joyent-smartos15-x64-1 aka backup
  • infra-joyent-ubuntu1604-x64-1 aka unencrypted
  • release-joyent-smartos18-x64-2
  • release-joyent-smartos20-x64-2
  • release-joyent-ubuntu1804_docker-x64-1
  • test-equinix_mnx-smartos18-x64-3
  • test-equinix_mnx-smartos18-x64-4
  • test-equinix_mnx-smartos20-x64-3
  • test-equinix_mnx-smartos20-x64-4
  • test-equinix_mnx-ubuntu1804-x64-1

In the Node.js organization (these are separate instances in Equinix Metal, current specs #3597 (comment)):

  • test-equinix-ubuntu2204-x64-1 aka jenkins-workspace-7
  • test-equinix-ubuntu2204-x64-2 aka jenkins-workspace-8

@richardlau
Copy link
Member

Oh, I've just noticed we already listed these in the issue description up top, except the release machines are missing from that list -- I'll update the list in the description as well🙂.

@ryanaslett
Copy link
Contributor

As mentioned in the call today, I'd like to add the [email protected] account to the equinix accounts in question so we can get a handle on all the details.

@mhdawson
Copy link
Member

mhdawson commented Mar 8, 2024

@nodejs/build can you chime in with your ok, concern/objections with adding the linuxIT operations account to the equinix accounts?

+1 from me.

@richardlau
Copy link
Member

richardlau commented Mar 8, 2024 via email

@bahamat
Copy link

bahamat commented Mar 8, 2024

@richardlau @mhdawson I've been able to confirm that MNX is happy to host the additional machines from the Node.js org from Equinix Metal, as well as the NodeCore instances.

@vielmetti
Copy link

Thank you @bahamat glad to see this.

@targos
Copy link
Member

targos commented May 14, 2024

Jenkins Workspace Machines

CleanShot 2024-05-14 at 09 20 20

root@test-equinix-ubuntu2204-x64-1:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           3.2G  1.5M  3.2G   1% /run
/dev/sda3       439G  359G   57G  87% /
tmpfs            16G     0   16G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.2G  4.0K  3.2G   1% /run/user/1001
tmpfs           3.2G  4.0K  3.2G   1% /run/user/0

@richardlau
Copy link
Member

@ryanaslett I've invited you to the "Node.js" Equinix Metal organization as an owner (this is the only one of the accounts owned by the Build WG) which contains the two Jenkins workspace machines. The other machines to be migrated are in the "Nodecore" Equinix Metal organization, which I think @bahamat would need to invite you to.

We should open a separate issue solely around additional access (i.e. admin on both Jenkins).

I can't ssh into grafana.

backup, as you surmise, does contain a lot of data.

unencrypted is a mirror of www (hosted on Digital Ocean) and is configured to be the failover server in Cloudflare should www be unavailable.

For smartos versions, talk to @bahamat and the folks at mnx -- they're likely to be in a much better position to advise on which smartos versions we should be testing on.

@bahamat
Copy link

bahamat commented May 15, 2024

I can't ssh into grafana.

@richardlau I looked at the grafana instance today. Looks like it crashed with a full disk and consequently didn't boot properly. I cleared the boot prompt so that it would come up and ssh is available now, but I didn't do anything to address the disk issue so I don't think Grafana is healthy yet. I figured addressing the disk space was better left to your team.

I discussed this with @ryanaslett earlier today, so this may be old news to some folks already.

@richardlau
Copy link
Member

@ryanaslett While the security release is still being tested (we're waiting for the security release to be done before changing anything), in preparation would it be possible to PR the new machines into the Ansible inventory? I think they've had secrets added, but no corresponding entries with IP addresses (and/or account).

@richardlau
Copy link
Member

richardlau commented Jul 9, 2024

@ryanaslett and I have migrated the two Equinix-hosted jenkins-workspace machines.

These are now offline in Jenkins

replaced by

The temp binary git repository has been cloned across to test-mnx-ubuntu2204-x64-1 and the two Jenkins variables (TEMP_REPO and TEMP_REPO_SERVER) updated to the new machine's IP address. We've also added entries to known_hosts for the VMs that need to push/pull to the temp binary git repository.

Let's run like this for a few days and if no new issues arise we can turn off the Equinix jenkins-workspace machines.

@vielmetti
Copy link

Thanks for all your work getting this moved over - appreciate it.

@ryanaslett
Copy link
Contributor

ryanaslett commented Jul 16, 2024

Status Update: The two jenkins-workspace machines have been successfully replaced, and I have removed them from the nodejs.org organization at equinix metal (ef5bd919-c911-4c87-a101-dff7872396a4).

The backup server has also been replaced by its new counterpart at mnx.io, and I have removed the backup server from the nodecore organization (a988c5d8-0f10-4d90-a6b4-f348757355d7).

The final server has 3 more services to transition:

  • docker host that houses the release and test containers for arm cross container jobs
  • The unencrypted host which is a current failover host for nodewww releases
  • grafana host
  • smartos testing nodes (these require patches from the smartos community to transition)

I will focus on the docker host and the release standby server next.
The Grafana host was decided that it can be decomissioned as is and we'll either stand up another later, but it shouldnt be a blocker to the existing decomissioning.
The smartos testing nodes will require a meeting and coordination with the smartos community.

@ryanaslett
Copy link
Contributor

There is one additional host,
ubuntu1804-x64-1: {ip: 147.28.162.99, user: ubuntu}

It's tied to the ubuntu1804-64 label, which it shares with another digitial ocean server

https://ci.nodejs.org/computer/test%2Dequinix%5Fmnx%2Dubuntu1804%2Dx64%2D1/

Are jobs still running on ubuntu1804 ? I cant seem to find any evidence of jobs that have run recently on those hosts?

@targos
Copy link
Member

targos commented Jul 18, 2024

I looked at the Jenkins config backups and there is no major job that depends on the ubuntu1804-64 label. Both hosts can probably be deleted.

@ryanaslett
Copy link
Contributor

Great. I will just "not migrate" that host then.

@richardlau
Copy link
Member

Update from Equinix:

23 January 2025

Dear Equinix Customer,

We recently announced that we are sunsetting our Equinix Metal product with an end- of-life date of June 30, 2026. We appreciate the trust you place in Equinix as your digital infrastructure provider and are committed to working with you to find an alternative solution for Metal, including colocation, managed services, or third-party options, well in advance of June 2026.

As part of this wind-down process, we are ending our credit program. We will continue to provide credits through April 30, 2025, after which all unused credits will expire. After that date, customers can continue to use Metal through June 2026 at the regular market-on-demand rate; if you have any reserved instances, those will continue month to month at their current rates.

We will ensure the continued performance, security, and stability of the product through the end-of-life transition, and there will be no changes to the current infrastructure or support experience.

Our team is dedicated to delivering a smooth process and assisting you every step of the way during this transition, and if you have any questions in the meantime or need further information, please don't hesitate to reach out to [email protected].

cc @ryanaslett

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants