Skip to content

Commit

Permalink
Merge branch 'rc-2024.12.1-hotfix6' into stable
Browse files Browse the repository at this point in the history
  • Loading branch information
jacderida committed Jan 20, 2025
2 parents 26eee3a + 2ef95c4 commit 92ed942
Show file tree
Hide file tree
Showing 20 changed files with 79 additions and 90 deletions.
16 changes: 16 additions & 0 deletions .github/workflows/memcheck.yml
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,22 @@ jobs:
run: pgrep antnode | wc -l
if: always()

- name: confirm opened FDs
shell: bash
timeout-minutes: 1
run: |
fd_cap="30"
pids=$(pgrep antnode)
for pid in $pids; do
fd_count=$(ls /proc/$pid/fd | wc -l)
echo "Process $pid - File Descriptors: $fd_count"
if (( $(echo "$fd_count > $fd_cap" | bc -l) )); then
echo "Process $pid holding FD exceeded threshold: $fd_cap"
exit 1
fi
done
if: always()

- name: Stop the local network and upload logs
if: always()
uses: maidsafe/ant-local-testnet-action@main
Expand Down
29 changes: 29 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,35 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

*When editing this file, please respect a line length of 100.*

## 2025-01-20

### Client

#### Fixed

- Remove unallocated static IP from the bootstrap mechanism. We have five static IP addresses
allocated to five hosts, each of which run nodes and a minimal web server. The web server makes a
list of peers available to nodes and clients to enable them to join the network. These static IP
addresses are hard-coded in the `antnode` and `ant` binaries. It was discovered we had accidentally
added six IPs and one of those was unallocated. Removing the unallocated IP should reduce the time
to connect to the network.

### Network

#### Changed

- Reduce the frequency of metrics collection in the node's metrics server, from fifteen to sixty
seconds. This should reduce resource usage and improve performance.
- Do not refresh all CPU information in the metrics collection process in the node's metrics server.
Again, this should reduce resource usage and improve performance.
- Remove the 50% CPU usage safety measure. We added a safety measure to the node to cause the
process to terminate if the system's CPU usage exceeded 50% for five consecutive minutes. This was
to prevent cascading failures resulting from too much churn when a large node operator pulled the
plug on tens of thousands of nodes in a very short period of time. If other operators had
provisioned to max capacity and not left some buffer room for their own nodes, many other node
processes could die from the resulting churn. After an internal discussion, the decision was taken
to remove the safety measure.

## 2025-01-14

### Client
Expand Down
11 changes: 5 additions & 6 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions ant-bootstrap/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ license = "GPL-3.0"
name = "ant-bootstrap"
readme = "README.md"
repository = "https://github.com/maidsafe/autonomi"
version = "0.1.3"
version = "0.1.4"

[features]
local = []

[dependencies]
ant-logging = { path = "../ant-logging", version = "0.2.44" }
ant-logging = { path = "../ant-logging", version = "0.2.45" }
ant-protocol = { path = "../ant-protocol", version = "0.3.3" }
atomic-write-file = "0.2.2"
chrono = { version = "0.4", features = ["serde"] }
Expand Down
1 change: 0 additions & 1 deletion ant-bootstrap/src/contacts.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ const MAINNET_CONTACTS: &[&str] = &[
"http://159.223.246.45/bootstrap_cache.json",
"http://139.59.201.153/bootstrap_cache.json",
"http://139.59.200.27/bootstrap_cache.json",
"http://139.59.198.251/bootstrap_cache.json",
];

/// The client fetch timeout
Expand Down
2 changes: 1 addition & 1 deletion ant-build-info/src/release_info.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
pub const RELEASE_YEAR: &str = "2024";
pub const RELEASE_MONTH: &str = "12";
pub const RELEASE_CYCLE: &str = "1";
pub const RELEASE_CYCLE_COUNTER: &str = "9";
pub const RELEASE_CYCLE_COUNTER: &str = "10";
6 changes: 3 additions & 3 deletions ant-cli/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ authors = ["MaidSafe Developers <[email protected]>"]
name = "ant-cli"
description = "CLI client for the Autonomi network"
license = "GPL-3.0"
version = "0.3.4"
version = "0.3.5"
edition = "2021"
homepage = "https://maidsafe.net"
readme = "README.md"
Expand All @@ -24,9 +24,9 @@ name = "files"
harness = false

[dependencies]
ant-bootstrap = { path = "../ant-bootstrap", version = "0.1.3" }
ant-bootstrap = { path = "../ant-bootstrap", version = "0.1.4" }
ant-build-info = { path = "../ant-build-info", version = "0.1.23" }
ant-logging = { path = "../ant-logging", version = "0.2.44" }
ant-logging = { path = "../ant-logging", version = "0.2.45" }
ant-protocol = { path = "../ant-protocol", version = "0.3.3" }
autonomi = { path = "../autonomi", version = "0.3.4", features = [
"fs",
Expand Down
2 changes: 1 addition & 1 deletion ant-logging/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ license = "GPL-3.0"
name = "ant-logging"
readme = "README.md"
repository = "https://github.com/maidsafe/autonomi"
version = "0.2.44"
version = "0.2.45"

[dependencies]
chrono = "~0.4.19"
Expand Down
4 changes: 2 additions & 2 deletions ant-logging/src/metrics.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ use std::time::Duration;
use sysinfo::{self, Networks, Pid, System};
use tracing::{debug, error};

const UPDATE_INTERVAL: Duration = Duration::from_secs(15);
const UPDATE_INTERVAL: Duration = Duration::from_secs(60);
const TO_MB: u64 = 1_000_000;

// The following Metrics are collected and logged
Expand Down Expand Up @@ -44,7 +44,7 @@ struct ProcessMetrics {
// Obtains the system metrics every UPDATE_INTERVAL and logs it.
// The function should be spawned as a task and should be re-run if our main process is restarted.
pub async fn init_metrics(pid: u32) {
let mut sys = System::new_all();
let mut sys = System::new();
let mut networks = Networks::new_with_refreshed_list();
let pid = Pid::from_u32(pid);

Expand Down
4 changes: 2 additions & 2 deletions ant-networking/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ license = "GPL-3.0"
name = "ant-networking"
readme = "README.md"
repository = "https://github.com/maidsafe/autonomi"
version = "0.3.3"
version = "0.3.4"

[features]
default = []
Expand All @@ -20,7 +20,7 @@ upnp = ["libp2p/upnp"]

[dependencies]
aes-gcm-siv = "0.11.1"
ant-bootstrap = { path = "../ant-bootstrap", version = "0.1.3" }
ant-bootstrap = { path = "../ant-bootstrap", version = "0.1.4" }
ant-build-info = { path = "../ant-build-info", version = "0.1.23" }
ant-evm = { path = "../ant-evm", version = "0.1.8" }
ant-protocol = { path = "../ant-protocol", version = "0.3.3" }
Expand Down
4 changes: 2 additions & 2 deletions ant-networking/src/metrics/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ use prometheus_client::{
use sysinfo::{Pid, ProcessRefreshKind, System};
use tokio::time::Duration;

const UPDATE_INTERVAL: Duration = Duration::from_secs(15);
const UPDATE_INTERVAL: Duration = Duration::from_secs(60);
const TO_MB: u64 = 1_000_000;

/// The shared recorders that are used to record metrics.
Expand Down Expand Up @@ -246,7 +246,7 @@ impl NetworkMetricsRecorder {

let pid = Pid::from_u32(std::process::id());
let process_refresh_kind = ProcessRefreshKind::everything().without_disk_usage();
let mut system = System::new_all();
let mut system = System::new();
let physical_core_count = system.physical_core_count();

tokio::spawn(async move {
Expand Down
4 changes: 2 additions & 2 deletions ant-node-manager/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,10 @@ tcp = []
websockets = []

[dependencies]
ant-bootstrap = { path = "../ant-bootstrap", version = "0.1.3" }
ant-bootstrap = { path = "../ant-bootstrap", version = "0.1.4" }
ant-build-info = { path = "../ant-build-info", version = "0.1.23" }
ant-evm = { path = "../ant-evm", version = "0.1.8" }
ant-logging = { path = "../ant-logging", version = "0.2.44" }
ant-logging = { path = "../ant-logging", version = "0.2.45" }
ant-protocol = { path = "../ant-protocol", version = "0.3.3" }
ant-releases = { version = "0.4.0" }
ant-service-management = { path = "../ant-service-management", version = "0.4.7" }
Expand Down
4 changes: 2 additions & 2 deletions ant-node-rpc-client/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ nightly = []

[dependencies]
ant-build-info = { path = "../ant-build-info", version = "0.1.23" }
ant-logging = { path = "../ant-logging", version = "0.2.44" }
ant-logging = { path = "../ant-logging", version = "0.2.45" }
ant-protocol = { path = "../ant-protocol", version = "0.3.3", features=["rpc"] }
ant-node = { path = "../ant-node", version = "0.3.4" }
ant-node = { path = "../ant-node", version = "0.3.5" }
ant-service-management = { path = "../ant-service-management", version = "0.4.7" }
async-trait = "0.1"
bls = { package = "blsttc", version = "8.0.1" }
Expand Down
9 changes: 4 additions & 5 deletions ant-node/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
authors = ["MaidSafe Developers <[email protected]>"]
description = "The Autonomi node binary"
name = "ant-node"
version = "0.3.4"
version = "0.3.5"
edition = "2021"
license = "GPL-3.0"
homepage = "https://maidsafe.net"
Expand All @@ -26,11 +26,11 @@ otlp = ["ant-logging/otlp"]
upnp = ["ant-networking/upnp"]

[dependencies]
ant-bootstrap = { path = "../ant-bootstrap", version = "0.1.3" }
ant-bootstrap = { path = "../ant-bootstrap", version = "0.1.4" }
ant-build-info = { path = "../ant-build-info", version = "0.1.23" }
ant-evm = { path = "../ant-evm", version = "0.1.8" }
ant-logging = { path = "../ant-logging", version = "0.2.44" }
ant-networking = { path = "../ant-networking", version = "0.3.3" }
ant-logging = { path = "../ant-logging", version = "0.2.45" }
ant-networking = { path = "../ant-networking", version = "0.3.4" }
ant-protocol = { path = "../ant-protocol", version = "0.3.3" }
ant-registers = { path = "../ant-registers", version = "0.4.7" }
ant-service-management = { path = "../ant-service-management", version = "0.4.7" }
Expand Down Expand Up @@ -62,7 +62,6 @@ rayon = "1.8.0"
self_encryption = "~0.30.0"
serde = { version = "1.0.133", features = ["derive", "rc"] }
strum = { version = "0.26.2", features = ["derive"] }
sysinfo = { version = "0.30.8", default-features = false }
thiserror = "1.0.23"
tokio = { version = "1.32.0", features = [
"io-util",
Expand Down
53 changes: 0 additions & 53 deletions ant-node/src/bin/antnode/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ use std::{
process::Command,
time::Duration,
};
use sysinfo::{self, System};
use tokio::{
runtime::Runtime,
sync::{broadcast::error::RecvError, mpsc},
Expand Down Expand Up @@ -412,58 +411,6 @@ You can check your reward balance by running:
error!("Failed to send node control msg to antnode bin main thread: {err}");
}
});
let ctrl_tx_clone_cpu = ctrl_tx.clone();
// Monitor host CPU usage
tokio::spawn(async move {
use rand::{thread_rng, Rng};

const CPU_CHECK_INTERVAL: Duration = Duration::from_secs(60);
const CPU_USAGE_THRESHOLD: f32 = 50.0;
const HIGH_CPU_CONSECUTIVE_LIMIT: u8 = 5;
const NODE_STOP_DELAY: Duration = Duration::from_secs(1);
const INITIAL_DELAY_MIN_S: u64 = 10;
const INITIAL_DELAY_MAX_S: u64 =
HIGH_CPU_CONSECUTIVE_LIMIT as u64 * CPU_CHECK_INTERVAL.as_secs();
const JITTER_MIN_S: u64 = 1;
const JITTER_MAX_S: u64 = 15;

let mut sys = System::new_all();

let mut high_cpu_count: u8 = 0;

// Random initial delay between 1 and 5 minutes
let initial_delay =
Duration::from_secs(thread_rng().gen_range(INITIAL_DELAY_MIN_S..=INITIAL_DELAY_MAX_S));
tokio::time::sleep(initial_delay).await;

loop {
sys.refresh_cpu();
let cpu_usage = sys.global_cpu_info().cpu_usage();

if cpu_usage > CPU_USAGE_THRESHOLD {
high_cpu_count += 1;
} else {
high_cpu_count = 0;
}

if high_cpu_count >= HIGH_CPU_CONSECUTIVE_LIMIT {
if let Err(err) = ctrl_tx_clone_cpu
.send(NodeCtrl::Stop {
delay: NODE_STOP_DELAY,
result: StopResult::Success(format!("Excess host CPU %{CPU_USAGE_THRESHOLD} detected for {HIGH_CPU_CONSECUTIVE_LIMIT} consecutive minutes!")),
})
.await
{
error!("Failed to send node control msg to antnode bin main thread: {err}");
}
break;
}

// Add jitter to the interval
let jitter = Duration::from_secs(thread_rng().gen_range(JITTER_MIN_S..=JITTER_MAX_S));
tokio::time::sleep(CPU_CHECK_INTERVAL + jitter).await;
}
});

// Start up gRPC interface if enabled by user
if let Some(addr) = rpc {
Expand Down
4 changes: 2 additions & 2 deletions ant-service-management/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ repository = "https://github.com/maidsafe/autonomi"
version = "0.4.7"

[dependencies]
ant-bootstrap = { path = "../ant-bootstrap", version = "0.1.3" }
ant-bootstrap = { path = "../ant-bootstrap", version = "0.1.4" }
ant-evm = { path = "../ant-evm", version = "0.1.8" }
ant-logging = { path = "../ant-logging", version = "0.2.44" }
ant-logging = { path = "../ant-logging", version = "0.2.45" }
ant-protocol = { path = "../ant-protocol", version = "0.3.3", features = ["rpc"] }
async-trait = "0.1"
dirs-next = "2.0.0"
Expand Down
6 changes: 3 additions & 3 deletions autonomi/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,9 @@ registers = []
vault = ["registers"]

[dependencies]
ant-bootstrap = { path = "../ant-bootstrap", version = "0.1.3" }
ant-bootstrap = { path = "../ant-bootstrap", version = "0.1.4" }
ant-evm = { path = "../ant-evm", version = "0.1.8" }
ant-networking = { path = "../ant-networking", version = "0.3.3" }
ant-networking = { path = "../ant-networking", version = "0.3.4" }
ant-protocol = { path = "../ant-protocol", version = "0.3.3" }
ant-registers = { path = "../ant-registers", version = "0.4.7" }
bip39 = "2.0.0"
Expand Down Expand Up @@ -68,7 +68,7 @@ xor_name = "5.0.0"

[dev-dependencies]
alloy = { version = "0.7.3", default-features = false, features = ["contract", "json-rpc", "network", "node-bindings", "provider-http", "reqwest-rustls-tls", "rpc-client", "rpc-types", "signer-local", "std"] }
ant-logging = { path = "../ant-logging", version = "0.2.44" }
ant-logging = { path = "../ant-logging", version = "0.2.45" }
eyre = "0.6.5"
sha2 = "0.10.6"
# Do not specify the version field. Release process expects even the local dev deps to be published.
Expand Down
2 changes: 1 addition & 1 deletion nat-detection/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ nightly = []

[dependencies]
ant-build-info = { path = "../ant-build-info", version = "0.1.23" }
ant-networking = { path = "../ant-networking", version = "0.3.3" }
ant-networking = { path = "../ant-networking", version = "0.3.4" }
ant-protocol = { path = "../ant-protocol", version = "0.3.3" }
clap = { version = "4.5.4", features = ["derive"] }
clap-verbosity-flag = "2.2.0"
Expand Down
Loading

0 comments on commit 92ed942

Please sign in to comment.