Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop rpc replace metrics #2670

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ermineJose
Copy link
Contributor

@ermineJose ermineJose commented Jan 25, 2025

Description

ant-node-manager currently uses rpc server to collect node_information and network_information, replace the rpc server with metric server.

Related Issue

Fixes #<issue_number> (if applicable).

Type of Change

Please mark the types of changes made in this pull request.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Other (please describe):

Checklist

Please ensure all of the following tasks have been completed:

  • I have read the contributing guidelines.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have updated the documentation accordingly.
  • I have followed the conventional commits guidelines for commit messages.
  • I have verified my commit messages with commitlint.

@ermineJose ermineJose force-pushed the drop-Rpc-replace-Metrics branch from 3786eb7 to cb994a9 Compare January 28, 2025 15:55
ant-networking/src/metrics/service.rs Dismissed Show dismissed Hide dismissed
let service = NodeService::new(node, Box::new(rpc_client));
// TODO: remove this as we have no way to know the reward balance of nodes since EVM payments!
let metric_client = MetricClient::new(node.metrics_port.unwrap());
let service = NodeService::new(node, Box::new(metric_client)); // TODO: remove this as we have no way to know the reward balance of nodes since EVM payments!

Check notice

Code scanning / devskim

A "TODO" or similar was left in source code, possibly indicating incomplete functionality Note

Suspicious comment
ant-service-management/src/metric.rs Fixed Show fixed Hide fixed
@ermineJose ermineJose force-pushed the drop-Rpc-replace-Metrics branch from 63cd487 to b0761aa Compare January 28, 2025 16:48
@ermineJose ermineJose force-pushed the drop-Rpc-replace-Metrics branch from b0761aa to 35433fc Compare January 28, 2025 16:53
@@ -438,6 +438,7 @@ impl NetworkBuilder {
is_client: bool,
req_res_protocol: ProtocolSupport,
upnp: bool,
root_dir: Option<PathBuf>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is optional, maybe should this be a method inside the NetworkBuilder?

Comment on lines +221 to +236
// Run the node
let runing_node_metrics = running_node.clone();
let _return_value = tokio::spawn(async move {
sleep(Duration::from_millis(200)).await;
let state = runing_node_metrics.get_swarm_local_state().await.expect("Failed to get swarm local state");
let connected_peers = state.connected_peers.iter().map(|p| p.to_string()).collect();
let listeners = state.listeners.iter().map(|m| m.to_string()).collect();
let network_info = NetworkInfoMetrics::new(connected_peers, listeners);

write_network_metrics_to_file(
runing_node_metrics.root_dir_path.clone(),
network_info,
runing_node_metrics.network.peer_id().to_string()
);
});

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things here:

  • It would be better if we don't import things from service_management inside ant_node. They should be isolated.
  • The listeners can be added or removed, and it can happen throughout the lifetime of the program (for private nodes), it would be better if you could move the logic inside the NetworkEvent::NewListenAddr event, which is handled inside antnode to write the listen addr to the file.
    • Since we also have to deal with removal of the listen addr (for private nodes), we should also create a new NetworkEvent::ListenerClosed event which will be emitted from SwarmEvent::ListenerClosed. This newly created event has to be then handled inside antnode and you should remove that particular listenaddr from the file.

for sample in scrape.samples.iter() {
for (key, value) in sample.labels.iter() {
match key.as_str() {
"peer_id" => node_info.peer_id = value.parse().unwrap(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use unwrap or expect in our code as it may cause the program to crash instantly, we should return error types here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the unwraps have to be removed.

}
}

pub async fn get_endpoint_metrics(&self, endpoint_name: &str) -> Result<prometheus_parse::Scrape, Box<dyn std::error::Error>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probalby create a new error.rs or if one already exists, add new enums for the errors returned from these functions. Box<dyn std::error::Error> is a dynamic object and we cannot easily downcast it back to the actual error.
We use something called as thiserror for error management, it is a good read.

@@ -13,6 +13,7 @@ pub mod error;
pub mod faucet;
pub mod node;
pub mod rpc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the references to rpc should be removed from our codebase.

Comment on lines +160 to +161
impl RpcActions for MetricClient {
async fn node_info(&self) -> Result<NodeInfo> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the Action here:

  • We must rename RpcActions -> MetricActions
  • remove the unused methods
  • Also the expect has to be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was also puzzled about the use of RpcActions. We shouldn't have references to RPC any more.

self.get_node_info_from_metadata_extended(&scrape, &mut node_info);
let scrape = self.get_endpoint_metrics("metrics").await.expect("Failed to get endpoint metrics");
self.get_node_info_from_metrics(&scrape, &mut node_info);
println!("node_info: {:?}", node_info);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume these println are for debugging? It would be great if these can be tidied up.

self.endpoint_port
);

let body = reqwest::get(&format!("http://localhost:{}/{endpoint_name}", self.endpoint_port))

Check notice

Code scanning / devskim

Accessing localhost could indicate debug code, or could hinder scaling. Note

Do not leave debug code in production
@ermineJose ermineJose force-pushed the drop-Rpc-replace-Metrics branch 2 times, most recently from df91d89 to cb8c46f Compare January 30, 2025 16:02
@ermineJose ermineJose force-pushed the drop-Rpc-replace-Metrics branch from cb8c46f to db7c11b Compare January 30, 2025 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants