Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSVM System VM fails to add static route to NFS server (ip route add x.x.x.x via null) #10163

Open
KobesM opened this issue Jan 5, 2025 · 5 comments

Comments

@KobesM
Copy link

KobesM commented Jan 5, 2025

ISSUE TYPE
  • Bug Report
COMPONENT NAME
secondarystoragevm
CLOUDSTACK VERSION
4.20.0.0
CONFIGURATION

Advanced Networking
Single Zone
physical network

  • Management gw 10.143.51.1, mask 255.255.255.0, vlan://untagged, start 10.143.51.151, end 10.143.51.200
  • Public gw 10.143.51.1, mask 255.255.255.0, vlan://51, start 10.143.51.101, end 10.143.51.150
OS / ENVIRONMENT

Almalinux 9.5 Manager and KVM hosts

SUMMARY

I just deployed Apache Cloudstack 4.20 as a fresh install but can't upload images to the secondairy storage. When uploading the image using the UI I receive the following error "Failed to upload Template - Error: Network Error".

STEPS TO REPRODUCE

When logging into the secondarystoragevm (ssh from KVM host) i found the following errors in the /var/log/cloud.log:

2025-01-05T20:45:36,290 WARN  [cloud.agent.Agent] (agentRequest-Handler-3:[]) Caught: com.cloud.utils.exception.CloudRuntimeException: Failed to get root directory from secondary storage URL [nfs://<nfs path>], using NFS version [null], due to [Unable to mount /192.168.1.21:<nfs path> at /mnt/SecStorage/d096bb1f-552a-3424-ab65-63c694685108 due to mount.nfs: No route to host].

and

2025-01-05T20:19:35,110 WARN  [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-2:[]) Execution of process [3587] for command [/bin/bash -c ip route add 192.168.1.21 via null ] failed.

Requesting the route table there is no route to 192.168.1.21 (nfs server):

# ip route
default via 10.143.51.1 dev eth2
10.143.51.0/24 dev eth1 proto kernel scope link src 10.143.51.196
10.143.51.0/24 dev eth2 proto kernel scope link src 10.143.51.101
169.254.0.0/16 dev eth0 proto kernel scope link src 169.254.46.109
192.168.1.22 via 10.143.51.1 dev eth1
EXPECTED RESULTS
System VM would add a route to the NFS server.
ACTUAL RESULTS
It seams to me for some reason the system VM is not able to add the correct route to the NFS server, which in this case is a Synology NAS. The log states it wants to add a static route using the device "null"? I found the consoleproxy vm does not have this issue and adds the static route to 192.168.1.21 without issues (Synology NAS also acts as the internal DNS server).
@iishitahere
Copy link

Dear @KobesM @DaanHoogland ,

I hope this message finds you well.

I have reviewed the details of the issue regarding the secondary storage VM failing to add the correct route to the NFS server. It seems like a routing misconfiguration or a network connectivity issue is causing the problem.

I would like to take this opportunity to work on resolving this issue. If the issue is still open, could you please assign it to me? I will ensure to provide updates regularly and propose a fix after thoroughly investigating the root cause.

Looking forward to your confirmation and any additional input or guidance you may have regarding this issue.

Best regards,
Ishita Jaiswal

@iishitahere
Copy link

Hi @DaanHoogland ,
I have reviewed the issue and my approach to address it. The problem seems to be that the system VM is unable to add the correct route to the NFS server, as indicated by the error logs showing the use of null in the route table, which leads to the "No route to host" error.

My approach to solve this issue would involve the following steps:

Investigate Route Table Configuration: I'll start by reviewing the route table configuration on the system VM and identifying why the static route to the NFS server is not being added correctly. It seems that the null value is being passed where an actual device or gateway should be specified.

Verify Network Interfaces and Gateway Configuration: I will check the network interfaces on the secondary storage VM and confirm that the correct gateway is set, ensuring it is able to route traffic to the 192.168.1.21 NFS server.

Check NFS Mount Configuration: I'll verify the NFS configuration and ensure that the system VM has proper access permissions and that the NFS server is reachable from the secondary storage VM.

Manually Add Route (Testing): I’ll test adding the static route manually using the ip route add command to see if it resolves the issue and confirm that the system VM can reach the NFS server.

Implement Solution in Code: Once I have identified the root cause, I will make any necessary changes to the code to ensure the system VM correctly adds the route to the NFS server. This may involve updating the networking or route handling logic within CloudStack.

I will proceed with debugging the issue and keep you updated on my progress. Let me know if there are any additional details that could assist in troubleshooting.

Best Regards,
Ishita Jaiswal

@weizhouapache
Copy link
Member

Hi @DaanHoogland , I have reviewed the issue and my approach to address it. The problem seems to be that the system VM is unable to add the correct route to the NFS server, as indicated by the error logs showing the use of null in the route table, which leads to the "No route to host" error.

My approach to solve this issue would involve the following steps:

Investigate Route Table Configuration: I'll start by reviewing the route table configuration on the system VM and identifying why the static route to the NFS server is not being added correctly. It seems that the null value is being passed where an actual device or gateway should be specified.

Verify Network Interfaces and Gateway Configuration: I will check the network interfaces on the secondary storage VM and confirm that the correct gateway is set, ensuring it is able to route traffic to the 192.168.1.21 NFS server.

Check NFS Mount Configuration: I'll verify the NFS configuration and ensure that the system VM has proper access permissions and that the NFS server is reachable from the secondary storage VM.

Manually Add Route (Testing): I’ll test adding the static route manually using the ip route add command to see if it resolves the issue and confirm that the system VM can reach the NFS server.

Implement Solution in Code: Once I have identified the root cause, I will make any necessary changes to the code to ensure the system VM correctly adds the route to the NFS server. This may involve updating the networking or route handling logic within CloudStack.

I will proceed with debugging the issue and keep you updated on my progress. Let me know if there are any additional details that could assist in troubleshooting.

Best Regards, Ishita Jaiswal

@iishitahere
in my opinion, the root cause is obvious and the fix seems simple
you can create a PR if you have the fix

by the way, do you have an environment to verify the fix ?

@iishitahere
Copy link

Hi @DaanHoogland , I have reviewed the issue and my approach to address it. The problem seems to be that the system VM is unable to add the correct route to the NFS server, as indicated by the error logs showing the use of null in the route table, which leads to the "No route to host" error.
My approach to solve this issue would involve the following steps:
Investigate Route Table Configuration: I'll start by reviewing the route table configuration on the system VM and identifying why the static route to the NFS server is not being added correctly. It seems that the null value is being passed where an actual device or gateway should be specified.
Verify Network Interfaces and Gateway Configuration: I will check the network interfaces on the secondary storage VM and confirm that the correct gateway is set, ensuring it is able to route traffic to the 192.168.1.21 NFS server.
Check NFS Mount Configuration: I'll verify the NFS configuration and ensure that the system VM has proper access permissions and that the NFS server is reachable from the secondary storage VM.
Manually Add Route (Testing): I’ll test adding the static route manually using the ip route add command to see if it resolves the issue and confirm that the system VM can reach the NFS server.
Implement Solution in Code: Once I have identified the root cause, I will make any necessary changes to the code to ensure the system VM correctly adds the route to the NFS server. This may involve updating the networking or route handling logic within CloudStack.
I will proceed with debugging the issue and keep you updated on my progress. Let me know if there are any additional details that could assist in troubleshooting.
Best Regards, Ishita Jaiswal

@iishitahere in my opinion, the root cause is obvious and the fix seems simple you can create a PR if you have the fix

by the way, do you have an environment to verify the fix ?

Yes, I do have the environment to verify the fix. I’m currently working on the PR and will update you once it's ready.
I will submit by day after tomorrow for sure!

@KobesM
Copy link
Author

KobesM commented Jan 10, 2025

A small update from my side... I verified if manually adding the route to the nfs server would work:

After logging into the SSVM the /var/log/cloud.log has the following warning logged as stated earlier:

2025-01-08T21:53:23,824 WARN  [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-2:[]) Execution of process [3611] for command [/bin/bash -c ip route add 192.168.1.21 via null ] failed.

At this point there is no route to 192.168.1.21:

root@s-2-VM:~# ip route
default via 10.143.71.1 dev eth2
10.143.51.0/24 dev eth1 proto kernel scope link src 10.143.51.111
10.143.71.0/24 dev eth2 proto kernel scope link src 10.143.71.102
169.254.0.0/16 dev eth0 proto kernel scope link src 169.254.17.84
192.168.1.22 via 10.143.51.1 dev eth1

So I manually added the route:

root@s-2-VM:~# ip route add 192.168.1.21 via 10.143.51.1

The route to 192.168.1.21 now does exist:

root@s-2-VM:~# ip route
default via 10.143.71.1 dev eth2
10.143.51.0/24 dev eth1 proto kernel scope link src 10.143.51.111
10.143.71.0/24 dev eth2 proto kernel scope link src 10.143.71.102
169.254.0.0/16 dev eth0 proto kernel scope link src 169.254.17.84
192.168.1.21 via 10.143.51.1 dev eth1
192.168.1.22 via 10.143.51.1 dev eth1

Also the /var/log/cloud.log reports it found the nfs server and created the required folders:

2025-01-10T19:34:57,532 INFO  [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-5:[]) Determined host nfs01.local.ironhive.nl corresponds to IP 192.168.1.21
2025-01-10T19:34:57,927 INFO  [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-5:[]) snapshots directory created/exists on Secondary Storage.
2025-01-10T19:34:57,939 INFO  [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-5:[]) volumes directory created/exists on Secondary Storage.

At this point I was also able to upload an image from the UI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants