Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloustack agent in KVM zone not connect, It stuck in alert when cloustack-managerment restart. #10154

Open
leduyquy opened this issue Jan 1, 2025 · 7 comments · May be fixed by #10158
Open
Milestone

Comments

@leduyquy
Copy link

leduyquy commented Jan 1, 2025

ISSUE TYPE
  • Bug Report
COMPONENT NAME
Agent
KVM hypervisor
CLOUDSTACK VERSION
4.19.1
CONFIGURATION

I setup 1 zone with 1 host use KVM and local disk for backend storage instance

OS / ENVIRONMENT

Ubuntu 22.04 for managerment server and KVM hypervisor

SUMMARY
STEPS TO REPRODUCE
down managerment server host disconnect.
up managerment server host stuck in alert state 
EXPECTED RESULTS
agent in host connect normal
ACTUAL RESULTS
agnet in host stuck in alert state.

Log managerment server.
`
2025-01-01 13:59:33,355 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Failed to handle host connection: 
com.cloud.utils.exception.CloudRuntimeException: Unable to connect 73
	at com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection([AgentManagerImpl.java:591](http://agentmanagerimpl.java:591/))
	at com.cloud.agent.manager.AgentManagerImpl.sendReadyAndGetAttache([AgentManagerImpl.java:1150](http://agentmanagerimpl.java:1150/))
	at com.cloud.agent.manager.AgentManagerImpl.handleConnectedAgent([AgentManagerImpl.java:1168](http://agentmanagerimpl.java:1168/))
	at com.cloud.agent.manager.AgentManagerImpl$HandleAgentConnectTask.runInContext([AgentManagerImpl.java:1252](http://agentmanagerimpl.java:1252/))
	at [org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run](http://org.apache.cloudstack.managed.context.managedcontextrunnable$1.run/)([ManagedContextRunnable.java:48](http://managedcontextrunnable.java:48/))
	at [org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call](http://org.apache.cloudstack.managed.context.impl.defaultmanagedcontext$1.call/)([DefaultManagedContext.java:55](http://defaultmanagedcontext.java:55/))
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext([DefaultManagedContext.java:102](http://defaultmanagedcontext.java:102/))
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext([DefaultManagedContext.java:52](http://defaultmanagedcontext.java:52/))
	at [org.apache.cloudstack.managed.context.ManagedContextRunnable.run](http://org.apache.cloudstack.managed.context.managedcontextrunnable.run/)([ManagedContextRunnable.java:45](http://managedcontextrunnable.java:45/))
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker([ThreadPoolExecutor.java:1128](http://threadpoolexecutor.java:1128/))
	at java.base/java.util.concurrent.ThreadPoolExecutor$[Worker.run](http://worker.run/)([ThreadPoolExecutor.java:628](http://threadpoolexecutor.java:628/))
	at java.base/java.lang.Thread.run([Thread.java:829](http://thread.java:829/))
Caused by: java.lang.NullPointerException
2025-01-01 13:59:33,355 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Failed to send ready command:java.nio.channels.ClosedChannelException
2025-01-01 13:59:33,356 WARN  [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Unable to create attache for agent: Seq 0-4730:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 1, [{"com.cloud.agent.api.StartupRoutingCommand":{"cpuSockets":"2","cpus":"56","speed":"2600","memory":"404267446272","dom0MinMemory":"1073741824","poolSync":"false","supportsClonedVolumes":"false","caps":"hvm,snapshot","pool":"/root","hypervisorType":"KVM","hostDetails":{"Host.OS.Kernel.Version":"5.15.0-127-generic","com.cloud.network.Networks.RouterPrivateIpStrategy":"HostLocal","Host.OS.Version":"22.04","host.volume.encryption":"true","host.instance.conversion":"true","secured":"true","Host.OS":"Ubuntu"},"hostTags":[],"groupDetails":{},"type":"Routing","dataCenter":"5","pod":"5","cluster":"5","guid":"LibvirtComputingResource","name":"","id":"0","version":"","iqn":"i","privateIpAddress”:”,”privateMacAddress":"","privateNetmask":"","storageIpAddress":"","storageNetmask":"","storageMacAddress”:””,”resourceName":"LibvirtComputingResource","gatewayIpAddress":",”msHostList":"@static","wait":"0","bypassHostMaintenance":"false"}},{"com.cloud.agent.api.StartupStorageCommand":{"totalSize":"(0 bytes) 0","poolInfo":{"uuid":"","host":"","localPath":"/var/lib/libvirt/images","hostPath":"/var/lib/libvirt/images","poolType":"Filesystem","capacityBytes":"(437.51 GB) 469771632640","availableBytes":"(423.22 GB) 454427123712"},"resourceType":"STORAGE_POOL","hostDetails":{},"type":"Storage","dataCenter":"5","pod":"5","guid":"872c55c9-d6f1-390a-a270-d21d74f017cf-LibvirtComputingResource","name":"cd-kvm05","id":"0","version":"[4.19.1.2](http://4.19.1.2/)","resourceName":"LibvirtComputingResource","msHostList":"@static","wait":"0","bypassHostMaintenance":"false"}}] }`

`log agent service 
Jan 01 07:00:29 hostnameagent java[9624]: INFO  [utils.nio.NioClient] (Agent-Handler-2:) (logid:) Connected to [managerment:8250](http://172.29.0.3:8250/)
Jan 01 07:00:29 hostnameagent java[9624]: INFO  [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Fetching CPU speed from command "lscpu".
Jan 01 07:00:29 hostnameagent java[9624]: INFO  [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Command [lscpu | grep -i 'Model name' | head -n 1 | egrep -o '[[:digit:]].[[:digit:]]+GHz' | sed 's/GHz//g'] resulted in the value [2600] for CPU speed.
Jan 01 07:00:29 hostnameagent java[9624]: INFO  [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) Host uses control group [cgroup2fs].
Jan 01 07:00:29 hostnameagent java[9624]: INFO  [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) Calculating the max shares of the host.
Jan 01 07:00:29 hostnameagent java[9624]: INFO  [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) The max shares of the host is [145600].
Jan 01 07:00:29 hostnameagent sudo[196375]:     root : PWD=/ ; USER=root ; COMMAND=/usr/bin/grep InitiatorName= /etc/iscsi/initiatorname.iscsi
Jan 01 07:00:29 hostnameagent sudo[196375]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0)
Jan 01 07:00:29 hostnameagent sudo[196375]: pam_unix(sudo:session): session closed for user root
Jan 01 07:00:29 hostnameagent java[9624]: INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Attempting to create storage pool 53f3fab7-ff3f-41d3-afc(Filesystem) in libvirt
Jan 01 07:00:29 hostnameagent java[9624]: INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Found existing defined storage pool 53f3fab7-ff3f-41d3-afcf-, using it.
Jan 01 07:00:29 hostnameagent java[9624]: INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Trying to fetch storage pool 53f3fab7-ff3f-41d3-afcf-from libvirt
Jan 01 07:00:29 hostnameagent java[9624]: INFO  [cloud.agent.Agent] (Agent-Handler-2:) (logid:) Process agent startup answer, agent id = 0
Jan 01 07:00:29 hostnameagent java[9624]: INFO  [cloud.agent.Agent] (Agent-Handler-2:) (logid:) Set agent id 0
Jan 01 07:00:29 hostnameagent java[9624]: INFO  [cloud.agent.Agent] (Agent-Handler-2:) (logid:) Startup Response Received: agent id = 0
Jan 01 07:00:29 hostnameagent java[9624]: WARN  [cloud.agent.Agent] (Agent-Handler-5:) (logid:481bb3fe) Unable to send response: null
Jan 01 07:00:34 hostnameagent java[9624]: INFO  [cloud.agent.Agent] (Agent-Handler-2:) (logid:) Connected to the host: [managerment](http://172.29.0.3/)
Jan 01 07:00:34 hostnameagent java[9624]: INFO  [cloud.agent.Agent] (Agent-Handler-2:) (logid:) Lost connection to host: [managerment](http://172.29.0.3/). Attempting reconnection while we still have 0 commands in progress.
Jan 01 07:00:34 hostnameagent java[9624]: INFO  [utils.nio.NioClient] (Agent-Handler-2:) (logid:) NioClient connection closed
Jan 01 07:00:34 hostnameagent java[9624]: INFO  [cloud.agent.Agent] (Agent-Handler-2:) (logid:) Reconnecting to host:managerment
Jan 01 07:00:34 hostnameagent java[9624]: INFO  [utils.nio.NioClient] (Agent-Handler-2:) (logid:) Connecting to [managerment:8250](http://172.29.0.3:8250/)
Jan 01 07:00:34 hostnameagent java[9624]: INFO  [[utils.nio.Link](http://utils.nio.link/)] (Agent-Handler-2:) (logid:) Conf file found: /etc/cloudstack/agent/agent.properties
Jan 01 07:00:35 hostnameagent java[9624]: INFO  [utils.nio.NioClient] (Agent-Handler-2:) (logid:) SSL: Handshake done
Jan 01 07:00:35 hostnameagent java[9624]: INFO  [utils.nio.NioClient] (Agent-Handler-2:) (logid:) Connected to [managerment:8250](http://172.29.0.3:8250/)
Jan 01 07:00:35 hostnameagent java[9624]: INFO  [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Fetching CPU speed from command "lscpu".
Jan 01 07:00:35 hostnameagent java[9624]: INFO  [utils.linux.KVMHostInfo] (Agent-Handler-1:) (logid:) Command [lscpu | grep -i 'Model name' | head -n 1 | egrep -o '[[:digit:]].[[:digit:]]+GHz' | sed 's/GHz//g'] resulted in the value [2600] for CPU speed.
Jan 01 07:00:35 hostnameagent java[9624]: INFO  [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) Host uses control group [cgroup2fs].
Jan 01 07:00:35 hostnameagent java[9624]: INFO  [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) Calculating the max shares of the host.
Jan 01 07:00:35 hostnameagent java[9624]: INFO  [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) The max shares of the host is [145600].
Jan 01 07:00:35 hostnameagent sudo[196458]:     root : PWD=/ ; USER=root ; COMMAND=/usr/bin/grep InitiatorName= /etc/iscsi/initiatorname.iscsi
Jan 01 07:00:35 hostnameagent sudo[196458]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0)
Jan 01 07:00:35 hostnameagent sudo[196458]: pam_unix(sudo:session): session closed for user root
Jan 01 07:00:35 hostnameagent java[9624]: INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Attempting to create storage pool 53f3fab7-ff3f-41d3-afcf-(Filesystem) in libvirt
Jan 01 07:00:35 hostnameagent java[9624]: INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Found existing defined storage pool 53f3fab7-ff3f-41d3-afcf-], using it.
Jan 01 07:00:35 hostnameagent java[9624]: INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Trying to fetch storage pool 53f3fab7-ff3f-41d3-afcf-from libvirt
Jan 01 07:00:35 hostnameagent java[9624]: INFO  [cloud.agent.Agent] (Agent-Handler-2:) (logid:) Process agent startup answer, agent id = 0
Jan 01 07:00:35 hostnameagent java[9624]: INFO  [cloud.agent.Agent] (Agent-Handler-2:) (logid:) Set agent id 0
Jan 01 07:00:35 hostnameagent java[9624]: INFO  [cloud.agent.Agent] (Agent-Handler-2:) (logid:) Startup Response Received: agent id = 0`

i have remove and change some information like IP, hostname,uuid for my policy
@DaanHoogland
Copy link
Contributor

@leduyquy can you find a line containing "Sending Connect to listener: " in the logs

and in general can you provide the MS logs for ctx-76fa1734 and for logid:f8b676f6. The agent logs are not to relevant I think as this is a NullPointerException occuring on the MS side.

@DaanHoogland DaanHoogland linked a pull request Jan 2, 2025 that will close this issue
14 tasks
@leduyquy
Copy link
Author

leduyquy commented Jan 3, 2025

Hi @DaanHoogland.
This is full logs for logid:f8b676f6.

2025-01-01 13:59:33,167 DEBUG [c.c.a.t.Request] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Seq 0-4730: Processing the first command { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 1, [{"com.cloud.agent.api.StartupRoutingCommand":{"cpuSockets":"2","cpus":"56","speed":"2600","memory":"404267446272","dom0MinMemory":"1073741824","poolSync":"false","supportsClonedVolumes":"false","caps":"hvm,snapshot","pool":"/root","hypervisorType":"KVM","hostDetails":{"Host.OS.Kernel.Version":"5.15.0-127-generic","com.cloud.network.Networks.RouterPrivateIpStrategy":"HostLocal","Host.OS.Version":"22.04","host.volume.encryption":"true","host.instance.conversion":"true","secured":"true","Host.OS":"Ubuntu"},"hostTags":[],"groupDetails":{},"type":"Routing","dataCenter":"5","pod":"5","cluster":"5","guid":"872c55c9-d6f1-390a-a270-d21d74f017cf-LibvirtComputingResource","name":"agent-hostname","id":"0","version":"4.19.1.2","iqn":"iqn.2004-10.com.ubuntu:01:e7f6625b59c","privateIpAddress":"agent-IP","privateMacAddress":"ae:a9:6e:83:7e:0b","privateNetmask":"255.255.255.0","storageIpAddress":"agent-IP","storageNetmask":"255.255.255.0","storageMacAddress":"ae:a9:6e:83:7e:0b","resourceName":"LibvirtComputingResource","gatewayIpAddress":"172.29.32.254","msHostList":""managerment-IP@static","wait":"0","bypassHostMaintenance":"false"}},{"com.cloud.agent.api.StartupStorageCommand":{"totalSize":"(0 bytes) 0","poolInfo":{"uuid":"53f3fab7-ff3f-41d3-afcf-6eea31db978e","host":"agent-IP","localPath":"/var/lib/libvirt/images","hostPath":"/var/lib/libvirt/images","poolType":"Filesystem","capacityBytes":"(437.51 GB) 469771632640","availableBytes":"(423.22 GB) 454427123712"},"resourceType":"STORAGE_POOL","hostDetails":{},"type":"Storage","dataCenter":"5","pod":"5","guid":"872c55c9-d6f1-390a-a270-d21d74f017cf-LibvirtComputingResource","name":"agent-hostname","id":"0","version":"4.19.1.2","resourceName":"LibvirtComputingResource","msHostList":""managerment-IP@static","wait":"0","bypassHostMaintenance":"false"}}] } 2025-01-01 13:59:33,173 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to CiscoVnmcElement 2025-01-01 13:59:33,173 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to BareMetalDiscoverer 2025-01-01 13:59:33,173 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to NetscalerElement 2025-01-01 13:59:33,173 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to HypervServerDiscoverer 2025-01-01 13:59:33,173 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to BaremetalPxeManagerImpl 2025-01-01 13:59:33,173 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to XcpServerDiscoverer 2025-01-01 13:59:33,173 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to NiciraNvp 2025-01-01 13:59:33,173 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to BrocadeVcsElement 2025-01-01 13:59:33,173 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to Ovm3Discoverer 2025-01-01 13:59:33,174 DEBUG [c.c.h.o.r.Ovm3Discoverer] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) createHostVOForConnectedAgent: Host {"id":73,"name":"agent-hostname","type":"Routing","uuid":"22e8befd-8b89-46f7-8fbf-622f99cbeca6"} 2025-01-01 13:59:33,174 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to LxcServerDiscoverer 2025-01-01 13:59:33,174 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to NetworkUsageManagerImpl 2025-01-01 13:59:33,174 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to PremiumSecondaryStorageManagerImpl 2025-01-01 13:59:33,174 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to Ovs 2025-01-01 13:59:33,174 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to VmwareServerDiscoverer 2025-01-01 13:59:33,174 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to ConsoleProxyManagerImpl 2025-01-01 13:59:33,174 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to OvmDiscoverer 2025-01-01 13:59:33,174 DEBUG [c.c.r.ResourceManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Dispatching resource state event CREATE_HOST_VO_FOR_CONNECTED to KvmServerDiscoverer 2025-01-01 13:59:33,201 DEBUG [c.c.r.ResourceState] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Resource state update: [id = 73; name = agent-hostname; old state = Enabled; event = InternalCreated; new state = Enabled] 2025-01-01 13:59:33,201 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Transition:[Resource state = Enabled, Agent event = AgentConnected, Host id = 73, name = agent-hostname] 2025-01-01 13:59:33,220 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Agent's management server host list is not up to date, sending list update:[]] 2025-01-01 13:59:33,220 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) create ClusteredAgentAttache for 73 2025-01-01 13:59:33,221 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Sending Connect to listener: XcpServerDiscoverer 2025-01-01 13:59:33,221 DEBUG [c.c.h.x.d.XcpServerDiscoverer] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Not XenServer so moving on. 2025-01-01 13:59:33,221 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Sending Connect to listener: HypervServerDiscoverer 2025-01-01 13:59:33,221 DEBUG [c.c.h.h.d.HypervServerDiscoverer] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Not Hyper-V hypervisor, so moving on. 2025-01-01 13:59:33,221 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Sending Connect to listener: TungstenElement 2025-01-01 13:59:33,222 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Sending Connect to listener: VmwareManagerImpl 2025-01-01 13:59:33,222 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Sending Connect to listener: SecondaryStorageListener 2025-01-01 13:59:33,222 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Sending Connect to listener: NetworkOrchestrator 2025-01-01 13:59:33,224 DEBUG [o.a.c.e.o.NetworkOrchestrator] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Host's hypervisorType is: KVM 2025-01-01 13:59:33,231 DEBUG [o.a.c.e.o.NetworkOrchestrator] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Sending CheckNetworkCommand to check the Network is setup correctly on Agent 2025-01-01 13:59:33,234 DEBUG [c.c.a.t.Request] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Seq 73-6903455278805549057: Sending { Cmd , MgmtId: 345050445235, via: 73(agent-hostname), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.CheckNetworkCommand":{"networkInfoList":[{"physicalNetworkId":"209","privateNetworkName":"cloudbr0","storageNetworkName":"cloudbr0"},{"physicalNetworkId":"210","guestNetworkName":"cloudbr1"}],"wait":"0","bypassHostMaintenance":"false"}}] } 2025-01-01 13:59:33,278 DEBUG [c.c.a.t.Request] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Seq 73-6903455278805549057: Received: { Ans: , MgmtId: 345050445235, via: 73(agent-hostname), Ver: v1, Flags: 110, { CheckNetworkAnswer } } 2025-01-01 13:59:33,282 DEBUG [o.a.c.e.o.NetworkOrchestrator] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Network setup is correct on Agent 2025-01-01 13:59:33,282 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Sending Connect to listener: SecurityGroupListener 2025-01-01 13:59:33,282 INFO [c.c.n.s.SecurityGroupListener] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Received a host startup notification 2025-01-01 13:59:33,284 DEBUG [c.c.a.t.Request] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Seq 73-6903455278805549058: Sending { Cmd , MgmtId: 345050445235, via: 73(agent-hostname), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CleanupNetworkRulesCmd":{"interval":"2413","wait":"0","bypassHostMaintenance":"false"}}] } 2025-01-01 13:59:33,284 INFO [c.c.n.s.SecurityGroupListener] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Scheduled network rules cleanup, interval=2413 2025-01-01 13:59:33,284 INFO [c.c.n.s.SecurityGroupListener] (AgentConnectTaskPool-10755:ctx-76fa1734) (logid:f8b676f6) Received a hos

@DaanHoogland
Copy link
Contributor

This is full logs for logid:f8b676f6.

Hi @leduyquy , it seems cut short as it does not include (go up to) the exception trace itself.

btw, I am trying to reproduce it myself in the meanwhile, and the fact that a NullPointerException is thrown is prawn to improvement anyway.

@leduyquy
Copy link
Author

leduyquy commented Jan 3, 2025

Hi @DaanHoogland , sorry my bad. Maybe it's cut off by my browser's character limit
I have attached full logs in the attached file. Please help me check it.
logs.txt

@DaanHoogland
Copy link
Contributor

@leduyquy the log looks good but it is not for the same ID

2025-01-03 15:46:17,798 ERROR [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-17819:ctx-02ff844d) (logid:0e3ec52f) Monitor DownloadListener says there is an error in the connect process for 73 due to null

sugests it remembered the bad state from the last run. Can you go to the log from the first of january and do the following

zgrep logid:f8b676f6 management-server.log.2025-01-01.gz >log.txt and
zgrep ctx-76fa1734 management-server.log.2025-01-01.gz >context.txt and attach the files, please?

@leduyquy
Copy link
Author

leduyquy commented Jan 3, 2025

Hi @DaanHoogland.

I have attached the information you need in the attached files. Besides. I deployed cloudstack with non-internet environment so it may encounter errors when downloading templates from the internet.
contents.txt
log.txt

@DaanHoogland
Copy link
Contributor

@leduyquy I have not found more info in your logs that could help and I tried to replicate your scenario, but no error occured. Can you think of anything else that might have been in play?

In the mean while I created a (purely theoretical) fix in #10158 . Can you try that ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

2 participants