Solving Harvester Operational Challenges: Expert Troubleshooting Techniques
Master the art of troubleshooting Harvester with this detailed guide. From overcoming cluster deployment hurdles to ensuring smooth access to embedded dashboards, we provide you with the tools and insights needed to address common pitfalls. This guide not only offers solutions but also enhances your troubleshooting skills with additional steps and SEO-friendly content to ensure you can navigate through Harvester’s intricacies with ease.
Troubleshooting Harvester Operational Challenges
Harvester, a powerful HCI solution, can encounter operational challenges that require expert troubleshooting. This guide provides comprehensive solutions to common issues, ensuring a seamless Harvester experience.
Install hanging at rke2-images
When installing Harvester via the ISO image, the process may hang at the rke2-images
stage when using iDRAC or RSA KVM. This issue is caused by the installation process trying to pull the rke2-images
from the ISO image and failing to do so due to the source being too slow.
Solution
- Physically attach a USB drive or DVD drive to the server and mount the Harvester ISO image for the installation process.
- Don’t use iDRAC or RSA KVM over a slow network connection for the installation process IE a VPN.
- Ignore the
rke2-images
stage and continue with the installation process as the images will be pulled from the internet during the first boot.
The first node is stuck in the “Pending” state
When creating a Harvester cluster, the first node may get stuck in the “Pending” state, preventing the cluster from being created. This issue can be caused by a variety of factors, including network issues, misconfigured settings, or a lack of resources.
Solution
- Wait, as the first node may take some time (15~20mins) to initialize and become ready.
- SSH into the first node and check the logs for any errors or issues.
- Ensure the first node has enough resources to run Harvester. The minimum requirements are 8 CPU cores, 32GB of RAM, and 500GB of disk space.
Second node fails to join the cluster
After successfully creating the first node, the second node may fail to join the cluster, preventing the cluster from expanding. This issue can be caused by network issues, misconfigured, or first node not being ready.
Solution
- Ensure the first node is 100% ready and operational before attempting to add the second node.
- Check and verify the Harvester UI is accessible and operational.
- Verify the Harvester VIP is accessible and operational.
- Check firewall rules between the first and second node to ensure they can communicate with each other. See the RKE2 documentation for more information on the required ports and protocols. RKE2 Ports
Incorrect HTTP Proxy Configurations
When setting up a multi-node Harvester cluster, HTTP proxy configurations can cause issues, preventing the cluster from being created or expanded.
Solution
- It’s important to adjust the NO_PROXY environment variable to include the Harvester VIP and the IP addresses of the nodes in the cluster. For example,
localhost,127.0.0.1,0.0.0.0,10.0.0.0/8,longhorn-system,cattle-system,cattle-system.svc,harvester-system,.svc,.cluster.local.
This will ensure that the nodes can communicate with each other and the Harvester VIP without going through the HTTP proxy IE we don’t want the nodes to go out to the proxy to communicate with each other. - If you’re using the Harvester ISO image, you can set the HTTP proxy during the installation process by adding
http_proxy=http://your-proxy:port
to the kernel command line. This will set the HTTP proxy for the installation process and the first boot.
CNI / Networking Issues
CNI / Networking issues can cause a variety of problems, including nodes not being able to communicate with each other, pods not being able to communicate with each other, and pods not being able to communicate with the outside world.
It’s important to ensure that the CNI / Networking is properly configured and operational to prevent these issues. In addition, it’s important to remember that Harvester uses RKE2, which uses canal with multus.
Solution
- SSH into one of the master nodes and run
kubectl get nodes
to ensure all the nodes are in theReady
state. - Run
kubectl -n kube-system get pods
to ensure all the pods are in theRunning
state. - Run the overlay network test to ensure the nodes can communicate with each other. This can be done by creating a pod on each node and pinging the pod on the other node. If the pods can communicate with each other, the overlay network is operational. See the official docs at KB000020831
Secure Connections to the Harvester UI
When accessing the Harvester UI by default the page uses a self-signed certificate, which can cause issues with browsers and other clients. It’s important to ensure that the connections to the Harvester UI are secure and trusted.
Solution
- Replace the self-signed certificate with a trusted certificate from a Certificate Authority (CA). See the official docs at Advanced Settings
Capturing Logs
If you’re still experiencing issues after following the solutions provided, it’s important to capture the logs and provide them to the Harvester team for further assistance.
- If you’re experiencing issues with the installation process, you can capture the logs by running
journalctl -u harvester-installer
andjournalctl -u harvester-installer-iso
on the installer node. - If you’re experiencing issues with the cluster, you can capture the logs by generating a support bundle from the Harvester UI. See the official docs at Support Bundle
- If you’re experiencing issues with the Harvester UI, you can capture the logs by following the official docs at Manually Download and Retain a Support Bundle File
- Access the Harvester UI and navigate to
System
>Support Bundle
>Download
to capture the logs. - If you can also access the hidden support page by navigating to
Preferences
and check theEnable Extension developer features
box underAdvanced Features
. Then navigate toSupport
at the bottom left of the page. - If you’re still experiencing issues with Longhorn, you can capture the logs by following the official docs at Longhorn Troubleshooting. Note: Longhorn is the default storage solution for Harvester and is tightly integrated with Harvester so it’s important to not make any changes to Longhorn without consulting the Harvester team and/or the Longhorn team.
Conclusion
By following the solutions provided in this guide, you can overcome common Harvester operational challenges and ensure a seamless experience. Elevate your troubleshooting skills and unlock the full potential of Harvester HCI.