Understanding CoreDNS and NodeLocalDNS: Troubleshooting and Monitoring
DNS is the cornerstone of any Kubernetes cluster, enabling seamless communication between services. CoreDNS and NodeLocalDNS are key components in ensuring efficient DNS resolution. In this blog post, we explore what CoreDNS and NodeLocalDNS are, their role in RKE2 clusters, how to troubleshoot common issues, and techniques for effective monitoring.
What is CoreDNS?
CoreDNS is a flexible, extensible DNS server that serves as the default DNS provider for Kubernetes clusters. It performs name resolution within the cluster and provides:
- Service discovery via DNS for Kubernetes services.
- DNS query forwarding for external domains.
- Plugin-based architecture for customization.
CoreDNS in RKE2
In RKE2, CoreDNS is deployed as a system Pod managed by the rke2-coredns-addon
. It resides in the kube-system
namespace and is integrated into the cluster’s networking stack. RKE2 customizes CoreDNS through its ConfigMap, enabling features such as:
- Cluster-wide DNS forwarding.
- Integration with the cluster’s service and pod network.
- Compatibility with RKE2-specific components like Canal or Calico.
What is NodeLocalDNS?
NodeLocalDNS is an extension to CoreDNS that runs as a DaemonSet on Kubernetes nodes. It improves DNS query performance by caching responses locally on each node. Benefits include:
- Reduced DNS query latency.
- Improved cluster resilience by reducing CoreDNS load.
- Optimized DNS performance for high-scale clusters.
NodeLocalDNS in RKE2
In RKE2, NodeLocalDNS is optional but recommended for clusters with high DNS query volumes or latency-sensitive applications. When enabled:
- A dedicated IP is assigned for DNS caching on each node.
- It intercepts DNS queries from Pods and serves cached responses or forwards them to CoreDNS.
- The configuration is managed via the
nodelocaldns
Helm chart, part of the RKE2 addons.
To enable NodeLocalDNS in RKE2, configure the following:
- Update the RKE2 config file (
/etc/rancher/rke2/config.yaml
) to include NodeLocalDNS settings. - Deploy the
nodelocaldns
Helm chart using RKE2’s Helm controller.
Applying the following configuration to the RKE2 config file will enable NodeLocalDNS if you are running cilium as the CNI:
---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-cilium
namespace: kube-system
spec:
valuesContent: |-
kubeProxyReplacement: true
k8sServiceHost: 127.0.0.1
k8sServicePort: 6443
localRedirectPolicy: true
---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-coredns
namespace: kube-system
spec:
valuesContent: |-
nodelocal:
enabled: true
use_cilium_lrp: true
---
If you are using Canal as the CNI, you can enable NodeLocalDNS by applying the following configuration:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-coredns
namespace: kube-system
spec:
valuesContent: |-
nodelocal:
enabled: true
Troubleshooting CoreDNS and NodeLocalDNS
Common CoreDNS Issues and Fixes
DNS Resolution Failures
- Symptoms: Applications report “unknown host” or fail to resolve service names.
- Checks:
- Ensure CoreDNS pods are running:
kubectl get pods -n kube-system -l k8s-app=kube-dns
. - Review CoreDNS logs:
kubectl logs -n kube-system -l k8s-app=kube-dns
. - Test resolution with
nslookup
ordig
. - Test internal DNS:
kubectl run dns-test --rm -it --image=busybox --restart=Never -- nslookup <service-name>.<namespace>.svc.cluster.local
. - Test external DNS:
kubectl run dns-test --rm -it --image=busybox --restart=Never -- nslookup google.com
.
- Ensure CoreDNS pods are running:
- Fixes:
- Validate CoreDNS ConfigMap for syntax errors.
- Ensure correct upstream servers are specified in the forward plugin.
High Latency in DNS Queries
- Symptoms: Applications experience delays in resolving hostnames.
- Checks:
- Look for high CPU or memory usage in CoreDNS pods.
- Monitor DNS query time with
kubectl exec
to test queries.
- Fixes:
- Enable caching in the CoreDNS ConfigMap.
- Scale CoreDNS replicas for better load distribution.
NodeLocalDNS Troubleshooting
Local Cache Not Responding
- Symptoms: DNS queries are slow, bypassing NodeLocalDNS.
- Checks:
- Verify NodeLocalDNS DaemonSet is running:
kubectl get pods -n kube-system -l k8s-app=node-local-dns
. - Test DNS resolution directly via NodeLocalDNS IP using a pod:
kubectl run dns-test --rm -it --image=busybox --restart=Never -- nslookup kubernetes.default.svc.cluster.local <NodeLocalDNS-IP>
.
- Verify NodeLocalDNS DaemonSet is running:
- Fixes:
- Restart NodeLocalDNS pods to reset cache.
- Verify node-level IP and configuration.
Overlay Test
As part of Rancher’s overlay test, which can be found here, you can deploy it to the Rancher environment by running the following command:
kubectl apply -f https://raw.githubusercontent.com/rancherlabs/support-tools/master/swiss-army-knife/overlaytest.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: overlaytest
spec:
selector:
matchLabels:
name: overlaytest
template:
metadata:
labels:
name: overlaytest
spec:
tolerations:
- operator: Exists
containers:
- image: rancherlabs/swiss-army-knife
imagePullPolicy: IfNotPresent
name: overlaytest
command: ["sh", "-c", "tail -f /dev/null"]
terminationMessagePath: /dev/termination-log
This will deploy a DaemonSet that will run on all nodes in the cluster. These pods will be running tail -f /dev/null
, which will do nothing but keep the pod running.
To run the overlay test script, execute the following command:
curl -sfL https://raw.githubusercontent.com/rancherlabs/support-tools/master/swiss-army-knife/overlaytest.sh | bash
The overlay test script includes the option to enable DNS checks using the --dns-check
flag. Here’s an example script:
#!/bin/bash
DNS_CHECK=false
# Parse arguments
while [[ $# -gt 0 ]]; do
case $1 in
--dns-check)
DNS_CHECK=true
shift
;;
*)
echo "Unknown option: $1"
exit 1
;;
esac
done
echo "=> Start network overlay, DNS, and API test"
kubectl get pods -l name=overlaytest -o jsonpath='{range .items[*]}{@.metadata.name}{" "}{@.spec.nodeName}{" "}{@.status.podIP}{"\n"}{end}' | sort -k2 |
while read spod shost sip
do
echo "Testing pod $spod on node $shost with IP $sip"
# Overlay network test
echo " => Testing overlay network connectivity"
kubectl get pods -l name=overlaytest -o jsonpath='{range .items[*]}{@.status.podIP}{" "}{@.spec.nodeName}{"\n"}{end}' | sort -k2 |
while read tip thost
do
if [[ ! $shost == $thost ]]; then
kubectl --request-timeout='10s' exec $spod -c overlaytest -- /bin/sh -c "ping -c2 $tip > /dev/null 2>&1"
RC=$?
if [ $RC -ne 0 ]; then
echo " FAIL: $spod on $shost cannot reach pod IP $tip on $thost"
else
echo " PASS: $spod on $shost can reach pod IP $tip on $thost"
fi
fi
done
if $DNS_CHECK; then
# Internal DNS test
echo " => Testing internal DNS"
kubectl --request-timeout='10s' exec $spod -c overlaytest -- /bin/sh -c "nslookup kubernetes.default > /dev/null 2>&1"
RC=$?
if [ $RC -ne 0 ]; then
echo " FAIL: $spod cannot resolve internal DNS for 'kubernetes.default'"
else
echo " PASS: $spod can resolve internal DNS for 'kubernetes.default'"
fi
# External DNS test
echo " => Testing external DNS"
kubectl --request-timeout='10s' exec $spod -c overlaytest -- /bin/sh -c "nslookup rancher.com > /dev/null 2>&1"
RC=$?
if [ $RC -ne 0 ]; then
echo " FAIL: $spod cannot resolve external DNS for 'rancher.com'"
else
echo " PASS: $spod can resolve external DNS for 'rancher.com'"
fi
else
echo " => DNS checks are skipped. Use --dns-check to enable."
fi
done
echo "=> End network overlay, DNS, and API test"
This script allows for comprehensive testing of overlay network connectivity, internal DNS, and external DNS. The --dns-check
flag enables DNS-specific tests, ensuring the cluster’s DNS functionality is verified.
Monitoring CoreDNS and NodeLocalDNS
Using Metrics and Logs
CoreDNS Metrics
CoreDNS exports Prometheus metrics that provide insights into:
- DNS query rates (
coredns_dns_request_count_total
). - Errors in query processing (
coredns_dns_request_error_count_total
). - Cache hits and misses.
NodeLocalDNS Metrics
Monitor node-level metrics for latency and cache performance. In RKE2, metrics can be scraped from NodeLocalDNS DaemonSet pods by integrating Prometheus.
Visualization with Grafana
Dashboards:
- Import a Kubernetes DNS monitoring dashboard in Grafana (e.g., Dashboard ID: 14981).
- Display metrics like query rates, error rates, and latency.
Alerting:
- Configure alerts for high error rates or unusual latency.
Conclusion
CoreDNS and NodeLocalDNS are critical to Kubernetes cluster operations, especially in RKE2. Understanding their configurations, troubleshooting effectively, and setting up robust monitoring can ensure your cluster’s DNS subsystem operates smoothly. By leveraging tools like Prometheus and Grafana, you can proactively address issues and optimize DNS performance.