In the world of web development and server management, unexpected conflicts between different software components can lead to challenging troubleshooting scenarios. This blog post details one such instance where the installation of k3s (Lightweight Kubernetes) interfered with an existing HTTPS web application served by Nginx, and how we methodically resolved the issue.
Background
Our journey began when https://digitaldwellings.tech
, an existing web application served via Nginx and secured with SSL, suddenly became inaccessible. Initial checks of Nginx configurations and SSL certificates didn't immediately reveal the cause of the problem. The application was supposed to be straightforward: Nginx acting as a reverse proxy, directing HTTPS traffic to a Gunicorn-served Flask application. However, the reality proved to be more complex upon further investigation.
Discovery and Initial Troubleshooting
The first step in our troubleshooting process involved confirming that Nginx was correctly configured to handle HTTPS traffic and proxy it to the application. We verified the Nginx server blocks, listened on ports 80 and 443, and ensured SSL certificates were correctly specified. Despite these configurations being correct, accessing the site still presented issues.
The Breakthrough with OpenSSL
The turning point in our investigation came when we used the openssl s_client -connect digitaldwellings.tech:443
command. This command, designed to diagnose SSL certificate issues, revealed that the server was presenting a self-signed certificate named "TRAEFIK DEFAULT CERT" rather than the expected Let's Encrypt certificate. This was our first clue that something else on the server was intercepting HTTPS traffic before it reached Nginx.
Identifying the Culprit: k3s and Traefik
Knowing that Nginx was not the source of the self-signed certificate, we explored other services on the server that could introduce such a certificate. Our attention turned to k3s, a lightweight Kubernetes distribution that had been recently installed on the server. k3s bundles Traefik as its default ingress controller, which automatically manages incoming traffic, including HTTPS connections. Traefik, not Nginx, was presenting the self-signed certificate to clients, thereby interfering with our web application.
Resolution: Disabling Traefik in k3s
With the cause identified, our next step was to disable Traefik within k3s to allow Nginx to once again handle HTTPS traffic directly. This process involved several steps:
- Removing Traefik Components: We used
kubectl
commands to delete the Traefik deployment, service, and associated resources from thekube-system
namespace within k3s. - Preventing Traefik Re-Deployment: We modified the k3s service configuration to include the
--disable traefik
flag, ensuring that Traefik would not automatically redeploy on k3s restarts. - Restarting k3s: After updating the configuration, we restarted the k3s service to apply our changes.
Conclusion and Lessons Learned
Following the removal of Traefik and the restoration of Nginx as the primary handler of HTTPS traffic, https://digitaldwellings.tech
became accessible once again. This experience highlighted several key lessons:
- Always Check for Interfering Services: When troubleshooting web server issues, consider all services that could potentially handle incoming traffic, especially on servers running multiple applications or platforms like Kubernetes.
- The Value of Diagnostic Tools: Tools like
openssl s_client
can provide invaluable insights into SSL/TLS configurations and pinpoint unexpected behaviors. - Document and Understand Changes: Installing new platforms or services on a server can have wide-reaching effects. It's crucial to document such changes and understand their potential impact on existing applications.
This troubleshooting journey, from initial confusion to final resolution, underscores the importance of a methodical approach and the use of diagnostic tools in managing complex web server environments.