Kubernetes: An Overview and How to Get Started with Kubernetes

The History and Motivation Behind Kubernetes

In today’s tech landscape, Kubernetes is a staple for deploying and managing applications in the cloud. But to understand why Kubernetes exists and how it has revolutionized cloud infrastructure, it’s important to look back at the history of deploying applications and how the industry has evolved over the last two decades.

The 2000s: Bare Metal and Monolithic Applications

In the early 2000s, cloud technology was non-existent. Deploying applications to the internet required physical infrastructure, typically managed within a data center. Companies either owned and operated their servers on-premises or used shared facilities where providers handled the power, cooling, and basic infrastructure while the companies managed their servers within the location.

In this setup, dedicated system administrators were responsible for provisioning and maintaining fleets of servers. This environment, known as “bare metal,” meant running applications directly on hardware without virtualization, which made it challenging to scale and manage. Due to the complexities involved, most applications were monolithic, with all functionality packed into a single codebase. While this limited flexibility, it simplified the operational overhead of managing applications on physical hardware.

Infrastructure tooling was rudimentary, relying heavily on custom-built scripts, manual configurations, and a variety of hacks to handle monitoring and maintenance. Each application deployment was a customized operation, and scalability was limited.

The 2010s: The Rise of the Cloud and Configuration Management

As we moved into the 2010s, cloud technology began to reshape infrastructure. Cloud providers offered virtual machines (VMs) that could be created or destroyed within minutes, eliminating the need for physical hardware. This shift introduced a new operational model, where companies could scale up or down quickly, spinning up new VMs to meet demand.

With cloud infrastructure came the rise of configuration management tools like Puppet and Chef, which allowed administrators to programmatically set up and maintain servers. This enabled more efficient configuration of fleets of servers, although certain tasks, such as fitting multiple applications onto a single VM, still required manual effort.

Cloud infrastructure made microservices architectures feasible. Instead of running monolithic applications, teams could split functionality across different services, each with its own deployment lifecycle. However, managing large numbers of cloud resources was still challenging, with cloud teams manually handling much of the workload placement and scaling.

Recent Years: Containers and the Orchestration Era

In the last few years, containerization became the industry standard for deploying applications, and Kubernetes emerged as the de facto orchestrator for managing these containers at scale. With Kubernetes, administrators no longer had to view each server individually but could treat clusters of servers as a single resource pool.

Kubernetes and similar orchestrators address common challenges in managing applications, such as:

Automatic and Efficient Scheduling: Instead of manually deciding where each application runs, Kubernetes handles this based on specified CPU and memory requirements.
Health Checks: Kubernetes can monitor applications and automatically restart or replace them if they enter an unhealthy state.
Service Discovery: Built-in service discovery allows applications to locate and communicate with each other, streamlining inter-service communication.
Configuration Management: Kubernetes shifts configuration from the host level to the orchestrator layer, letting teams define application needs as part of their deployment config.
Automatic Scaling: Kubernetes allows applications to scale up or down based on demand.
Persistent Storage Management: The platform makes it easy to manage storage alongside applications.
Networking Across Applications: Kubernetes provides a standardized way to handle networking within a cluster.

The Birth of Kubernetes: From Google’s Borg to Open Source Powerhouse

Kubernetes was born from Google’s internal workload orchestrator, Borg, which they had refined over many years to manage their vast infrastructure. Google saw an opportunity to open-source a more general version of Borg to expand its cloud footprint and differentiate itself in the cloud industry. The Kubernetes project started as a collaboration between Google and other major players in the tech industry, eventually becoming a cornerstone of modern cloud infrastructure.

Kubernetes Technology Overview: Key Components and Architecture

We’ll break down the roles and functions of Kubernetes’ main components and how they enable seamless scaling, scheduling, and deployment.

Key Terms in Kubernetes Architecture

To begin, here are four foundational terms to understand in Kubernetes:

Cluster: A set of resources that collectively form the Kubernetes system.
Node: Each server within a cluster, which can be either a virtual machine or a physical (bare metal) machine.
Control Plane: Where Kubernetes system components run to manage the cluster.
Data Plane: Where user-deployed applications, also called workloads, run on worker nodes.

In a simple setup, the control plane and data plane may run on a single node, but most production setups have separate nodes dedicated to each plane.

Kubernetes System Components

The Kubernetes architecture consists of several core components that run within the control plane and data plane. Let’s walk through each component and its function.

Kubernetes Architecture with System Components

Cloud Controller Manager: This component acts as the interface between Kubernetes and the cloud provider, handling tasks such as provisioning resources like load balancers through API calls to the cloud.
Controller Manager: Kubernetes operates on a control loop concept, continuously reconciling the cluster’s actual state with its desired state. The controller manager oversees these controllers to ensure that workloads meet the specified requirements.
API Server: The Kubernetes API server is the entry point for interacting with the cluster. Users communicate with the cluster through this API, which coordinates actions across other components as needed.
etcd: Kubernetes uses etcd, a high-availability key-value store, as its database to store all cluster-related information. This ensures consistency of resource data across nodes.
Scheduler: The scheduler’s role is to assign workloads (pods) to nodes based on available resources, such as CPU and memory, to optimize application performance across the cluster.
kubelet: This component runs on each worker node in the data plane, responsible for starting and managing the lifecycle of application workloads. It also performs health checks and relays this information to the API server.
kube-proxy: kube-proxy manages networking within the cluster, ensuring that workloads on different nodes can communicate. It typically uses tools like iptables to maintain network configurations. Some networking plugins bypass kube-proxy, using alternative approaches like eBPF for networking at the kernel level.

Managed Kubernetes Clusters

In many cases, cloud providers offer managed Kubernetes clusters, abstracting away much of the complexity involved in setting up and maintaining the control plane components. With managed clusters, developers interact with the Kubernetes API to deploy applications without needing to handle cluster internals directly.

Kubernetes’ Modular Interfaces

Kubernetes employs several standardized interfaces to enable flexibility and modularity, allowing organizations to tailor specific aspects of the stack as needed:

CRI (Container Runtime Interface): This interface standardizes how containers are executed. Popular CRI implementations include containerd and CRIo. Docker was previously used but was phased out due to its lack of compatibility with CRI.
CNI (Container Network Interface): CNI enables Kubernetes to support a variety of networking solutions. Options include cloud-specific plugins, such as the Amazon VPC CNI, as well as popular third-party options like Calico, Flannel, and Cilium. Some advanced CNI implementations, like Cilium, use eBPF to optimize networking.
CSI (Container Storage Interface): CSI allows Kubernetes to manage storage by interfacing with persistent storage solutions. For example, using the Amazon EBS CSI driver enables seamless storage provisioning with AWS resources. CSI also allows for specialized uses, such as dynamically loading TLS certificates or managing secrets with the Secret Store CSI driver.

Exploring the Cloud Native Ecosystem

The Kubernetes ecosystem, managed by the Cloud Native Computing Foundation (CNCF), includes numerous projects across various stages of maturity, from established production-ready tools to emerging incubating projects. The CNCF landscape is vast, providing tools that can extend Kubernetes with capabilities such as observability, security, and storage solutions.

A Comprehensive Guide to Setting Up Kubernetes: Local and Cloud-Based Cluster Installation

Setting up Kubernetes clusters is a crucial step in your Kubernetes journey, whether for development or production. This guide covers setting up both local and cloud-based Kubernetes environments, providing you with flexibility to use the configuration that best suits your needs.

Tools and Dependencies

To start, we’ll install several tools and dependencies. These include:

Docker Desktop: Required for container management.
Devbox: A wrapper around Nix, which simplifies dependency installation and ensures version compatibility.

1. Setting Up Docker Desktop

To install Docker Desktop:

Visit the Docker documentation for specific installation instructions based on your operating system.
Follow the steps to install and verify Docker on your machine.

2. Installing Devbox

Devbox provides a simplified setup for dependencies by creating isolated environments. Here’s how to install Devbox:

Go to Devbox’s documentation and locate the quick start guide.
Follow instructions to install Devbox, which involves running a shell script that sets up your environment.
After installation, activate Devbox within the GitHub repository’s directory by running devbox shell.

3. Local Kubernetes Cluster Setup with KIND

Kubernetes in Docker (KIND) enables you to run a Kubernetes cluster locally. Here’s how:

Use the Devbox configuration to generate a configuration file for KIND with customized paths for persistent data.
Run task kind1_generate_config to generate the configuration.
Set up your cluster by running task kind02_create_cluster, which creates a KIND cluster with one control plane node and two worker nodes.
Verify the setup by running:
- kubectl get nodes: Lists all nodes in the cluster.
- kubectl get pods –all-namespaces: Displays system pods running within your cluster.

This local cluster setup is ideal for experimenting with Kubernetes concepts without needing cloud resources.

4. Setting Up Cloud-Based Clusters

Cloud-based clusters simulate a production environment and provide advanced networking and storage options. Below are examples using two cloud providers: Civo and Google Cloud Platform (GCP).

A. Civo Cloud Cluster

Civo offers a straightforward way to set up cloud-based Kubernetes clusters. Here’s how:

Authenticate: Use Civo’s CLI to authenticate with an API key from your account.
Create Network and Firewall: Set up an isolated network and firewall rules to allow traffic on specific ports.
Deploy the Cluster: Run the Civo command to create a cluster with two nodes. Civo sets up a basic configuration without an ingress controller, which you can later add as needed.

This setup provides a public DNS-compatible load balancer and persistent storage options.

B. Google Kubernetes Engine (GKE) Cluster on Google Cloud

Google Kubernetes Engine (GKE) is one of the most popular managed Kubernetes services, offering both standard and autopilot modes. Here’s a summary of the setup process for GKE’s standard mode:

Initialize the GCP CLI: Set up the Google Cloud CLI and enable relevant APIs for Kubernetes management.
Create VPC and Subnet: Define a virtual private cloud (VPC) and subnet for network isolation.
Create the Cluster: Use the gcloud command to create a Kubernetes cluster within your specified VPC and subnet. GKE’s standard mode allows you to manage worker nodes directly.
Authenticate with GKE: The cluster configuration is automatically added to your kubectl config, allowing seamless interaction.

With GKE, you get access to a full suite of Kubernetes features, including Google’s load balancing, persistent storage options, and monitoring tools.

Cleaning Up Cloud Resources

To avoid incurring unnecessary costs, make sure to delete any resources you no longer need. For example:

Use Civo’s and GCP’s respective CLI commands to delete clusters and network configurations.
Verify that all resources are removed, as cloud providers may charge for leftover resources even if clusters are deleted.

Additional Cluster Management Tips

Kubernetes Context Switching: Use kubectl config get-contexts to view and switch between clusters easily, especially if you have both local and cloud-based clusters configured.
Task Automation: Use tools like Task Runner (e.g., Go Task) for automating repetitive tasks, which is particularly helpful when working with Kubernetes configurations.

By setting up both local (KIND) and cloud-based clusters (Civo and GKE), you gain a flexible development environment suitable for any scenario. Whether you’re working in a local setup for testing or a cloud environment to simulate production, Kubernetes’ modular design and the support of cloud providers allow you to practice and deploy with ease.

Share This Post

About the Author:

Sherry Quach

Sherry is a Data Analyst at Knowi having previously worked at the California Emerging Infections Program analyzing public health infectious disease data. Sherry is skilled in data visualizations, SQL, data analysis, and business intelligence. Sherry holds a BS, Molecular and Cellular Biology from University of California, Berkeley and has contributed to research papers including Characteristics and Maternal and Birth Outcomes of Hospitalized Pregnant Women with Laboratory-Confirmed COVID-19 — COVID-NET, 13 States and COVID-19–Associated Hospitalizations Among Health Care Personnel — COVID-NET, 13 States.

All Posts

Dashboards & Visualizations

Embedded Analytics

Self-Serve Analytics

AI-powered Analytics

Best In Class BI Capabilities

Data-As-A-Service

Chat with your Documents

Kubernetes: An Overview and How to Get Started with Kubernetes

The History and Motivation Behind Kubernetes

The 2000s: Bare Metal and Monolithic Applications

The 2010s: The Rise of the Cloud and Configuration Management

Recent Years: Containers and the Orchestration Era

The Birth of Kubernetes: From Google’s Borg to Open Source Powerhouse

Kubernetes Technology Overview: Key Components and Architecture

Key Terms in Kubernetes Architecture

Kubernetes System Components

Managed Kubernetes Clusters

Kubernetes’ Modular Interfaces

Exploring the Cloud Native Ecosystem

A Comprehensive Guide to Setting Up Kubernetes: Local and Cloud-Based Cluster Installation

Tools and Dependencies

1. Setting Up Docker Desktop

2. Installing Devbox

3. Local Kubernetes Cluster Setup with KIND

4. Setting Up Cloud-Based Clusters

A. Civo Cloud Cluster

B. Google Kubernetes Engine (GKE) Cluster on Google Cloud

Cleaning Up Cloud Resources

Additional Cluster Management Tips

Share This Post

Sherry Quach

Turn Your Data Into Actions

RELATED POSTS

Joining Couchbase and SQL data and doing multi-datasource analytics – Tutorial

How to Join MongoDB Data with MySQL, Elasticsearch, REST APIs, and Amazon Redshift

Is MongoDB Good for Analytics?

The Hidden Cost of Disorganized BI Workspaces (And How to Fix It with Knowi)

Analyzing & Visualizing Couchbase Data – Tutorial

DBWrite: A Database Write-Back Functionality in Knowi

Platform

Solutions

Resources

About Us

Follow Us