08.源码级别Pod详解（四）： Pod readiness与Container Probe

前言

在前文我们05.源码级别Pod详解（二）：Pod生命周期说过，因为Pod通常不直接被部署，而是通过更高级别的Workload所进行调度和管控，于是K8S提供了一种检测Pod是否成功部署的机制，这就是Pod readiness。

Pod readiness

本质上Pod readiness是向PodStatus注入额外信号的手段，我们可以在PodSpec中设置readiness，通过检测Pod是否满足我们设置的条件来判断Pod是否已经就绪。

最终readiness的值是由PodStatus的condition来决定的，我们在05.源码级别Pod详解（二）：Pod生命周期讲述过PodCondition相关内容，如果condition的值为空或者为否，那么readiness也会是一个False的值。一个Pod被认为是Ready时，必须满足以下两个条件。

所有的容器都是ready状态
所有定义在readinessGates的值都是true

下面是PodReadinessGate的源码 [code] // PodReadinessGate contains the reference to a pod condition type PodReadinessGate struct { // ConditionType refers to a condition in the pod’s condition list with matching type. ConditionType PodConditionType }

// PodConditionType defines the condition of pod
type PodConditionType string

// These are valid conditions of pod.
const (
	// PodScheduled represents status of the scheduling process for this pod.
	PodScheduled PodConditionType = "PodScheduled"
	// PodReady means the pod is able to service requests and should be added to the
	// load balancing pools of all matching services.
	PodReady PodConditionType = "Ready"
	// PodInitialized means that all init containers in the pod have started successfully.
	PodInitialized PodConditionType = "Initialized"
	// PodReasonUnschedulable reason in PodScheduled PodCondition means that the scheduler
	// can't schedule the pod right now, for example due to insufficient resources in the cluster.
	PodReasonUnschedulable = "Unschedulable"
	// ContainersReady indicates whether all containers in the pod are ready.
	ContainersReady PodConditionType = "ContainersReady"
)

[/code]

如果Pod类型为Ready的Condition的值为True，那么就表示Pod就绪。也可以通过探针的方式来检测Pod是否就绪，用户可以定义Readiness Probe，通过设置探测条件来确定Pod是否已经就绪。

就绪的定义是：Pod已经成功运行，可以接受外来流量。

k8s不允许手动操作任何K8S对象 的status字段，所有对象的status这个字段都是只读的。

Pod Network Readiness (v1.25)【alpha】

k8s在v1.25版本中，引入了一个新的特性，即Pod Network Readiness。这个功能仍然处于早期阶段（alhpa），所以我们仅做一些简单的了解。

当Pod被调度到node之后，它会被上报到kubelet和控制平面。当这些步骤都完成之后，kubelet会通过CRI来创建Pod运行时的沙箱环境（沙箱指的是受限制的环境，因为Pod运行的程序不一定是安全的，所以沙箱可以隔离风险，程序中的恶意代码影响会被限制在沙箱内而不会影响到系统的其他部分。）

当kubelet检测到Pod没有配置网络设置运行时沙箱，那么PodReadyToStartContainers就会被设置为false，这会在以下情况中出现

在pod生命周期的早期（刚建立），kubelet还没来得及设置pod的沙箱环境
在pod生命周期之后的阶段，Pod的沙箱环境因为某些原因被摧毁。比如因为Node重启导致而pod没有被驱逐；或者底层使用的是虚拟机，当虚拟机重启之后，需要创建新的沙箱和新的网络配置。

当kubelet成功将沙箱环境创建好，并将网络配置成功之后，PodReadyToStartContainers字段就会被设置为True。在当这个值为True之后，kubelet可以开始拉取镜像，并且创建容器。当一个Pod有init containers字段时，那么kubelet会在这些容器都成功启动之后才把Initialized字段设为True；如果没有这个字段，那么在沙箱环境设置成功之前就会将Initialized设置为True。

Pod scheduling readiness (v1.27) [beta]

Pod调度就绪，这个特性在1.27版本才引入，想要了解的读者可以查看原文链接，这里不多做赘述。

Container Probe

探针（probe）本质上是kubelet对于容器的周期性执行的一个工具，它有两种实现的形式，要么是进行一些网络上的连接测试，要么是执行容器内部的某些命令。probe通常的作用是检测容器内的服务是否处于健康状态，或者是否已经成功启动，可以接受流量。

探针的输出有三种，分别是success、failure和unknown。

三种探针

同时，根据不同的目的和探测时间，我们又把Probe分成了下列三种

livenessProbe：检测容器是否存活，如果检测失败，那么kubelet会将容器杀死，然后根据容器策略进行重启。当容器不能够在遇到问题，或者服务不健康的时候自动崩溃，那么就需要使用这个探针，保证服务一旦出问题就立马重启容器。

readinessProbe：检测容器是否就绪，从而确定容器是否具有接受流量的能力。容器检测失败，那么endpoint controller会将这个Pod IP从Service中删掉，流量便不会打到这个Pod上。如果想要在确定无误的时候才接入流量，而不是Pod一启动就接入流量，那么就需要使用这个探针。readinessProbe和livenessProbe可能同时存在，但是只有在readinessProbe探测成功之后才会接入流量。

startupProbe：和livenessProbe有点类似，它检测容器内的服务是否成功启动，如果检测失败，就会将容器杀死，然后根据容器策略进行重启。对于需要较长时间才能启动的服务，通常可以设置这种探针以便于容器成功启动。

四类实现

我们通过探针具体实现检测的方式，将它们分成下列四种

exec：在容器内部执行命令，通过是否成功执行来检测是否处于预期状态。

grpc：在v1.24版本引入，执行一次远程过程调用，当结果是SERVING状态的时候则成功。

httpGet：发送请求http请求，具体地址为PodIp+port(设置)/page(设置)，当返回码为200到400之间成功。

tcpSocket：对Pod的IP进行一次TCP检查，如果端口正在监听中，则认为成功。

因为本质上只有执行命令和网络请求两种模式，上述我们讲述的也是Pod Readiness，下面来进行Pod Readiness类型exec和httpGet的实战演练，tcpSocket和grpc使用较少，感兴趣可以自行尝试，其实也只是进行简单的修改就可以了。

httpGet Probe

下面是一个使用了httpGet的Pod，非常简单，注意probe是container级别的，所以要在containers字段中。 [code] apiVersion: v1 kind: Pod metadata: name: nginx-pod-demo spec: containers: - name: nginx-container image: nginx:latest ports: - containerPort: 80 readinessProbe: httpGet: path: /index.html port: 80 // 初始化5秒之后进行检测 initialDelaySeconds: 5 // 每五秒检测一次 periodSeconds: 5 [/code]

我们通过kubectl get pods podname -o jsonpath='.spec.containers[*].name'这个命令确定container。

目前这个Pod只有一个container，我们很轻易的就知道这个containername是什么，当Pod有多个container的时候，可以通过上述命令轻松的将一个pod的所有container都拉出来。

然后通过下面这个命令

kubectl logs nginx-pod-demo -c nginx-container查看日志

能看出来访问http://10.244.0.1:80/index.html 这个URL的结果是200成功，处于200-400之间，Probe检测结果为True，所以Pod的状态变为了Ready。

上面URL的IP地址就是Pod的Ip地址，同时也是容器的Ip地址。

exec Probe

[code] apiVersion: v1 kind: Pod metadata: name: nginx-pod-demo2 spec: containers: - name: nginx-container image: nginx:latest ports: - containerPort: 80 readinessProbe: exec: command: - mkdir - -p - /test/hello

[/code]

我们首先创建对应的Pod，再查看一下是否进入Ready状态。

kubectl exec -it nginx-pod-demo -c nginx-container -- /bin/sh

进入容器内部，发现确实创建了这个目录，验证成功。

容器初始化

EntryPoint和CMD都是用来指定容器初始化将要执行的命令的，但是他们有不同的规则

EntryPoint会把docker run命令的参数视作EntryPoint的参数，并且在容器启动的时候，作为主线程执行，通常EntryPoint定义着容器的默认行为。 [code] dockerfile ENTRYPOINT [“echo”, “hello”]

如果我们docker run追加参数 像下面这样
docker run mycontainer World 
这样就会追加参数到entrypoint中，打印出hello World

[/code]

CMD可以用来给EntryPoint指定参数，也可以单独使用，CMD以在运行容器时被覆盖掉 [code] dockerfile CMD [“echo”, “Hello World”]

如果我们docker run追加命令 像下面这样
docker run mycontainer echo Fucking World 
通过这样的方式就可以覆盖掉cmd的参数

[/code]

Probe源码解读

[code] // Container represents a single container that is expected to be run on the host. type Container struct { // Required: This must be a DNS_LABEL. Each container in a pod must // have a unique name. Name string // Required. Image string // Optional: The docker image’s entrypoint is used if this is not provided; cannot be updated. // Variable references $(VAR_NAME) are expanded using the container’s environment. If a variable // cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax // can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, // regardless of whether the variable exists or not. // +optional Command []string // Optional: The docker image’s cmd is used if this is not provided; cannot be updated. // Variable references $(VAR_NAME) are expanded using the container’s environment. If a variable // cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax // can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, // regardless of whether the variable exists or not. // +optional Args []string // Optional: Defaults to Docker’s default. // +optional WorkingDir string // +optional Ports []ContainerPort // List of sources to populate environment variables in the container. // The keys defined within a source must be a C_IDENTIFIER. All invalid keys // will be reported as an event when the container is starting. When a key exists in multiple // sources, the value associated with the last source will take precedence. // Values defined by an Env with a duplicate key will take precedence. // Cannot be updated. // +optional EnvFrom []EnvFromSource // +optional Env []EnvVar // Compute resource requirements. // +optional Resources ResourceRequirements // +optional VolumeMounts []VolumeMount // volumeDevices is the list of block devices to be used by the container. // +optional VolumeDevices []VolumeDevice // +optional LivenessProbe *Probe // +optional ReadinessProbe *Probe // +optional StartupProbe *Probe // +optional Lifecycle *Lifecycle // Required. // +optional TerminationMessagePath string // +optional TerminationMessagePolicy TerminationMessagePolicy // Required: Policy for pulling images for this container ImagePullPolicy PullPolicy // Optional: SecurityContext defines the security options the container should be run with. // If set, the fields of SecurityContext override the equivalent fields of PodSecurityContext. // +optional SecurityContext *SecurityContext

	// Variables for interactive containers, these have very specialized use-cases (e.g. debugging)
	// and shouldn't be used for general purpose containers.
	// +optional
	Stdin bool
	// +optional
	StdinOnce bool
	// +optional
	TTY bool
}

[/code]

上述代码是K8S中对于Container的定义，能看出来每个Container都能定义我们上述提到的三种探针，同时还有一个lifecycle属性，这个也和容器的启动和终止有关，前文汇总我们已经详细介绍过了，lifecycle详解。

我们来看看对于Probe的定义 [code] type Probe struct { // The action taken to determine the health of a container Handler // Length of time before health checking is activated. In seconds. // +optional InitialDelaySeconds int32 // Length of time before health checking times out. In seconds. // +optional TimeoutSeconds int32 // How often (in seconds) to perform the probe. // +optional PeriodSeconds int32 // Minimum consecutive successes for the probe to be considered successful after having failed. // Must be 1 for liveness and startup. // +optional SuccessThreshold int32 // Minimum consecutive failures for the probe to be considered failed after having succeeded. // +optional FailureThreshold int32 } [/code]

Handler：定义了检查健康状态的方法，有三个可能的类型httpGet、tcpSocket、exec，在v1.24之后新增了grpc方式，因为这里的代码是v1.20版本的，所以只有上述三个类型
InitialDelaySeconds：可选，在容器启动后多长时间开始执行探针，单位为秒。
TimeoutSeconds：可选。探测的超时时间，超时后认为失败。
PeriodSeconds：可选，执行的频率，即多久执行一次探测。
SuccessThreshold：可选，连续成功的最小次数，用于确认探针是否被认为成功。如果是liveness和startup类型，这个值必须是1，readiness则可自定义，这也比较符合逻辑。检测一次成功说明是存活的，无需多次检测；如果想要接入流量，可以选择多次确认，多次都成功再接入流量比较稳妥。
FailureThreshold：可选，连续失败的最小次数，用于确认探针是否被认为失败，目的同样也是提高容错率。

接下来我们看看具体的Handler里面有什么 [code] type Handler struct { // One and only one of the following should be specified. // Exec specifies the action to take. // +optional Exec *ExecAction // HTTPGet specifies the http request to perform. // +optional HTTPGet *HTTPGetAction // TCPSocket specifies an action involving a TCP port. // TODO: implement a realistic TCP lifecycle hook // +optional TCPSocket *TCPSocketAction }

type ExecAction struct {
	// Command is the command line to execute inside the container, the working directory for the
	// command  is root ('/') in the container's filesystem.  The command is simply exec'd, it is
	// not run inside a shell, so traditional shell instructions ('|', etc) won't work.  To use
	// a shell, you need to explicitly call out to that shell.
	// +optional
	Command []string
}


type HTTPGetAction struct {
	// Optional: Path to access on the HTTP server.
	// +optional
	Path string
	// Required: Name or number of the port to access on the container.
	// +optional
	Port intstr.IntOrString
	// Optional: Host name to connect to, defaults to the pod IP. You
	// probably want to set "Host" in httpHeaders instead.
	// +optional
	Host string
	// Optional: Scheme to use for connecting to the host, defaults to HTTP.
	// +optional
	Scheme URIScheme
	// Optional: Custom headers to set in the request. HTTP allows repeated headers.
	// +optional
	HTTPHeaders []HTTPHeader
}

type TCPSocketAction struct {
	// Required: Port to connect to.
	// +optional
	Port intstr.IntOrString
	// Optional: Host name to connect to, defaults to the pod IP.
	// +optional
	Host string
}

[/code]

下面这些 Probe 类型是通过 Handler 结构进行封装的。Handler 是一个通用的结构，用于包装三种不同的探测方式。

下面是每个类型的解释：

ExecAction：

* `ExecAction` 定义了一种执行命令的探测方式。这可以是容器内部的命令，通过 `Command` 字段指定。这个命令会被直接在容器中执行，而不是通过 `shell` 进行。

HTTPGetAction：

* `HTTPGetAction` 定义了一种通过 `HTTP GET` 请求执行探测的方式。可以指定要访问的路径 (`Path`)、端口 (`Port`)、主机名 (`Host`) 以及使用的协议 (`Scheme`)。还可以添加自定义的 HTTP 头信息 (`HTTPHeaders`)。

TCPSocketAction：

* `TCPSocketAction` 定义了一种通过 `TCP Socket `连接执行探测的方式。需要指定要连接的端口 (`Port`) 和主机名 (`Host`)。

结语

这篇博客主要介绍了Pod Readiness和Container Probe相关概念，带领大家由浅入深了解Probe的实现和具体使用以及目的。

《每天十分钟，轻松入门K8S》的第八篇08.源码级别Pod详解（四）： Pod readiness与Container Probe 到这里就结束了，之后的几讲都会和Pod相关，深入源码级别探索K8S核心概念Pod相关内容，感兴趣的朋友欢迎点赞、评论、收藏、订阅，您的支持就是我最大的动力。

https://juejin.cn/post/7295565904406511657

https://juejin.cn/post/7292323577210404915