kubernetes 1.8之前如果想要收集k8s集群中关于容器,pod以及kubernetes event相关的监控数据,最好的方案是使用官方提供的heapster,但是1.8之后,该版本被社区给废弃了,取而代替是的社区新孵化的项目metrics-server。本篇文章记录下metrics-server的基本功能,及社区对该服务的定位。

metrics-sever介绍

简单明了,直接使用官方的解释:

Metrics Server is a cluster-wide aggregator of resource usage data. Starting from Kubernetes 1.8 it’s deployed by default in clusters created by kube-up.sh script as a Deployment object. If you use a different Kubernetes setup mechanism you can deploy it using the provided deployment yamls. It’s supported in Kubernetes 1.7+ (see details below).

Metric server collects metrics from the Summary API, exposed by Kubelet on each node.

Metrics Server registered in the main API server through Kubernetes aggregator, which was introduced in Kubernetes 1.7.

metrics-server实现

Metrics Server代码的实现也是深度复制了kube-apiserverheapster的代码,借鉴了kube-apiserver服务的启动流程框架,并借鉴了heapster内部获取各个Node节点的CPUMem的部分代码。具体的实现流程如下:

通过Manager数据结构来管理metrics-server服务:

1
2
3
4
5
6
7
8
9
type Manager struct {
source sources.MetricSource
sink sink.MetricSink
resolution time.Duration

healthMu sync.RWMutex
lastTickStart time.Time
lastOk bool
}

其中Manager结构中的resource主要是收集k8s集群中各个node节点的node及pod的cpu和mem监控数据。主要是调用k8s的这个接口:http://localhost:8001/api/v1/proxy/nodes/<node-name>:10255/stats/summary
并用下面的数据结构对数据进行组织:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// MetricsBatch is a single batch of pod, container, and node metrics from some source.
type MetricsBatch struct {
Nodes []NodeMetricsPoint
Pods []PodMetricsPoint
}

// NodeMetricsPoint contains the metrics for some node at some point in time.
type NodeMetricsPoint struct {
Name string
MetricsPoint
}

// PodMetricsPoint contains the metrics for some pod's containers.
type PodMetricsPoint struct {
Name string
Namespace string

Containers []ContainerMetricsPoint
}

// ContainerMetricsPoint contains the metrics for some container at some point in time.
type ContainerMetricsPoint struct {
Name string
MetricsPoint
}

// MetricsPoint represents the a set of specific metrics at some point in time.
type MetricsPoint struct {
Timestamp time.Time
// CpuUsage is the CPU usage rate, in cores
CpuUsage resource.Quantity
// MemoryUsage is the working set size, in bytes.
MemoryUsage resource.Quantity
}

而Manager结构中的sink主要负责将source收集过来的数据存储在内存中,最终业务访问下面的接口即可获取相关的数据:


metrics-server

metrics-server部署

直接按照官方的部署方式进行部署:

1
2
3
4
5
# Kubernetes 1.7
$ kubectl create -f deploy/1.7/

# Kubernetes > 1.8
$ kubectl create -f deploy/1.8+/

但是部署完成之后,可能会遇到下面的错误:

1
Error from server (Forbidden): nodes.metrics.k8s.io is forbidden: User "system:anonymous" cannot list nodes.metrics.k8s.io at the cluster scope.

说明system:anonymous用户没有权限,给该用户授权:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: view-metrics
rules:
- apiGroups:
- metrics.k8s.io
resources:
- pods
- nodes
verbs:
- get
- list
- watch

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: view-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: view-metrics
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: system:anonymous

最终通过kubectl top检查服务是否能正常工作:


metrics-server

补充

1.基于metrics-server提供的core system metrics以及prometheus提供的custome metrics可以方便的实现HPA(Horizontal Pod Autoscaler)。


metrics-server

2.heapster除了提供基础的cpu,mem等监控指标,还支持收集k8s event的功能,当前metrics-server建议使用eventrouter但是支持的sink只有s3,kafka,stdout等,我这边对增加了对influxdb sink的支持。