结果
k8s部署一套skywalking,多namespace共享,sidecar方式接入.轻松实现全链路监控
镜像下载到内网
docker pull apache/skywalking-oap-server:9.5.0
#打标上传
docker tag bb81e785d6b7 registry.cn-hangzhou.aliyuncs.com/earic/skywalking-oap-server:9.5.0
docker push registry.cn-hangzhou.aliyuncs.com/earic/skywalking-oap-server:9.5.0
自定义路径部署
Skywalking在9.0+版本后重做了前端UI,叫做“Booster UI”,以前的“Rocketbot UI”被弃用
可以直接下载已发布的源码,如:9.5.0
改代码
-
vite.config.ts
vite.config.ts 下面增加: base: "./", // 类似publicPath,'./'避免打包访问后空白页面,要加上,不然线上也访问不了
-
src/router/index.ts
src/router/index.ts 两处 createWebHistory 改为 createWebHashHistory
打包产出静态文件
cypress安装
解决Unzipping Cypress 0% 0s 。依赖nodejs版本,当前下载的是
v18.17.0
https://nodejs.org/download/release/latest-v18.x/
执行命令
npm i
npm run build-only
打镜像
mkdir -p /mnt/d/publish/skw && cd /mnt/d/publish/skw
# 将上面dist文件移动到当前目录
cp -r /mnt/d/tmp/skywalking-booster-ui-9.5.0/dist/ .
cat <<EOF > passwd
cyk:\$apr1\$anOYsKSJ\$P2RT/hf0OHzuEyWciCsdZ1
EOF
cat <<EOF > web.conf
server {
listen 80;
server_name _;
error_log /usr/local/openresty/nginx/logs/skw_error.log crit;
access_log /usr/local/openresty/nginx/logs/skw_access.log;
#新增下面两行
auth_basic "Please input password"; #这里是验证时的提示信息
auth_basic_user_file /usr/local/src/nginx/passwd;
index index.html index.htm;
location / {
alias /usr/local/openresty/nginx/html/skw/;
proxy_set_header X-Real-IP \$remote_addr;
proxy_set_header X-Forwarded-For \$proxy_add_x_forwarded_for;
index index.html index.htm;
try_files \$uri \$uri/ /index.html;
}
# 禁止访问目录但允许访问文件
location /css/{
root /usr/local/openresty/nginx/html/;
autoindex off;
proxy_store on;
}
location /img/{
root /usr/local/openresty/nginx/html/;
autoindex off;
proxy_store on;
}
location /js/{
root /usr/local/openresty/nginx/html/;
autoindex off;
proxy_store on;
}
error_page 500 502 503 504 /50x.html;
}
EOF
dos2unix web.conf
cat <<EOF > Dockerfile
FROM registry.cn-hangzhou.aliyuncs.com/earic/openresty:1.21.4.1-alpine
MAINTAINER wwj
COPY dist/ /usr/local/openresty/nginx/html/skw
COPY passwd/ /usr/local/src/nginx/passwd
#COPY web.conf /usr/local/openresty/nginx/conf/nginx.conf
COPY web.conf /etc/nginx/conf.d/default.conf
EXPOSE 80
EOF
## 解决mediaType in manifest should be ‘application/vnd.docker.distribution.man
export BUILDAH_FORMAT=docker
podman build --no-cache -t registry.cn-hangzhou.aliyuncs.com/earic/skw:$(date +"%Y-%m-%d_%H-%M-%S") .
podman push registry.cn-hangzhou.aliyuncs.com/earic/skw:2023-08-02_10-18-55
rancher部署
配置映射
- skw-alarm
apiVersion: v1
kind: ConfigMap
metadata:
name: skw-alarm
annotations:
{}
# key: string
labels:
{}
# key: string
namespace: cyk-uat
data:
alarm-settings.yml: |-
rules:
# Rule unique name, must be ended with `_rule`.
service_resp_time_rule:
metrics-name: service_resp_time
op: ">"
threshold: 1000
period: 10
count: 3
silence-period: 5
message: Response time of service {name} is more than 1000ms in 3 minutes of last 10 minutes.
service_sla_rule:
# Metrics value need to be long, double or int
metrics-name: service_sla
op: "<"
threshold: 8000
# The length of time to evaluate the metrics
period: 10
# How many times after the metrics match the condition, will trigger alarm
count: 2
# How many times of checks, the alarm keeps silence after alarm triggered, default as same as period.
silence-period: 3
message: Successful rate of service {name} is lower than 80% in 2 minutes of last 10 minutes
service_resp_time_percentile_rule:
# Metrics value need to be long, double or int
metrics-name: service_percentile
op: ">"
threshold: 1000,1000,1000,1000,1000
period: 10
count: 3
silence-period: 5
message: Percentile response time of service {name} alarm in 3 minutes of last 10 minutes, due to more than one condition of p50 > 1000, p75 > 1000, p90 > 1000, p95 > 1000, p99 > 1000
service_instance_resp_time_rule:
metrics-name: service_instance_resp_time
op: ">"
threshold: 1000
period: 10
count: 2
silence-period: 5
message: Response time of service instance {name} is more than 1000ms in 2 minutes of last 10 minutes
database_access_resp_time_rule:
metrics-name: database_access_resp_time
threshold: 1000
op: ">"
period: 10
count: 2
message: Response time of database access {name} is more than 1000ms in 2 minutes of last 10 minutes
endpoint_relation_resp_time_rule:
metrics-name: endpoint_relation_resp_time
threshold: 1000
op: ">"
period: 10
count: 2
message: Response time of endpoint relation {name} is more than 1000ms in 2 minutes of last 10 minutes
dingtalkHooks:
textTemplate: |-
{
"msgtype": "text",
"text": {
"content": "Apache SkyWalking Alarm: \n %s."
}
}
webhooks:
- url: https://oapi.dingtalk.com/robot/send?access_token=5258e3472e467162a944d92d306423c5a78ff10b0c03b0b092465fdde47ad119
__clone: true
-
skw-config
apiVersion: v1 kind: ConfigMap metadata: name: skw-config annotations: {} labels: {} namespace: cyk-uat data: SW_STORAGE: elasticsearch SW_NAMESPACE: skywalking-index SW_STORAGE_ES_CLUSTER_NODES: 10.128.159.50:9200 SW_ES_USER: elastic SW_STORAGE_DAY_STEP: '1' SW_STORAGE_ES_INDEX_REPLICAS_NUMBER: '0' SW_ES_PASSWORD: 5NXWhVgg3ulraED1TnXu SW_HEALTH_CHECKER: default SW_TELEMETRY: none SW_TELEMETRY_PROMETHEUS_HOST: 0.0.0.0 SW_TELEMETRY_PROMETHEUS_PORT: '1234' SW_PROMETHEUS_FETCHER_ACTIVE: 'true' TZ: Asia/Shanghai JAVA_OPTS: '-Xms1g -Xmx1g -Duser.timezone' __clone: true
升级后elasticsearch7要改elasticsearch,否则启动报错
CRT证书转JKS证书
如果没用https的es,跳过
#crt转为p12证书
openssl pkcs12 -export -in ca.crt -inkey ca.key -out keystore.p12 -name "alias"
#p12 to jks
keytool -importkeystore -srckeystore keystore.p12 -destkeystore keystore.jks -deststoretype pkcs12
#jks to p12
keytool -importkeystore -srckeystore keystore.jks -srcstoretype JKS -deststoretype PKCS12 -destkeystore keystore.p12
如果提示keytool不存在
sudo yum install java-1.8.0-openjdk-devel
pem证书转pkcs12
keytool -import -v -trustcacerts -file ca.pem -keystore es_keystore.jks -keypass changeit -storepass changeit
https指定证书用
SW_SW_STORAGE_ES_SSL_JKS_PATH=/nfs/keystore.jks
实际没用Https的es,单独搭建了http的es