| 巡检项ID | 巡检项描述 | 巡检项分组 | 分类 | 业务树路径 |
|---|---|---|---|---|
| check_ntp_hwclock | 检查hwclock硬件时钟偏差是否超过10秒 | 系统参数检查 | 日常运维 | TCENTER/Zookeeper/zookeeper TCENTER/Zookeeper/instance TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/TDSQL/Instance TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/interface TCENTER/CRedis/cache |
| drms_tce-tcenter-hdfs_check_enable_topology_4f970b7a | 检查机架感知是否开启 | 容灾检查 | 容灾检查 | TCENTER/HDFS/namenode |
| drms_tce-tcenter-k8s_check_apiserver_39c40585 | 检查 apiserver 的 rdata 配置 | 容灾检查(单机) | 容灾检查 | TCENTER/K8S/Master |
| drms_tce-tcenter-k8s_check_cluster_global_dnsnameservers_178906da | 检查 cluster global 配置中的 dnsNameservers | 容灾检查(单机) | 容灾检查 | TCENTER/K8S/Master |
| drms_tce-tcenter-k8s_check_coredns_c728ad1d | 检查 coredns 的健康检查配置 | 容灾检查(单机) | 容灾检查 | TCENTER/K8S/Master |
| drms_tce-tcenter-k8s_check_kube_dns_d5dbbfbf | 检查kube DNS | 容灾检查(单机) | 容灾检查 | TCENTER/K8S/Master |
| drms_tce-tcenter-k8s_check_registry_1bd407c1 | 检查 tcs-internal-registry | 容灾检查(单机) | 容灾检查 | TCENTER/K8S/Master |
| system_cpu_usage | 检查 CPU 使用率高于90% | 系统参数检查 | 日常运维 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/interface TCENTER/CRedis/cache |
| system_disk_mount | 检查磁盘正确挂载 | 系统参数检查 | 日常运维 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/interface TCENTER/CRedis/cache |
| system_io_usage | 检查 IO 使用率低于 90% | 系统参数检查 | 日常运维 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/interface TCENTER/CRedis/cache |
| system_load_average | 检查最近1分钟/5分钟/15分钟系统平均负载是否小于CPU核数 | 系统参数检查 | 日常运维 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/interface TCENTER/CRedis/cache |
| system_mem_usage | 检查内存使用率小于 80% | 系统参数检查 | 日常运维,深度巡检 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/interface TCENTER/CRedis/cache |
| system_partition_usage | 检查所有挂载分区使用率小于 80% | 系统参数检查 | 日常运维 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/interface TCENTER/CRedis/cache |
| system_sync_data | 检查主机与巡检服务器时间差小于 1 秒 | 系统参数检查 | 日常运维,深度巡检 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/interface TCENTER/CRedis/cache |
| tx_barad_es_ping | 检查Elasticsearch节点ping延迟小于 500 ms | 依赖服务检查 | 日常运维 | TCENTER/Elasticsearch/etcd |
| tx_barad_zk_config_state | 检查Zookeeper集群节点配置 | 依赖服务检查 | 日常运维 | TCENTER/Zookeeper/instance |
| tx_barad_zk_node_count | 检查Zookeeper集群kafka目录数为3 | 依赖服务检查 | 日常运维 | TCENTER/Zookeeper/instance |
| tx_barad_zk_node_exist | 检查Zookeeper集群/目录下存在有storm110,kafka,kfkSpout目录 | 依赖服务检查 | 日常运维 | TCENTER/Zookeeper/instance |
| tx_barad_zk_partition_usage | 检查Zookeeper集群磁盘 /和/data 使用率是否小于阈值 | 依赖服务检查 | 日常运维 | TCENTER/Zookeeper/instance |
| tx_barad_zk_snapshot_state | 检查Zookeeper集群快照大小 | 依赖服务检查 | 日常运维 | TCENTER/Zookeeper/instance |
| tx_barad_zk_state | 检查Zookeeper集群节点状态 | 依赖服务检查 | 日常运维 | TCENTER/Zookeeper/instance |
| tx_check_mysql_status_cam_auth | TCenter cam auth组件 数据库检查 | 脏数据检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_check_mysql_status_cam_grant | TCenter cam grant组件 数据库检查 | 脏数据检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_check_mysql_status_cam_list | TCenter cam list组件 数据库检查 | 脏数据检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_check_mysql_status_cam_sts | TCenter cam sts组件 数据库检查 | 脏数据检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_check_mysql_status_platform_account | TCenter platform account组件 数据库检查 | 脏数据检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_get_svc-ignore-sync-ip | 检查svc中是否配置了infra.tce.io和ignore-sync-ip | 安全类检查 | 日常运维 | TCENTER/K8S/Master |
| tx_kafka_check_replicationfactor | 检查物理机kafka副本数不为1 | 服务状态检查 | 日常运维 | TCENTER/Kafka/kafka |
| tx_pass_tmp_safe | 用户root和tunnel_user密码已入库且密码风险检查 | 安全类检查 | 日常运维 | TCENTER/K8S/Master |
| tx_Redis®_base_cc_sql_interface_balance_status | 检查newcc库中interfce分布不均 | 脏数据检查 | 日常运维 | TCENTER/CRedis/oss |
| tx_Redis®_base_cc_sql_status_interface_procs_capacity | 检查newcc库中interface使用量高于80% | 脏数据检查 | 日常运维 | TCENTER/CRedis/oss |
| tx_Redis®_base_cc_sql_status_Redis®_procs_capacity | 检查newcc库中Redis®进程使用量高于80% | 脏数据检查 | 日常运维 | TCENTER/CRedis/oss |
| tx_Redis®_base_cc_sql_status_task_state | 检查newcc库中事件任务正常 | 脏数据检查 | 日常运维 | TCENTER/CRedis/oss |
| tx_Redis®_base_interface_status_cli_num | [采集]interface的连接数高于5000 | 容量检查 | 日常运维 | TCENTER/CRedis/interface |
| tx_Redis®_base_interface_status_latency | [采集]interface的时延小于200ms | 机房网络检查 | 日常运维 | TCENTER/CRedis/interface |
| tx_Redis®_base_interface_status_maxfd | [报警]interface fd最大limit配置大于10000 | 系统参数检查 | 日常运维 | TCENTER/CRedis/interface |
| tx_Redis®_base_interface_status_mem | [采集]interface的内存大于5G | 容量检查 | 日常运维 | TCENTER/CRedis/interface |
| tx_Redis®_base_interface_status_rafficlimit | [报警]interface出现限流limit日志 | 容量检查 | 日常运维 | TCENTER/CRedis/interface |
| tx_Redis®_base_interface_status_slowlog | [报警]interface 1小时内高于50条慢查询 | 容量检查 | 日常运维 | TCENTER/CRedis/interface |
| tx_Redis®_base_interface_status_stats_avg_latency_6 | [报警]interface 单进程请求耗时超过500ms | 容量检查 | 日常运维 | TCENTER/CRedis/interface |
| tx_Redis®_base_interface_status_stats_clirates | [报警]interface 单进程连接数利用率超过80% | 容量检查 | 日常运维 | TCENTER/CRedis/interface |
| tx_Redis®_base_interface_status_stats_input_limit_rate | [报警]interface 单进程输入带宽利用率超过80% | 容量检查 | 日常运维 | TCENTER/CRedis/interface |
| tx_Redis®_base_interface_status_stats_output_limit_rate | [报警]interface 单进程输出带宽利用率超过80% | 容量检查 | 日常运维 | TCENTER/CRedis/interface |
| tx_Redis®_base_interface_status_stats_qps | [报警]interface 单进程QPS大于50000 | 容量检查 | 日常运维 | TCENTER/CRedis/interface |
| tx_Redis®_base_server_status_check_client_biggest_input_buf_status | [采集]Redis®输入缓冲区最大buffer大于100Mb | 容量检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_Redis®_base_server_status_check_client_longest_output_list_status | [采集]Redis®输出缓冲区对象个数大于1000 | 容量检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_Redis®_base_server_status_check_connected_clients_ratio_status | [报警]Redis®的clients链接数高于上限50% | 容量检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_Redis®_base_server_status_check_instantaneous_input_kbps_status | [采集]Redis®的输入bps高于500Mb | 容量检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_Redis®_base_server_status_check_instantaneous_ops_per_sec_status | [采集]Redis®的请求qps高于6万 | 容量检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_Redis®_base_server_status_check_instantaneous_output_kbps_status | [采集]Redis®的输出bps高于500Mb | 容量检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_Redis®_base_server_status_check_latest_fork_usec_status | [采集]Redis®的fork时延高于1s | 容量检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_Redis®_base_server_status_check_logdiskbusy_status | [报警]Redis®的落盘出现busy阻塞 | 容量检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_Redis®_base_server_status_check_master_aof_enable | [报警]Redis®的master实例开启AOF,有性能影响 | 服务配置检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_Redis®_base_server_status_check_Redis®_limit_maxfd | [报警]Redis®的limit配置fd限制大于10000 | 容量检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_Redis®_base_server_status_check_repl_offset_status | [报警]Redis®的主从偏移量大于10Mb | 服务配置检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_Redis®_base_server_status_check_slowlog_status | [采集]Redis®的慢查询高于20条 | 容量检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_Redis®_base_server_status_check_used_memory_status | [采集]Redis®的内存使用量高于30G | 容量检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_Redis®_cc_agent_check_cc_agent_timer | Redis® cc 依赖agent进程存活检查-cc_agent_timer | 进程检查 | 日常运维 | TCENTER/Product/product-tcenter-support-cRedis®/oss_nodes |
| tx_Redis®_cc_agent_check_cc_monitor_py | Redis® cc 依赖agent进程存活检查-cc_monitor_py | 进程检查 | 日常运维 | TCENTER/Product/product-tcenter-support-cRedis®/oss_nodes |
| tx_Redis®_cc_agent_check_defunct_proc | Redis® cc cc_monitor_py假死 | 进程检查 | 日常运维 | TCENTER/Product/product-tcenter-support-cRedis®/oss_nodes |
| tx_Redis®_cc_agent_check_influxdb | Redis® cc 依赖agent进程存活检查-influxd | 进程检查 | 日常运维 | TCENTER/Product/product-tcenter-support-cRedis®/oss_nodes |
| tx_Redis®_cc_agent_check_mul_server | Redis® cc 依赖agent进程存活检查-mul_server | 进程检查 | 日常运维 | TCENTER/Product/product-tcenter-support-cRedis®/oss_nodes |
| tx_Redis®_cc_agent_check_qds_center | Redis® cc 依赖agent进程存活检查-qds_center | 进程检查 | 日常运维 | TCENTER/Product/product-tcenter-support-cRedis®/oss_nodes |
| tx_Redis®_cc_agent_check_web | Redis® cc 依赖agent进程存活检查-web | 进程检查 | 日常运维 | TCENTER/Product/product-tcenter-support-cRedis®/oss_nodes |
| tx_Redis®_cc_agent_version_statics_cc_monitor_py | [采集]获取cc管控cc_monitor_py版本 | 进程检查 | 日常运维 | TCENTER/Product/product-tcenter-support-cRedis®/oss_nodes |
| tx_Redis®_cc_sql_status_app_nums | 检查newcc库中实例存在脏实例数据 | 数据库检查 | 日常运维 | TCENTER/CRedis/oss |
| tx_Redis®_cc_sql_status_app_state | 检查newcc库中实例状态正常 | 数据库检查 | 日常运维 | TCENTER/CRedis/oss |
| tx_Redis®_cc_sql_status_interface_machine_status | 检查newcc库中interface机器状态出现非1 | 数据库检查 | 日常运维 | TCENTER/CRedis/oss |
| tx_Redis®_cc_sql_status_interface_proc_status | 检查newcc库中interface进程状态出现非1 | 数据库检查 | 日常运维 | TCENTER/CRedis/oss |
| tx_Redis®_cc_sql_status_machine_salerate | 检查newcc库中机器售卖率高于80% | 数据库检查 | 日常运维 | TCENTER/CRedis/oss |
| tx_Redis®_cc_sql_status_master_port | 检查newcc库中master进程的端口配置正常 | 数据库检查 | 日常运维 | TCENTER/CRedis/oss |
| tx_Redis®_cc_sql_status_master_slave | 检查newcc库中Redis®_procs_t中主从不一致进程 | 数据库检查 | 日常运维 | TCENTER/CRedis/oss |
| tx_Redis®_cc_sql_status_Redis®_machine_status | 检查newcc库中Redis®机器状态出现非1 | 数据库检查 | 日常运维 | TCENTER/CRedis/oss |
| tx_Redis®_cc_sql_status_Redis®_proc_status | 检查newcc库中Redis®进程状态出现非1 | 数据库检查 | 日常运维 | TCENTER/CRedis/oss |
| tx_Redis®_cc_sql_status_replication_check | 检查newcc库中副本数量一致性检查 | 数据库检查 | 日常运维 | TCENTER/CRedis/oss |
| tx_Redis®_cc_sql_status_scale_check | 检查newcc库中分片数量一致性检查 | 数据库检查 | 日常运维 | TCENTER/CRedis/oss |
| tx_Redis®_check_agent_status | 检查是否存在 agent 未拉起 | 服务状态检查 | 日常运维 | TCENTER/CRedis/oss |
| tx_Redis®_check_cluster_fail_nodes_status | 检查 cluster 是否存在 fail状态未forget成功节点 | 服务状态检查 | 日常运维 | TCENTER/CRedis/oss |
| tx_Redis®_interface_version_statics | [采集]interface版本信息 | 服务状态检查 | 日常运维 | TCENTER/CRedis/interface |
| tx_Redis®_machine_hwcheck_disk | 检查最近30分钟内日志中存在磁盘异常日志 | 硬件状态检查 | 日常运维 | TCENTER/CRedis/interface |
| tx_Redis®_machine_hwcheck_eth | 检查最近30分钟内日志中存在有网卡异常日志 | 硬件状态检查 | 日常运维 | TCENTER/CRedis/interface |
| tx_Redis®_machine_hwcheck_mem | 检查最近30分钟内日志中是存在内存异常日志 | 硬件状态检查 | 日常运维 | TCENTER/CRedis/interface |
| tx_Redis®_machine_hwcheck_nvme | 检查最近30分钟内日志中存在有nvme异常日志 | 硬件状态检查 | 日常运维 | TCENTER/CRedis/interface |
| tx_Redis®_machine_hwcheck_raid | 检查最近30分钟内日志中存在有raid卡异常日志 | 硬件状态检查 | 日常运维 | TCENTER/CRedis/interface |
| tx_Redis®_machine_survival_status | 检查机器存活状态 | 服务状态检查 | 日常运维 | TCENTER/CRedis/oss |
| tx_Redis®_master_slave_consistency | 检查是否存在主从一致性脏数据 | 服务状态检查 | 日常运维 | TCENTER/CRedis/oss |
| tx_Redis®_server_status_cluster_fail_nodes | [报警]Redis®的cluster中存在fail状态节点 | 服务状态检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_Redis®_server_status_cluster_slots_ok | [报警]Redis®的cluster槽位状态异常 | 服务状态检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_Redis®_server_status_cluster_state | [报警]Redis®的cluster info状态异常 | 服务状态检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_Redis®_server_status_memused_rate | [报警]Redis®的内存使用率大于80% | 服务状态检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_Redis®_server_status_stand_proc | [报警]Redis®存在单点进程 | 服务状态检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_Redis®_server_version_statics | [采集]Redis®的cache进程版本采集 | 服务状态检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_tcenter_ckv_check_port | 检查 ckv 端口存在 | 依赖服务检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_tcenter_ckv_check_time | 检查 ckv 时间正常 | 依赖服务检查 | 日常运维 | TCENTER/CRedis/cache |
| tx_tcenter_csp_mgmt_pod_nginx | pod csp-pod-mgmt 检查nginx进程运行正常 | 容器检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_csp_mgmt_pod_pm2 | pod csp-pod-mgmt 检查PM2进程运行正常 | 容器检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_csp_mon_ceph_mgr | 检查ceph-mgr进程运行正常 | 服务状态检查 | 日常运维 | TCENTER/CSP/monitor_hosts |
| tx_tcenter_csp_mon_ceph_mon | 检查ceph-mon进程运行正常 | 服务状态检查 | 日常运维 | TCENTER/CSP/monitor_hosts |
| tx_tcenter_csp_mon_moira_server | 检查moira server进程运行正常 | 服务状态检查 | 日常运维 | TCENTER/CSP/monitor_hosts |
| tx_tcenter_csp_mon_moira_server_api | 检查moira server api服务正常 | 服务状态检查 | 日常运维 | TCENTER/CSP/monitor_hosts |
| tx_tcenter_csp_mon_pod_ceph_mgr | pod csp-pod-mon 检查ceph-mgr进程运行正常 | 容器检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_csp_mon_pod_ceph_mon | pod csp-pod-mon 检查ceph-mon进程运行正常 | 容器检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_csp_mon_pod_moira_server | pod csp-pod-mon 检查moira server进程运行正常 | 容器检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_csp_mon_pod_moira_server_api | pod csp-pod-mon 检查moira server api服务正常 | 容器检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_csp_mon_pod_node_exporter | pod csp-pod-mon 检查node_exporter进程运行正常 | 容器检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_csp_mon_pod_prometheus | pod csp-pod-mon 检查prometheus进程运行正常 | 容器检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_csp_rgw_moira_agent | 检查moira agent运行正常 | 服务状态检查 | 日常运维 | TCENTER/CSP/gw_hosts |
| tx_tcenter_csp_rgw_nginx | 检查csp_nginx运行正常 | 服务状态检查 | 日常运维 | TCENTER/CSP/gw_hosts |
| tx_tcenter_csp_rgw_pod_moira_agent | pod csp-pod-rgw 检查moira agent运行正常 | 容器检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_csp_rgw_pod_nginx | pod csp-pod-rgw 检查csp_nginx运行正常 | 容器检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_csp_rgw_pod_node_exporter | pod csp-pod-rgw 检查node_exporter进程运行正常 | 容器检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_csp_rgw_pod_rgw_api | pod csp-pod-rgw 检查rgw api服务正常 | 容器检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_csp_rgw_radosgw | 检查rgw进程运行正常 | 服务状态检查 | 日常运维 | TCENTER/CSP/gw_hosts |
| tx_tcenter_csp_rgw_rgw_api | 检查rgw api服务正常 | 服务状态检查 | 日常运维 | TCENTER/CSP/gw_hosts |
| tx_tcenter_csp_store_ambari_agent | 检查ambari-agent服务运行正常 | 服务状态检查 | 日常运维 | TCENTER/CSP/ips |
| tx_tcenter_csp_store_ambari_agent_crontab | 检查crontab中配置ambari-agent定时任务 | 服务状态检查 | 日常运维 | TCENTER/CSP/ips |
| tx_tcenter_csp_store_ceph_down | 检查ceph osd没有down | 服务状态检查 | 日常运维 | TCENTER/CSP/ips |
| tx_tcenter_csp_store_ceph_health | 检查集群状态健康 | 服务状态检查 | 日常运维 | TCENTER/CSP/ips |
| tx_tcenter_csp_store_ceph_size | 检查csp集群的副本数 | 服务状态检查 | 日常运维 | TCENTER/CSP/ips |
| tx_tcenter_csp_store_ceph_usage | 检查csp集群的使用率 | 服务状态检查 | 日常运维 | TCENTER/CSP/ips |
| tx_tcenter_csp_store_crond | 检查crond运行正常 | 服务状态检查 | 日常运维 | TCENTER/CSP/ips |
| tx_tcenter_csp_store_pod_ambari_agent | pod csp-pod-osd 检查ambari-agent服务运行正常 | 容器检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_csp_store_pod_ambari_agent_crontab | pod csp-pod-osd 检查crontab中配置ambari-agent定时任务 | 容器检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_csp_store_pod_ceph_down | pod csp-pod-osd 检查ceph osd没有down | 容器检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_csp_store_pod_ceph_health | pod csp-pod-osd 检查集群状态健康 | 容器检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_csp_store_pod_crond | pod csp-pod-osd 检查crond运行正常 | 容器检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_es_allocation_oss | 检查Elasticsearch集群磁盘使用率未超过阈值 | 服务状态检查 | 日常运维 | TCENTER/Elasticsearch/oss |
| tx_tcenter_es_breaker_oss | 检查熔断器 breaker 参数设置是否 < 80% | 服务状态检查 | 日常运维 | TCENTER/Elasticsearch/oss |
| tx_tcenter_es_check_agent | agent 进程运行正常 | 进程检查 | 日常运维,深度巡检 | TCENTER/Elasticsearch/data |
| tx_tcenter_es_check_api | api能够正常访问 | API检查 | 日常运维,深度巡检 | TCENTER/Elasticsearch/oss |
| tx_tcenter_es_check_ces_node_info | 检查.ces_node_info文件3天没有更新 | 服务配置检查 | 日常运维 | TCENTER/Elasticsearch/data |
| tx_tcenter_es_check_etcd_cluster_health | 检查etcd集群健康 | 容器检查 | 日常运维,深度巡检 | TCENTER/K8S/Master |
| tx_tcenter_es_check_etcd_cluster_health | 检查etcd集群健康 | 服务状态检查 | 日常运维,深度巡检 | TCENTER/Elasticsearch/etcd |
| tx_tcenter_es_check_master | master 进程运行正常 | 进程检查 | 日常运维,深度巡检 | TCENTER/Elasticsearch/oss |
| tx_tcenter_es_check_master_cron | master计划任务存在 | 服务配置检查 | 日常运维 | TCENTER/Elasticsearch/oss |
| tx_tcenter_es_check_node_status | 每个节点正常 | 服务状态检查 | 日常运维 | TCENTER/Elasticsearch/etcd |
| tx_tcenter_es_cluster_each_shards_oss | 检查Elasticsearch集群磁盘使用率未超过阈值 | 服务状态检查 | 日常运维,深度巡检 | TCENTER/Elasticsearch/oss |
| tx_tcenter_es_cluster_total_shards_oss | 检查Elasticsearch集群磁盘使用率未超过阈值 | 服务状态检查 | 日常运维 | TCENTER/Elasticsearch/oss |
| tx_tcenter_es_health_oss | 检查Elasticsearch集群健康状态为green | 服务状态检查 | 日常运维 | TCENTER/Elasticsearch/oss |
| tx_tcenter_es_jvm_mem_oss | 检查Elasticsearch集群JVM内存使用率未超过阈值 | 服务状态检查 | 日常运维 | TCENTER/Elasticsearch/oss |
| tx_tcenter_es_node_oss | 检查Elasticsearch集群健康状态为green | 服务状态检查 | 日常运维 | TCENTER/Elasticsearch/oss |
| tx_tcenter_es_shards_store_oss | 检查支撑ES节点单个分片小于50g | 服务状态检查 | 日常运维 | TCENTER/Elasticsearch/oss |
| tx_tcenter_etcd_dbsize | 检查dbSize小于1.6g | 服务状态检查 | 日常运维 | TCENTER/K8S/Master TCENTER/K8S/Etcd |
| tx_tcenter_etcd_member_status | 检查member状态正常 | 服务状态检查 | 日常运维 | TCENTER/K8S/Etcd |
| tx_tcenter_hdfs_check_datanode_cluster_state | 测试 hdfs datanodes 集群正常 | 服务状态检查 | 日常运维,深度巡检 | TCENTER/HDFS/datanode |
| tx_tcenter_hdfs_check_dfs_usage | 检测 hdfs 集群 DFS 使用率小于 80% | 服务状态检查 | 日常运维,深度巡检 | TCENTER/HDFS/datanode |
| tx_tcenter_hdfs_check_kinit_state | 检查hdfs是否kinit执行成功 | 服务状态检查 | 日常运维 | TCENTER/HDFS/datanode |
| tx_tcenter_hdfs_check_missing_blocks | 检测 hdfs 集群不存在 Missing Blocks | 服务状态检查 | 日常运维,深度巡检 | TCENTER/HDFS/datanode |
| tx_tcenter_hdfs_check_namenode_cluster_state | 测试 hdfs namenodes 集群正常 | 服务状态检查 | 日常运维,深度巡检 | TCENTER/HDFS/datanode |
| tx_tcenter_hdfs_check_read_state | 测试 hdfs 读正常 | 服务状态检查 | 日常运维,深度巡检 | TCENTER/HDFS/datanode |
| tx_tcenter_hdfs_check_safe_mode | 检测hdfs是否进入安全模式 | 系统参数检查 | 日常运维,深度巡检 | TCENTER/HDFS/namenode TCENTER/HDFS/datanode |
| tx_tcenter_hdfs_check_write_state | 测试 hdfs 写正常 | 服务状态检查 | 日常运维,深度巡检 | TCENTER/HDFS/datanode |
| tx_tcenter_hdfs_crontab_check_hdfs_journalnode | 检测hdfs-jn计划任务 | 服务配置检查 | 日常运维 | TCENTER/HDFS/journalnode |
| tx_tcenter_hdfs_crontab_check_kinit | 检查hdfs是否配置kinit计划任务 | 服务状态检查 | 日常运维 | TCENTER/HDFS/journalnode |
| tx_tcenter_hdfs_kerberos | kerberos进程检查 | 服务状态检查 | 日常运维 | TCENTER/HDFS/namenode |
| tx_tcenter_hdfs_keytable | keytabe检查 | 服务状态检查 | 日常运维 | TCENTER/HDFS/namenode |
| tx_tcenter_hdfs_keytable_check_node | 检查所有节点的keytab文件 | 服务状态检查 | 日常运维 | TCENTER/HDFS/namenode TCENTER/HDFS/journalnode TCENTER/HDFS/datanode |
| tx_tcenter_hdfs_process_check_journalnode | 检测hdfs-jn进程正常 | 进程检查 | 日常运维 | TCENTER/HDFS/journalnode |
| tx_tcenter_k8s_apiserver-etcd-client_expire | apiserver-etcd-client证书将在45天内过期 | 证书与license检查 | 日常运维 | TCENTER/K8S/Master |
| tx_tcenter_k8s_apiserver-kubelet-client_expire | apiserver-kubelet-client证书将在45天内过期 | 证书与license检查 | 日常运维 | TCENTER/K8S/Master |
| tx_tcenter_k8s_apiserver-loopback-client_expire | apiserver-loopback-client证书将在45天内过期 | 证书与license检查 | 日常运维 | TCENTER/K8S/Master |
| tx_tcenter_k8s_apisever_expire | apisever证书将在45天内过期 | 证书与license检查 | 日常运维 | TCENTER/K8S/Master |
| tx_tcenter_k8s_ca_expire | ca证书将在45天内过期 | 证书与license检查 | 日常运维 | TCENTER/K8S/Master |
| tx_tcenter_k8s_cert-manager_expire | cert-manager证书将在45天内过期 | 证书与license检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_k8s_etcd-ca_expire | etcd-ca证书将在45天内过期 | 证书与license检查 | 日常运维 | TCENTER/K8S/Etcd |
| tx_tcenter_k8s_etcd-peer_expire | etcd-peer证书将在45天内过期 | 证书与license检查 | 日常运维 | TCENTER/K8S/Etcd |
| tx_tcenter_k8s_etcd-server_expire | etcd-server证书将在45天内过期 | 证书与license检查 | 日常运维 | TCENTER/K8S/Etcd |
| tx_tcenter_k8s_kubelet-ca_expire | kubelet-ca证书将在45天内过期 | 证书与license检查 | 日常运维 | TCENTER/K8S/Master |
| tx_tcenter_k8s_kubelet-client_expire | kubelet-client证书将在45天内过期 | 证书与license检查 | 日常运维 | TCENTER/K8S/Node TCENTER/K8S/Master |
| tx_tcenter_k8s_kubelet_expire | kubelet证书将在45天内过期 | 证书与license检查 | 日常运维 | TCENTER/K8S/Node TCENTER/K8S/Master |
| tx_tcenter_k8s_mtu_check | mtu没有风险 | 服务配置检查 | 日常运维,深度巡检 | TCENTER/K8S/Node TCENTER/K8S/Master |
| tx_tcenter_k8s_ocloud-tcenter-base-vpc-dns_analysis | ocloud-tcenter-base-vpc-dns能够解析域名 | 服务配置检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_k8s_registry-ca_expire | registry-ca证书将在45天内过期 | 证书与license检查 | 日常运维 | TCENTER/K8S/Master |
| tx_tcenter_k8s_registry_expire | registry证书将在45天内过期 | 证书与license检查 | 日常运维 | TCENTER/K8S/Master |
| tx_tcenter_kafka_check_data_size | kafka 检查 | 脏数据检查 | 日常运维 | TCENTER/Kafka/kafka |
| tx_tcenter_kafka_check_port | kafka 检查 | 端口检查 | 日常运维 | TCENTER/Kafka/kafka |
| tx_tcenter_kafka_check_topic | kafka 检查 | 脏数据检查 | 日常运维 | TCENTER/Kafka/kafka |
| tx_tcenter_kafka_partition_leader | kafka的partition leader不为-1 | 服务状态检查 | 日常运维 | TCENTER/Kafka/kafka |
| tx_tcenter_kubelet_check_docker | 检查docker进程状态 | 进程检查 | 日常运维 | TCENTER/K8S/Node TCENTER/K8S/Master |
| tx_tcenter_kubelet_check_kubelet_status | 检查kubelet进程状态 | 进程检查 | 日常运维 | TCENTER/K8S/Node TCENTER/K8S/Master |
| tx_tcenter_kubelet_tcs_cgroup_memory_nokmem | 检查tcs所有节点完成cgroup.memory=nokmem配置 | 服务配置检查 | 日常运维 | TCENTER/K8S/Node TCENTER/K8S/Master |
| tx_tcenter_mongo_cluster_config_state | mongo 集群检查 | 脏数据检查 | 日常运维 | TCENTER/MongoDB/instance |
| tx_tcenter_mongo_cluster_read_state | mongo 集群检查 | 脏数据检查 | 日常运维 | TCENTER/MongoDB/instance |
| tx_tcenter_mongo_cluster_shard_state | mongo 集群检查 | 脏数据检查 | 日常运维 | TCENTER/MongoDB/instance |
| tx_tcenter_mongo_cluster_write_state | mongo 集群检查 | 脏数据检查 | 日常运维 | TCENTER/MongoDB/instance |
| tx_tcenter_mq_check_cluster_state | 集群状态正常 | 服务状态检查 | 日常运维 | TCENTER/RabbitMQ/RabbitMQ |
| tx_tcenter_mq_check_cron | 检查mq的定时任务 | 服务状态检查 | 日常运维 | TCENTER/RabbitMQ/RabbitMQ |
| tx_tcenter_mq_check_cron | 检查mq的定时任务 | 服务配置检查 | 日常运维 | TCENTER/Product/product-tcenter-support-mq/mq_nodes |
| tx_tcenter_mq_check_list_queues_state | mq list_queues 状态正常 | 服务状态检查 | 日常运维 | TCENTER/RabbitMQ/RabbitMQ |
| tx_tcenter_mq_cluster_brainsplit | mq 集群状态没有出现脑裂 | 服务状态检查 | 日常运维 | TCENTER/RabbitMQ/RabbitMQ |
| tx_tcenter_mq_report_state | mq节点检查 | 服务状态检查 | 日常运维 | TCENTER/RabbitMQ/RabbitMQ |
| tx_tcenter_mq_report_state | mq节点检查 | 脏数据检查 | 日常运维 | TCENTER/Product/product-tcenter-support-mq/mq_nodes |
| tx_tcenter_mq_state | mq节点检查 | 服务状态检查 | 日常运维,深度巡检 | TCENTER/RabbitMQ/RabbitMQ |
| tx_tcenter_mq_state | mq节点检查 | 脏数据检查 | 日常运维,深度巡检 | TCENTER/Product/product-tcenter-support-mq/mq_nodes |
| tx_tcenter_mq_version_check_3100 | 云平台物理支撑rabbitmq版本检查 | 服务配置检查 | 日常运维 | TCENTER/RabbitMQ/RabbitMQ |
| tx_tcenter_ntp_check_crontab_ntpdate | 检查crontab中没有配置ntpdate定时任务 | 时间服务检查 | 日常运维 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/journalnode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/oss TCENTER/CRedis/interface TCENTER/CRedis/cache |
| tx_tcenter_ntp_check_ntpd_is_enabled | 检查ntpd自启动脚本正常 | 时间服务检查 | 日常运维 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/journalnode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/oss TCENTER/CRedis/interface TCENTER/CRedis/cache |
| tx_tcenter_ntp_check_ntpd_offset | 检查ntpd上次同步offset在100ms以内 | 时间服务检查 | 日常运维 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/journalnode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/oss TCENTER/CRedis/interface TCENTER/CRedis/cache |
| tx_tcenter_ntp_check_ntpd_when | 检查ntpd上次同步时间在1024s以内 | 时间服务检查 | 日常运维 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/journalnode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/oss TCENTER/CRedis/interface TCENTER/CRedis/cache |
| tx_tcenter_ntp_check_process_ntpd_and_chrony | 检查是否使用ntpd或chrony | 时间服务检查 | 日常运维 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/journalnode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/oss TCENTER/CRedis/interface TCENTER/CRedis/cache |
| tx_tcenter_pod_workload_node_check | 检查k8s集群中workload是否有node重叠 | 容器检查 | 日常运维 | TCENTER/K8S/Master |
| tx_tcenter_system_check_defunct_process | 检查服务器-僵尸进程数小于50 | 系统参数检查 | 日常运维 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/interface TCENTER/CRedis/cache |
| tx_tcenter_system_cpu_idle | 检查服务器-cpuidle大于30% | 系统参数检查 | 日常运维 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/interface TCENTER/CRedis/cache |
| tx_tcenter_system_disk_read_write | 检查服务器磁盘读写可用 | 系统参数检查 | 日常运维 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/interface TCENTER/CRedis/cache |
| tx_tcenter_system_fd_usage | 检查服务器-fd用量低于80% | 系统参数检查 | 日常运维 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/interface TCENTER/CRedis/cache |
| tx_tcenter_system_inode_usage | 检查服务器-inode使用率低于80% | 系统参数检查 | 日常运维 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/interface TCENTER/CRedis/cache |
| tx_tcenter_system_netdev_recv_pkg | 检查服务器-网卡入包量小于5G | 系统参数检查 | 日常运维 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/interface TCENTER/CRedis/cache |
| tx_tcenter_system_netdev_send_pkg | 检查服务器-网卡出包量小于5G | 系统参数检查 | 日常运维 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/interface TCENTER/CRedis/cache |
| tx_tcenter_system_network_sockets | 检查服务器-网络socket链接小于20万 | 系统参数检查 | 日常运维 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/interface TCENTER/CRedis/cache |
| tx_tcenter_system_tcenter_io_await | 检查服务器-ioawait小于1s | 系统参数检查 | 日常运维 | TCENTER/Zookeeper/zookeeper TCENTER/TDSQL/scheduler TCENTER/TDSQL/proxy TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/db TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Node TCENTER/K8S/Master TCENTER/ImgCache/imgcache TCENTER/HDFS/namenode TCENTER/HDFS/datanode TCENTER/Elasticsearch/oss TCENTER/Elasticsearch/etcd TCENTER/Elasticsearch/data TCENTER/CSP/web_hosts TCENTER/CSP/monitor_hosts TCENTER/CSP/ips TCENTER/CSP/gw_hosts TCENTER/CRedis/interface TCENTER/CRedis/cache |
| tx_tcenter_system_tcenter_io_util | 检查服务器-ioutil小于60% | 系统参数检查 | 日常运维 | TCENTER/TDSQL/scheduler TCENTER/TDSQL/oss TCENTER/TDSQL/monitor TCENTER/TDSQL/chitu TCENTER/RabbitMQ/RabbitMQ TCENTER/Product/product-tcenter-support-mq/mq_nodes TCENTER/Kafka/kafka TCENTER/K8S/Master TCENTER/HDFS/namenode TCENTER/HDFS/datanode TCENTER/CSP/monitor_hosts TCENTER/CSP/gw_hosts |
| tx_tcenter_tcs_cgroup_memory_nokmem_status | 检查tcs所有节点完成cgroup.memory=nokmem生效 | 服务配置检查 | 日常运维 | TCENTER/K8S/Master |
| tx_tcenter_tcs_check_node_mem_cpu | 检查node的cpu/mem分配率是否高于90% | 容器检查 | 日常运维 | TCENTER/K8S/Master |
| tx_tcenter_tcs_check_node_mem_cpu | 检查node的cpu/mem分配率是否高于90% | 服务状态检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_tcs_check_node_status | 检查node是否ready | 容器检查 | 日常运维,深度巡检 | TCENTER/K8S/Master |
| tx_tcenter_tcs_check_node_status | 检查node是否ready | 服务状态检查 | 日常运维,深度巡检 | TCENTER/k8s-master-ip |
| tx_tcenter_tcs_master_disk_usage | 检查磁盘使用率是否小于60% | 进程检查 | 日常运维 | TCENTER/K8S/Master |
| tx_tcenter_tdsql_check_proc | 检查ewp进程&计划任务配置 | 进程检查 | 日常运维 | TCENTER/TDSQL/oss TCENTER/TDSQL/Instance |
| tx_tcenter_tdsql_ioutil | 检查磁盘的util值是否超过60% | 容量检查 | 日常运维 | TCENTER/TDSQL/oss TCENTER/TDSQL/Instance |
| tx_tcenter_tdsql_load_average | 检查最近1分钟/5分钟/15分钟系统平均负载是否小于CPU核数 | 容量检查 | 日常运维 | TCENTER/TDSQL/oss TCENTER/TDSQL/Instance |
| tx_tcenter_tdsql_login | 判断用户tdsql是否存在&tdsql是否可以登录 | 系统参数检查 | 日常运维 | TCENTER/TDSQL/Instance |
| tx_tcenter_tdsql_mem_swap_usage | 检查swap使用是否超过60% | 容量检查 | 日常运维 | TCENTER/TDSQL/oss TCENTER/TDSQL/Instance |
| tx_tcenter_tdsql_mem_usage | 检查内存使用率是否超过阈值 | 容量检查 | 日常运维 | TCENTER/TDSQL/oss TCENTER/TDSQL/Instance |
| tx_tcenter_tdsql_partition_usage | 检查 /和/data 挂载分区使用率是否小于阈值 | 容量检查 | 日常运维 | TCENTER/TDSQL/oss TCENTER/TDSQL/Instance |
| tx_tcenter_tdsql_rc_local | rc.local是否有执行权限检查 | 系统参数检查 | 日常运维 | TCENTER/TDSQL/Instance |
| tx_tcenter_tke_check_cls_fail_by_k8s_api | 检查是否有创建失败或长期创建中的tke集群 | 服务状态检查 | 日常运维 | TCENTER/k8s-master-ip |
| tx_tcenter_vpcdns_dig_pod | 检查 base-vpc-dns 域名解析 | 服务状态检查 | 日常运维 | TCENTER/KubeResource/ocloud-tcenter-base-vpc-dns |
| tx_tcenter_zk_check_conn_num | zk连接数检查是否小于500 | 依赖服务检查 | 日常运维 | TCENTER/Zookeeper/instance |
| tx_tcenter_zk_check_conn_num | zk连接数检查是否小于500 | 服务状态检查 | 日常运维 | TCENTER/Zookeeper/zookeeper |
| tx_tcenter_zk_check_status | zk状态检查 | 依赖服务检查 | 日常运维,深度巡检 | TCENTER/Zookeeper/instance |
| tx_tcenter_zk_check_zk_outstanding_requests | zk排队请求数量是否超过10 | 依赖服务检查 | 日常运维,深度巡检 | TCENTER/Zookeeper/instance |
| tx_tcenter_zk_check_zk_outstanding_requests | zk排队请求数量是否超过10 | 服务状态检查 | 日常运维,深度巡检 | TCENTER/Zookeeper/zookeeper |
| tx_tcenter_zk_check_znode_num | znode数量检查超过10W | 依赖服务检查 | 日常运维,深度巡检 | TCENTER/Zookeeper/instance |
| tx_tcenter_zk_check_znode_num | znode数量检查超过10W | 服务状态检查 | 日常运维,深度巡检 | TCENTER/Zookeeper/zookeeper |
| tx_tcenter_zk_config | 检查Zookeeper集群节点配置 | 依赖服务检查 | 日常运维 | TCENTER/Zookeeper/instance |
| tx_tcenter_zk_multi_version | 检查Zookeeper集群是否有多个版本混部 | 依赖服务检查 | 日常运维 | TCENTER/Zookeeper/instance |
| tx_tcenter_zk_no_sync_data | zk判断未同步数据是否大于10 | 依赖服务检查 | 日常运维 | TCENTER/Zookeeper/instance |
| tx_tcenter_zk_no_sync_data | zk判断未同步数据是否大于10 | 服务状态检查 | 日常运维 | TCENTER/Zookeeper/zookeeper |
| tx_tcenter_zk_partition_usage | 检查Zookeeper集群磁盘 /和/data 使用率是否小于阈值 | 依赖服务检查 | 日常运维 | TCENTER/Zookeeper/instance |
| tx_tcenter_zk_snapshot | 检查Zookeeper集群快照大小 | 依赖服务检查 | 日常运维 | TCENTER/Zookeeper/instance |
| tx_tcenter_zk_snapshot_size | zk快照体积是否超过1G | 依赖服务检查 | 日常运维,深度巡检 | TCENTER/Zookeeper/instance |
| tx_tcenter_zk_snapshot_size | zk快照体积是否超过1G | 服务状态检查 | 日常运维,深度巡检 | TCENTER/Zookeeper/zookeeper |
| tx_tcenter_zk_state | 检查Zookeeper集群节点状态 | 依赖服务检查 | 日常运维,深度巡检 | TCENTER/Zookeeper/instance |
| tx_tcs_check_pajero | 检查 pajero 运行状态 | 服务配置检查 | 日常运维,深度巡检 | TCENTER/K8S/Master |
| tx_tcs_check_pvc_usage | 检查是否存在 pvc 使用率超过 85% | 服务配置检查 | 日常运维 | TCENTER/K8S/Node TCENTER/K8S/Master |
| tx_tcs_check_tunl_vip | 检查 tcs 隧道完整性 | 服务配置检查 | 日常运维,深度巡检 | TCENTER/K8S/Node |
| tx_tcs_dig_underlay | 检查 tcs underlay 域名解析 | 进程检查 | 日常运维 | TCENTER/K8S/Master |