常用监控指标

最近更新时间: 2026-03-13 09:03:00

指标/事件名称 指标含义 类型 单位 阈值 预案
count_over_time({tcs_product="barad",tcs_type="tcs_barad",event_name="FlinkJobStatus"}[1m]) Flink作业状态检测 0 重启Flink作业
barad_access_message_delay_histogram_bucket 上报延迟 histogram dtdurations
barad_access_http_request_count_total_increase 请求数 gauge count
barad_access_http_request_duration_seconds_p95 请求耗时(p95) histogram dtdurationms
barad_access_http_request_size_bytes_p95 请求包体大小(p95) histogram bytes
barad_access_http_request_count_total_increase 请求数 gauge count
barad_access_http_request_duration_seconds_p95 请求耗时(p95) histogram dtdurationms
barad_access_http_request_size_bytes_p95 请求包体大小(p95) histogram bytes
metric_writer_fields_count_increase 写入Field个数 counter count
metric_writer_parse_fail_count_increase 解析失败数 counter count
metric_writer_parse_metric_count_increase 解析指标数 counter count
metric_writer_recv_data_delay_p95 接收延迟(P95) histogram dtdurationms
amp_alert_process_convergence_req_total_increase 总收敛请求数 counter count
amp_alert_process_duration_ms_bucket_p95 请求耗时(P95) histogram dtdurationms
amp_alert_process_process_duration_ms_bucket_p95 处理耗时(P95) histogram dtdurationms
amp_alert_process_req_total_increase 请求量 counter count
amp_alert_process_richError_req_total_increase 丰富错误数 counter count
amp_alert_process_send_success_req_total_increase 成功发送数 counter count
amp_subscribe_process_duration_ms_bucket_p95 订阅处理耗时(P95) histogram dtdurationms
amp_subscribe_process_req_hit_total_increase 请求命中量 counter count
amp_subscribe_process_yehe_notify_send_increase 发送至消息中心个数 counter count
flink_taskmanager_Status_JVM_CPU_Time_increase CPU使用时间 counter dtdurations
flink_taskmanager_Status_JVM_GarbageCollector_ConcurrentMarkSweep_Count_increase CMS_GC次数 counter count
flink_taskmanager_Status_JVM_GarbageCollector_ConcurrentMarkSweep_Time_increase CMS_GC时长 counter dtdurationms
flink_taskmanager_Status_JVM_Memory_Heap_Used 内存使用情况 gauge bytes
barad_access_message_delay_histogram_p95 指标延迟(P95) histogram dtdurations