Metrics¶
Some services of DPF expose several custom metrics that provide insights into their operation and performance and can help to identify any potential issues. These metrics are exposed at the /metrics endpoint of each service and can be scraped by Prometheus. You can visualize them by using tools like Grafana to create dashboards that provide a real-time view of the system's performance.
For a brief overview of the custom metrics for each service, see Custom Service Metrics below.
For more information about the overall health of your server, refer to Default Metrics below.
Custom Service Metrics¶
WU Services¶
All WU services (like seal-dpf-wu-manager-rest, seal-dpf-wusystemcall, seal-dpf-wuwait, seal-dpf-wusendback, seal-dpf-wujavacall4) monitor working unit calls. They expose the following custom metrics:
-
dpf_wu_calls_total: Total number of WU calls per WU. -
dpf_wu_duration_seconds: Total duration of WU per WU. -
dpf_wu_waiting_seconds: Total waiting for WU per WU.
seal-dpf-process-manager¶
The seal-dpf-process-manager service monitors workflow calls. It exposes the following custom metrics:
-
dpf_wf_calls_total: Total number of WF calls per WF. -
dpf_wf_duration_seconds: Total duration of WF per WF.
Default Metrics¶
Additionally to the custom metrics provided by the services, you can utilize the standard metrics provided by the Node.js prom-client to measure the overall health of your server.
[
{
"name": "process_cpu_user_seconds_total",
"help": "Total user CPU time spent in seconds.",
"type": "counter"
},
{
"name": "process_cpu_system_seconds_total",
"help": "Total system CPU time spent in seconds.",
"type": "counter"
},
{
"name": "process_cpu_seconds_total",
"help": "Total user and system CPU time spent in seconds.",
"type": "counter"
},
{
"name": "process_start_time_seconds",
"help": "Start time of the process since unix epoch in seconds.",
"type": "gauge"
},
{
"name": "process_resident_memory_bytes",
"help": "Resident memory size in bytes.",
"type": "gauge"
},
{
"name": "process_virtual_memory_bytes",
"help": "Virtual memory size in bytes.",
"type": "gauge"
},
{
"name": "process_heap_bytes",
"help": "Process heap size in bytes.",
"type": "gauge"
},
{
"name": "process_open_fds",
"help": "Number of open file descriptors.",
"type": "gauge"
},
{
"name": "process_max_fds",
"help": "Maximum number of open file descriptors.",
"type": "gauge"
},
{
"name": "nodejs_eventloop_lag_seconds",
"help": "Lag of event loop in seconds.",
"type": "gauge"
},
{
"name": "nodejs_eventloop_lag_min_seconds",
"help": "The minimum recorded event loop delay.",
"type": "gauge"
},
{
"name": "nodejs_eventloop_lag_max_seconds",
"help": "The maximum recorded event loop delay.",
"type": "gauge"
},
{
"name": "nodejs_eventloop_lag_mean_seconds",
"help": "The mean of the recorded event loop delays.",
"type": "gauge"
},
{
"name": "nodejs_eventloop_lag_stddev_seconds",
"help": "The standard deviation of the recorded event loop delays.",
"type": "gauge"
},
{
"name": "nodejs_eventloop_lag_p50_seconds",
"help": "The 50th percentile of the recorded event loop delays.",
"type": "gauge"
},
{
"name": "nodejs_eventloop_lag_p90_seconds",
"help": "The 90th percentile of the recorded event loop delays.",
"type": "gauge"
},
{
"name": "nodejs_eventloop_lag_p99_seconds",
"help": "The 99th percentile of the recorded event loop delays.",
"type": "gauge"
},
{
"name": "nodejs_active_resources",
"help": "Number of active resources that are currently keeping the event loop alive, grouped by async resource type.",
"type": "gauge",
"labels": [
"type"
]
},
{
"name": "nodejs_active_resources_total",
"help": "Total number of active resources.",
"type": "gauge"
},
{
"name": "nodejs_active_handles",
"help": "Number of active libuv handles grouped by handle type. Every handle type is C++ class name.",
"type": "gauge",
"labels": [
"type"
]
},
{
"name": "nodejs_active_handles_total",
"help": "Total number of active handles.",
"type": "gauge"
},
{
"name": "nodejs_active_requests",
"help": "Number of active libuv requests grouped by request type. Every request type is C++ class name.",
"type": "gauge",
"labels": [
"type"
]
},
{
"name": "nodejs_active_requests_total",
"help": "Total number of active requests.",
"type": "gauge"
},
{
"name": "nodejs_heap_size_total_bytes",
"help": "Process heap size from Node.js in bytes.",
"type": "gauge"
},
{
"name": "nodejs_heap_size_used_bytes",
"help": "Process heap size used from Node.js in bytes.",
"type": "gauge"
},
{
"name": "nodejs_external_memory_bytes",
"help": "Nodejs external memory size in bytes.",
"type": "gauge"
},
{
"name": "nodejs_heap_space_size_total_bytes",
"help": "Process heap space size total from Node.js in bytes.",
"type": "gauge",
"labels": [
"space"
]
},
{
"name": "nodejs_heap_space_size_used_bytes",
"help": "Process heap space size used from Node.js in bytes.",
"type": "gauge",
"labels": [
"space"
]
},
{
"name": "nodejs_heap_space_size_available_bytes",
"help": "Process heap space size available from Node.js in bytes.",
"type": "gauge",
"labels": [
"space"
]
},
{
"name": "nodejs_version_info",
"help": "Node.js version info.",
"type": "gauge",
"labels": [
"version",
"major",
"minor",
"patch",
"release",
"lts"
]
},
{
"name": "nodejs_gc_duration_seconds",
"help": "Garbage collection duration by kind, one of major, minor, incremental or weakcb.",
"type": "histogram",
"labels": [
"kind"
]
},
{
"name": "http_request_duration_seconds",
"help": "duration histogram of http responses labeled with: status_code, method, path",
"type": "histogram",
"labels": [
"status_code",
"method",
"path"
]
},
{
"name": "up",
"help": "1 = up, 0 = not up",
"type": "gauge"
}
]