Welcome to the new Golem Cloud Docs! 👋
Documentation
Metrics

Operate - metrics

Overview

Every Golem service exposes Prometheus metrics on its HTTP interface. The metrics are available at the /metrics endpoint for scraping.

Beside the service-specific metrics each service exports a single counter called version_info in which the labels hold service-specific information. The currently available labels are:

LabelDescription
versionThe version of the service

Service-specific metrics

Component Service

Component service exposes the following Prometheus metrics:

MetricTypeLabelsDescription
poem_request_countcounterhttp.request.method, url.full, [http.path_pattern], http.response.status_code, exception.messageCounts every request on the HTTP interface
poem_errors_countcounterhttp.request.method, url.full, [http.path_pattern], http.response.status_code, exception.messageCounts failed requests on the HTTP interface
poem_request_duration_mshistogramhttp.request.method, url.full, [http.path_pattern], http.response.status_code, exception.messageMeasures the duration of handling requests on the HTTP interface
api_success_secondshistogramapi, api_typeMeasures the duration of successfully serving API requests (both HTTP and gRPC)
api_failure_secondshistogramapi, api_typeMeasures the duration of failed API requests (both HTTP and gRPC)
grpc_api_active_streamsgaugeNumber of open incoming gRPC streams
http_api_active_streamsgaugeNumber of open incoming HTTP streams

Component Compilation Service

Component compilation service exposes the following Prometheus metrics:

MetricTypeLabelsDescription
cache_sizegaugecacheCurrent maximal capacity of the cache
cache_hit_totalcountercacheNumber of cache hits
cache_miss_totalcountercacheNumber of cache misses
cache_eviction_totalcountercache, triggerNumber of cache evictions
component_compilation_queue_lengthgaugeNumber of compilation requests enqueued
compilation_time_secondshistogramTime to compile a WASM compnent to native code

Shard Manager

Shard manager exposes the following Prometheus metrics:

MetricTypeLabelsDescription
api_success_secondshistogramapi, api_typeMeasures the duration of successfully serving API requests (both HTTP and gRPC)
api_failure_secondshistogramapi, api_typeMeasures the duration of failed API requests (both HTTP and gRPC)
grpc_api_active_streamsgaugeNumber of open incoming gRPC streams
http_api_active_streamsgaugeNumber of open incoming HTTP streams
external_call_success_secondshistogramtarget, opDureation of successful outgoing calls
external_call_response_size_byteshistogramtarget, opSize of the response of outgoing calls
external_call_retry_totalcountertarget, opNumber of failed outgoing calls that got retried
external_call_failure_totalcountertarget, opNumber of failed outgoing calls not to be retried
redis_success_secondshistogramsvc, api, cmdDuration of successful Redis calls
redis_failure_totalcountersvc, api, cmdNumber of failed Redis calls
redis_serialized_size_byteshistogramsvc, entitySize of serialized Redis entities
redis_deserialized_size_byteshistogramsvc, entitySize of deserialized Redis entities

Worker Executor

Worker executors expose the following Prometheus metrics:

MetricTypeLabelsDescription
api_success_secondshistogramapi, api_typeMeasures the duration of successfully serving API requests (both HTTP and gRPC)
api_failure_secondshistogramapi, api_typeMeasures the duration of failed API requests (both HTTP and gRPC)
grpc_api_active_streamsgaugeNumber of open incoming gRPC streams
http_api_active_streamsgaugeNumber of open incoming HTTP streams
external_call_success_secondshistogramtarget, opDureation of successful outgoing calls
external_call_response_size_byteshistogramtarget, opSize of the response of outgoing calls
external_call_retry_totalcountertarget, opNumber of failed outgoing calls that got retried
external_call_failure_totalcountertarget, opNumber of failed outgoing calls not to be retried
redis_success_secondshistogramsvc, api, cmdDuration of successful Redis calls
redis_failure_totalcountersvc, api, cmdNumber of failed Redis calls
redis_serialized_size_byteshistogramsvc, entitySize of serialized Redis entities
redis_deserialized_size_byteshistogramsvc, entitySize of deserialized Redis entities
cache_sizegaugecacheCurrent maximal capacity of the cache
cache_hit_totalcountercacheNumber of cache hits
cache_miss_totalcountercacheNumber of cache misses
cache_eviction_totalcountercache, triggerNumber of cache evictions
compilation_time_secondshistogramTime to compile a WASM compnent to native code
event_totalcountereventNumber of events produced by workers
event_broadcast_totalcountereventNumber of events broadcasted by the executor
worker_executor_call_totalcounterapiNumber of calls to the worker layer
promises_count_totalcounterNumber of promises created
promises_scheduled_complete_totalcounterNumber of scheduled promise completions
assigned_shard_countgaugeNumber of assigned shards
create_worker_secondshistogramTime to create a new worker
create_worker_failure_totalcountererrorNumber of failed worker creations
invocation_totalcountermode, outcomeNumber of invocations
invocation_consumption_totalhistogramAmount of fuel consumed by an invocation
allocated_memory_byteshistogramAmount of memory allocated by a single memory.grow instruction
host_function_call_totalcounterinterface, nameNumber of calls to specific host functions
resume_worker_secondshistogramTime taken to resume a worker
replayed_functions_counthistogramNumber of functions replayed per forker resumption
oplog_svc_call_totalcounterapiNumber of calls to the oplog layer

Worker Service

Worker service exposes the following Prometheus metrics:

MetricTypeLabelsDescription
poem_request_countcounterhttp.request.method, url.full, [http.path_pattern], http.response.status_code, exception.messageCounts every request on the HTTP interface
poem_errors_countcounterhttp.request.method, url.full, [http.path_pattern], http.response.status_code, exception.messageCounts failed requests on the HTTP interface
poem_request_duration_mshistogramhttp.request.method, url.full, [http.path_pattern], http.response.status_code, exception.messageMeasures the duration of handling requests on the HTTP interface
api_success_secondshistogramapi, api_typeMeasures the duration of successfully serving API requests (both HTTP and gRPC)
api_failure_secondshistogramapi, api_typeMeasures the duration of failed API requests (both HTTP and gRPC)
grpc_api_active_streamsgaugeNumber of open incoming gRPC streams
http_api_active_streamsgaugeNumber of open incoming HTTP streams
external_call_success_secondshistogramtarget, opDureation of successful outgoing calls
external_call_response_size_byteshistogramtarget, opSize of the response of outgoing calls
external_call_retry_totalcountertarget, opNumber of failed outgoing calls that got retried
external_call_failure_totalcountertarget, opNumber of failed outgoing calls not to be retried
cache_sizegaugecacheCurrent maximal capacity of the cache
cache_hit_totalcountercacheNumber of cache hits
cache_miss_totalcountercacheNumber of cache misses
cache_eviction_totalcountercache, triggerNumber of cache evictions