Skip to main content

A/R results

Availability Reliability

ARGO Monitoring service, by generating status timelines for each level of the topology, can produce useful conclusions about the monitoring item. One very useful conclusion is to decide if the item is available for usage and if it is considered as reliable.
To succeed this, availability/reliability rates need to be calculated on the timelines.

Availability

Availability: Service Availability is the fraction of time a service was in the UP Period during the known interval in a given period.

Availability rate is useful to know, as it provides information of the time the monitored item is available for use. Availability is calculated as follows:

Availability = UP period / KNOWN period 

where

KNOWN = Total period / UNKNOWN period 

and

  • UP period : The duration the status of the monitored item is OK or WARNING
  • UNKNOWN period: The duration the status of the monitored item is UNKNOWN
  • Total period : The duration of the monitoring (usually a day)

Reliability

Reliability : Service Reliability is the ratio of the time interval a service was UP over the time interval it was supposed (scheduled) to be UP in the given period.

Reliability rate is also useful to know, as it provides information about the quality of monitored item during the monitoring time, whereas it is reliable of usage or not.

Reliability is calculated as follows:

Reliability = UP period / (KNOWN period – Scheduled Downtime) 

where

  • UP period : The duration the status of the monitored item is OK or WARNING
  • UNKNOWN period: The duration the status of the monitored item is UNKNOWN
  • Scheduled Downtime period: The duration the status of the monitored item is defined to be DOWNTIME
  • Total period : The duration of the monitoring (usually a day)

Example

Let’s assume we have a timeline for 15-09-2021 with the following status/timestamp

timestampstatus
2021-09-15T00:00:00ZOK
2021-09-15T03:00:00ZCRITICAL
2021-09-15T12:00:00ZUNKNOWN
2021-09-15T18:00:00ZDOWNTIME
2021-09-15T22:00:00ZOK

Based on these timestamps each status lasts as:

  • OK Status: From 00:00 to 03:00 and from 22:00 till the end of the day
  • CRITICAL Status: From 03:00 to 12:00
  • UNKNOWN Status: From 12:00 to 18:00

The Service owner has informed the monitoring service for a scheduled downtime from 18:00 to 22:00

Calculating the duration for each status in seconds is:

StatusDuration
Total seconds of a day86400 seconds /24h
OK18000 seconds /5h
CRITICAL32400 seconds /9h
UNKNOWN21600 seconds /6h
DOWNTIME14400 seconds /4h
Availability = UP period / KNOWN period = UP period / (Total period – UNKNOWN period)

where:

  • UP period =OK period + WARNING period=18000 seconds
  • UNKNOWN period: UNKNOWN period + MISSING period=21600 seconds
  • Total period : 86400 seconds
  • Availability = 18000/(86400-21600)=0.2777, and converting to the scale of 100 the availability is 27.777
Reliability = UP period / (KNOWN period – Scheduled Downtime) = UP period / (Total     period – UNKNOWN period – ScheduledDowntime)

where:

  • UP period =OK period +WARNING period. This in numbers means 18000 seconds

  • UNKNOWN period: UNKNOWN period+MISSING period. This in numbers means 21600 seconds

  • ScheduledDowntime period: 14400 seconds

  • Total period : 86400 seconds

  • Reliability =18000/(86400-21600-14400)=0.3571, and converting to the scale of 100 the reliability is 35.771

ARGO Monitoring Service also calculates results of up period, downtime period, unknown period as:

  • UP = UP period / Total period. This in numbers mean 18000/86400 =0.20833
  • UNKNOWN=UNKNOWN period +MISSING period / Total period. | this in numbers mean 21600/86400=0.25
  • DOWNTIME= DOWNTIME period / Total period. This in numbers mean 14400/86400=0.16667