Payment systems are complex, typically involving multiple systems such as issuing bank, network, payment orchestrator, payment aggregator, payment gateway, and acquiring bank. Any of these systems facing outage leads to payment failure. Many of these systems are maturing and we see frequent outages.
Detecting an outage is not an easy task given the parties involved. Not many systems provide a ping API to check the health of a system. We have tried different models and adopted payment failure based outage detection.
The algorithm constantly monitors and learns the payment failures for each payment method. Outage systems maintain the score for each payment method in 0 to 1 range and with two stages: FLUCTUATE and DOWN.
When any particular payment method sees a high number of failures back-to-back and it crosses FLUCTUATE threshold, outage API starts showing that payment method with FLUCTUATE, and when it crosses DOWN threshold , outage API starts showing that payment method with DOWN.
Outage system keeps score at global level (across merchants) to consider high number of failures at global level and make outage decisions. This helps when at merchant level, we don’t have enough transactions to make an outage decision.
Global information - across merchants
Granular - detects bank-wise outages
Two stage outage - Fluctuate, Down
Near real-time - detects outages as early as it happens and accurately
Configurable - you can define fluctuate threshold and down threshold based on your business needs