Introduced an exponential backoff mechanism with a configurable base, max delay, and jitter factor to handle retries for target failures. This improves resilience by reducing the load during repeated failures and avoids synchronized retry storms. Additionally, stale target cleanup logic has been implemented to prevent unnecessary state retention.
Replaced all inline error messages with the standardized `api_error` helper for consistent error response formatting. This improves clarity, maintainability, and ensures uniform error structures across the application. Updated logging for collector failures to include error class and switched to warning level for target unreachable scenarios.
Introduced `_failure_state` to track consecutive failures per target and `_failure_log_interval_seconds` to control logging frequency. Added logs for recovery when a target recovers and throttled error logs to reduce noise for recurring errors.
Introduced logic to check the existence of `pg_stat_checkpointer` and fetch corresponding statistics when available. This ensures compatibility with newer PostgreSQL versions while maintaining backward support using `pg_stat_bgwriter`.
This commit introduces a `use_pg_stat_statements` flag for targets, allowing users to enable or disable the use of `pg_stat_statements` for query insights. It includes database schema changes, backend logic, and UI updates to manage this setting in both creation and editing workflows.
This commit introduces alert management capabilities, including creating, updating, listing, and removing custom SQL-based alerts in the backend. It adds the necessary database migrations, API endpoints, and frontend pages to manage alerts, enabling users to define thresholds and monitor system health effectively.