NX-10x: Reliability, error handling, runtime UX hardening, and migration safety gate (NX-101, NX-102, NX-103, NX-104) #32

Merged
nessi merged 7 commits from development into main 2026-02-14 15:28:44 +00:00

7 Commits

Author SHA1 Message Date
6de3100615 [NX-104 Issue] Filter out restrict/unrestrict lines in schema comparison.
All checks were successful
Migration Safety / Alembic upgrade/downgrade safety (pull_request) Successful in 22s
PostgreSQL Compatibility Matrix / PG14 smoke (pull_request) Successful in 7s
PostgreSQL Compatibility Matrix / PG15 smoke (pull_request) Successful in 7s
PostgreSQL Compatibility Matrix / PG16 smoke (pull_request) Successful in 8s
PostgreSQL Compatibility Matrix / PG17 smoke (pull_request) Successful in 7s
PostgreSQL Compatibility Matrix / PG18 smoke (pull_request) Successful in 7s
Updated the pg_dump commands in the migration-safety workflow to use `sed` for removing restrict/unrestrict lines. This ensures consistent schema comparison by ignoring irrelevant metadata.
2026-02-14 16:23:05 +01:00
cbe1cf26fa [NX-104 Issue] Add migration safety CI workflow
Some checks failed
Migration Safety / Alembic upgrade/downgrade safety (pull_request) Failing after 30s
PostgreSQL Compatibility Matrix / PG14 smoke (pull_request) Successful in 9s
PostgreSQL Compatibility Matrix / PG15 smoke (pull_request) Successful in 7s
PostgreSQL Compatibility Matrix / PG16 smoke (pull_request) Successful in 7s
PostgreSQL Compatibility Matrix / PG17 smoke (pull_request) Successful in 8s
PostgreSQL Compatibility Matrix / PG18 smoke (pull_request) Successful in 7s
Introduces a GitHub Actions workflow to ensure Alembic migrations are safe and reversible. The workflow validates schema consistency by testing upgrade and downgrade operations and comparing schemas before and after the roundtrip.
2026-02-14 16:07:36 +01:00
5c566cd90d [NX-103 Issue] Add offline state handling for unreachable targets
Introduced a mechanism to detect and handle when a target is unreachable, including a detailed offline state message with host and port information. Updated the UI to display a card notifying users of the target's offline status and styled the card accordingly in CSS.
2026-02-14 15:58:22 +01:00
1ad237d750 Optimize collector loop to account for actual execution time.
Previously, the loop did not consider the time spent on `collect_once`, potentially causing delays. By adjusting the sleep duration dynamically, the poll interval remains consistent as intended.
2026-02-14 15:50:31 +01:00
d9dfde1c87 [NX-102 Issue] Add exponential backoff with jitter for retry logic
Introduced an exponential backoff mechanism with a configurable base, max delay, and jitter factor to handle retries for target failures. This improves resilience by reducing the load during repeated failures and avoids synchronized retry storms. Additionally, stale target cleanup logic has been implemented to prevent unnecessary state retention.
2026-02-14 11:44:49 +01:00
117710cc0a [NX-101 Issue] Refactor error handling to use consistent API error format
Replaced all inline error messages with the standardized `api_error` helper for consistent error response formatting. This improves clarity, maintainability, and ensures uniform error structures across the application. Updated logging for collector failures to include error class and switched to warning level for target unreachable scenarios.
2026-02-14 11:30:56 +01:00
9aecbea68b Add consistent API error handling and documentation
Introduced standardized error response formats for API errors, including middleware for consistent request IDs and exception handlers. Updated the frontend to parse and process these error responses, and documented the error format in the README for reference.
2026-02-13 17:30:05 +01:00