Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⭐ [Enhancement]: Improve Health Endpoint #2366

Open
JerryNixon opened this issue Sep 6, 2024 · 2 comments
Open

⭐ [Enhancement]: Improve Health Endpoint #2366

JerryNixon opened this issue Sep 6, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@JerryNixon
Copy link
Contributor

JerryNixon commented Sep 6, 2024

What is it?

  • Add configuration information to health endpoint
  • Add endpoint basics to health endpoint
  • Add thresholds to health endpoint

Health as a standard

There is no official industry standard for the health endpoint. /health or variations like /_health are common by convention. ASP.NET Core uses Microsoft.Extensions.Diagnostics.HealthChecks.

Useful for automation

For example, Azure App Service & Azure Kubernetes Service (AKS) support health probes to monitor the health of your application. If a service fails health checks, Azure can automatically restart it or redirect traffic to healthy instances.

Similarly, if Data API builder fails health checks in a way a customer deems past a threshold, they have the option to recycle the container or send an alert to direct engineers.

Term Description
Health Endpoint The URL (e.g., /health) exposed as JSON.
Check A specific diagnostic test (e.g., database, API).
Status The result of a check.
Status.Healthy The system is functioning correctly.
Status.Unhealthy The system has a critical failure or issue.
Status.Degraded The system is functioning, but with issues.

More on degraded

We might opt not to have degraded. But "Degraded" means the system is operational but not performing optimally. For example, for a database, the query duration might exceed a defined threshold.

if (QueryDuration > DurationThreshold) {
    Check.Status = "Degraded"; // Query taking too long, degrading performance
}

Overall health calculation

Healthy Unhealthy Degraded Global Status
- 0 0 Healthy
- ≥ 1 - Unhealthy
- 0 ≥ 1 Degraded

This logic shows how the global health status is determined:

  • Healthy: All checks are healthy.
  • Unhealthy: If any is unhealthy.
  • Degraded: No unhealthy checks, any degraded.

Output standard schema

Health check responses follow a common convention rather than a strict standard. The typical pattern involves a "checks" property for individual components' statuses (e.g., database, memory), with each status rolling up to an overall "status" at the top level.

Basic format

{
  "status": "Healthy",
  "checks": {
    "check-name": { "status": "Healthy" },
    "check-name": { "status": "Healthy" }
  }
}

Example

{
  "status": "Healthy",
  "checks": {
    "database": { "status": "Healthy" },
    "memory": { "status": "Healthy" }
  }
}

Other common fields

Fields like description, tags, data, and exception provide additional metadata.

1. Description:

A textual explanation of what the health check is doing or testing.

{
 "status": "Healthy",
 "description": "Checks database connection and query speed."
}

2. Tags:

Labels or categories that group or identify related health checks.

{
 "status": "Healthy",
 "tags": ["database", "critical"]
}

3. Data:

Any additional information collected during the health check, often technical metrics or diagnostics.

{
 "status": "Degraded",
 "data": {
   "responseTime": "250ms",
   "maxAllowedResponseTime": "100ms"
 }
}

4. Exception:

Information about any error or failure encountered during the health check.

{
 "status": "Unhealthy",
 "exception": "TimeoutException: Database query timed out."
}

Overall example

{
  "status": "Unhealthy",
  "created": "12/12/2000 12:00:00 UTC",
  "cache-ttl": 5,
  "checks": {
    "database": {
      "status": "Unhealthy",
      "description": "Checks if the database is responding within an acceptable timeframe.",
      "tags": ["database", "critical"],
      "data": {
        "responseTime": "500ms",
        "maxAllowedResponseTime": "100ms"
      },
      "exception": "TimeoutException: Database query timed out."
    }
  }
}

These fields help provide a more granular view of the health status, making it easier to understand why a particular check is failing or succeeding.

(Additive) Data API builder config

The standard allows for additive data, like DAB config data we could add.

{
    "status": "Healthy",
    "version": "1.2.10",
    "dab-configuration": {
      "http": true,
      "https": true,
      "rest": true,
      "graphql": true,
      "telemetry": true,
      "caching": true,
      "dab-configs": [
        "/App/dab-config.json (mssql)"
      ],
      "dab-schemas": [
        "/App/schema.json (mssql)"
      ]
    },
    ...
}

Simple implementation

There is no formal guidance on check complexity; however, checks should not make the health endpoint unusable, and checks should implement a cancellation token to support timeouts.

using Microsoft.Extensions.Diagnostics.HealthChecks;

var builder = WebApplication.CreateBuilder(args);

var healthChecks = builder.Services.AddHealthChecks();
healthChecks.AddCheck<CustomHealthCheck>("CustomCheck");

var app = builder.Build();
app.UseHttpsRedirection();
app.MapGet("/date", () => DateTime.Now.ToString());

app.UseHealthChecks("/health");

app.Run();

public class CustomHealthCheck() : IHealthCheck
{
    public async Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken cancellationToken = default)
        => HealthCheckResult.Healthy();
}

In addition to one class for each check, we can reuse checks by leveraging the factory syntax:

string[] endpoints = ["/api1", "/api2"];
foreach (var endpoint in endpoints)
{
    healthChecks.Add(new HealthCheckRegistration(
        name: endpoint,
        factory: _ => new EndpointHealthCheck(endpoint),
        failureStatus: HealthStatus.Unhealthy,
        tags: ["endpoint"],
        timeout: TimeSpan.FromSeconds(10)));
}

Example API check

public async Task<HealthCheckResult> CheckHealthAsync(
    HealthCheckContext context, 
    CancellationToken cancellationToken = default)
{
    var url = $"https://localhost:7128/{endpoint.Trim('/')}?$top=1";
    using var httpClient = new HttpClient();
    var response = await httpClient.GetAsync(url, cancellationToken);
    
    if (response.IsSuccessStatusCode)
    {
        return HealthCheckResult.Healthy();
    }

    return HealthCheckResult.Unhealthy($"Invalid HTTP response.");
}

Configuration changes

Because we have the configuration, we know if this is a stored procedure or table/view endpoint. We might want to allow the developer to influence how the checks work against the endpoint/entity.

{
  "runtime" : {
    "health" : {
      "enabled": true, (default: true)
      "cache-ttl": 5, (optional default: 5)
      "roles": ["sean", "jerry", "*"] (optional default: *)
    }
  }
}
{
  "data-source" : {
    "health" : {
      "moniker": "sqlserver", (optional default: GUID)
      "enabled": true, (default: true)
      "query": "SELECT TOP 1 1", (option)
      "threshold-ms": 100 (optional default: 10000)
    }
  }
}
{
  "<entity-name>": {
      "health": {
        "enabled": true, (default: true)
        "filter": "Id eq 1" (optional default: null),
        "first": 1 (optional default: 1),
        "threshold-ms": 100 (optional default: 10000)
      },
      ...
    },
  }
}

Output sample

{
  "status": "Healthy",
  "version": "1.2.3.4",
  "created": "12/12/2000 12:00:00 UTC",
  "dab-configuration": {
    "http": true,
    "https": true,
    "rest": true,
    "graphql": true,
    "telemetry": true,
    "caching": true,
    "health-cache-ttl": 5,
    "dab-configs": [
      "/App/dab-config.json (mssql)"
    ],
    "dab-schemas": [
      "/App/schema.json"
    ]
  },
  "checks": {
    "database": {
      "status": "Healthy",
    },
    "<entity-name>": {
      "status": "Healthy",
    },
    "<entity-name>": {
      "status": "Healthy",
    },
  }
}

Questions

  1. What to show in development versus production?
  • Not an issue, use Enabled globally.
  1. Should we introduce a formatted version? Like https://localhost/health/ui.
  • Not in the first effort.
  1. We should create a DAB health JSON schema! Yes!

Related issues to close

@JerryNixon JerryNixon added the enhancement New feature or request label Sep 6, 2024
@JerryNixon JerryNixon self-assigned this Sep 6, 2024
@seantleonard seantleonard added this to the October2024-March2025 milestone Sep 11, 2024
@JerryNixon JerryNixon pinned this issue Sep 26, 2024
@seantleonard
Copy link
Contributor

Another healthcheck example: DabHealthCheck.cs

internal class DabHealthCheck : IHealthCheck

@JerryNixon JerryNixon unpinned this issue Oct 3, 2024
@aaronpowell
Copy link
Contributor

It looks like https://github.com/Xabaril/AspNetCore.Diagnostics.HealthChecks has support across the four supported data sources for DAB, would it be easier to add those internally to surface up the health checks, or at least treat them as additive to DAB-specific ones?

Once there is some native health check info surfaced by DAB, I'd love to get it integrated in the .NET Aspire Community Toolkit integration (tracking via CommunityToolkit/Aspire#190).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants