Observability and profiling middleware#799
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
8039edc to
e7f5ad5
Compare
|
e7f5ad5 to
4ce271c
Compare
|
This commit introduces `ProfileRequestMiddleware` to provide performance insights for API requests, migrating and generalizing functionality from AWX's `TimingMiddleware` and `AWXProfiler`. This makes the functionality available to any consumer of `django-ansible-base`. The middleware always adds an `X-API-Time` header to the response, indicating the total time taken to process the request. To help locate profiling data, it also adds an `X-API-Node` header with the cluster host ID if one is not already present in the response. When the `ANSIBLE_BASE_CPROFILE_REQUESTS` setting is enabled, the middleware will also perform a cProfile analysis for each request. The resulting `.prof` file is saved to a temporary directory on the node that served the request, and its path is returned in the `X-API-CProfile-File` response header. For the most accurate and reliable timing, it is recommended to add this middleware to the top of the `MIDDLEWARE` list in your settings. The core profiling logic is encapsulated in the `DABProfiler` class, which can be imported and used directly for profiling non-HTTP contexts, such as background tasks. This commit also includes a `README.md` file with documentation for the new middleware and profiler, including usage examples and tests in the test_app.
This commit refactors the SQL query profiling logic out of the AWX controller's API view and into a reusable middleware in the django-ansible-base library. This makes the feature available to all AAP components, such as the gateway and EDA. SQLProfilingMiddleware adds `X-API-Query-Count` and `X-API-Query-Time` headers to API responses. Changes include: - A new `SQLProfilingMiddleware` class in the profiling module. - Configuration is controlled by a new, namespaced setting, `ANSIBLE_BASE_SQL_PROFILING`, with a fallback to the legacy `SQL_DEBUG` setting for backward compatibility. - The middleware logs a warning if it is enabled while Django's `DEBUG` setting is `False`, as query logging is disabled in that state. - Unit tests have been added to ensure correct functionality and prevent regressions. - The profiling documentation has been updated to include usage instructions for the new middleware.
This enhances the `SQLProfilingMiddleware` to provide deeper observability by integrating it with a new request tracing context system. It introduces `TraceContextMiddleware`, which establishes a unique `trace_id` for each request using `contextvars`. This trace ID is then leveraged by the `SQLProfilingMiddleware` to inject a detailed comment, including the trace ID, route, and origin, into every SQL query. This allows for direct correlation between a web request and its database activity. Key changes include: - A new `TraceContextMiddleware` that sets and safely resets the request context. - The `SQLProfilingMiddleware` is updated to inject contextual SQL comments. - A new, isolated test suite for the `TraceContextMiddleware` to ensure it correctly handles the `X-Request-ID` header and prevents context bleeding between requests. - Existing tests were made more robust, and a state pollution issue was fixed by ensuring proper context cleanup in the test suite.
Refactors the existing `trace_context` manager to align its behavior with the `TraceContextMiddleware`, ensuring consistent and safe observability for non-web tasks. This commit addresses these issues by: - Modifying `trace_context` to only accept `origin` and `trace_id`. - Adding validation to ensure any provided `trace_id` is a valid UUID, matching the middleware's logic. - Removing the ability to create arbitrary context variables via `**kwargs`. Additionally, this change introduces: - Tests for the `trace_context` manager, covering its functionality, validation, thread-safety, and use as a decorator. - Documentation in the `README.md` with usage instructions and an example for background tasks.
675cb06 to
3ba596d
Compare
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
DVCS PR Check Results: Could not find JIRA key(s) in PR title, branch name, or commit messages |




What is being changed?
This commit introduces a ObservabilityMiddleware that provides request tracing, performance profiling, and secure SQL query commenting. This consists of three components (ProfileRequestMiddleware, SQLProfilingMiddleware, TraceContextMiddleware) to simplify testing and maintenance, with a single public middleware for apps to consume.
Why is this change needed?
The goal is to provide a consistent way for any django-ansible-base consumer to gain performance and debugging insights. Many of these features were previously implemented in AWX, but not available in other apps.
How does this change address the issue?
It creates a single middleware (ObservabilityMiddleware) that developers can add to the top of their MIDDLEWARE list. This ensures the correct ordering of tracing, profiling, and SQL commenting.
Type of Change
Self-Review Checklist
Testing Instructions
Prerequisites
1 MIDDLEWARE = [
2 'ansible_base.lib.middleware.observability.ObservabilityMiddleware',
3 ...
4 ]
ANSIBLE_BASE_PROFILE_TIMING=True
ANSIBLE_BASE_PROFILE_NODE=True
ANSIBLE_BASE_CPROFILE_REQUESTS=True
ANSIBLE_BASE_SQL_PROFILING=True
Note: ANSIBLE_BASE_CPROFILE_REQUESTS has significant performance implications and is intended for live debugging, not for permanent use in production.
To verify the SQL comments, you must have access to the database logs (e.g., by setting log_statement=all or enabling slow query logging in postgresql.conf).
The SQL commenting feature is most effective when combined with slow query logging, which can be configured to log only a percentage of requests to manage overhead.
Steps to Test
Setup aap-dev
https://github.com/ansible/aap-dev/pull/440
git clone git@github.com:kdelee/aap-dev.git cd aap-dev git checkout enable_observabilityConfigure Sources
export AAP_VERSION=2.7 make configure-sourcesWhen prompted, select:
kdelee/django-ansible-base@shared_middleware_with_trace_sqlkdelee/awx@add_dab_middleware_develkdelee/aap-gateway@add_dab_middlewarekdelee/eda-server@add_dab_middlewarehttps://github.com/ansible-automation-platform/aap-gateway/pull/1254
ansible/awx#16285
ansible/eda-server#1476
Test Profiling Disabled (Default)
Expected: All profiling headers absent.
Test Profiling Enabled
Expected: All profiling headers present, cProfile files collected, SQL queries tagged with trace_id.
Test Profiling Disabled Again
Expected: All profiling headers absent.
Additional Commands
# PostgreSQL commands make postgres-enable-logging make postgres-disable-logging make postgres-enable-pg-stat-statements make postgres-disable-pg-stat-statements make postgres-enable-observability make postgres-disable-observabilityAdditional Context
This work is a generic implementation of the profiling method originally used by @chrismeyersfsu for diagnosing https://issues.redhat.com/browse/AAP-50642.
Both Controller and Dab had SOME of this, but this unifies the approach so all the components act the same.
Required Actions
Requires documentation updates
Requires downstream repository changes
kdelee/awx@add_dab_middleware_develkdelee/aap-gateway@add_dab_middlewarekdelee/eda-server@add_dab_middleware- would need similar changes to hub etc
Requires infrastructure/deployment changes
Requires coordination with other teams
Blocked by PR/MR: #XXX
replaces #794