Accelerate Database Performance Troubleshooting with the Grafana Assistant Integration

Overview

When your database slows down, seconds matter. Grafana Cloud Database Observability already provides deep visibility into your SQL queries through RED metrics (Rate, Errors, Duration), execution samples, wait event breakdowns, table schemas, and visual explain plans. But visibility alone doesn't tell you what to do next. Seeing a query's P99 latency spike or cryptic wait events like wait/synch/mutex/innodb can leave you puzzled. The new Grafana Assistant Integration bridges that gap, coupling artificial intelligence with the rich observability data from Grafana Cloud. Instead of copying and pasting SQL into a separate AI tool, the assistant queries your actual Prometheus and Loki data sources within your current time window, using live table schemas, indexes, and execution plans. Each tab provides purpose-built analysis actions designed by database engineers, ensuring every recommendation is based on real data. Your query text and schema metadata are used only for the current analysis and are never stored or used for model training.

Accelerate Database Performance Troubleshooting with the Grafana Assistant Integration

Prerequisites

Before you start troubleshooting with the Grafana Assistant, ensure you have the following in place:

Grafana Cloud Account – An active Grafana Cloud stack with access to the Database Observability application.
Database Observability Enabled – Your database (e.g., MySQL, PostgreSQL, etc.) must be instrumented and connected to Grafana Cloud via the appropriate exporter or agent.
Prometheus and Loki Data Sources – Metrics and logs from your database must be flowing into Prometheus and Loki respectively.
Appropriate Permissions – Your user role must allow using the Assistant feature and querying the underlying data sources.
Web Browser – A modern browser (Chrome, Firefox, Edge) with JavaScript enabled.

Step-by-Step Instructions

1. Identify the Offending Query

Navigate to the Database Observability dashboard in Grafana Cloud. Look for queries with high duration spikes or error rate increases. These are often listed in the overview or in a dedicated “Top Queries” panel. Click on the query to drill into its time-series performance data.

2. Open the Grafana Assistant

Inside the query detail view, locate the Assistant panel. It typically appears as a sidebar or a floating icon. Click the button to open it. You’ll see a chat interface with several pre-defined analysis actions (buttons).

3. Use a Predefined Prompt

Instead of typing a generic question, click the button labeled “Why is this query slow?”. The assistant immediately executes a series of background queries against your Prometheus and Loki data sources within the same time window you are viewing. It synthesizes metrics like row examination ratios, P99 vs. median latency, CPU usage, and wait event percentages.

4. Interpret the Results

The assistant returns a concise health assessment. For example, it might report that “Duration is spiking because the number of rows examined is 50 times the number of rows returned – most work is wasted on filtering. P99 is 12x the median, indicating an intermittent problem. CPU time is healthy, but wait events consume 40% of execution time.” The assistant also translates cryptic wait event names into plain language, e.g., wait/synch/mutex/innodb becomes “InnoDB mutex contention – multiple threads are waiting for a shared resource.”

5. Take Action Based on Recommendations

Based on the analysis, the assistant may suggest specific actions:

Add an index to reduce rows examined.
Optimize the query to avoid table scans.
Increase buffer pool size to reduce I/O waits.
Review application pattern causing intermittent load.

You can also ask follow-up questions directly in the chat, such as “Show me the top wait events over the last hour” or “What indexes are available on this table?”

6. Validate Changes

After making changes, go back to the same query detail view and compare the performance metrics before and after. The assistant can help you verify by running the same analysis again.

Common Mistakes

Ignoring Wait Events – Many users focus only on execution time and ignore wait events. The assistant automatically highlights them, but skipping the analysis can lead to misdiagnosis.
Not Using Predefined Prompts – Typing free-form questions may yield generic answers. The predefined prompts are crafted by database engineers and target the most common issues.
Misinterpreting P99 vs. Median – A high P99 relative to median indicates sporadic problems. Some users mistakenly treat it as a constant issue.
Assuming CPU is Always the Bottleneck – Healthy CPU with high wait events points to contention or I/O, not compute. The assistant clarifies this, but users sometimes ignore the distinction.
Forgetting to Refresh Data – If you change the time window, click the assistant’s refresh button to get a new analysis based on the updated window.

Summary

The Grafana Assistant Integration for Database Observability transforms raw metrics and logs into actionable insights. By combining AI with live observability data, it eliminates guesswork and accelerates root cause analysis. Use the predefined prompts to quickly diagnose slow queries, understand cryptic wait events, and receive targeted recommendations – all without leaving your Grafana Cloud dashboard.