-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[META] Query profiler support in Query Insights Dashboards #104
Comments
There are also several other possible visualizations we can use for the profiler output beside the flame graph. Let me try to explain them in the comment threads and let's discuss further from here! Note: I'll use the below fake OpenSearch profiler output to create the mock visualizations :) |
One possibility is to use Gantt chart to visualize the profiler results. Example Gantt chart: https://shybovycha.github.io/2020/08/02/gantt-chart-part2.html. The motivation behind using Gantt Chart is, the flame graph is okay for showing nested execution, but it’s not great for time-based analysis, which is what we actually care about when profiling query performance - we want to know how the search is flowing in different shards. If we can attach timestamp in each phase of the profiler output, we can structured them in a timeline narrative, in this case Gantt chart makes way more sense than a flame graph. The benefits are:
Here’s a rough idea of how it could look: In the above mock chart:
As for the implementation, we can use D3.js for rendering the Gantt chart. One potential challenge is we also need to enhance the profiler API to give us certain execution timestamps.
![]()
If everyone’s on board, we can work on a quick poc using real profiler data, and perform user studies with the POC. I really think this will improve how we analyze profiler results and make it easier to debug slow queries. |
HeatMap with TreeMap:If we want to focus on analyzing/comparing resource usage (i.e. CPU usage, Memory Usages) for different shards and potentially identify hotspot in a profiled query, we can also use heatmap with drill down supported by TreeMaps. With other visualizations, we can see execution times like in the flame graph (or the proposed Gantt chart), but we don’t have a great way to analyze resource consumption across shards (in a comparative way). A heatmap would make it super easy to spot shards that are consuming excessive resources. Here’s an mock example of what it could look like for the profiler (with CPU as the metric):
To drill down a shard, we can support click and zoom in for each cell into a treeMap like below: ![]() The benefits of using heatmap & treemap includes:
Again, to decide whether this is the ideal visualization, we need to do some user studies with a POC. |
@ansjcy We need to make an assumption that some of the operations are happening in sequence (e.g. can_match followed by query followed by fetch), while some operations ... like at a shard level are happening in parallel. This might not be true if a search request was run with custom value for We should probably look at what fields are relevant to graphing and the rest can probably be embedded in the hover tooltip or some other mechanism to bubble up the details on demand. |
Is your feature request related to a problem?
Currently, OpenSearch users lack native UI tooling to analyze query performance bottlenecks, as outlined in OpenSearch-Dashboards Issue #571. This gap forces developers to rely on the profiling API and manual log analysis when debugging slow queries. Also the absence of integrated profiling capabilities within Query Insights (metadata, historical similar bad queries, recommendation etc) prevents users from connecting top n queries with more real time granular execution details and "how to improve the profiled queries".
What solution would you like?
We propose a Profiling UI integrated with the Query Insights dashboard to create an end-to-end performance analysis experience. We also want to support displaying historical similar rogue queries in the profiler utilizing the Top N Queries data. Furthermore, we intend to bridge the gap between understanding an issue and knowing what to do to resolve it in the profiler - the profiler should surface actionable recommendations to users on how to resolve and improve problematic queries, such as rewriting queries, adjusting underlying index configurations, or enabling specific OpenSearch features.
More specifically, with the below mock profiler page,
we want to enable users to:
Deep Inspection on any queries
Context-Aware Navigation from top n queries page
One-click profiling from Top Queries list to the profiling page for further root-cause analysis.
Correlate profiled queries with existing similar queries in historical top n queries. (as shown in the "Similar Queries In Top N Historical Queries" in the above mock screen. Arguably, we should also provide visualizations for the comparisons as well, like time series heatmap, parallel coordinate plot for closly comparing different phases/dimensions etc, similar to the chart shown below)
Intelligent Recommendations
Subtasks
What alternatives have you considered?
External Profiling Tools
While third-party APM solutions exist, they require complex instrumentation and lack native integration with OpenSearch query metadata.
Do you have any additional context?
The text was updated successfully, but these errors were encountered: