Editor's Note: FlameScope is Netflix's open source performance visualization tool for analysis of variance, perturbation, single-thread execution, application startup, and other time-based issues that will greatly simplify the workflow and make the workload more intuitive. This article was translated from MEDIUM Central Plains titled"Netflix FlameScope"Articles.
FlameScope is visualized in the form of a flame map. It first displays the input data as an interactive sub-second excursion heat map and then focuses on different patterns based on time range selection and generates a flame map for that time range. In other words, you can choose an arbitrary continuous time slice to capture the configuration file and treat it as a flame graphic.
Sub-second deviation heat map
The flame map shows the entire configuration file for one minute at a time. This is very convenient for analyzing stable workloads, but there are usually small disturbances within a minute of the display. It is tantamount to catching these subtle changes in a haystack. Flame Scope, by shifting the heat maps from sub-seconds, can easily solve these problems caused by micro disturbances. After that, users can continue to choose to use flame maps to learn. In other words, the user can select an arbitrary continuous time slice to select the configuration file and visualize it as a flame map.
The sub-second offset heat map works as shown in the figure. It has a simulated ten-line heat map where the x-axis represents time and each unit represents one second. The y-axis is also a time variable that represents sub-second offsets. The change in color represents the number of samples in the time range, the darker and denser the color.
For example, a time stamp of 11.25 seconds for an event whose x coordinate is the 11th column and the y coordinate will be the bottom quarter. The more events that occur around 11.25 seconds, the darker the color of that part will be.
Select range example
The following is an example of a selection range.
This CPU configuration file has a lot of content. The CPU is busy between 0 and 5 seconds and appears darker. It is also busy around 34 and 94 seconds (like a 60-second periodic task), but its duration is longer. short. There are occasional bursts of activity that occur within about 80 milliseconds, all of which present with short dark red stripes.
All of these details can be selected in FlameScope and then the range is drawn as a flame map. The following image is a short red stripe selection map.
FlameScope can read the Linux perf configuration file. I have collated some of the information collected in the previous survey, which also shows some changes that I do not know.
Of course not all configuration files are interesting. Some look like static TVs, with a random request for stable workload and delay.
FlameScope was created by the Netflix Cloud Performance Team. The main founders so far are Vadim Filanovsky, I, Martin Spier, and our manager, Ed Hunter, who has been supporting the project. The initial problem with the project was the failure of microservices, with spikes occurring every 15 minutes or so, and the reason was unknown. Vadim found that this was accompanied by an increase in CPU utilization that lasted only a few seconds. He tried to collect a CPU flame chart to further explain this, but could not reliably capture a one-minute flame map to explain the problem, because its operation continued to fluctuate; at the same time, capturing two or three minutes of flame map did not help , And this problem was "overwhelmed" in the normal workload profile, so he turned to me for help.
Since I have a two minute file, I cut it into a 10-second range and create a flame map for each range. This method seems very useful, because it reveals the changes, so I cut them further to a second window. Browsing these short windows to find and solve problems is actually a daunting task, so I want a faster method. Since the sub-second shift heat map was a tool I invented many years ago, there has not been much use so far. So I suddenly realized that this might be a good way to browse the configuration file, allowing the changes not only to be visualized in the entire second, but also to display the score in one second.
I immediately made a quick model to prove the validity of this idea and discussed it with Martin to turn it into a real tool. The annotated heat map in this article shows Vadim's original configuration file. The main problem is focused on the CPU's activity in the first few seconds. Martin completed most of the architectural design and coding work for FlameScope, including his latest version of FlameGraphs:d3-flame-graph and a d3-based heatmap: d3-heatmap2.
We plan to apply more features and we will issue tickets to github to make it easier for anyone willing to help. In the future, the project will cover more interactive features such as palette selection and data conversion. There will also be a button to save the final flame map as an independent SVG to support the use of other profile sources, not only Only Linux perf. At the same time we are also working on finding a way to show the difference between the selected range and the baseline (the entire profile).