File access optimization discovery and visualization

Fast and Lean

Optimization Classification

The classification process procedes as described earlier in the "Optimization Possibilities" section. Each function pair receives a classification category, as well as an additional information string about the reasons for the categorization.

The first check is whether a function call of a different file access occurs between the checked function pair. If the time-based, filtered, reduced dataframe as described before is empty, no call falls between the pair. In this case, the optimization receives category 1, meaning it can be easily optimized, with the additional information that no calls fall between.

If calls are found between the function pair, further checks have to be made. To check each case efficiently, the script extracts relevant information from the consecutive repeated calls. First, the script retrieves the name of the repeated function, which is the same for both calls in the pair. The only two possible function types here are write and read.

Next, the script determines the byte range inside the file in which both calls occur with the use of the information added earlier about the offset after each function call. The beginning of the affected byte range is the offset before the first function call, which corresponds to the read or write byte count of the first function subtracted from the offset after the first function. The desired information is extracted from the data of the function call. In turn, the end of the affected byte range is one byte before the offset after the second function call.

Lists defined in the script contain all possible function names, including POSIX and MPI function names, that exhibit the same behavior and are treated the same way to ensure that the script handles them equally.

To check how a function in between interferes with the function pair, the script needs the repeated function, as well as the start and end of the affected byte range for the function pair.

The previously retrieved list of calls between this pair is iterated, then the script checks each call for its effect on the optimization potential and classifies it accordingly. If one of the calls would prevent an optimization, the loop is broken and the optimization point receives a category 0 or 3 classification, with information given on why it is not possible or cannot be handled.

For each check, the script extracts the relevant details of the current call, including the called function and the range of the affected byte area. Called functions are of type write, read, or seek.

As before, the start of the byte range corresponds to the written or read bytes by the function subtracted from the offset after the function. In the case of a seek function, the start of the byte range is simply the offset after the function call.

The end of the byte range of the call between is one byte before the offset after the function call. If it is a seek function, the byte range is only one point, because a single seek function does not affect a range of bytes. Hence, the start and end of the byte range is the same in such cases.

The script can now compare this information with the details of the repeated functions and performs the checks efficiently.

If the repeated function is a read type, various cases can arise depending on the call in between. In the following cases an optimization is possible: when the call between is another read or a seek, no matter the byte range; when the call between is a write with its byte range ending before or starting behind the read byte range.

Therefore, optimizing repeated read functions is only impossible when the call between is a write interfering with the affected byte area of the read functions. On the other hand, if the repeated functions are of a write type, optimizations are possible when the call between is a read or seek happening in front of or after the writes.

In all other cases, optimizations are impossible because the write functions alter the content in their byte range. Any function between that overlaps with the written byte area prevents possible merging of the repeated writes, because it would lead to inconsistencies when merged.

The mergeable cases are classified category 2, along with information explaining why the call in between would not interfere with merging the repeated functions.

If a function happens to exist that the script cannot handle, it assigns an additional category 3 to it with the information of the unknown function for debugging reasons. This situation is made visible later in the visualization, allowing you to see that it is unclear whether an optimization is possible in these cases.

In any other case, the optimization point receives the default category 0, with information on the problem that prevents a merge.

Either way, the script adds the result as an entry into the earlier described dataframe containing the analyzed optimizations.

After the script analyzes all potential optimizations and the results are in the new dataframe, it writes the data into the InfluxDB instance and sets the category number as the field value of the created points, with all other information added as tags. The timestamp of each point is the mean timestamp of the analyzed function pair.

In the end, the analyzed optimization data is available in the InfluxDB instance, in addition to the offset-enriched file access data, which enables the dashboard to display clearly and effectively when and how optimizations are possible.

Running the Script

When tracing a program with libiotrace , the created data has a corresponding job name. The script only requires this job name to identify the data it needs to enrich and analyze. Additionally, it requires a name for the new measurement where the data will be stored in the InfluxDB instance.

Adjusting this information before running the script ensures that the desired data is analyzed for optimizations. The script performs the described procedure for each file access within the selected job to analyze potential optimizations then processes each file sequentially, writing all data about cursor position changes and the analyzed optimizations into the InfluxDB instance.

Apart from debugging and additional helpful information, the script also outputs a URL at the end. This URL redirects the user to the Grafana dashboard with all settings preselected according to the analyzed data. The URL automatically selects the variables about the InfluxDB bucket, the measurement containing the data, and the analyzed job name in the dashboard. Also, it sets the time frame for easy use of the dashboard in combination with the script.

Visualization with Grafana

The dashboard comprises two sections in different panels. The dashboard's main goal is to display the program's possible optimizations actively in an intuitive and easy-to-understand way by reducing the information shown to ensure you gain insights without feeling overwhelmed. By setting all the variables and the time frame, the script-generated URL immediately gives you the desired information.

Grafana's dashboard settings at the top are changeable in drop-down boxes, but you do not have to set them manually. This decision was important during the dashboard's design, to support the aim of simplifying its usage. The dashboard is designed to be interactive so that you can adjust the variables. Some panels are clickable, which automatically sets specific variables to show desired information, as described in the following sections.

Flux queries dynamically retrieve the available values for each variable:

import "influxdata/influxdb/schema"
schema.tagValues(
  bucket: "${bucket}",
  tag: "_measurement",
  predicate: (r) => true,
  start: -1000d
)

This way, the variables automatically include all possible options of the current context without requiring manual updates. Also, some variables and their available options depend on other, sometimes higher level, variables.

The panels display data according to the variable values selected, and as mentioned earlier, you can modify the variables interactively in the panels.

« Previous 1 2 3 4 5 6 Next »