Monitor and optimize Fibre Channel SAN performance

Tune Up

Watch Out for Multipathing

Standard operating system settings often lead to an imbalance in data traffic, so you will want to pay attention to Fibre Channel multipathing of servers, wherein only one of several connections are actively used. This imbalance then extends to the SAN and ultimately to the storage array. Potential performance bottlenecks occur far more frequently in such constellations. Modern storage systems today use active-active mode over all available controllers and ports. You will want to leverage these capabilities for the benefit of your environment.

Sometimes the use of vendor-specific multipathing drivers can be expedient. These drivers are typically slightly better suited to the capabilities of the storage array, have more specific options, and are often better suited for monitoring than standard operating system drivers. On the other hand, if you want to keep your servers regularly patched, a certain version maintenance and compatibility check overhead can be a result of such third-party software.

Optimizing Data Streams with QoS

Service providers who simultaneously support many different customers with many performance-hungry applications in their storage environments need to ensure that mission-critical applications are assigned the required storage performance in a stable manner at all times. An advantage for one application can be a disadvantage for another. A consistent quality of service (QoS) strategy allows for better planning of data streams and means that critical servers and applications can be prioritized from a performance perspective.

Vendors of storage systems, SAN components, or HBAs have different technical approaches to this problem, but they are not related. In no place here can the data flow be centrally controlled and regulated across all components. Moreover, most solutions do not make a clear distinction between normal operation and failure mode. For example, if performance problems occur within the SAN, the storage system stoically retains its prioritized settings, because it knows nothing about the problem.

Although initial approaches have been made for communication between HBAs and SAN components to act across the board, they only work with newer models and are only available for a few performance metrics. Special HBAs and their drivers support prioritization at the LUN level on the server. The drawback is that you have to set up each individual server, which can be a mammoth task with hundreds of physical servers – not to mention the effort of large-scale server virtualization.

Various options also exist for prioritizing I/Os for SAN components. Basically, the data stream could be directed through the SAN with the use of virtual fabrics or virtual SANs (e.g., to separate test and production systems or individual customers logically from each other). However, this method is not well suited for a more granular distribution of important applications, because the administrative overhead and technical limitations speak against it. For this purpose, it is possible to route servers through the SAN through specially prioritized zones in the data flow. In this way, the frames of high-priority zones receive the right of way and are preferred in the event of a bottleneck.

On the storage systems themselves, QoS functionalities have been established for some time and are therefore the most developed. Depending on the manufacturer or model, data throughput can be limited in terms of megabytes or I/O operations per second for individual LUNs, pools, or servers – or, in return, prioritized at the same level. Such functions require permanent performance monitoring, which is usually available under a free license with modern storage systems. Depending on the setting options, less prioritized data is then permanently throttled or only sent to the back of the queue if a bottleneck situation is looming on the horizon.

However, be aware that applications in a dynamic IT landscape lose priority during their life cycle and that you will have to adjust the settings associated with them time and time again. Whether you're prioritizing storage, SAN, or servers, you should always choose only one of these three levels at which you control the data stream; otherwise, you could easily lose track in the event of a performance problem.

Determining Critical Performance KPIs

The basis for the efficient provision of SAN capacities is good, permanent monitoring of all important SAN performance indicators. You should know your key performance indicators (KPIs) and document these values over a long period of time. Whether you work with vendor performance tools or with higher level central monitoring software that queries the available interfaces (e.g., SNMP, SMI-S, or REST API), defining KPIs for servers, SAN, and storage is decisive. On the server side, the response times or I/O wait times of the LUNs or disks are certainly an important factor, but the data throughput (MBps) for the connected HBAs also can be helpful.

Within the SAN you need to pay special attention to all ISL connections, because often a bottleneck in data throughput occurs, or, as described, buffer credits are missing. Alerts are also conceivable for all SAN ports when 80 or 90 percent of the maximum data throughput rate is reached, which you can use to monitor all HBAs and storage ports for this metric. However, you should be a little more conservative with the monitoring parameters and feel your way forward slowly. Experience has shown that approaching bottlenecks are often overlooked if too many alerts are regularly received and have to be checked manually.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs

Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>


		<div class=