Photo Johannes Plenio on Unsplash

Photo Johannes Plenio on Unsplash

Storage protocols for block, file, and object storage

Evolutionary Theory

Article from ADMIN 63/2021
By
The future of flexible, performant, and highly available storage.

Current developments such as computational storage or storage class memory as future high-performance storage are receiving a great deal of attention, but you still must understand whether, and to what extent, block and file storage, storage area network (SAN), network-attached storage (NAS), object storage, or global clustered filesystems continue to provide the basis for the development of new technologies – especially to assess possible implications correctly for your own IT environment.

In the storage sector in particular, experts and manufacturers bandy about abbreviations and technical terms, and the momentum in the pace of development also seems undiminished. Above all, the speed at which innovations enter the market is surprising. On the other hand, the storage protocols that guarantee data access at the block or file level are extremely old, although they still provide the technological basis for being able to use data storage sensibly at all. At the same time, a number of new questions are emerging: Will there be a radical break in the transport layer at some point? Will we have to deal with more and more technology options that exist in parallel? To find an answer, an assessment of further development in storage protocols is helpful.

Block Storage as SAN Foundation

Classic block-level storage protocols are used for data storage on storage networks – typically Fibre Channel (FC) storage area network (SAN) – or cloud-based storage environments – typically Internet SCSI (iSCSI). Nothing works in the data center without block storage, but how does block storage itself work?

Storage data is divided into blocks that are assigned specific identifiers as separate units. The storage network then deposits the data blocks where it is most efficient for the respective application. When users subsequently request their data from a block storage system, the underlying storage system reassembles the blocks and presents them to the user and the application.

The blocks can be stored on different systems, and each block can be configured to work with different operating systems (e.g., Linux, Windows). Also, one block can be formatted for NFS and one for SMP. Block storage thus decouples data from user environments, adding a layer of abstraction. Because the information can be distributed flexibly across multiple environments with this method, multiple data access paths exist that allow users to retrieve data more quickly. In principle, however, this procedure is more complex than a relatively easy-to-configure NAS system.

Direct-attached storage (DAS) has some advantages but also has limitations depending on the application profile. Depending on the implementation, the advantages relate to reduced latency times by block-level access, uncomplicated operation, and relatively low costs because of limited management overhead. Disadvantages relate to limited scaling of capacity and performance, as well as limited application availability.

For a boost, a second host must be connected. The data availability on the JBOD (Just a Bunch of Disks) or array level can be improved by RAID. In addition to SCSI, SATA and serial-attached SCSI (SAS) are usually found as common protocols in the DAS environment. With DAS, the server always controls access to storage. Server-based storage is a growing trend, which you can observe in combination with non-volatile memory express (NVMe) flash, big data apps, NoSQL databases, in-memory computing, artificial intelligence (AI) applications, and software-defined storage.

The SAN, unlike DAS, is a specialist high-speed network that provides access to connected storage devices and their data from block-level storage. Current SAN implementations comprise servers and hosts, intelligent switches, and storage elements interconnected by specialized protocols such as Fibre Channel or SCSI. SANs can span multiple sites to improve business continuity for critical application environments.

A SAN uses virtualization to present storage to the connected server systems as if it were connected locally. A SAN array provides a consolidated storage resource pool, typically based on virtual LUNs, which are shared by multiple hosts in cluster environments.

SANs are mostly still based on the Fibre Channel protocol. However, Fibre Channel over Ethernet (FCoE) and convergence of storage and IP protocols over one connection are also options. With SANs, you can use gateways to move application data between different storage network technologies as needed.

Evergreen iSCSI, Newcomer NVMe

iSCSI executes the SCSI storage protocol over an Ethernet network connection with TCP. Mostly, iSCSI is used locally or in the private cloud environment for secondary block storage applications that are not very business critical. Really critical applications typically use robust and low-latency FC SANs that are consistently separated from the application network – or they already use NVMe for particularly performance-intensive I/O workload profiles, but more on this later.

High data integrity, low-latency transmission performance, and features such as buffer flow control enable Fibre Channel to define critical business objectives and consistently meet quality of service levels. The protocol is also suitable as an NVMe transport layer, supporting both SCSI and NVMe traffic on a fabric simultaneously. Existing Gen5 (16Gbps) and Gen6 (32Gbps) FC SANs can run FC NVMe over existing SAN fabrics with little change, because NVMe meets all specifications, according to the Fibre Channel Industry Association (FCIA).

The situation is different in the hyperscaling data centers of large cloud providers, which for cost reasons alone (standardization, capacities, etc.) are currently (still) relying on iSCSI block storage and Ethernet protocols with 25, 50, or 100Gbps, although NVMe is also becoming more attractive for more performance and new service offerings. In the context of software-defined infrastructures, Ethernet will remain the first choice for the foreseeable future in the breadth of all installations for reasons of standardization and cost.

In the highly specialized HPC environment, on the other hand, InfiniBand is often used on premises; it is significantly more powerful in terms of latency times and scalable throughput, but also costs more. Additionally, support for hypervisors and operating systems, as well as drivers and firmware, is limited. iSCSI as block-level storage runs most frequently over Ethernet with TCP but can also be set up over InfiniBand.

iSCSI runs on standard network cards or special host bus adapters, either with iSCSI extensions for remote direct memory access (iSER) or with the help of a TCP offload engine that has implemented not only the IP protocol but also parts of the SCSI protocol stack to accelerate data transfer. iSCSI workload support has been expanded with network adapters for I/O performance purposes with iSCSI (hardware) offload, TCP offload, or both engines. In the first case, the host bus adapter offloads all iSCSI initiator functions from the host CPU. In the second case, the adapter offloads TCP processing from the server kernel and CPU. The most important advantage of iSCSI in practice is that all common operating systems or hypervisor implementations and storage systems support it, which is currently not yet the case for NVMe over fabrics (NMVeOF).

NVMeOF and Block-Level Storage

As I/O protocols, NVMe and NVMeOF are significantly leaner than SCSI or iSCSI in terms of overhead and are therefore also faster. If significantly more performance in the form of lowest I/O latencies is required, NVMe is the optimized protocol for the server connection with native PCIe flash storage with DAS.

NVMeOF as a scalable network variant enables data to be transferred between hosts and flash storage over a storage network based on Ethernet (the corresponding protocols are called RoCE and iWARP), Fibre Channel, or InfiniBand (Table 1). Currently, as with iSER, NVMeOF Ethernet remote direct memory access (RDMA) end nodes can only interoperate with other NVMeOF Ethernet end nodes that support the same Ethernet RDMA transport. NVMeOF end nodes are not able to interoperate with iSCSI or iSER end nodes.

Table 1

Storage Protocol Performance Criteria

Protocol Latency Scalability Performance Distribution
Fibre Channel Low Yes High Common
RoCEv2* Very low Yes High Insignificant
iWARP Medium Yes Medium Insignificant
TCP High Yes Medium Sometimes (with iSCSI)
InfiniBand Very low Restricted High Rare
*RoCE, remote direct memory access over converged Ethernet.

NVMe(OF) eliminates SCSI as a protocol and has lower latencies than iSCSI. Although hard disk and SSD arrays often still use the common SCSI protocol, performance is dramatically improved without the SCSI overhead. For example, command queuing in SCSI supports only one queue for I/O commands, whereas NVMe allows up to 64,000. Each queue, in turn, can service up to 64,000 commands simultaneously. Additionally, NVMe simplifies commands on the basis of 13 specific NVMe command sets designed to meet the unique requirements of NVM devices.

NVMe latency was already about 200ms less than 12Gb SAS when the technology was introduced. Additionally, the more efficient instruction set made it possible to reduce CPU load by more than 50 percent compared with SCSI. The situation is similar for sequential reads and writes: Because of the high bandwidth, six to eight times higher I/O performance values can usually be achieved for sequential reads and writes compared with SATA SSDs.

Block storage based on NVMeOF can be implemented over Ethernet TCP/IP, Fibre Channel, Ethernet RDMA, or InfiniBand fabrics. The RDMA option provides the fastest performance, but all versions of NVMeOF are already faster than iSCSI, which is why flash storage vendors are increasingly starting to move to NVMeOF. Ultimately, it remains to be seen which technology options gain widespread acceptance over time. NVMeOF/RDMA that are still being developed are iWARP, InfiniBand, NVMeTCP, and RoCEv2 ("Rocky").

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus