How Persistent Memory Will Change Computing

Persistent Memory for Filesystems

You can also treat PM as storage [2], so you can use it as raw storage or create a filesystem on it. Using PM for storage is basically the same idea as using a RAM disk, with one very important difference: When the power is turned off, the data is not lost. To the operating system, the PM looks like conventional block storage, and a filesystem can be built using that block storage. The two most obvious ways to deploy a filesystem (storage) on PM in servers are to use it as local scratch space (local filesystem) or as part of a distributed filesystem (parallel or not).

In both cases, realize that you are taking very fast storage (almost as fast as DRAM) and layering software on top of it. This adds latency and reduces bandwidth into and out of the PM. I like to say that you are "sullying the hardware with software," but to create a usable storage system, you are forced to do this. Ideally, you want some sort of filesystem that is very lightweight and imposes little effect on performance.

An important point to think about in this regard is POSIX. If it takes a somewhat large amount of software to allow a POSIX filesystem to use PM, resulting in much larger latencies, is it worth it? POSIX gives you compatibility and a common interface, so you can easily move applications from system to system. If the alternative is a specific library that has to be linked to your application that allows you to take advantage of the amazing performance, and you lose easy compatibility, forcing you to rewrite the I/O portions of your application, is it worth it? The ultimate answer is that it depends on your applications; however, think about the price you would be willing to pay in terms of performance to gain POSIX compatibility.

Using PM for local storage could be a big win for applications that write out local temporary files. Examples include out-of-core solvers that need to write temporary data to storage (e.g., finite element methods (FEM) or databases). Using PM as really fast local storage allows you to take existing applications and improve their performance quickly. However, you also have the option of using PM as extra memory and not using an out-of-core algorithm if the problem can fit into total memory, including PM. Of course, some problems won't fit into memory, requiring some sort of local storage.

You can always use the PM in servers to create a distributed filesystem. Immediately, parallel filesystems like Lustre, OrangeFS, or GPFS spring to mind. However, you need to consider carefully what you are about to do. If you use PM as part of a distributed filesystem and the server loses power, you don't lose any data, but you could lose access to the data. Unless the filesystem employs RAID or replication, you won't be able to access the data on the server.

Additionally, you have to consider moving data from the network interface to PM via the filesystem. With PM, you have extremely fast storage, and the I/O bottleneck might just have moved to the network interface somewhere else in the system. Moving the bottleneck is inevitable, but you need to be ready for it to appear somewhere in the system.

Performance

The performance of PM is always under discussion, particularly in the case of 3D XPoint, because it is so close to release. In this case, performance has always been discussed in general terms:

  • 1,000 times the performance of NAND flash
  • 1,000 times the endurance of NAND flash
  • 10 times the density of DRAM
  • A price between flash and DRAM

In addition to the DIMM form factor, Intel is going to release 3D XPoint under the Optane brand in the form of SSDs. Recently, Intel gave a demonstration of these SSDs at the OpenWorld conference, hosted by Oracle.

The results were summarized in an article online [3].

Brian Krzanich, CEO at Intel, talked about Optane and finally gave the world some performance numbers, although they are for the Optane SSD and not the Optane DIMMs. Krzanich was demoing on what looked like a 1U two-socket server from Oracle that had two Intel Xeon E5v3 "Haswell" processors. One processor used an Intel P3700 SSD and a prototype Optane SSD. Both drives were connected to the system using NVMe [4] links to improve performance. The size of the P3700 SSD was not given, but by referencing Newegg [5], the following details were found:

  • Capacity: 400GB
  • Price: $909 (as of the writing of this article)
  • Up to 450,000 4K random read IOPS
  • Up to 75,000 4K random write IOPS
  • Maximum sequential read: up to 2,700MBps
  • Maximum sequential write: up to 1,080MBps

Two benchmarks or tests were shown, although the details of the tests were not given. Both tests compared the IOPS and latency using the P3700 SSD versus the Optane SSD. The results are summarized in Table 2.

Table 2: Drive Benchmarks

Test 1
P3700 IOPS: 15,900
Optane IOPS: 70,300 (4.42x)
P3700 latency: 58µ
Optane latency: 9µ (6.44x)
Test 2
P3700 IOPS: 13,400
Optane IOPS: 95,600 (7.13x)
P3700 latency: 73µ
Optane latency: 9µ (8.11x)

Because the data path to/from the drives is the same (NVMe), the source of the differences is mostly in the drives themselves. The Optane drive is clearly much faster than the current P3700 SSD drive. Based on these results, I can't wait for the DIMM performance numbers to come out!