Combining Directories on a Single Mountpoint with SSHFS-MUX

With some simple tuning, SSHFS performance is comparable to NFS almost across the board. In an effort to get even more performance from SSHFS, we examine SSHFS-MUX, which allows you to combine directories from multiple servers into a single mountpoint.

Hadoop for All

Hadoop 2.x and its associated tools promise to deliver big data solutions not just to the IT-heavy big players, but to anyone with unstructured data and the need for multidimensional data analysis.

SSHFS – Installation and Performance

Sharing data saves space, reduces data skew, and improves data management. We look at the SSHFS shared filesystem, put it through some performance tests, and show you how to tune it.

A Library for Many Jobs

The Joblib Python Library handles frequent problems – like parallelization, memorization, and saving and loading objects – in almost no time, giving programmers more freedom to push on with their core tasks.

Pandas: Data Analysis with Python

The Python Data Analysis Library, or Pandas, is built on top of the fast math library NumPy and makes analysis of large volumes of data an easy and efficient experience.

Monitoring HPC Systems: Process, Network, and Disk Metrics

In the continuing story of monitoring HPC systems, we look at code that measures process, network, and disk metrics.

Monitoring HPC Systems: Processor and Memory Metrics

One goal of HPC administration is effective monitoring of clusters. In this article, we talk about writing code that measures processor and memory metrics on each node.

The Lua Scripting Language

Is this powerful but simple scripting language big enough for Big Data?

Monitoring HPC Systems: What Should You Monitor?

In rapidly growing HPC installations, you need to understand what is happening within the system to make improvements or simply to justify the purchase.

Understanding I/O Patterns with strace, Part III

In the third article of this three-part series, we look at simple write examples in Python and track the output with strace to see how it affects I/O patterns and performance.