Moving your data – It's not always pretty

Moving Day

8. bbFTP

Although bbFTP [23] sounds like it's related to BBCP, it's really not. BBCP was developed at SLAC [24], and bbFTP was developed at IN2P3. [25]. bbFTP is something like FTP, but it uses its own transfer protocol optimized for large files (greater than 2GB). Like BBCP, it works with multiple streams and has compression and some security features. One version is even firewall and NAT friendly [26].

However, bbFTP does not appear to have a way to retain file attributes, including ownership, mode, timestamps, and xattr data.

It is just a simple FTP tool designed for high file transfer rates over FTP (and that's not a bad thing – it just might not be the best option for data migration).

9. GridFTP

Probably the most popular FTP toolkit for transferring data files between hosts is GridFTP [27], which is part of the Globus Toolkit and is designed for transferring data over a WAN. Recall that Globus Toolkit is designed for computing grids, which can comprise systems at distances from one another.

GridFTP was designed to be a standard way to move data across grids.Additionally GridFTP has some unique features that work well for moving data:

  • Security – Uses GSI to provide security and authentication.
  • Parallel and striped transfer – Improves performance by using multiple simultaneous TCP streams to transfer data.
  • Partial file transfer – Allows resumption of interrupted downloads, unlike normal FTP.
  • Fault tolerance and restart – Allows interrupted data flows to be restarted, even automatically.
  • Automatic TCP optimization – Adjusts the network window and buffer sizes to improve performance, reliability, or both.

Using GridFTP can be challenging because you have to use a number of pieces of Globus on both the old storage and the new storage. However, a version of GridFTP, GridFTP-Lite, replaces GSI with SSH.

This makes things a little easier because the security features might not be needed for data migration outside the data center, just within.

I have not tested GridFTP, so I'm not sure how well it would work for data migration. One concern I have is that if migration of attributes is important, including ownership, timestamps, and xattr data, then GridFTP might not be the best tool.

10. Aspera

Up to this point, I've focused on open source tools for data migration, but I think one commercial tool is worthy of mention. Aspera [28] has some very powerful software that many people have talked about using to transfer data. They use an algorithm called fasp to transfer data over TCP networks. I've talked to people who said they can transfer data faster than wire speeds, probably because of data compression. Overall, these people are very impressed with Aspera's performance.

Aspera offers synchronization [29] with their tools. According to the website, the tools can "preserve file attributes such as permissions, access times, ownership, etc." I don't know if this means xattr data as well, but at least they can deal with POSIX attributes.

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Moving Your Data – It’s Not Always Pleasant

    The world is swimming in data, and the pool is getting deeper at an alarming rate. At some point you will have to migrate data from one set of storage devices to another. Although it sounds easy, is it? We take a look at some tools that can help.

  • Google Cloud Storage for backups
    We compare Google Cloud Storage for Internet-based backups with Amazon S3.
  • Extended File Attributes

    One way to store metadata is with the originating file in extended file attributes.

  • Nine home clouds compared
    Dropbox was the first of a number of cloud service providers. However, only services that promise full control over your own data can give users a feeling of security. We provide an overview of nine cloud projects and two BitTorrent tools.
  • S3QL filesystem for cloud backups
    Many HPC sites with petabytes of data need some sort of backup solution. Among the many candidates, cloud storage is a serious contender. In this article, we look at one solution with some serious advantages: S3QL.
comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs



Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>
	</a>

<hr>		    
			</div>
		    		</div>

		<div class=