Facebook Moves Data Center: How to Migrate 30 PB of Stored Data

By

When Facebook moved their servers to a new data center they had to migrate 30 PB of data.

At some point in spring 2011 Facebook just ran out of power and space in the data center, and it became clear they had to move over to a new one. The stored data had grown from 20 PB to 30 PB in just about one year, that's 30 million Gigabytes or 3000 times the the size of the Library of Congress, according to a post in Facebook's tech blog. The data is stored in the distributed HDFS filesystem of the Apache Hadoop clustering project.

Moving the physical nodes from one data center to another was not an option, because this would lead to a significant downtime, Facebook doesn't want to afford. Instead they opted for a stategy of replicating the data to the new data center continuously. First they copied all of the data over and then used a custom replication technology for keeping up to date. For copying they took advantage of Hadoop's own DistCp tool that makes use of a Map/Reduce algorithm to parallelize copying as much as possible.

Facebook's own replication tool had a plugin for the Hive software monitor the distributed filesystem for changes and record the changes in a logfile. On the basis of that logfile they replication software could synchronize changes to the new data center. At some point the Facebook engineers switched Hadoop's Jobtracker to stop the old filesystem from being modified. Then they changed DNS entries and started the Jobtracker in the new center.

More details on the migration can be found in the Facebook-Blog.

08/02/2011

Related content

  • The New Hadoop

    Hadoop version 2 expands Hadoop beyond MapReduce and opens the door to MPI applications operating on large parallel data stores.

  • Big data tools for midcaps and others
    Hadoop 2.x and its associated tools promise to deliver big data solutions not just to the IT-heavy big players, but to anyone with unstructured data and the need for multidimensional data analysis.
  • Hadoop for Small-to-Medium-Sized Businesses

    Hadoop 2.x and its associated tools promise to deliver big data solutions not just to the IT-heavy big players, but to anyone with unstructured data and the need for multidimensional data analysis.

  • Is Hadoop the New HPC?

    Hadoop has been growing clusters in data centers at a rapid pace. Is Hadoop the new corporate HPC?

  • Is Hadoop the new HPC?
    Hadoop has been growing clusters in data centers at a rapid pace. Is Hadoop the new corporate high-performance computing?
comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs



Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>
	</a>

<hr>		    
			</div>
		    		</div>

		<div class=