Google Announces TensorStore for High-Performance Array Storage


TensorStore is an open source software library for storage and manipulation of large datasets. 

Google has announced TensorStore, an open source, C++ and Python library designed for reading and writing large multi-dimensional arrays.

Many contemporary computer science applications manipulate huge, multi-dimensional datasets, says Google. “In these settings, even a single dataset may require terabytes or petabytes of data storage. ​​Such datasets are also challenging to work with as users may read and write data at irregular intervals and varying scales, and are often interested in performing analyses using numerous machines working in parallel,” Google explains. 

TensorStore is an open source software library that, according to the website:

  • Provides a uniform API for reading and writing multiple array formats.
  • Natively supports multiple storage drivers, including Google Cloud Storage, local and network filesystems, in-memory storage.
  • Automatically takes advantage of multiple cores for encoding/decoding and performs multiple concurrent I/O operations to saturate network bandwidth.
  • Enables high-throughput access even to high-latency remote storage.

“Processing and analyzing large numerical datasets requires significant computational resources,” says Google, which “is typically achieved through parallelization across numerous CPU or accelerator cores spread across many machines.” Thus, according to Google, “a fundamental goal of TensorStore has been to enable parallel processing of individual datasets that is both safe (i.e., avoids corruption or inconsistencies arising from parallel access patterns) and high performance (i.e., reading and writing to TensorStore is not a bottleneck during computation).”


Related content

comments powered by Disqus