Darshan I/O analysis for Deep Learning frameworks

Looking and Seeing

Summary

A small amount of work has taken place in the past characterizing or understanding the I/O patterns of DL frameworks. In this article, Darshan, a widely accepted I/O characterization tool rooted in the HPC and MPI world, was used to examine the I/O pattern of TensorFlow running a simple model on the CIFAR-10 dataset.

Deep Learning frameworks that use the Python language for training the model open a large number of files as part of the Python and TensorFlow startup. Currently, Darshan can only accommodate 1,024 files. As a result, the Python directory had to be excluded from the analysis, which could be a good thing, allowing Darshan to focus more on the training. However, it also means that Darshan can't capture all of the I/O used in running the training script.

With the simple CIFAR-10 training script, not much I/O took place overall. The dataset isn't large, so it can fit in GPU memory. The overall runtime was dominated by compute time. The small amount of I/O that was performed was almost all write operations, probably writing the checkpoints after every epoch.

I tried larger problems, but reading the data, even if it fit into GPU memory, led to exceeding the current 1,024-file limit. However, the current version of Darshan has shown that it can be used for I/O characterization of DL frameworks, albeit for small problems.

The developers of Darshan are working on updates to break the 1,024-file limit. Although Python postprocessing exists, the developers are rapidly updating that capability. Both developments will greatly help the DL community in using Darshan.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs



Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>
	</a>

<hr>		    
			</div>
		    		</div>

		<div class=