Best practices for building and running Docker and Singularity containers.

Building and Running Containers

Containers are a great solution for the problem of reproducibility, repeatability, portability, resource isolation, and archiving. They free users from being tied to a specific distribution, so they can build and use the distribution they know and use. In the previous article, I discussed general best practices by focusing on “high-level” techniques for containers. In this article, I dive deeper into specific best practices for building and running Docker and Singularity containers.

Building Containers

Building containers is not too difficult, especially if you use HPC Container Maker (HPCCM), which I discussed in the last article. However, as with many things, the devil is in the details.

Docker

As mentioned previously, Docker was not originally designed for HPC containers, but rather for people developing tools, applications, and libraries to share with each other and to run on their laptops. Assuming they were root, access was a given. If you are root, you can do anything.

Docker is still important to HPC because it was the first container. Used heavily in artificial intelligence, deep learning, and machine learning, it is even used for classic HPC applications.

To begin, I assume you have Docker or Docker Desktop installed on the system on which you want to build your container – which also assumes that you have root access to the system. The next step is that Docker requires a docker group to be on the system. If you want to build or run containers, you need to be part of that group. Adding someone to an existing group is not difficult:

$ sudo usermod -a -G docker layton

Chris Hoffman wrote an article with more details on how to add an existing user to a group.

Next, create your Dockerfile, which defines the image you want to create (again, using HPCCM). When Docker creates an image, it will look for a file named dockerfile, so I recommend creating a directory for each container you want to build and putting that file and the HPCCM recipe in that directory.

The next step is to build the image with the minimal command:

$ sudo docker build -t name:tag .

The first part of the command, docker build, tells Docker to create the image from the local dockerfile. The -t option lets you add a “tag” to Docker images to help identify them. These tags are extremely important because they label the image. Think of it as a file name: A good file name prevents you from searching every file to find the data you want. A viewable label for an image comprises a base name followed by a semicolon followed by a tag.

The base name should be something very familiar to anyone using your container. For example, it could be centos-7.5, ubuntu-19.10, ubuntu-19.10-tensorflow-2.0, or almost anything your want (but watch special characters and blanks). It doesn’t have to have the complete version in the name, but it should be recognizable and describe the base OS.

The tag allows you to be much more descriptive or to call out the name of the application(s) in the image. For example, the tag can be used for defining the version (e.g., 2.0), or a “purpose” for the container (e.g., data-science). Putting a date in the tag (e.g., data-science-0212020) is also a good practice. Combining all of these practices can create a long tag (e.g., centos-7.6:tensorflow-2.0-0212020-Layton), but such tags can be very useful. The previous example informs you that the distribution in the image is CentOS 7.6 and the image has TensorFlow 2.0, was created on February 12, 2020, and was created by user Layton. Overall, the important point is to make the tag useful so someone could look at it and have a good idea what the image is about.

Although the name:tag combination seems long, it contains a fair amount of information, and you could even add more information if you like, such as moving the main application to the name and using the tag for other information (e.g., other applications, build time, the origin of the container, etc.).

On the other hand, don’t go crazy. Here are a few best practices to follow for the name:tag:

  • Don’t make it impossibly long.
  • Don’t make it cryptic so no one else understands it.
  • If you add special characters or phrases indicating the image has been “certified” in some way, be sure to inform the users.
  • Don’t make it too short, or it will not be descriptive enough and is therefore useless.
  • Personally, I like putting dates and creator initials in the name:tag, but not everyone does.
  • Be consistent when creating your name:tag.

Take a look around at image names on your own Dockerhub, the public Dockerhub, or one at work. See what people have (or haven’t) done with name:tag. The keys are to make it simple yet descriptive and make it consistent.

The docker build command has many options you can use to control building the image, but I don’t use too many of them, and I don’t know of others who do, either. The only two I’ve really used are:

  • -f – allows you to use a file with a name other than dockerfile
  • --squash – allows you to squash newly built layers into a new single layer

The --squash command is still marked as experimental, although I think it’s been experimental for as long as Docker as been around. I’ll talk about it later in the article.

The very last option on the example docker build command line is a dot (.), which tells Docker to put the image in the current directory – a good practice when you are first learning Docker or when you first build an image.