I went to a summer camp for people whose Python environments went pear-shaped. Here is my class report.

(Re)Installing Python

This article is a first for me. Although I’ve written a few “how-to” articles, this is really my first “my summer at camp” article. My Python installation had recently stopped working and some of my code was not running. I tried re-installing certain packages that didn’t seem to be working (no change in behavior). I tried erasing certain packages that didn’t seemed to be working and re-install them. That didn’t work either, so this situation seemed to warrant a complete erasure and re-install.

My initial install was by Anaconda and was installed it into ./anaconda3. I did use pip to install some packages for which Anaconda had no packages. To start, I erased the ~/anaconda3 and ~/.conda directories,

rm -rf ~/anaconda3
rm -rf ~/.conda

and edited my .bashrc directory to remove the Anaconda bits that I allowed the Anaconda installer to add.

At this point I was ready to start re-installation. I wanted to stay with Anaconda as much as possible. I’ve used it for many years and I'm comfortable with it.

Around 2012, the Anaconda Python distribution was created to be an easy-to-use distribution for business and data users, which was up and coming at the time. As part of this distribution, the Python package manager conda was created. It quickly became popular in the Python community, although it is not the only manager, and I am not getting into Python package management discussions (wars). Use what you like. I've been using Anaconda and conda for some years, so that is what I used when re-installing Python and the packages.

I’ve used Python virtual environments off and on. I find them somewhat useful, but I’m the only person using the workstation, so I know exactly what is installed and how it was installed. From that perspective, I don’t see the usefulness of virtual environments. However, they are very useful when installing new packages if there’s a chance they might corrupt the current installation – which is probably what happened to me.

For re-installing Python as my summer camp project, I chose to install almost everything I could into the “base” environment. The primary reason is that many of the packages are used in multiple projects of mine. I don’t want to have to search the list of Python environments to find the one I want for a particular project. This arrangement can be even more interesting because I use tabs on my terminal window, so a specific tab might not have activated the Python environment I want. Although people will disagree with me, I wanted to keep all my packages of choice in the base environment.

My Required Python Packages

To get started, I made a list of all packages I needed (Table 1). This list, although long, is not everything I wanted to install over time, but it captures more than 90%. These packages will serve as my base Python environment.

Table 1: Packages to Install

   
scipy tabulate
blas pyfiglet
matplotlib termcolor
pymp mpi4py
cudatoolkit (for other packages) pandas
numba pytorch (with GPUs)
cupy torchvisonpytorch-lightningtorchdatatorchmetrics,torchaudio
tensorflow (with GPUs) swin-transformer-pytorch
tensorflow-addonstensoflow-iotensorflow-datasets scikit-learn
jupyterlab scikit-image
h5py  

I Arrive at Summer Camp

My first try was just to use Anaconda to install everything. To be sure that TensorFlow, PyTorch, and CuPy all used GPUs properly, I used simple check commands:

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
python3 -c "import torch; print(torch.cuda.is_available())"
python3 -c "import cupy as cp; x_gpu = cp.array([1, 2, 3]); print(x_gpu.device)"

If the output for each check run lists at least one GPU, you know that TensorFlow, PyTorch, and CuPy are using a GPU or GPUs.

I wrote a simple Bash script that installed each of the packages, after which I ran the tests. None of the tests passed. Sigh. I moved around the order of packages, thinking it might be the order of installation. I also checked Anaconda to make sure I was using the latest version and to see what channel I should use with conda. None of these worked because the tests failed.

I narrowed down the problem to installing – you guessed it – TensorFlow, PyTorch, and CuPy. Some literature gave the “best” methods for installing TensorFlow and PyTorch that involved pip. I am not an expert on Python installation tools and methods, but I have read that conda from Anaconda and pip are not mutually exclusive. Below are some highlights for both pip and conda:

Pip

  • Is a Python package manager.
  • Installs Python software packages as wheels or source distributions.
  • Might need compilers and libraries to build and install packages, and has a much larger repository than conda.
  • Relies on other packages for virtual environments (virtualenv)

Conda

  • Is a system package manager.
  • Installs, runs, and updates packages and their dependencies and can work for any language.
  • Doesn’t require compilers.
  • Can create virtual environments.

I Survived My First Month

After realizing I needed pip to install TensorFlow, PyTorch, and CuPy, I modified the script so that it now looked like Listing 1. I removed my previous Python installation and re-installed Anaconda. Then I ran the script. Everything seemed to go fine, but when I ran the PyTorch GPU test, it said it didn’t support GPUs (the output was False). I reinstalled the section of the script for PyTorch, and afterward I ran the PyTorch GPU check, which returned False, meaning no GPU support.

Listing 1: Installation Script with conda

conda install -c conda-forge -y cudatoolkit=11.8.0
 
# Tensorflow:
conda install -c conda-forge -y cudatoolkit=11.8.0
python3 -m pip install nvidia-cudnn-cu11==8.6.0.163 tensorflow==2.12.*
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
pip install tensorflow-addons
conda install -c conda-forge -y tensorflow-datasets
pip install tensorflow-io
 
# PyTorch
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
python3 -c "import torch; print(torch.cuda.is_available())"
conda install -c conda-forge -y pytorch-lightning
conda install -c conda-forge -y torchdata
conda install -c conda-forge -y torchmetrics
 
# Cupy:
pip install cupy
python3 -c "import cupy as cp; x_gpu = cp.array([1, 2, 3]); print(x_gpu.device)"

Next, I decided to try something different. Instead of using conda to install the extra TensorFlow and PyTorch packages, I decided to use pip. The script now looked like Listing 2.

Listing 2: Installation Script with pip

conda install -c conda-forge -y cudatoolkit=11.8.0
 
# Tensorflow:
conda install -c conda-forge -y cudatoolkit=11.8.0
python3 -m pip install nvidia-cudnn-cu11==8.6.0.163 tensorflow==2.12.*
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
pip install tensorflow-addons
pip install tensorflow-datasets
pip install tensorflow-io
 
# PyTorch
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install pytorch-lightning
pip install torchdata
pip install torchmetrics
 
# Cupy:
pip install cupy
 
# Final tests
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
python3 -c "import torch; print(torch.cuda.is_available())"
python3 -c "import cupy as cp; x_gpu = cp.array([1, 2, 3]); print(x_gpu.device)"

After removing the Python installation, reinstalling it, and rerunning the script, it worked!! Evidently when I used conda to install some packages, it broke PyTorch. I didn’t chase down exactly which package it was, I just stuck with pip for installing.

The final script I created is shown in Listing 3. The installation worked perfectly as far as my testing could tell. Tensorflow, PyTorch, and CuPy ran my few test scripts correctly and used GPUs. I ran a larger piece of code that uses the other packages, and it ran correctly.

Listing 3: Installation Script with pip and conda

conda install -c conda-forge -y cudatoolkit=11.8.0
 
# Tensorflow:
conda install -c conda-forge -y cudatoolkit=11.8.0
python3 -m pip install nvidia-cudnn-cu11==8.6.0.163 tensorflow==2.12.*
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
pip install tensorflow-addons
pip install tensorflow-datasets
pip install tensorflow-io
 
# PyTorch
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install pytorch-lightning
pip install torchdata
pip install torchmetrics
 
# Cupy:
pip install cupy
 
# PyMP
pip install pymp
 
# Other packages using conda
conda install -c anaconda -y scipy
conda install -c conda-forge -y matplotlib
conda install -c conda-forge -y time
conda install -c conda-forge -y tabulate
conda install -c conda-forge -y pyfiglet
conda install -c conda-forge -y termcolor
conda install -c conda-forge -y numba
conda install -c conda-forge -y mpi4py
conda install -c conda-forge -y jupyterlab
conda install -c anaconda -y h5py
conda install -c conda-forge -y python-grahviz
conda install -c anaconda -y pandas
conda install -c anaconda -y dask
conda install -c conda-forge -y markdown
conda install -c anaconda -y ncurses
conda install -c anaconda -y scikit-learn
conda install -c anaconda -y scikit-image
 
# Final tests
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
python3 -c "import torch; print(torch.cuda.is_available())"
python3 -c "import cupy as cp; x_gpu = cp.array([1, 2, 3]); print(x_gpu.device)"

Summary and Moral of the Story

I’ve used Python for quite a bit of my coding since 2008, and it worked well for me. As I changed and updated Python environments, I was able to run all the code in my projects. I was a happy camper. Then, either through my carelessness or some conflict, my Python environment became corrupted. This led to my summer camp adventure, which has truly been an experience.

During camp, I learned the following:

  • Python installations can be fraught with danger. Document. Test (lots).
  • If you install a major package with conda or pip, stay with pip for all add-ons.
  • When you're walking on eggshells, don't hop.

Following these steps has allowed me to get to the point where the Python environment can run all my code correctly. However, I'm not finished with summer camp. One package I tried adding to the list was Horovod. I really need this package for some tangential things I’m doing, but trying to install has been an utter and complete disaster.

I tried pip to install Horovod in a virtual environment, but I always got an error about the inability to create the wheel for Horovod. I then tried as pure an Anaconda environment as I could, and it too failed with either the same or a slightly different error message. I decided to follow the Horovod instructions for building it in an Anaconda environment. This failed miserably as well with the same error.

I appealed for help on the Horovod community site with really no results. (They are nice people, though.) I also tried the Anaconda community site with the same level of help. (People are very nice here, too.) At this point I have given up on installing Horovod utterly and completely. The documentation is sorely out of date, and I can’t find any updated articles or blogs about how to build it. The time has come to find a better tool to extend TensorFlow and PyTorch code to multiple GPUs, I hope with message passing interface (MPI).

As I finish writing my summer at camp report, I have concluded that the Python ecosystem is terribly broken. None of the methods for creating and installing it (e.g., Anaconda, Pip, Poetry, etc.) seem capable of addressing my needs. In reading Twitter, I’m not the only one who feels this way. It is just a hot mess with little hope of it being fixed in the near term. I’m extremely sad about this situation because I believe it’s going to start pushing people away from Python to something else, creating more confusion while diluting Python. It is what it is, as my camp counselor told me.