
Lead Image © austler, 123RF.com
Kick-start your AI projects with Kubeflow
Quick Start
Artificial intelligence (AI) and machine learning have been on everyone's lips for some time now. The more accessible the technology becomes, the more developers will realize they need to develop algorithms that pervade the everyday lives of the masses and make life easier. However, interested developers initially face a challenge that has nothing to do with AI and language models: providing the technical infrastructure.
Kubeflow [1] is an open source machine learning platform that promises to offer all the tools you need for AI development, which means the components number well over 70, and they need to be rolled out and operated with the right configurations and in a correct and coordinated sequence. The enormity of this undertaking explains why more than a few developers feel they have been taken for a ride. Anyone who has never looked at Kubeflow will need more or less as much time to familiarize themselves with it as they would take to organize the required components themselves … and I've not even mentioned Kubernetes itself (see the "Taming the AI Infrastructure" box). Any administrator who has ever rolled out K8s knows that success is by no means guaranteed.
Taming the AI Infrastructure
Tools for building and training AI-based language models have been available for a long time. They often originate from the open source environment, which benefits from its proximity to academic circles and science when it comes to AI. A large number of the language models available today use open source scripting languages and libraries and are themselves published under free licenses.
In the context of machine-based learning, providing the technical infrastructure is no mean task. In fact, every AI model behaves like a complete program, to which components such as the material the model needs for training are added. The AI community has long since established methods and rules for developing compliant models and making them available to others.
The processes are similar to those in traditional software development. Continuous development and continuous delivery (CI/CD) play just as important a role as do APIs, which expose models to the outside world by standardized paths and, in turn, implies the use of pipelines (i.e., defined processes containing many steps), with the help of which a model can be developed and trained from the first evolutionary stage to completion. Git also plays a crucial role in version management. If AI models are available in program form, they can also be managed, edited, updated, and ultimately executed like any other program, which means a huge amount of prep work for developers who just want to experiment with and initiate research into AI.
As ever, the open source community knows exactly what to do. The fact that so many students have shifted their focus from traditional programming topics to state-of-the-art tasks such as AI and language models has prompted people to think about environments that are designed to make it easier to get started in AI development. Jupyter is an excellent, but unfortunately commercial, example, as are solutions such as TensorFlow (Figure 1).
In this article, I provide a hands-on guide on how to get Kubeflow up and running quickly with a freshly created AWS account. The installation uses Cognito for user management but does not connect to external components such as Amazon Simple Storage Service (S3), instead using the components already available in Kubeflow.
Kubeflow
Kubeflow is surfing the popular wave surrounding Linux containers (Figure 2). As the name suggests, the platform is based on Kubernetes, and the "flow" in the name at least indicates that it is about providing workflows. Kubeflow is particularly interesting from an AI perspective, in that the workflows and tools provided by the environment almost exclusively relate to AI workloads and the training of language models.

Kubeflow provides developers with all the tools and integration components they need to start working on AI models right away without having to worry about the underlying infrastructure. There is a catch, of course: Kubeflow is a monster. The Kubeflow pipelines (KFP) platform comprises a number of components that need to be rolled out in Kubernetes, just like a bone fide microarchitecture application. Besides legacy services such as etcd and Istio, you need tools such as a dedicated DNS service or a registry for container images.
Hyperscalers to the Rescue
Getting Kubernetes and Kubeflow to work together on your hardware turns out to be a laborious undertaking, not least because Kubeflow identifies more as a kind of software collection that can be rolled out and used in a variety of ways, but which end users should never see in its plain vanilla form.
Instead, the idea is for distributors to prepare Kubeflow, coordinate the individual components, and then distribute a ready-to-run package (e.g., with the Helm package manager). Some ready-made implementations are on the market, including deployKF (Figure 3), but an attempt to get it running on a local K8s cluster quickly shows that even if you do manage to get it to work, you will likely be too exhausted by the process to do anything with the final results.

With the enormity of this task in mind, hyperscalers have turned their attention to Kubeflow and are actively supporting it. Azure, AWS, and Google have the required hardware in place and ready-made distributions of Kubernetes in their portfolios that are perfectly tailored to their own platform. Because these organizations have enough manpower to prepare Kubeflow, it should come as little surprise that the Kubeflow source code contains a number of commits from AWS, Azure, and Google that contain tweaks here and there to get Kubeflow up and running quickly on the respective platforms.
If you have no previous experience with Kubeflow, though, you are still likely to fail. Although developers can avoid at least most of the Kubeflow complexity, some additional work from the interaction between Kubeflow and your choice of platform still needs to be done (e.g., user admin). Although Kubeflow offers user administration, it can also be easily replaced by identity and access management (IAM) in AWS thanks to a kind of plugin principle. Kubeflow users can then be managed directly by Amazon Cognito. If you are unfamiliar with the cornucopia of services on the target platform, you will probably lose your way in the maze of options and possibilities.
Moreover, Kubeflow offers several starting points for the connection of external services instead of integrated solutions. For example, Kubeflow uses MinIO as S3-style storage by default. For deployment on AWS, though, native AWS S3 would make more sense. Possibilities upon possibilities. Developers who just want to use Kubeflow will end up not seeing the wood for the trees.
A word of warning about AWS at this point: The final setup on AWS as described in this article will generate costs of around EUR60 per day. The price will vary depending on the region you use for the AWS setup, your currency, and the resource types. For example, if you want to go whole hog here and use expensive GPU instances for your AI workloads, you can easily exceed this price-per-day barrier.
If you want to experiment with Kubeflow, you can do so on AWS, and Kubeflow will also run perfectly in production operations on AWS. However, I would strongly recommended not running any test setups you do not need, because the AWS dollar meter will just keep ticking along. Moreover, I would strongly recommend defining a budget limit in the AWS billing tool so that you will at least see an email warning if you exceed certain amounts.
Preparations
Before you can even get started with Kubeflow, you need to make a few preparations in AWS [2]. In the following discussion, I assume you have a brand new AWS account with two-factor authentication in place and a stored payment method. Although the free trial allocation at AWS can also be used with Kubeflow, certain services cannot be used without a credit card, and Kubeflow requires some of these services: no credit card, no Kubeflow.
Once you are all warm and cosy in your new AWS account, it's time for the first real task. To access the Kubeflow instance to be built, Kubeflow needs to be accessible from the network. The deployment tools available for Kubeflow deployment on AWS assume that a separate domain is available for this purpose in Amazon Route 53 (i.e., the AWS DNS manager with an integrated load balancer), which is the only way the Kubeflow setup tools will later be able to create a virtual load balancer on AWS; it then also has a public IP address and is accessible by a defined hostname. In theory, a subdomain of an existing domain that is sub-delegated by DNS is all you need at this point.
However, because .cloud domains in particular are quite cheap to obtain, my experience is that it makes more sense to dedicate a separate domain to your own Kubeflow adventures and place it completely under the auspices of AWS. Kubeflow then creates a subdomain for this domain in Route 53 in the scope of the installation, which also sets up direct access to the load balancer. Importantly, if you configured your own domain for use in Route 53 or delegated a subdomain of an existing domain, you will need the ID of the domain in Route 53, which you can discover by opening the hosted domain there and clicking Hosted zones | View details . The required information can be found in the Hosted zone ID field (Figure 4).
Buy this article as PDF
(incl. VAT)