Photo by Conner Baker on Unsplash

Operating large language models in-house

At Home

Article from ADMIN 88/2025

By Thomas Joos

An internal AI server is an interesting way to retain data sovereignty. We show you how to set up an in-house AI server on your hardware and use it in parallel with AI services such as ChatGPT in the cloud.

Operating your own artificial intelligence (AI) server in your data center offers a number of advantages over cloud services. One decisive factor is retaining complete control over sensitive company data, which will always remain on your network, which improves data security, and which helps you comply with strict data protection requirements, especially in highly regulated industries. Moreover, an in-house AI server enables consistent performance without dependencies on an Internet connection or external providers. Data processing latency is reduced, which is particularly beneficial for computationally intensive tasks such as image or speech analysis.

Another advantage is the ability to customize your hardware and software environments. You can scale and configure your servers individually to meet the specific requirements of your AI applications, without being restricted by standardized services from cloud providers. In the long term, an in-house server can also prove to be more cost efficient, because regular billing for cloud services is eliminated, and the infrastructure can be fully amortized. Being independent of price adjustments or service conditions imposed by external providers also gives you financial and operational peace of mind.

Hardware Requirements

The equipment for your large language model (LLM) environment depends on the requirements and the number of users, but the choice of graphics processing unit (GPU) is crucial for AI workloads: GPUs such as the NVIDIA A100 or the newer H100 are the market leaders because they are specifically optimized for deep learning and machine learning. These GPUs support technologies such as tensor cores, which specialize in computing neural networks, and offer a massive speed boost in terms of training and inference.

The H100 is based on the Hopper architecture and offers significant performance gains with lower power consumption

...

Use Express-Checkout link below to read the full article (PDF).