Chatbots put to the scripting test
Beginner's Mistake
A large language model (LLM) uses what is known as a transformer architecture, hence the name "generative pretrained transformer" (GPT). Put simply, an LLM represents each word used in a sentence as a series of mathematical vectors. The trained model comprises several layers of transformer encoders and decoders that put the vector groups of words from a sentence or word block in a suitable context.
As the model learns, it creates encoders that suggest a connection between the vectors of terms (e.g., "ship" and "water" or "helicopter" and "flying"). During the learning phase, the LLM tries to build correct sentences. It compares these with the correct answers given by the trainer. The difference between the LLM's response and the correct response is transferred to the transformer layers and optimizes the LLM's function. A generic LLM therefore becomes proficient in one or more languages with which it has been trained and has knowledge of the data used in the training.
For example, a model such as ChatGPT 3.5 was trained with 175 billion parameters – its state of knowledge dates back to September 2021. GPT-4 already has more than a trillion parameters. After completing the basic training, an LLM can be improved with further levels of knowledge. These difference models are known as a low-rank adaptation of large language models (LoRAs). For example, a LoRA can help a generic language model learn how Python programming works without having to retrain the entire LLM.
This mode of operation also shows the weaknesses of LLMs: They are not creative and do not generate any new information. They only use existing knowledge and reformulate it to match the question. Of course, they have a massive amount of information that no single person has at their disposal, but the LLM is again limited to the information with which it was trained.
Model Weaknesses
The limitations of the training data are the first concrete weakness of an LLM. ChatGPT 3.5, for example, was trained with a database from 2021 and cannot answer questions about later events. If you use ChatGPT 3.5 to create Python code, for example, you will be given snippets that are compatible with Python 3.8. The LLM does not take into account the changes in Python 3.11 from October 2022. Although this might not be a major issue with Python, it has a far more serious effect on languages such as Ansible, which has undergone many changes between versions 2.9 and 2.15.
Another problem regarding the future is that, at the moment, vast quantities of new text are being created by LLMs with their outdated knowledge. However, the text is not flagged as being LLM-based and derivative of others' work but is being sold as original content. Some of this content contains outright lies, because it is derived from hoaxes, such as the fake World of Warcraft feature perpetuated by a few fans [1]. As amusing as this story is, it also means that this false information will end up in the training data of future LLMs. After all, no one can manually sift through and qualify a billion parameters before they approve them for use in training. Future generic LLMs will therefore be worse rather than better. Although the quantity of training data is growing rapidly, its quality is continuing to decline, meaning that the quality of future LLMs will likely to be worse. The real future of LLMs instead lies with small models that are trained with a well-qualified and fairly private database.
An often overlooked but serious weak point relating to the tools used by LLMs is that vector mathematics should always provide the exact same answer to a question – namely, the one with the highest probability of coming back from the transformers. Imagine if image-generating models such as DALL-E, Midjourney, or Stable Diffusion always provided the user exactly the same image when requesting an astronaut riding a donkey. The technology would be too precise and therefore boring. Therefore, all image and text models add a seed to the user questions. A seed is nothing more than a large random number, which means ChatGPT processes the request supplemented by one roll of the dice in the transformers to produce a response. However, in the case of generating program code, you would want the mathematically most probable result, not one that is co-determined by a random generator.
Generating Code with LLMs
A number of LLMs are available to generate code free of charge. I start with ChatGPT [2], followed by IBM watsonx Code Assistant for Red Hat Ansible [3] and the self-hosted OoobaBooga [4]. Watson Code Assistant is currently only available free of charge for Ansible until March 19, 2024 (Ansible Lightspeed), but the other two options deliver program code in PowerShell, Python, Bash, or Perl.
The free version of ChatGPT used for this article means I was restricted to version 3.5. Anyone working with ChatGPT must always bear in mind that OpenAI stores all user input and uses it to train future LLM generations, so make sure you don't feed the chatbot confidential information.
In the first test, ChatGPT created a Bash script that listed all RPM packages installed on a Linux distribution for which updates were available. This task was quite simple, but I wanted output that listed the package name, the currently installed version, and the available version. A human-programmed script for this task might look like Listing 1.
Listing 1
List RPM Packages
#!/bin/bash printf "%-40s %-20s %-20s\n" "Package Name" "Update" "Installed" dnf list updates | while read pn pv pc; do if [ "$pn" = "Last" ] || [ "$pn" = "Available" ]; then continue fi pv2=$(rpm -q --qf '%{VERSION}\n' $pn ) IFS='-' read -ra pv3 < "$pv" printf "%-40s %-20s %-20s\n" "$pn" "$pv3" "$pv2" done
In the script, dnf list updates
returns the available updates and shows the name of the package (pn
), the available package version (pv
), and the repositories (pc
, which is not needed). Because dnf
does not reveal the currently installed package version, the script uses rpm -q
, followed by a cosmetic clean-up of the available version (pv3
), which removes the name of the distribution from the update version number. The first two output lines without package names from the dnf list updates
line are filtered out by the if
statement.
Because hardly any administrator needs the detailed version number function, no demo scripts exist for this special case on the Internet and therefore no ready-made training data for a model. When asked to Write a bash script that lists RPM packets that can be updated; the list must contain the name of the packet, the currently installed version number, and the version number of the available update
, ChatGPT did not provide a working script: The bot suggested the use of updates=$(dnf check-update)
to determine the packages that could be updated. So far that suggestion is correct, but ChatGPT claimed that the command output would display the package name and the installed and available version in three columns.
This statement is not true: The second column contained the available version and the third the name of the repository. In response to the complaint (This does not work. $(echo "$line" | awk '{print $3}') returns the name of the packet repository, but not the available version ), ChatGPT responded (I apologize for the oversight. You're correct; the 'dnf check-update' output includes repository information. Here's an updated script that correctly extracts the available version ) and then returned another script that still didn't work, because it still split up the output line in the wrong way (Figure 1).
In the next step, ChatGPT received the working, human-generated script for analysis and provided a very precise explanation, so I asked ChatGPT if it could improve this script, to which the bot responded: The script you provided is functional, but it can be simplified and improved in terms of readability and efficiency. Here's a revised version of the script
. The script dispensed with the while
loop and instead suggested a somewhat more complex awk
call. In principle, this solution is a good idea, but the script did not work because it did not implement the Awk syntax correctly and proposed some fairly dubious variable filters.
In the next attempt, I wanted the chatbot to create the desired functionality in Ansible instead of Bash. This attempt also failed – quite badly. The very first task of the playbook it offered was fatally flawed:
tasks: - name: Update package cache package: name: "*" state: latest become: yes
If the ansible.builtin.package
module starts with name: "*"
and state: latest
, it does not update the package cache as claimed, but updates all updateable packages directly. The rest of the playbook is then useless. If Ansible executes command: dnf list updates
after this unintentional full update, there is no response because all of the installed packages are already up to date.
The other ChatGPT suggestions were not helpful either. Other tasks in the playbook used modules such as command:
or set_fact:
, which experienced Ansible programmers tend to avoid wherever possible. Unfortunately, the Internet is full of bad examples of Ansible programming, which means the suggestions from the Internet-trained LLM are unlikely to provide particularly good code.
As a last attempt, I asked ChatGPT to generate some PowerShell code: Create a PowerShell script for Windows that queries a username and password in a graphical dialog using forms. With that information, the script will create a Windows user. You can find a whole series of examples of this query online. Logically, ChatGPT came up with a working script for this task that looked more or less like the demo scripts from various websites. However, any administrator could have found this code with the use of a search engine, without resorting to an AI bot.
The bottom line on ChatGPT is that if you want to program something that other users have created and published before September 2021, feel free to ask ChatGPT. However, you could also use a search engine to find suitable sources, and you would better be able to judge the trustworthiness of these sources. Be more cautious with special requests and, above all, check the code suggested by the chatbot in detail.
Code with Ansible Lightspeed
The aptly named IBM Code Assistant, a suite of generative AI-assisted products, provides several purpose-built models that, unlike a generic LLM, have been trained for specific tasks. Thankfully, IBM indicates with the Code Assistant name that it supports programmers and does not generate code with the aim of replacing people. A paid version is intended to help programmers rewrite the outdated Cobol programs on mainframes in Java, among other things.
The watsonx Code Assistant for Ansible is better known as Ansible Lightspeed and is currently available to all users free of charge until March 19, 2024. However, the manufacturer has not yet decided whether and for how long this picture will remain the same. The tool is part of the Ansible plugin for the Visual Studio Code editor. You only have to link a GitHub account with the Ansible plugin to use it. As with ChatGPT, the user of the free version has to agree that their code can be used by the manufacturer to improve the model. The same applies to Lightspeed. Only use it for code that does not contain personal or confidential information.
Lightspeed only supports Ansible as a language, but unlike ChatGPT 3.5, it has up-to-date knowledge of Ansible 2.14. Users do not integrate Lightspeed by sending a chat request. Instead, as an Ansible developer, you write your code and Lightspeed suggests code blocks on the basis of code already in use and the name of the tasks. The results are fairly mediocre, especially at the beginning of an Ansible playbook, because Lightspeed has to guess what the user wants to do from the name of the first task. The more code you write, the better the Code Assistant suggestions become; it then also inserts previously declared variables in the correct places.
Lightspeed did not provide any working code for my example off the cuff, but if you then continue to write a significant part of the automation yourself in your playbook, you will start to receive correct suggestions for further tasks. These suggestions then match your existing code and follow the syntax of the current Ansible version.
As an assistant, Lightspeed supports users creating longer playbooks or roles. Tasks are suggested that match the existing playbook programming and the variables used there. However, do not expect watsonx to conjure up complete playbooks out of a hat as an LLM in the style of ChatGPT tries to do.
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.