Efficiently planning and expanding the capacities of a cloud

Cloud Nine

Working with Percentiles

Percentiles are much more helpful in capacity planning than the golden mean. For example, the 99th percentile of the time required to process an API request is much more meaningful than an average value, because it tells you that 99 percent of all requests ran at a particular speed or better. Monitoring solutions such as Prometheus and InfluxDB offer a function to calculate the percentiles of a dataset.

Strictly speaking, the median is also a percentile: the 50th percentile of the total frequency distribution. Percentiles work like this: First, you create a table of all values for a specific event. For example, if you are interested in the time required to process an API request and you have 100 measured values for this parameter, you would create a table with all 100 values in ascending order. If you want the 90th percentile, you take the first 90 percent of the entries and, of those, the highest value. That's the 90th percentile. This works analogously with all percentiles. Accordingly, you can tune the platform such that this value improves.

By the way, Grafana [3] is recommended for displaying the values from Prometheus and InfluxDB; Grafana can run queries directly on the database and thus also use the built-in percentile function (Figure 2).

Figure 2: Grafana is the tool of choice for visualizing data, including percentiles.

Indirectly, percentiles also play a role in capacity planning. On the basis of the percentiles for certain factors (e.g., the average bandwidth available to a VM, the time taken to process API calls, or available RAM), you can see the extent to which the cloud is already running at full capacity. Everyone can define their own minimum and maximum values; however, it has proved advantageous to have at least 25 percent of CPU and RAM capacity in reserve to serve major customers.

Achieving Hyperscalability

The third and final aspect of capacity planning is how well a data center can scale. From the administrator's point of view, it does not help if you have recognized in good time that you need more hardware but are then unable to make the hardware available within an acceptable period of time.

Adding resources to a cloud is usually a multiphase process. Phase 0 is the formulated need (i.e., determining that more capacity is necessary and to what extent). Next, you determine the hardware: On the basis of the questions that play a role in capacity management, you then define which servers you need and in what configurations before ordering them. Even here you can expect considerable delays, especially in larger companies, because when it comes to hundreds of thousands or millions of dollars, pounds, or euros, long approval chains mostly have to be observed.

As the admin, you only get back into the driver's seat after the desired hardware has arrived. At this point, it must be clear which server is to be installed where and in which rack so that the responsible people at the data center can start installing the servers and the corresponding cabling immediately.

Of course, topics such as redundancy and the availability of electricity also play an important role. Such a setup can be planned and designed particularly efficiently if you have a combined tool from Data Center Inventory Management (DCIM) and IP Address Management (IPAM) at your disposal: From this central source of knowledge, data center personnel can extract all the information they need when they need it. NetBox [4] is an example of a solution that fits this requirement very well. In fact, it was developed for exactly this purpose (Figure 3).

Figure 3: NetBox can be the link between hardware and automation in scale-out setups as DCIM and IPAM components.

Essential Automation

After the installation of additional hardware in the rack, it's time to install the operating system and cloud solution on the new systems. This process shows how clearly a conventional setup differs from a modern cloud: In the old world, the operating system was usually installed manually, because the effort was nothing compared with the effort that developing a completely automated installation routine would have meant.

In the cloud, however, this approach obviously no longer works. Extending a platform can easily mean 300 or 400 new servers at the same time, so a manual installation would be pointless and extremely tedious. In clouds, therefore, the goal is to press an On button, and all subsequent work is completely automated. Ultimately, the requirement is that the new server automatically becomes a member of the existing cloud network, and the cloud solution you use automatically takes care of supplying it with new VMs for customers.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs

Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.