# Arithmetic Artist

The free R programming language and software environment for statistical computing and graphics is well supported, has great flexibility, and is easily automated.

The R statistical programming language offers machine learning methods, dashboards, descriptive analyses, t -tests, cluster analyses, various regression methods, interactive graphs, and more. If you have ever wondered whether it would be worthwhile to immerse yourself in R, the following easy-to-grasp taster seeks to clarify who is likely to benefit.

#### Yesterday and Today

R [1] is based on the S programming language developed at Bell Laboratories in 1975-1976, which then split in the late 1980s and early 1990s into a commercial version named S-PLUS and the GNU project R. The name, which admittedly takes some getting used to, can be traced back to the first names of the developers, Ross Ihaka and Robert Gentleman, and alludes to the previous project name, as well.

R is often referred to as a programming environment, which is intended to emphasize the open package concept and to indicate that R differentiates itself from common monolithic statistical software. The basic functions are provided by eight packages included in the R source code. Additionally, many thousands of additional packages offer extensions.

In its early years, R was more of a niche player and was mainly used by statisticians and biometricians at universities. In the meantime, however, it has gained a firm place in the corporate world with the increasing entry of data science into many companies.

#### Syntax

An overview of the most important properties of its syntax facilitates any introduction to R. The R syntax is characterized by expressions and is case sensitive: An object named `modelFit` cannot be called as `modelfit`, for example. The assignment operator `<-` creates an object and points to the object that is assigned the content of an expression.

To assign the numbers from 1 to 5 to the vector `numbers`, use the following expression:

`> numbers <- c(1, 2, 3, 4, 5)`

The `c()` function – the `c` stands for "concatenate" – combines the individual elements listed in parentheses. An equals sign can be used as an alternative for assignments, in line with the standards of other programming languages. However, this practice is controversial in the R community. The assignment operator and equals sign also are not fully equivalent, because the latter can only be used at the top level.

Another member in the assignment operator group, `<<-`, is an extension of the assignment operator. Known as the superassignment operator, it can be used to assign values within functions in the global environment or to overwrite variables already defined there.

A special feature compared with common programming languages is indexing. To retrieve the first element of the `numbers` vector you would write `numbers[1]`, which would return both the element at the first position of `numbers` and explicitly the index of that element (`[1]`).

As you will see from the code examples, no semicolon or the like is used to complete an expression. R uses line breaks for this purpose.

The interpreter anticipates you: If the end of a line obviously does not complete the expression (e.g., because a bracket is missing or the expression ends with a comma), the interpreter assumes that the expression continues in the next line and prompts you for the completion with a plus sign:

```> rep(numbers,
+ each = 3)
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5```

Comments in code are indicated by a hash,

```# This is a comment
> names <- c("Anna", "Rudolf", "Edith",
+            "Jason", "Maria")```

and can occupy a line of its own or be added to the end of a line.

#### Data Structures

The most important data structures in R are vectors, matrices, lists, and data frames. Vectors are unidimensional data structures and the smallest possible building block, because R has no scalars. For this reason, they are also known as atomic vectors. For example, an object containing only one string or one number is treated by R as a vector with a length of 1 . R has five vector types:

• `logical`
• `integer`
• `double` (`numeric`)
• `character`
• `complex`

Vectors in R are characteristic, in that their elements all have to be of the same type. If you try to combine elements of different types (e.g., strings and numbers), you will not see an error message. Instead, R automatically converts all elements to the same class in a process known as coercion. The following example tries to create a vector from an integer, a logical constant, and a character. R automatically converts all elements to characters:

```> misc <- c(43, TRUE, "Hello")
> misc
[1] "43"    "TRUE"  "Hello"
> class(misc)
[1] "character"```

Data frames are table-like data structures in R and are used in almost every data analysis. Each column contains a vector; although all vectors have the same length, they can be of any type (Listing 1).

Listing 1

Data Frame

```> data.frame(numbers, names)
numbers  names
1         1  Anna
2         2  Rudolf
3         3  Edith
4         4  Jason
5         5  Maria```

If you add an additional element to the `names` vector and again try to create a data frame from `numbers` and `names`, an error occurs because `numbers` has a length of 5 and `names` has a length of 6 (Listing 2).

Listing 2

Faulty Frame

```> names[6] <- "Henry"
> data.frame(numbers, names)
Error in data.frame(numbers, names) :
Arguments imply different number of lines: 5, 6```

Strictly speaking, lists are also vectors, but they are recursive vectors. Any conceivable object can be components of a list, even lists themselves. The data structure `list` in R can be easily compared with a dictionary (`dict`) in Python or a structure (`struct`) in C. Lists in Python, on the other hand, are more similar to vectors in R, except R vectors can contain different data types.

Express-Checkout as PDF
Price \$2.95
(incl. VAT)

SINGLE ISSUES

SUBSCRIPTIONS

TABLET & SMARTPHONE APPS

UK / Australia

## Related content

• Security data analytics and visualization with R
Conduct improved security analysis and visualization of security-related data using R, a scripting language for statistical data manipulation and analysis.
• Statistics and machine learning with Weka
The open source Weka tool applies a wide variety of analysis methods to data without the need for advanced programming skills and without having to change environments.
• Data Analysis with R and Python
The statistical programming language R dissects its database in a masterful way, and you can embed your R in Python using the Rpy2 interface.
• Profiling Is the Key to Survival

Computing hardware is constantly changing, with new CPUs and accelerators, and the integration of both. How do you know which processors are right for your code?

• Profiling application resource usage
Computing hardware is constantly changing, with new CPUs and accelerators, and the integration of both. How do you know which processors are right for your code?