In R, vectors are fundamental data structures that play a central role in organizing and storing information effectively. They are primarily categorized into two types: atomic vectors and generic vectors (lists). In this chapter, we focus on atomic vectors, which consist of a single sequence of elements of the same data type—logical, integer, double, or character—and explore their properties and operations.
7.1 Introduction to vectors in R
One of the most central concepts in R are the vectors. Vectors are broadly categorized into two types: atomic vectors and generic vectors (lists) (FIGURE 7.1).
FIGURE 7.1 Data structures in R. Atomic and generic vectors.
Atomic vectors must consist of elements of the same basic data type (e.g., numeric, characters). In contrast, lists can contain elements of varying data types (e.g., some elements may be numeric, while others may be characters).
The R language supports various data structures for organizing and storing information. In the following chapters, we will explore more complex structures, such as matrices, arrays, and data frames. Each of these structures serves a specific purpose and can differ in the type of data it holds and its level of complexity. These data structures are schematically illustrated in FIGURE 7.1.
7.2 Atomic vectors in R
The most fundamental data structure in R is the atomic vector atomic vector, which consists of a single sequence of elements of the same data type. Each element within the vector is uniquely identified by its position within this sequence.
Types of atomic vectors
There are four primary types of atomic vectors (also known as ``atomic” classes):
logical
integer
double
character (which may contain strings)
Integer and double vectors are collectively known as numeric vectors. There are also two rare types, complex and raw, which we will not cover in this textbook.
Let’s begin by understanding one-element vectors, the simplest form of atomic vectors in R. After that, we will explore longer atomic vectors to gain insight into their properties and practical applications.
7.2.1 One-element atomic vectors
Individual logical values, numbers (also known as scalars), or characters are atomic vectors of length one. Therefore, a one-element vector (oev) represents a single value that can be used as the building block to construct more complex objects (longer vectors). The following examples demonstrate one-element vectors for each of the four primary data types, arranged from the most specific to the most general.
7.2.1.1Logical one-element vector
Logical values, also known as Boolean values, are represented as TRUE or FALSE. While they can be abbreviated to T or F, this practice is generally not recommended. Examples of logical one-element vectors (oev) are as follows:
oev_a<-TRUE# assign the logical TRUE to an object named oev_aoev_a# call the object with its name
[1] TRUE
oev_b<-FALSE# assign the logical FALSE to an object named oev_boev_b# call the object with its name
[1] FALSE
7.2.1.2Integer one-element vector
Although numbers such as 1 or 2 may appear in the console, R may internally store them as 1.00 or 2.00. To explicitly specify integer values in R, we must append an “L” suffix, as demonstrated in the following examples:
oev_c<-3Loev_c
[1] 3
oev_d<-100Loev_d
[1] 100
7.2.1.3Double one-element vector
Doubles, which represent real numbers, can be expressed either in decimal form (e.g., 0.000017) or in e-notation (e.g., 1.7e-05).
oev_decimal<-0.000017oev_decimal
[1] 1.7e-05
oev_scientific<-1.7e-05oev_scientific
[1] 1.7e-05
7.2.1.4Character one-element vector
One-element vectors can also be character values—that is, single characters or entire strings of text. In R, characters can be defined using either single '' or double "" quotation marks. Internally, however, R stores all strings using double quotes, even if they were originally created with single quotes.
oev_e<-"a"# a character enclosed in double quotation marksoev_e
[1] "a"
oev_f<-'I love data analysis'# a string of text enclosed in single quotation marksoev_f
[1] "I love data analysis"
It is important to understand that R treats numeric and character vectors differently. For example, while basic arithmetic operations can be performed on numeric vectors, they are not valid for character vectors. Attempting to apply numeric operations, such as addition, to character vectors will result in an error, as shown below:
h<-"1"# "1" is stored as a character vectork<-"2"# "2" is stored as a character vectorh+k
Error in h + k : non-numeric argument to binary operator
The error message indicates that we are attempting to apply numeric operations to character objects, “1” and “2”, which is not valid. To resolve this, the characters need to be converted to numeric values before any operations can be applied.
Single values (one-element vectors) are rarely the focus of an R session. Next, we are going to discuss about “longer” atomic vectors.
7.2.2 Longer atomic vectors
Atomic vectors typically contain more than one element. The elements of a vector are ordered and must all be of the same data type. Common examples of “long” atomic vectors include numeric (whole numbers and fractions), logical (e.g., TRUE or FALSE), and character (e.g., letters or words). Let’s explore how to create “long” atomic vectors and highlight key properties through examples.
7.2.2.1The colon operator :
The colon operator : generates a sequence of consecutive numbers increasing or decreasing by 1. For example:
1:5
[1] 1 2 3 4 5
Here, the colon operator : takes two integers, 1 and 5, and returns an atomic vector of integers starting at 1 and ending at 5, incremented by 1.
We can assign the atomic vector to an object named x_seq as follows:
x_seq<-1:5
To access the vector, simply refer to its name:
x_seq
[1] 1 2 3 4 5
We can determine the type of a vector using the typeof() function:
-3:4# sequence from negative to positive integer numbers
[1] -3 -2 -1 0 1 2 3 4
7.2.2.2The function seq()
We have already explored the seq() function in Chapter @ref(rfunctions), where “seq” stands for sequence. By default, it generates vectors of consecutive numeric values. For example:
seq(1, 5)# generates a sequence from 1 to 5 with a default step of 1
[1] 1 2 3 4 5
7.2.2.3The c() function
We can create atomic vectors manually using the c() function (short for concatenate), which combines values into a single vector and is one of the most commonly used functions in R. For example, to create a numeric vector with the values 2, 4.5, and -1, we type:
Note that an atomic vector is an object and can be an element within another vector. For example:
y_seq<-3:7c(y_seq, 2, 4.5, -1)# y_seq object is an element of a vector
[1] 3.0 4.0 5.0 6.0 7.0 2.0 4.5 -1.0
7.2.2.4Repeating vectors
The rep() function in R provides a convenient way to repeat either an entire vector or individual elements within it. Below are some examples:
A. Repeating the entire vector
rep(1:4, times =5)# repeat the entire vector 5 timesrep(c(0, 4, 7), times =3)# repeat the entire vector 3 timesrep(c("a", "b", "c"), times =2)# repeat the entire vector 2 times
rep(1:4, each =5)# repeat each element 5 timesrep(c(0, 4, 7), each =3)# repeat each element 3 timesrep(c("a", "b", "c"), each =2)# repeat each element 2 times
We will use some of these built-in vectors in the examples that follow.
7.3 Mixing things in a vector - Coercion
7.3.1 Implicit coercion
Implicit coercion in R refers to the automatic conversion of data from one type to another when required for an operation or function. This feature enables R to handle mixed data types flexibly.
For example, R assumes that all elements in an atomic vector are of the same data type – such as all numbers, all characters, or all logical elements. Let’s create a “mixed” vector:
Since the vector contains a mix of numeric, character, and logical values, R coerces all elements to a common data type. In this case, all elements are converted to characters. As a result, my_vector contains 1, 4, hello and TRUE as character elements.
When a numeric value is added to a logical vector, R automatically coerces all the elements in the vector to numeric (double) values. Logical values are converted to numbers as follows: FALSE becomes 0 and TRUE becomes 1.
7.3.2 Explicit coercion
Explicit coercion explicit coercion refers to the process of intentionally converting data from one type to another using specific conversion functions provided by R. For instance, we can convert numeric values into characters using the as.character() function. Let’s create a numeric vector f, containing the numbers 1 through 5, and then convert it into a character vector g:
This function is particularly useful in practice, as many datasets—especially those from CSV files, web scraping, or spreadsheets—may contain numeric data in character format. This often occurs when numbers are enclosed in quotes or contain commas (e.g., “1,000” instead of 1000). To perform calculations, data analysis, or any operation requiring numeric data types, we need to convert these character representations into actual numeric data.
Now, suppose we have a vector named q containing the values “O”, “1”, “2”, “3”, “d”, “5” as characters and we want to convert them to numbers using the as.numeric() function:
When we apply as.numeric() to the character vector, R successfully converts the characters "1", "2", "3"and "5" into their corresponding numeric values 1, 2, 3 and 5. However, R encounters an issue with the characters "O" and "d". As a result, we receive a warning that NAs were introduced by coercion, indicating that the character elements “O” and “d” were converted to missing values (NA).
Moreover, when coercion is not possible or meaningful, R typically displays a warning and converts all elements to NAs. For example:
R supports basic arithmetic operations between vectors using standard operators such as +, -, *, /, ^.
7.4.1.1Arithmetic operations between a scalar and a long vector
When a scalar (a single numeric value) interacts with a vector, the operation is applied between the scalar and each element of the vector, a process known as vectorization1.
1 Vectorization refers to the process of applying operations or functions to entire vectors directly, without the need for explicit loops or iterations. This allows element-wise operations on vectors, which results in code that is more concise, readable, and often faster compared to traditional iterative approaches.
For example, when we add a scalar and a vector, the scalar is added to each element of the vector, as follows:
Similarly, when a vector is multiplied by a scalar, each element of the vector is individually multiplied by the scalar, as shown below:
3*v
[1] 3 6 9
7.4.1.2Arithmetic operations between two long vectors
Arithmetic operations in R can be performed element-wise between corresponding elements of two vectors. This means that each element in one vector interacts with the corresponding element in the other vector. Consider two vectors, denoted as v and t, each containing a series of numerical values. Now, let’s apply some arithmetic operators to these vectors.
The dot product2 is a mathematical operation between two numeric vectors, and , that results in a scalar quantity. This operation is commonly represented with a dot placed between the vectors: .
2 The dot product of two vectors is an important operation in multiplication of matrices (see Chapter @ref(rmatrices)).
It is computed by multiplying corresponding elements of the vectors and summing the results. Given two vectors, and , we have:
where the symbol denotes the summation over all elements from to .
In our example, and , so the dot product is:
In R, the dot product operator is denoted as %*%, so we obtain:
Let’s go through some examples of basic comparison operators (>, <, ==, <=, >=, !=) applied between two vectors.
7.4.3.1Comparison between a long vector and a scalar
When comparison operators are applied between a long vector and a scalar, each element of the vector is compared to the scalar. The result of each comparison is a Boolean value (TRUE or FALSE).
In the case of two long vectors, each element of the first vector is compared with the corresponding element of the second vector, a process known as element-wise (or element-by-element) comparison. The result of each comparison is a Boolean value (TRUE or FALSE).
Logical operators are applicable to logical and/or numeric vectors and are applied in an element-wise way. The result of each comparison is a logical (Boolean) value.
The & operator performs an element-wise comparison, evaluating each pair of elements from the first and second vectors. It returns TRUE if both elements are TRUE; otherwise, it returns FALSE.
s&u
[1] TRUE FALSE TRUE FALSE TRUE FALSE FALSE
The && operator compares two one-element vectors and returns TRUE only if both elements are TRUE. For example:
s[1]&&u[1]
[1] TRUE
Note that in R 4.3.0 version and later, using the && operator on vectors longer than one results in an error. For example:
s&&u
Error in s && u : ‘length = 7’ in coercion to ‘logical(1)’
7.4.4.2OR operators (|, ||)
The | operator performs an element-wise comparison, evaluating each pair of elements from the first and second vectors. It returns TRUE if at least one element of the pair is TRUE; otherwise, it returns FALSE.
s|u
[1] TRUE FALSE TRUE TRUE TRUE TRUE FALSE
The || operator compares two one-element vectors and returns TRUE if at least one of the elements is TRUE. For example:
s[1]||u[1]
[1] TRUE
Note that in R 4.3.0 version and later, using the || operator on vectors longer than one results in an error. For example:
s||u
Error in s || u : ‘length = 7’ in coercion to ‘logical(1)’
7.4.4.3NOT operator (!)
The ! operator inverts each element of the vector, returning the opposite logical value. This is known as the negation operation.
!s
[1] FALSE TRUE FALSE TRUE FALSE FALSE TRUE
!u
[1] FALSE TRUE FALSE FALSE FALSE TRUE TRUE
7.5 Statistical functions applied to vectors
Statistical functions in R, such as sum() and mean(), take the elements of a numeric vector as input and return a single numeric value:
This demonstrates that if a numeric vector contains missing values, the mean cannot be computed and will return NA. In such cases, we can ignore the NAs by setting the na.rm argument to TRUE, which calculates the mean of the remaining values in the vector:
We can select elements from a vector using the subsetting operator, denoted by single square brackets [ ], which is also known as the extraction operator. The index within these brackets can be specified as a numeric vector, a logical vector, or a character vector, providing flexibility in element selection.
In the following examples, we demonstrate this concept using the built-in month.name vector, which contains the names of all twelve months. January is the first element, February is the second, and so on.
We can select specific elements or subsets from a vector using square brackets [ ] and specifying the indices of the desired elements. For example:
month.name[3]# select the 3rd month
[1] "March"
month.name[3:5]# select the 3rd, 4th, and 5th months
[1] "March" "April" "May"
In the second example, the expression 3:5 generates the sequence of indices 3, 4, 5, which is then passed to the subsetting operator [ ]. This returns a new vector containing only the months March, April, and May.
Note that we can select the same elements of a vector multiple times, and they will be returned in the order specified by the indices. For example:
month.name[c(3, 2, 1, 3, 4)]# the 3rd element is selected twice
[1] "March" "February" "January" "March" "April"
INFO
In R, the first element of a vector is at index of 1. In many other programming languages (e.g., C, Python, and Java), the first element in a sequence is indexed at 0.
Next, let’s apply the range 10:15 to the month.name vector:
month.name[10:15]
[1] "October" "November" "December" NA NA NA
When selecting elements from indices 10 to 15, R returns NAs for any indices that are beyond the length of the vector (e.g., for indices 13 to 15, since there are no corresponding months).
7.6.1.2Skip specific elements of vectors
A negative index skip the element at the specified index position. For example:
Next, let’s examine a common error that occurs when attempting to skip specific parts of a vector. For example, if we want to skip the first five elements of the month.name vector, we may try the following:
month.name[-1:5]
This results in an error: Error in month.name [-1:5]: only 0’s may be mixed with negative subscripts
The issue arises because the colon operator : in R generates the sequence -1, 0, 1, 2, 3, 4, 5, which is not valid for indexing as it mixes zero, negative, and positive indices.
One way to resolve this issue is by wrapping the sequence in parentheses, to ensure that the “-” arithmetic operator is applied to all elements of the sequence:
7.6.2 Selecting elements using boolean indices (TRUE/FALSE)
We can also use a logical vector with the [ ] operator, where TRUE or T selects the corresponding elements, and FALSE or F excludes them. For example, let’s say that we want to select only the first four months of the year:
In R, a named vector is a vector where each element is associated with a name or label. This allows us to access elements using their names instead of numeric indices. Here’s an example:
# Define a vector of month namesnm<-c("month_1", "month_2", "month_3", "month_4")# Assign names to the elements of the 'fourmonths' vector using setNames()fourmonths2<-setNames(fourmonths, nm)# Select elements with names "month_1", "month_2", and "month_4"fourmonths2[c("month_1", "month_2", "month_4")]
In the code above, we first create a vector nm containing labels such as “month_1”, “month_2”, “month_3”, and “month_4”. Next, the setNames() function is used to assign these names to the elements of the fourmonths vector, resulting in a named vector called fourmonths2. Finally, we select specific elements from fourmonths2 by referencing their names.
7.7 Vector recycling
What happens if we supply a logical vector that is shorter than the vector we’re selecting the elements from? For example:
fourmonths# call the "fourmonths" vector
[1] "January" "February" "March" "April"
fourmonths[c(TRUE, FALSE)]# we provide a vector with only two elements
[1] "January" "March"
This illustrates the concept of vector recycling. R automatically “recycles” the values of the shorter logical vector c(TRUE, FALSE) to match the length of the fourmonths vector, repeating the pattern as needed, as shown below:
Let’s consider another example with two numeric vectors of different lengths: c(3, 2, 7) and c(6, 4, 0, 5, 8, 6). How will R perform arithmetic operations, such as “addition”, in this case?
When we sum these two vectors, R automatically “recycles” the shorter vector, repeating it until it matches the length of the longer vector. This process is shown below:
c(3, 2, 7, 3, 2, 7) ||||||c(6, 4, 0, 5, 8, 6)
As a result, the element-wise addition is performed and is equivalent to the following:
If the length of the longer vector is not a multiple of the shorter vector’s length, R will still perform the calculation, but it will display a relevant warning message. For example:
Warning in c(3, 2, 7) + c(6, 4, 0, 5, 8): longer object length is not a multiple of
shorter object length
[1] 9 6 7 8 10
7.8 Subassignment
The subsetting operator [ ] can be combined with the assignment operator <- to modify specific values in a vector, a process known as subassignment. For example: