7  Atomic vectors

When we have finished this chapter, we should be able to:

Learning objectives
  • Know common R data structures.
  • Create 1-dimensional vectors.
  • Understand the concepts of coercion and vector recycling.
  • Extract elements from an 1-dimensional vector.

7.1 Introduction to vectors in R

The most fundamental concept in R are the vectors. Vectors come in two broad types: atomic vectors and generic vectors (lists) . The atomic vectors must have all elements of the same basic type (e.g., numeric, characters). On the contrary, in the lists different elements can have different basic types (e.g., some elements may be numeric and some characters).

The R language supports many types of data structures that we can use to organize and store information. We will see that complex structures such as matrices, arrays, and data frames can be created. Each data structure type serves a specific purpose and might differ in terms of the type of data it can hold and its structural complexity. These data structures are schematically illustrated in Figure 7.1.

Figure 7.1: Data structures in R.

7.2 Atomic vectors

Atomic vectors in R are one-dimensional data structures. This means that they consist of a single sequence of elements of the same data type arranged along a single dimension. Each element within the vector is uniquely identified by its position within this sequence.

There are four primary types of atomic vectors (also known as “atomic” classes):

  • logical
  • integer
  • double
  • character (which may contain strings)

As a group integer and double vectors are considered numeric vectors.

There are also two rare types: complex and raw but we won’t discuss them further because they are not used in this textbook.

Let’s start by understanding one-element vectors, the most basic form of atomic vectors in R. Afterward, we’ll explore longer vectors to gain insight into their properties and practical uses.

7.2.1 One-element atomic vectors

Individual numbers or strings are one dimensional (1-D) vectors of length one and in some instances we call them scalars. Therefore, an one-element vector (oev) is just a single value like a number and they can be used to construct more complex objects (longer vectors). We present some examples of one-element vectors for each of the four primary types (in order from least to most general type).

Logical one-element vector

Logical values are boolean values of TRUE or FALSE which can be abbreviated, when we type them as T or F (not recommended). Examples of logical one-element vectors (oev) follows:

oev_a <- TRUE     # assign the logical TRUE to an object named oev_a
oev_a             # call the object with its name
[1] TRUE
oev_b <- FALSE    # assign the logical FALSE to an object named oev_b
oev_b             # call the object with its name
oev_c <- T        
[1] TRUE
oev_d <- F        

Integer one-element vector

Even when we observe numbers like 1 or 2 in the console, internally, R may store them as 1.00 or 2.00. To explicitly specify integer numbers,in R, we need to append an “L” suffix, as demonstrated in the following examples:

oev_e <- 3L          
[1] 3
oev_f <- 100L        
[1] 100

Double one-element vector

Doubles, which represent real numbers, can be expressed either in decimal format (e.g., 0.000017) or in scientific notation (e.g., 1.7e-5).

oev_g <- 0.000017   
[1] 1.7e-05
oev_scientific <- 1.7e-5      
[1] 1.7e-05

Character one-element vector

One-element vectors can also be characters (also known as strings). In R, we denote characters using single '' or double "" quotation marks. Internally R stores every string within double quotes, even we have created them with single quotation marks. Here, we present some examples of character one-element vectors:

oev_h <- "hello"      # double quotation marks
[1] "hello"
oev_i <- 'Covid-19'   # single quotation marks
[1] "Covid-19"
oev_j <- "I love data analysis"
[1] "I love data analysis"

R treats numeric and character vectors differently. For example, while we can do basic arithmetic operations on numeric vectors – they won’t work on character vectors. If we try to perform numeric operations such as addition on character vector, we’ll get an error like the following:

h <- "1"
k <- "2"
h + k

Error in h + k : non-numeric argument to binary operator

The error message indicates that we’re trying to apply numeric operations to character objects that’s wrong.

It’s very rare that single values (one-element vectors) will be the center of an R session. Next, we are going to discuss about “longer” atomic vectors.

7.2.2 Longer atomic vectors

Atomic vectors can consisted of more than one element. In this case, the vector elements are ordered, and they must all be of the same type of data. Common example types of “long” atomic vectors are numeric (whole numbers and fractions), logical (e.g., TRUE or FALSE), and character (e.g., letters or words). Let’s see how we can create “long” atomic vectors and some usefull vector properties through examples.

The colon operator :

This operator : generates sequences of consecutive values. For example:

[1] 1 2 3 4 5

In this example, the colon operator : takes two integers 1 and 5 as arguments, and returns an atomic vector of integer numbers from the starting point 1 to the ending point 5 by steps 1.

We can assign (or name) the atomic vector to an object named x_seq:

x_seq <- 1:5

and call it with its name:

[1] 1 2 3 4 5

We can determine the type of a vector with typeof().

[1] "integer"

The elements of the x_seq vector are integers. We can also find how many elements a vector contains applying the length() function:

[1] 5

Other examples:

[1] 5 4 3 2 1
[1] 2.5 3.5 4.5 5.5 6.5 7.5 8.5
[1] -3 -2 -1  0  1  2  3  4

The function seq()

We have already explore in Chapter 4 the seq() function which creates vectors of consecutive values (seq stands for sequence):

seq(1, 5)    # increment by 1
[1] 1 2 3 4 5

The c() function

We can also create atomic vectors “by hand” using the c() function (or concatenate command) which combines values into a vector. Let’s create a vector of values 2, 4.5, and 1:

c(2, 4.5, -1)
[1]  2.0  4.5 -1.0

Of course, we can have an atomic vector with logical elements as the following example:


or equivalently

c(T, F, T, F)

and an atomic vector with character elements:

c("male", "female", "female", "male")
[1] "male"   "female" "female" "male"  

An atomic vector can be element of another vector:

y_seq <- 3:7
c(y_seq, 2, 4.5, -1)  # y_seq is an element of a vector
[1]  3.0  4.0  5.0  6.0  7.0  2.0  4.5 -1.0

Repeating vectors

The rep() function in R provides a convenient way to repeat either the complete vector or individual elements of a vector. Let’s see some examples:

A. Repeating the complete vector

rep(1:4, times = 5)               # 5 times to repeat the complete vector
rep(c(0, 4, 7), times = 3)        # 3 times to repeat the complete vector
rep(c("a", "b", "c"), times = 2)  # 2 times to repeat the complete vector
 [1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
[1] 0 4 7 0 4 7 0 4 7
[1] "a" "b" "c" "a" "b" "c"

B. Repeating each element of the vector

rep(1:4, each = 5)               # each element is repeated 5 times
rep(c(0, 4, 7), each = 3)        # each element is repeated 3 times
rep(c("a", "b", "c"), each = 2)  # each element is repeated 2 times
 [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
[1] 0 0 0 4 4 4 7 7 7
[1] "a" "a" "b" "b" "c" "c"

Default vectors

R comes with a few built-in vectors providing useful data for various tasks.

# upper-case letters

# lower-case letters

# months

# three-letter months
 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"
 [1] "January"   "February"  "March"     "April"     "May"       "June"     
 [7] "July"      "August"    "September" "October"   "November"  "December" 
 [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"

We will use some of these built-in vectors in the examples that follow.

7.3 Mixing things in a vector - Coercion

7.3.1 Implicit coercion

Implicit coercion in R refers to the automatic conversion of data from one type to another when necessary for an operation or function. In general, it is an attempt by R to be flexible with data types.

For example, R assumes that everything in our atomic vector is of the same data type – that is, all numbers or all characters or all logical elements. Let’s create a “mixed” vector:

my_vector <- c(1, 4, "hello", TRUE)
[1] "1"     "4"     "hello" "TRUE" 

Since the vector contains a mix of numeric, character, and logical values, R converted all elements into a common data type, in this case all characters. So my_vector contains 1, 4, hello and TRUE as characters.

The hierarchy for coercion is:

logical < integer < numeric < character


1. numeric Vs character

a <- c(10.5 , 3.2, "I am a character string")
[1] "10.5"                    "3.2"                    
[3] "I am a character string"
[1] "character"

Adding a character string to a numeric vector converts all the elements in the vector to character values.

2. logical Vs character

b <- c(TRUE, FALSE, "Hello")
[1] "TRUE"  "FALSE" "Hello"
[1] "character"

Adding a character string to a logical vector converts all the elements in the vector to character values.

3. logical Vs numeric

d <- c(FALSE, TRUE, 2)
[1] 0 1 2
[1] "double"

Adding a numeric value to a logical vector converts all the elements in the vector to double (numeric) values. Logical values are converted to numbers as following: TRUE is converted to 1 and FALSE to 0. Therefore, the logical values behave like numbers (FALSE = 0, TRUE = 1) in mathematical functions. For example:

num_TF <- c(4, FALSE, TRUE, 2, -1, TRUE, FALSE, 0)
[1]  4  0  1  2 -1  1  0  0
[1] 7
[1] 0.875

7.3.2 Explicit coercion

Explicit coercion refers to the process of intentionally converting data from one data type to another using specific conversion functions provided by R. For example, we can convert numbers into characters applying the as.character() function. Let’s create a numeric vector f, with numbers 1 through 5, and convert it to a character vector g:

f <- 1:5

g <- as.character(f)
[1] "1" "2" "3" "4" "5"

We can turn the characters back to numbers using the as.numeric() function which converts characters or other data types into numeric:

[1] 1 2 3 4 5

This function in R is particularly valuable in practice, because many public datasets that include numbers, include them in a form that makes them appear to be character strings.

Next, suppose the object q of character strings “1”, “2”, “3”, “d”, “5” and we want to convert them to numbers using the as.numeric() function:

q <- c("1", "2", "3", "d", "5")

Warning: NAs introduced by coercion
[1]  1  2  3 NA  5

We observe that R successfully converted the strings "1", "2", "3"and "5" into their corresponding numeric values 1, 2, 3 and 5, but it encountered difficulty with the string "d". As a result, if we apply as.numeric() on this vector, we get a warning that NAs were introduced by coercion (the element “d” was converted to a missing value NA).

Moreover, when the coercion does not really make sense, we will usually get a warning and R turns all the elements into NAs. For example:

x_abcde <- c("a", "b", "c", "d", "e")
Warning: NAs introduced by coercion

7.4 Operators applied between two vectors

7.4.1 Arithmetic Operators

Let’s go through some examples of basic arithmetic operators (+, -, *, /, ^) applied between two vectors.

Comparison between a long vector and a scalar

When we add a scalar to a vector, the scalar is added to each element of the vector (vectorization):

v <- c(1, 2, 3)

v + 3
[1] 4 5 6

Multiplying a vector by a scalar results in each element of the vector being multiplied by the scalar (vectorization):

v * 3
[1] 3 6 9

Comparison between two long vectors

The arithmetic operations can be performed element-wise between corresponding elements of the vectors (vectorization). In simple terms, each element of one vector interacts with the corresponding element of the other vector. Let’s consider two vectors, denoted as v and t, each holding a series of numerical values, and let’s apply some arithmetic operators.

v <- c(1, 2, 3)
t <- c(8, 3, 2)
t + v           # addition
[1] 9 5 5
t * v           # multiplication
[1] 8 6 6
t^v             # exponent
[1] 8 9 8
t + 3 * v / 2   # remember the order of operations in R
[1] 9.5 6.0 6.5
INFO: Vectorization

Vectorization refers to the process of efficiently applying operations or functions to entire vectors without the need for explicit looping or iteration. This involves element-wise operations on vectors, which results in code that is more concise, readable, and often faster compared to traditional iterative approaches.

For example, mathematical operations applied to all the elements of a numeric vector:

(1:5) * 2
[1]  2  4  6  8 10
[1]  2  4  8 16 32

The same rule is applied to the elements of the vectors using mathematical functions:

z_seq <- 3:9      
sqrt(z_seq)    # calculate the square root of all the elements of z_seq
[1] 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427 3.000000

We can also round the results using the round() function and set the argument digits = 2, as following:

round(sqrt(z_seq), digits = 2)
[1] 1.73 2.00 2.24 2.45 2.65 2.83 3.00

7.4.2 Dot (inner) product operator

The dot product, also known as the inner product, is a mathematical operation between two numeric vectors, v and t. This operation is commonly represented using a dot placed between the vectors: \(\nu \cdot t\).

Particularly, we multiply the corresponding elements of the two vectors and then sum up those products to obtain a single scalar value. Given the two vectors, \(\nu = (\nu_1, \nu_2, ..., \nu_n)\) and \(t = (t_1, t_2, ..., t_n)\), we have: \[\nu \cdot t = \nu_1 * t_1 + \nu_2 * t_2 + ... + \nu_n * t_n\] In our example, \(\nu = (1, 2, 3)\) and \(t = (8, 3, 2)\), so the inner product is: \(\nu \cdot t = 1 * 8 + 2 * 3 + 3 * 2 = 8 + 6 + 6 = 20\)

In R, the inner product operator is denoted as %*%, so we obtain:

v %*% t
[1,]   20

The inner product of two vectors is an important operation in multiplication of matrices (see Chapter 8).

7.4.3 Relational Operators

Let’s go through some examples of basic relational operators (>, <, ==, <=, >=, !=) applied between two vectors.

Comparison between a long vector and a scalar

For relational operators applied between a long vector and a scalar, each element of the vector is compared with a defined value (scalar). The result of each comparison is a Boolean value (TRUE or FALSE).


m <- c(4, 2, 3, 8)
m > 3
m >= 3
m == 3
m != 3

Comparison between two long vectors

In the case of two long vectors, each element of the first vector is compared with the corresponding element of the second vector (element-wise comparison). The result of each comparison is a Boolean value (TRUE or FALSE).


w <- c(2, 5.5, 6, 9)
z <- c(8, 2.5, 14, 9)
w > z
w == z
w >= z
w != z

7.4.4 Logical Operators are applied to vectors

The logical (Boolean) operators are:

  • &, & (AND)
  • |, || (OR)
  • ! (NOT)

Logical operators are applicable to logical and/or numeric vectors and are applied in an element-wise way. The result of each comparison is a logical (Boolean) value.

Suppose we have the following vectors:

s <- c(1, 0, - 1, 0, TRUE, TRUE, FALSE)
[1]  1  0 -1  0  1  1  0
u <- c(2, 0, - 2, 2, TRUE, FALSE, FALSE)
[1]  2  0 -2  2  1  0  0

How R will compute, for example, s & u?

All non-zero values in the vectors are considered as logical value TRUE and all zeros are considered as FALSE.


[1]  1  0 -1  0  1  1  0


[1]  2  0 -2  2  1  0  0


AND Operators (&, &&)

The & operator combines each element of the first vector with the corresponding element of the second vector (element-wise comparison) and gives an output TRUE if both elements are TRUE.

s & u

The && operator works with one-element vectors and gives an output TRUE if both elements are TRUE. For example:

s[1] && u[1]
[1] TRUE
Operator &&

In R 4.3.0 version and later, calling the && in operations with vectors of length greater than one gives an error. For example:

s && u

Error in s && u : ‘length = 7’ in coercion to ‘logical(1)’

OR operators (|, ||)

The | operator combines each element of the first vector with the corresponding element of the second vector (element-wise comparison) and gives an output TRUE if at least one of the elements is TRUE.

s | u

The || operator works with one-element vectors and gives an output TRUE if at least one of the elements elements is TRUE. For example:

s[1] || u[1]
[1] TRUE
Operator ||

In R 4.3.0 version and later, calling the || in operations with vectors of length greater than one gives an error. For example:

s || u

Error in s || u : ‘length = 7’ in coercion to ‘logical(1)’

NOT operator (!)

The ! operator takes each element of the vector and gives the opposite logical value.

! s
! u

7.5 Statistical functions applied to vectors

Statistical functions in R such as sum() and mean() take as input the values of a numeric vector and return a single numeric value:

v_seq <- 5:10   
[1]  5  6  7  8  9 10
sum(v_seq)     # adds all the elements of a vector
[1] 45
mean(v_seq)    # calculate the arithmetic mean
[1] 7.5
median(v_seq)  # calculate the median
[1] 7.5
sd(v_seq)      # calculate the standard deviation
[1] 1.870829
range(v_seq)   # returns the minimum and maximum values
[1]  5 10

Next, we add a missing value NA in the v_seq vector:

v_seq2 <- c(v_seq, NA)
[1] "integer"

We can see that the v_seq2 vector is of integer type.

However, if we try to calculate the mean of the v_seq2, R returns a NA value:

[1] NA

Therefore, if some of the values in a numeric vector are missing, then the mean of the vector is unknown (NA). In this case, it makes sense to remove the NA and calculate the mean of the other values in the vector setting the na.rm argument equals to TRUE:

mean(v_seq2, na.rm = TRUE)
[1] 7.5

7.6 Subsetting vectors

We can select elements from a vector using single square brackets [ ], which is also referred to as the extraction operator. The index within these brackets can be specified as a numeric vector, a logical vector, or a character vector, offering flexibility in element selection. In the examples that follow, we demonstrate this concept using the built-in month.name vector that contains the names of all twelve months, with January being the first element, February the second, and so on.

 [1] "January"   "February"  "March"     "April"     "May"       "June"     
 [7] "July"      "August"    "September" "October"   "November"  "December" 

7.6.1 Selecting elements by indexing position

Select specific elements of a vector

We can select parts of a vector with square brackets [ ] using the indices of the elements. For example:

month.name[3]           # select the 3rd month
[1] "March"
month.name[3:5]         # select the 3rd, 4th, and 5th months
[1] "March" "April" "May"  

In the second code example, the vector 3:5 generates the sequence of indices 3, 4, 5, which is then passed to the extraction operator [ ]. Consequently, the command returns a new subset vector containing only the months March, April, and May.

Figure 7.2: Select elements by indexing position

We can also get the previous result using the vector c(3, 4, 5):

month.name[c(3, 4, 5)]
[1] "March" "April" "May"  

In R, the first element of a vector starts at index of 1. In many other programming languages (e.g., C, Python, and Java), the first element in a sequence has an index of 0.

Note that the values are returned in the order that we specify with the indices. For example:

month.name[5:3]       # select the 5th, 4th, 3rd elements
[1] "May"   "April" "March"

We can also select the same elements of a vector multiple times:

month.name[c(1, 2, 3, 3, 4)]     # the 3rd element is selected twice
[1] "January"  "February" "March"    "March"    "April"   


Out-of-Range Indexing

If we try to extract elements outside of the vector, R returns missing values NAs:

[1] "October"  "November" "December" NA         NA         NA        

Skip specific elements of vectors

A negative index skip the element at the specified index position. For example:

month.name[-3]             # skip the 3rd month
 [1] "January"   "February"  "April"     "May"       "June"      "July"     
 [7] "August"    "September" "October"   "November"  "December" 

We can also skip multiple elements:

month.name[c(-3, -7)]      # skip the 3rd and 7th elements
 [1] "January"   "February"  "April"     "May"       "June"      "August"   
 [7] "September" "October"   "November"  "December" 

which is equivalent to:

month.name[-c(3, 7)]       # skip the 3rd and 7th elements
 [1] "January"   "February"  "April"     "May"       "June"      "August"   
 [7] "September" "October"   "November"  "December" 

A common error occurs when trying to skip certain parts of a vector. For example, suppose we want to skip the first five elements form the month.name vector. First, we may try the following:


This gives an error:
Error in month.name [-1:5]: only 0’s may be mixed with negative subscripts

Remember that the colon : is an operator in R; in this example it generates the sequence -1, 0, 1, 2, 3, 4, 5.

A way of solving the problem is to wrap the sequence in parentheses, so that the “-” arithmetic operator will be applied to all elements of the sequence:

[1] -1 -2 -3 -4 -5
month.name[-(1:5)]            # skip the 1st to 5th element
[1] "June"      "July"      "August"    "September" "October"   "November" 
[7] "December" 

7.6.2 Selecting elements using boolean indices (T/F)

We can also pass a logical vector to the [ ] operator indicating with TRUE or T the indices we want to select and FALSE or F all others. For example, let’s say that we want to select only the first four months of the year:

fourmonths <- month.name[c(TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, 
                           FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)]
[1] "January"  "February" "March"    "April"   
Figure 7.3: Select elements by using TRUE or FALSE indices

Furthermore, if we want to exclude “March” from the fourmonths vector we should code:

fourmonths[c(TRUE, TRUE, FALSE, TRUE)]
[1] "January"  "February" "April"   

7.6.3 Selecting elements by indexing names

In R, a named vector is a vector where each element is associated with a name or label. This association enables us to access elements using their names instead of their numeric indices. Let’s illustrate this with an example:

# Define a vector of month names
nm <- c("month_1", "month_2", "month_3", "month_4")

# Assign names to the elements of the 'fourmonths' vector using setNames()
fourmonths2 <- setNames(fourmonths, nm)

# Select elements with names "month_1", "month_2", and "month_4"
fourmonths2[c("month_1", "month_2", "month_4")]
   month_1    month_2    month_4 
 "January" "February"    "April" 

Initially, a vector named nm is created, containing month names such as “month_1”, “month_2”, “month_3”, and “month_4”. Then, the setNames() function assigns these names to the elements of the vector fourmonths, transforming it into a named vector called fourmonths2. Subsequently, specific elements are selected from fourmonths2 based on their assigned names.

Figure 7.4: Select elements by indexing names

7.7 Vector recycling

What happens if we supply a logical vector that is shorter than the vector we’re extracting the elements from?

For example:

fourmonths          # call the "fourmonths" vector
[1] "January"  "February" "March"    "April"   
fourmonths[c(TRUE, FALSE)]    # we provide a vector with only two elements
[1] "January" "March"  

This illustrates the idea of vector recycling. The [ ] extraction operator silently “recycled” the values of the shorter vector c(TRUE, FALSE) in order to make the length compatible to the fourmonths vector:

[1] "January" "March"  


Let’s look at another example. Suppose we have two numeric vectors with different length. In this case, how R will perform arithmetic operations such as “addition”?

c(3, 2, 7) ?  ?  ?  
  |  |  |  |  |  |   
c(6, 4, 0, 5, 8, 6) 

The sum of the two vectors is:

c(3, 2, 7) + c(6, 4, 0, 5, 8, 6)
[1]  9  6  7  8 10 13

So, what happened here?


If we sum these two vectors then R automatically recycles the shorter vector, by replicating it until it matches the length of the longer vector as follows:

c(3, 2, 7, 3, 2, 7) 
  |  |  |  |  |  |   
c(6, 4, 0, 5, 8, 6) 

So, the element-wise addition is feasible and equivalent to the following:

c(3, 2, 7, 3, 2, 7) + c(6, 4, 0, 5, 8, 6) 
[1]  9  6  7  8 10 13

If the longer vector length isn’t a multiple of the shorter vector length, then R performs the calculation and prints out a pertinent warning message. For example:

c(3, 2, 7) + c(6, 4, 0, 5, 8)
Warning in c(3, 2, 7) + c(6, 4, 0, 5, 8): longer object length is not a
multiple of shorter object length
[1]  9  6  7  8 10