13 Basic concepts of Probability

When we have finished this Chapter, we should be able to:

Learning objectives

Understand and use the terminology of probability

13.1 Packages we need

We need to load the following packages:

# dice graphs
library(tidydice)
library(ggplot2)

# venn diagrams
library(ggvenn)
library(venn)
library(TeachingDemos)

13.2 Sample Space and Random Events

Both deterministic and stochastic phenomena drive the everyday life.

A deterministic phenomenon (process or experiment) always produce the same outcome each time it is repeated under the same conditions.
A random phenomenon (process or experiment) is characterized by conditions under which the result cannot be determined with certainty before it occurs, that is, one of several possible outcomes is observed each time the process or experiment is repeated. For example, when a coin is tossed, the outcome is either heads H or tails T, but unknown before the coin is tossed.

The sample space Ω is defined as the set of all possible outcomes of a random experiment. For example, if we roll a 6-sided die, the sample space is the set of the six possible outcomes, Ω ={1, 2, 3, 4, 5, 6} (Figure 13.1).

force_dice(1:6) |> 
  plot_dice(detailed = TRUE, fill_success = "white") + 
  theme(plot.title = element_blank())

Figure 13.1: Sample space for rolling a 6-sided die.

Different random experiments have different sample spaces that can be denoted in an equivalent way (flipping a coin: Ω ={H, T}, flipping two coins: Ω ={HH, HT, TH, TT}, testing for possible genotypes of a bi-allelic gene A: Ω ={AA, Aa, aa}).

A random event (henceforth called event) is denoted by a capital letter such as A, B, or C and is a sub-set of sample space Ω, including a number of possible outcomes of the experiment. For the example of the rolling die, the event “even number” may be represented by A = {2, 4, 6} which is a sub-set of Ω (A ⊂ Ω), and the event “odd number” by B = {1, 3, 5} which is also a sub-set of Ω (B ⊂ Ω). In the case of flipping two coins, an event could be that exactly one of the coins lands Heads, A = {HT, TH} or the event could be that at least one of the coins lands heads, B = {HH, HT, TH}.

If an event consists of a single outcome from the sample space, it is termed a simple event. For example, the event of getting the number 1 on rolling a die, denoted as A = {1}. If an event consists of more than a single outcome from the sample space, it is called a compound event such as rolling a die and getting an even number, A = {2, 4, 6}.

Important

For each experiment, two events always exist:

the sample space, Ω, which comprises all possible outcomes.
the empty set = ∅, that contains no outcomes and it is called the impossible event.

13.3 Operations of events using set theory and Venn diagrams

13.3.1 Union of Events: A∪B

The union of the events A and B, denoted by A∪B, is the collection of all outcomes that are in A or in B or in both of them and it is also an event. It will occur if either A or B occurs (the symbol ∪ is equivalent to OR operator).

Example

In the experiment of rolling a die, let’s consider the events A = “the number rolled is even” and B = “the number rolled is less than three”.

A <- c(2, 4, 6)      # A = {2, 4, 6}
B <- c(1, 2)         # B = {1, 2}

union(A, B)          # A∪B = {2, 4, 6, 1}

[1] 2 4 6 1

Show the code

venn("A + B", zcolor = "#7F7FFF", opacity = 1, ggplot = TRUE) +
  annotate("text", x = 35, y = 950, label = "Ω") +
  annotate("text", x = 500, y = 750, label = "AUB")

Figure 13.2: The union of the events A and B as represented in a Venn diagram.

13.3.2 Intersection of Events: A∩B

The intersection of A and B, denoted by A∩B, consists of all outcomes that are in both A and B (the symbol ∩ is equivalent to AND operator). That is, the events A and B must occur simultaneously.

Example

# A = {2, 4, 6}
# B = {1, 2}

intersect(A, B)

[1] 2

Show the code

venn("A B", zcolor = "#7F7FFF", opacity = 1,  ggplot = TRUE) +
  annotate("text", x = 35, y = 950, label = "Ω") +
  annotate("text", x = 500, y = 530, label = "A∩B")

Figure 13.3: The intersection of the events A and B as represented in a Venn diagram.

13.3.3 Complement Events

For example, the complement of the union of A and B, denoted by $(A \cup B)^{c}$ (sometimes denoted as $(A \cup B)^{'}$ ), is also an event and consists of all outcomes of the sample space Ω that don’t belong to A∪B.

Example

# A = {2, 4, 6}
# B = {1, 2}

AUB <- union(A, B)                     # A∪B = {2, 4, 6, 1} 


sample_space <- c(1, 2, 3, 4, 5, 6)    # sample_space = {1, 2, 3, 4, 5, 6}

setdiff(sample_space, AUB)

[1] 3 5

Show the code

venn("A + B", zcolor = "white", opacity = 1, box = FALSE, ggplot = TRUE) +
theme(panel.background = element_rect(fill = "#7F7FFF")) +
  annotate("text", x = 5, y = 1000, label = "Ω") +
  annotate("text", x = 500, y = 950, label = "(AUB)^{C}", parse =TRUE)

Figure 13.4: The complement of the union of A and B as represented in a Venn diagram.

13.3.4 Mutually exclusive events

Let’s consider the events A = “the number rolled is even” and C = “the number rolled is odd”.

The events A and C are mutually exclusive (also known as incompatible or disjoint) if they cannot occur simultaneously. This means that they do not share any outcomes and A∩C =∅.

Example

A <- c(2, 4, 6)      
C <- c(1, 3, 5)         

intersect(A, C)

numeric(0)

Show the code

# List of items
x2 <- list("A" = A, "C" = C)

ggvenn(x2, fill_alpha = 1, auto_scale = T, show_percentage = FALSE) +
  labs(title = "Ω") +
  scale_fill_manual(values = c("#7F7FFF", "#7F7FFF")) +
  theme(plot.title = element_text(size = 20),
        plot.background = element_rect(fill = "white"))

Figure 13.5: Venn Diagram of two mutually exclusive events.

13.4 Probability

The concept of probability is used in everyday life which stands for the likelihood of occurring or non-occurring of random events. The first step towards determining the probability of an event is to establish a number of basic rules that capture the meaning of probability. The probability of an event should fulfill three axioms defined by Kolmogorov:

The Kolmogorov Axioms

The probability of an event A is a non-negative number, P(A) ≥ 0
The probability of all possible outcomes, or sample space Ω, equals to one, P(Ω) = 1
If A and B are two mutually exclusive events (also known as disjoint events), then P(A ∪ B) = P(A) + P(B) and P(A ∩ B) = 0.

13.4.1 Definition of Probability

A. Theoretical probability (theoretical approach)

Theoretical probability describes the behavior we expect to happen if we give a precise description of the experiment (but without conducting any experiments). Theoretically, we can list out all the equally probable outcomes of an experiment, and determine how many of them are favorable for the event A to occur. Then, the probability of an event A to occur is defined as:

$\begin{matrix} (13.1) & P (A) = \frac{Number of outcomes favourable to the event A}{Total number of possible outcomes} \end{matrix}$

Note that the Equation 13.1 only works for experiments that are considered “fair”; this means that there must be no bias involved so that all outcomes are equally likely to occur.

Example 1

What is the theoretical probability of rolling the number “5” when we roll a six-sided fair die once?

The theoretical probability is:

$P (rolling 5) = \frac{1 outcome favourable to the event}{6 possible outcomes} = \frac{1}{6} \approx 0.167$

This is because only one outcome (die showing: ) is favorable out of the six equally likely outcomes (die showing: , , , , , ).

dice_tbl <- force_dice(1:6, success = 5)
dice_tbl

# A tibble: 6 × 5
  experiment round    nr result success
       <int> <int> <int>  <int> <lgl>  
1          1     1     1      1 FALSE  
2          1     1     2      2 FALSE  
3          1     1     3      3 FALSE  
4          1     1     4      4 FALSE  
5          1     1     5      5 TRUE   
6          1     1     6      6 FALSE

sum(dice_tbl$success)/length(dice_tbl$result)

[1] 0.1666667

Example 2

What is the probability of rolling either a “5” or a “6” when we roll a six-sided fair die once?

The theoretical probability is:

$P (rolling 5 OR 6) = \frac{2 outcomes favourable to the event}{6 possible outcomes} = \frac{2}{6} = \frac{1}{3} \approx 0.33$

dice_tbl <- force_dice(1:6, success = c(5, 6))
dice_tbl

# A tibble: 6 × 5
  experiment round    nr result success
       <int> <int> <int>  <int> <lgl>  
1          1     1     1      1 FALSE  
2          1     1     2      2 FALSE  
3          1     1     3      3 FALSE  
4          1     1     4      4 FALSE  
5          1     1     5      5 TRUE   
6          1     1     6      6 TRUE

sum(dice_tbl$success)/length(dice_tbl$result)

[1] 0.3333333

This is because two outcomes (die showing: or ) is favorable out of the six equally likely outcomes (die showing: , , , , , ).

We can also use the probability’s axioms. The probability of rolling a 6 is 1/6 and the probability of rolling a 5 is also 1/6. We cannot take a 5 and 6 at the same time (these events are mutually exclusive) so:

$P(rolling a 5 OR 6) = P(rolling a 5) + P(rolling a 6) = 1/6 + 1/6 = 2/6 = 1/3$

B. Experimental probability (frequentist approach)

The experimental probability is based on data from repetitions of the same experiment. According to this approach, the probability of an event A, denoted by P(A), is the relative frequency of occurrence of the event over a total number of experiments:

$\begin{matrix} (13.2) & P (A) \approx \frac{number of times A occured}{total number of experiments} \end{matrix}$

Yet, this definition seems less clear, as it does not specify the exact interpretation of “repetitions of the same experiment” (Finetti et al. 2008).

Example

We rolled a six-sided die 100 times and we recorded how often each outcome occurred (Figure 13.6). What is the experimental probability of getting the number “5”?

set.seed(348)
roll_dice(times = 100) |>  
  ggplot(aes(x = result)) +
  geom_bar(fill = "#0071BF", width = 0.65) +
  geom_text(aes(label=after_stat(count)), stat = "count", 
            vjust = 1.5, colour = "white") +
  scale_x_continuous(breaks = c(1:6), labels = factor(1:6)) +
  theme_minimal(base_size = 14)

Figure 13.6: Bar plot shows the counts for each outcome.

The experimental probability is:

$P (rolling a 5) = \frac{20 times the number “5” occured}{100 experiments} = \frac{20}{100} = 0.20 o r 20 %$ In 20% of the cases we got a that is greater than the expected value of $100 / 6 \approx 16.67 %$ .

However, if the die is rolled numerous times, for example 10000 times, the experimental probability should approximate the theoretical probability of that outcome (Law of Large Numbers), as shown in Figure 13.7:

set.seed(128)
roll_dice(times = 10000) |>  
  ggplot(aes(x = result)) +
  geom_bar(fill = "#0071BF", width = 0.65) +
  geom_text(aes(label=after_stat(count)), stat = "count", 
            vjust = 1.5, colour = "white") +
  scale_x_continuous(breaks = c(1:6), labels = factor(1:6)) +
  theme_minimal(base_size = 14)

Figure 13.7: Bar plot shows the counts for each outcome.

Now, the experimental probability is:

$P (rolling a 5) = \frac{1684 times the number “5” occured}{10000 experiments} = \frac{1684}{10000} = 0.1684 o r 16.84 %$

that is very close to the theoretical probability 16.67%.

Law of Large Numbers

The more times the experiment is performed, the closer the experimental probability approaches the theoretical probability.

C. “Subjective” probability (Bayesian approach)

The probability assigned to an event represents the degree of belief that the event will occur in a given try of the experiment, and it implies an element of subjectivity.

Example

In the die roll experiment, the determination of the subjective probability for events , , , , , relies on the belief that the die is unbiased, and therefore it must be true that P(1) = P(2) = P(3) = P(4) = P(5) = P(6). With this information, we can then simply use the Kolmogorov axioms to state that P(1) + P(2) + P(3) + P(4) + P(5) + P(6) = 1, and therefore obtain the intuitive result that P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6.

Nonetheless, if some individuals were aware that the die was biased, such as favoring the number six, their viewpoint would undergo a substantial change.

The following properties are useful to assign and manipulate event probabilities.

Fundamental Properties of Probability

The probability of the null event is zero, P(∅) = 0.
The probability of the complement event A satisfies the property:

$\begin{matrix} (13.3) & P (A^{'}) = 1 - P (A) \end{matrix}$

The probability of the union of two events satisfies the general property that (Addition Rule of Probability) :

$\begin{matrix} (13.4) & P (A \cup B) = P (A) + P (B) - P (A \cap B) \end{matrix}$

13.4.2 The Conditional Probability

The conditional probability is indicated as P(A|B) (or A given B) and the outcome of event A depends on the outcome of event B. The following formula defines the conditional probability:

$\begin{matrix} (13.5) & P (A \cap B) = P (A | B) \cdot P (B) \end{matrix}$

$\begin{matrix} (13.6) & P (A | B) = \frac{P (A \cap B)}{P (B)} \end{matrix}$

Example

Suppose we roll two fair six-sided dice. What is the probability that the first roll is a 3, given that the sum of two rolls is 8?

The sample space of the experiment consists of all ordered pairs of numbers from 1 to 6. That is, Ω = {(1, 1), (1, 2),… , (1, 6), (2, 1),… , (6, 6)}.

It is useful to define the following two events:

A = {The first roll shows 3, and the second any number}.
B = {The sum of two rolls is 8}.

We are interested in finding the conditional probability: $P (A | B) = \frac{P (A \cap B)}{P (B)}$

Event A (the first roll shows 3, and the second any number) is given by outcomes A = {(3,1), (3,2), (3,3), (3,4), (3,5), (3, 6)}.

Therefore, the probability of event A is:

$P (A) = \frac{6}{36} = \frac{1}{6}$

Event B (the sum of two rolls is 8) is given by outcomes B = {(2,6), (3,5), (4,4), (5,3), (6,2)} :

Therefore, the probability of event B to occur is:

$P (B) = \frac{5}{36}$

Also, the event A∩B occurs if the first die shows 3 and the sum is 8, which can clearly occur only if a sequence of (3,5) takes place:

1st roll	2	$3$	4	5	6
2nd roll	6	$5$	4	3	2
Sum	8	$8$	8	8	8

Thus, the probability of intersection of the two events is P(A∩B) = 1/36.

Finally, according to the definition of conditional probability Equation 13.6, the probability of interest is:

$P (A | B) = \frac{P (A \cap B)}{P (B)} = \frac{\frac{1}{36}}{\frac{5}{36}} = \frac{1}{5}$

Therefore, the “knowledge” that the sum of two rolls is 8 has updated the probability of A from P(A) = 1/6 = 0.167 to P(A|B) = 1/5 = 0.2.

13.4.3 Bayes’ theorem

Bayes’ theorem is based on this concept of “revisiting probability” when new information is available.

The Equation 13.5 states that $P (A \cap B) = P (A | B) \cdot P (B)$ . Note that the $P (A \cap B)$ is the probability of both events A and B occurring, so we can also state that $P (A \cap B) = P (B | A) \cdot P (A)$ .

Now, replacing the P(A ∩ B) with P(B|A) · P(A) in the Equation 13.6 we get the Bayes’ theorem:

$\begin{matrix} (13.7) & P (A | B) = \frac{P (B | A) \cdot P (A)}{P (B)} \end{matrix}$

where $P (B) \neq 0$ .

Example

We are interested in calculating the probability of developing lung cancer if a person smokes tobacco for a long time, P(Cancer|Smoker).

Suppose that 8% of the population has lung cancer, P(Cancer) = 0.08, and 30% of the population are chronic smokers, P(Smoker) = 0.30. Also, suppose that we know that 60% of all people who have lung cancer are smokers, P(Smoker|Cancer) = 0.6.

Using the Bayes’ theorem we have:

$P(Cancer|Smoker) = \frac{P(Smoker|Cancer)· P(Cancer)}{P(Smoker)} = \frac{0.6 \times 0.08}{0.3} = \frac{0.048}{0.3} = 0.16$

13.4.4 Independence of events

If the knowledge of occurrence of an event does not influence the occurrence of another event, the two events are called independent. In fact, if A and B are independent, then the conditional probability is P(A|B) = P(A), i.e. the occurrence of B has no influence on the occurrence of A and P(B|A) = P(B), i.e. the occurrence of A has no influence on the occurrence of A. Consider, for example, rolling two dice consecutively: the outcome of the first die is independent of the outcome of the other die.

Two events A and B are said to be independent if:

$\begin{matrix} (13.8) & P (A \cap B) = P (A) \cdot P (B) \end{matrix}$

This is known as Multiplication Rule of Probability and follows directly from Equation 13.5 because P(A|B) = P(A).

Example

Determine the probability of obtaining two 3s when rolling two six-sided fair dice consecutively. This event can be decomposed in two events:

A = {die 1 shows , and die 2 shows any number}.

$P (A) = \frac{6}{36} = \frac{1}{6}$

B = {die 1 shows any number, and die 2 shows }.

$P (B) = \frac{6}{36} = \frac{1}{6}$

We can state that the two events A and B are independent by nature, since each event involves a different die, which has no knowledge of the outcome of the other one. The event of interest is A ∩ B, and the definition of probability of two independent events leads to:

$P(A ∩ B) = P(A) · P(B) = \frac{1}{6} \cdot \frac{1}{6} = \frac{1}{36}$

This result can be verified by a direct count of all possible outcomes in the roll of two dice, and the fact that there is only one combination out of 36 that gives rise to two consecutive 3s.