## Descriptive Statistics

### Measures of Central Tendency

**Mean () (population)**

where:

= total number of data points = each data point

**Mean () (sample)**

where:

= number of data points in the sample

**Median**

- If
is odd:

- If
is even:

**Mode**

- Value that appears most frequently in the dataset.

### Measures of Dispersion

**Variance () (population)**

**Standard Deviation () (population)**

**Variance () (sample)**

**Standard Deviation () (sample)**

**Range**

## Probabilities

### Basic Concepts

**Probability of an event**

**Complementary Probability**

### Probability Rules

**Addition Rule (for mutually exclusive events)**

**Addition Rule (for non-mutually exclusive events)**

**Multiplication Rule (for independent events)**

**Multiplication Rule (for dependent events)**

where

## Law of Large Numbers

As the size of a sample increases, the sample mean will approach the expected mean of the population.

**Weak Form**

For a sequence of independent and identically distributed (i.i.d.) random variables with a mean

for any

This means that the sample mean tends to get closer to the population mean, but there is no guarantee that this will happen in every case.

**Strong Form**

States that the sample mean converges almost surely (with probability 1) to the population mean, as the sample size approaches infinity

This means that the sample mean will definitely equal the population mean when **the number of trials is infinite**.

**Example**

We roll a die 1000 times and calculate the mean of the results. As the number of rolls (n) increases, the mean of the results will approach the expected value (

If we roll the die once, we can get any number from 1 to 6, but as we increase the number of rolls, the mean of those rolls will approach 3.5.

## Combinatorics

**Fundamental Counting Principle**

If one task can be performed in

### Permutations

**Permutations of**

**Permutations of**

### Combinations

**Combinations of**

**Combinations with repetition**

### Binomial Theorem

**Binomial expansion**

## Bernoulli Trials

A Bernoulli trial is a random experiment that has the following characteristics:

Discrete Outcomes: It has only two possible outcomes, typically called success (usually represented by 1) and failure (represented by 0).

Constant Probability: The probability of success

is constant in each trial. Consequently, the probability of failure is .Independence: The trials are independent, meaning the outcome of one trial does not affect the outcome of another.

**Examples**

- Tossing a coin, where “heads” can be considered a success and “tails” a failure
- Taking an exam, where “passing” is considered a success and “failing” a failure
- Measuring the effectiveness of a medical treatment in which the outcome can be “effective” (success) or “ineffective” (failure)

## Probability Distributions

### Discrete Distribution

Discrete distributions are those that describe probabilities of variables that can only take specific and finite values, such as integers.

**Binomial Distribution**

Models the number of successes in a sequence of independent trials, each with the same probability of success.

where:

= number of trials = number of successes = probability of success in a single trial = binomial coefficient

**Negative Binomial Distribution**

Models the number of failures before achieving a fixed number of successes in Bernoulli trials.

where:

is the number of successes required is the probability of success is the number of failures

**Geometric Distribution**

Models the number of trials until the first success in a sequence of Bernoulli trials.

where:

is the probability of success is the number of trials until the first success

**Hypergeometric Distribution**

Models the number of successes in a fixed-size sample drawn without replacement from a finite population.

where:

is the population size is the number of successes in the population is the sample size

**Poisson Distribution**

Models the number of events that occur in a fixed interval of time or space when events occur with a constant average rate.

where:

is the average rate of occurrence of the events is the number of events

### Continuous Distribution

**Normal (Gaussian) Distribution**

Models many variables in nature and society. It is symmetric and bell-shaped.

where:

is the mean is the variance

**Cumulative Distribution Function of the Normal (CDF)**

**Exponential Distribution**

Models the time between events in a Poisson process. It is used in reliability theory and waiting times.

where:

is the rate of occurrence of the events.

**Uniform Distribution**

Models a variable that has the same probability of taking any value within a defined interval.

where:

and are the limits of the interval.

**Gamma Distribution**

Generalizes the exponential distribution. Models the time until

where:

is a shape parameter is a rate parameter

**Beta Distribution**

Primarily used in Bayesian statistics to model probability distributions in proportions or probabilities.

where:

and are shape parameters is the beta function

**Cauchy Distribution**

Models phenomena where the mean and variance are undefined or infinite.

where:

is the location is the scale parameter

**Student’s t-distribution**

Used to estimate the mean of a normally distributed population when the sample size is small and the variance is unknown.

where:

are the degrees of freedom, which determine the shape of the distribution. As increases, the t-distribution converges to a standard normal distribution.

## Correlation and Regression

### Correlation

**Pearson Correlation Coefficient (**

### Linear Regression

**Regression Line Equation**

where:

= slope = intercept

**Slope**

where:

and are the standard deviations of and , respectively.

## Statistical Inference

### Parameter Estimation

**Point Estimation (sample mean)**

### Confidence Intervals

**Confidence Interval for the Mean**

where

### Hypothesis Testing

**Hypothesis Test (p-value)**

- If
is the null hypothesis and is the alternative hypothesis:

**Test Statistic for the Mean**

## Correlation and Regression

### Correlation

**Pearson Correlation Coefficient (**

### Linear Regression

**Regression Line Equation**

where:

= slope = intercept

**Slope**

where:

and are the standard deviations of and , respectively.