Introduction to Normal Distribution

Understanding the Normal Distribution

Definition:

The Normal Distribution is a continuous probability distribution that exhibits a remarkable symmetry around its mean value. This means that the probability of observing a value above the mean is precisely equal to the probability of observing a value below the mean. Interestingly, the distribution reaches its highest point precisely at the mean, creating the iconic bell-shaped curve that has become a hallmark of statistical analysis.

Probability Density Function

The defining characteristic of the Normal Distribution is its probability density function, which takes the form of a symmetric, bell-shaped curve. This function mathematically describes the relative likelihood of observing different values within the distribution. The shape of the curve is determined by two key parameters: the mean, which represents the central tendency of the data, and the standard deviation, which quantifies the spread or variability of the values around the mean.

Normal Distribution - formula

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import matplotlib.patches as mpatches

import seaborn as sns

import math

np.random.seed(1234)

N = 200000

data_normal_distr = np.random.normal(0, 1, N)

sns.histplot(data_normal_distr , color='blue', alpha=0.48, stat='density', bins=18)

sns.kdeplot(data_normal_distr , color='red')

plt.title("Normal Distribution")

plt.show()

from scipy import stats
from scipy.stats import norm

loc = 0
scale = 1
print(f'The mean of the Normal({loc},{scale}) Distribution is: ', np.round(norm.mean(loc=loc, scale=scale), 4))
print(f'The median of the Normal({loc},{scale}) Distribution is: ', np.round(norm.median(loc=loc, scale=scale), 4))
print(f'The variance of the Normal({loc},{scale}) Distribution is: ', np.round(norm.var(loc=loc, scale=scale), 4))
print(f'The standard deviation of the Normal({loc},{scale}) Distribution is: ', np.round(norm.std(loc=loc, scale=scale), 4))
print(f'The skewness of the Normal({loc},{scale}) Distribution is: ', np.round(norm.stats(loc=loc, scale=scale, moments='mvsk')[2], 4))
print(f'The kurtosis of the Normal({loc},{scale}) Distribution is: ', np.round(norm.stats(loc=loc, scale=scale, moments='mvsk')[3], 4))

X = 1.5
print(f'The left probability of X = {X} in the Normal Standard Distribution is: ', norm.cdf(X))
print(f'The Right probability of X = {X} in the Normal Standard Distribution is: ', norm.sf(X))

fig, ax = plt.subplots()

#Normal Distribution curve
x= np.arange(-4, 4, 0.001)

ax.plot(x, norm.pdf(x, loc=0, scale=1))
ax.set_title("Normal Dist. with mean=0, std_dv=1")
ax.set_xlabel('X - Values')
ax.set_ylabel('PDF(X)')

#Fill_color_between
px=np.arange(-4,1, 0.001)
ax.set_ylim(0, 0.5)
ax.fill_between(px, norm.pdf(px, loc=0, scale=1), color='yellow')

prob = norm.cdf(X)
ax.text(-0.8, 0.02, f'P(X<{np.round(X, 3)})\n {np.round(prob, 2)}', fontsize=18);

fig, ax = plt.subplots()

#Normal Distribution curve
x= np.arange(-4, 4, 0.001)
ax.plot(x, norm.pdf(x, loc=0, scale=1))
ax.set_title("Normal Dist. with mean=0, std_dv=1")
ax.set_xlabel('X - Values')
ax.set_ylabel('PDF(X)')

#Fill_color_between
px=np.arange(1,4, 0.001)
ax.set_ylim(0, 0.5)
ax.fill_between(px, norm.pdf(px, loc=0, scale=1), color='yellow')

prob = norm.sf(X)
ax.text(-0.8, 0.02, f'P(X>{np.round(X, 3)})\n {np.round(prob, 2)}', fontsize=18);

# Area Under Curve [-1,1] =~ 68%
X = -1
Y = 1
print(f'The probability between ({X}, {Y}) in the Normal Standard Distribution is: ', norm.cdf(Y) - norm.cdf(X))

fig, ax = plt.subplots()

#Distribution curve
x= np.arange(-4, 4, 0.001)
ax.plot(x, norm.pdf(x, loc=0, scale=1))
ax.set_title("Normal Dist. with mean=0, std_dv=1")
ax.set_xlabel('X-Values')
ax.set_ylabel('PDF(X)')

#Fill_between
px=np.arange(-1,1, 0.001)
ax.set_ylim(0, 0.5)
ax.fill_between(px, norm.pdf(px, loc=0, scale=1), color='yellow')

prob = norm.cdf(Y) - norm.cdf(X)
ax.text(-0.8, 0.02, f'P({np.round(X, 3)}<X<{np.round(Y, 3)})\n {np.round(prob, 2)}', fontsize=18);

Normal Distribution - Area under curve empirical [-1,1] = 68%

# Area Under Curve [-2,2] =~ 95%
X = - 2
Y = 2
print(f'The probability between ({X}, {Y}) in the Normal Standard Distribution is: ', norm.cdf(Y) - norm.cdf(X))

fig, ax = plt.subplots()

#Distribution curve
x= np.arange(-4, 4, 0.001)
ax.plot(x, norm.pdf(x, loc=0, scale=1))
ax.set_title("Normal Dist. with mean=0, std_dv=1")
ax.set_xlabel('X-Values')
ax.set_ylabel('PDF(X)')

#Fill_between
px=np.arange(-2,2, 0.001)
ax.set_ylim(0, 0.5)
ax.fill_between(px, norm.pdf(px, loc=0, scale=1), color='yellow')

prob = norm.cdf(Y) - norm.cdf(X)
ax.text(-0.8, 0.02, f'P({np.round(X, 3)}<X<{np.round(Y, 3)})\n {np.round(prob, 2)}', fontsize=18);

Normal Distribution - Area under curve empirical [-2,2] = 95%

# Area Under Curve [-3,3] =~ 99.7%
X = - 3
Y = 3
print(f'The probability between ({X}, {Y}) in the Normal Standard Distribution is: ', norm.cdf(Y) - norm.cdf(X))

fig, ax = plt.subplots()

#Distribution curve
x= np.arange(-4, 4, 0.001)
ax.plot(x, norm.pdf(x, loc=0, scale=1))
ax.set_title("Normal Dist. with mean=0, std_dv=1")
ax.set_xlabel('X-Values')
ax.set_ylabel('PDF(X)')

#Fill_between
px=np.arange(-3,3, 0.001)
ax.set_ylim(0, 0.5)
ax.fill_between(px, norm.pdf(px, loc=0, scale=1), color='yellow')

prob = norm.cdf(Y) - norm.cdf(X)
ax.text(-0.8, 0.02, f'P({np.round(X, 5)}<X<{np.round(Y, 5)})\n {np.round(prob, 5)}', fontsize=18);
Normal Distribution - Area under curve empirical [-3,31] = 99.7%

Normal Distribution - empirical

Advantages of the Normal Distribution:

1) Universality: The Normal Distribution is widely applicable across various fields due to its prevalence in natural phenomena and human behavior. Ease of Use: Its mathematical properties make it convenient for analytical purposes and statistical inference.
2) Central Limit Theorem: The Normal Distribution serves as a foundation for the Central Limit Theorem, which states that the distribution of sample means approaches a Normal Distribution as the sample size increases, regardless of the shape of the population distribution.
3) Statistical Inference: Many statistical methods, such as hypothesis testing and confidence intervals, rely on the assumption of Normality for accurate results.
Limitations of the Normal Distribution:
1) Sensitivity to Outliers: The Normal Distribution is sensitive to outliers, meaning that extreme values can significantly impact its shape and parameters.
2) Assumption of Normality: While the Central Limit Theorem allows for approximation to Normality in large samples, real-world data may not always follow a perfectly Normal Distribution.
3) Restricted Applicability: In cases where data is skewed or exhibits non-Normal behavior, alternative distributions may provide better fits and more accurate results.

Importance of the Normal Distribution:

The Normal Distribution holds immense importance in statistics and various fields for several reasons:
Predictive Modeling: Many natural phenomena and human behaviors, such as heights, test scores, and product demand, follow a Normal Distribution. Understanding this distribution allows for accurate predictions and modeling of these phenomena.
Quality Control: In manufacturing and production processes, deviations from a desired standard often follow a Normal Distribution. By analyzing process data using Normal Distribution principles, organizations can identify and rectify issues efficiently.
Risk Assessment: Financial markets often assume returns to be Normally distributed, enabling risk assessment and portfolio optimization strategies. Biological and Psychological Measurements: Vital statistics such as blood pressure, IQ scores, and reaction times tend to be Normally distributed, making the Normal Distribution invaluable in medical and psychological research.

Applications in Sales and Marketing

Example 1: Consumer Behavior Analysis Consider a marketing firm analyzing the purchasing behavior of consumers. By understanding that purchase amounts tend to follow a Normal Distribution, the firm can segment its customer base effectively and tailor marketing strategies to target specific consumer segments.
Example 2: Sales Forecasting In retail, sales data often exhibits a Normal Distribution pattern over time. By applying statistical methods based on the Normal Distribution, such as time series analysis and forecasting models, retailers can predict future sales volumes accurately, optimize inventory management, and plan marketing campaigns accordingly.
Example 3: Market Research Market researchers often collect data on consumer preferences, satisfaction levels, and brand perceptions. By assuming that these metrics are Normally distributed within the population, researchers can derive meaningful insights, make informed business decisions, and formulate marketing strategies that resonate with target audiences.

Probability Distributions - Normal Distribution - Python