# Statistics for Data Science : P-value in Hypothesis Testing

I have been looking for a simple intuitive explanation for p-value for a long time and found one after listening to a Super Data Science podcast by Kirill Eremenko : https://www.superdatascience.com/podcast.

The ideas here are a reproduction of what I learnt by listening to the above podcast , and I hope that I am able to make it as clear and simple as the podcast itself.

p-value is a term in statistics to denote the measure of probability of an event happening by random chance , given our null hypothesis is true .

This can be best illustrated from an example as below :

Example : We have coin and we want to find out if it is an unbiased or a biased coin. If it is unbiased then P(H) = P(T) = 1/2, where P(H) and P(T) denote the probability of getting a head or tail on a random toss.

We formulate the experiment by proposing our null hypothesis

H0 : the coin is unbiased ie P(H) = P(T)

and alternative hypothesis

H1 : the coin is biased ie P(H) != P(T) .

We start by assuming that our null hypothesis is true and then carry out the below experiment to check whether there is any evidence to reject our null hypotheses :

Suppose we toss the coin 5 times in succession and here is what we observe.

This is perfectly normal as probability of obtaining heads or tails in 1 coin toss is 1/2 or 50%.

Probability of 2 heads occurring in 2 consecutive tosses is 1/4 or 25% as HH is one event out 4 different possibilities(HH,TT,HT,TH).