Statistics for Data Science : P-value in Hypothesis Testing

Prakhar S
3 min readOct 24, 2021

--

I have been looking for a simple intuitive explanation for p-value for a long time and found one after listening to a Super Data Science podcast by Kirill Eremenko : https://www.superdatascience.com/podcast.

The ideas here are a reproduction of what I learnt by listening to the above podcast , and I hope that I am able to make it as clear and simple as the podcast itself.

p-value is a term in statistics to denote the measure of probability of an event happening by random chance , given our null hypothesis is true .

This can be best illustrated from an example as below :

Example : We have coin and we want to find out if it is an unbiased or a biased coin. If it is unbiased then P(H) = P(T) = 1/2, where P(H) and P(T) denote the probability of getting a head or tail on a random toss.

We formulate the experiment by proposing our null hypothesis

H0 : the coin is unbiased ie P(H) = P(T)

and alternative hypothesis

H1 : the coin is biased ie P(H) != P(T) .

We start by assuming that our null hypothesis is true and then carry out the below experiment to check whether there is any evidence to reject our null hypotheses :

Suppose we toss the coin 5 times in succession and here is what we observe.

I Toss : Heads.

This is perfectly normal as probability of obtaining heads or tails in 1 coin toss is 1/2 or 50%.

II Toss : Heads .

Probability of 2 heads occurring in 2 consecutive tosses is 1/4 or 25% as HH is one event out 4 different possibilities(HH,TT,HT,TH).

III Toss : Heads .

Probability of 3 heads occurring in 3 consecutive tosses is 1/8 or 12.5% (the no of possible outcomes here is 2³). Now we might start having a slight doubt as to whether our coin is biased towards heads or not, but still 12.5% is a significant enough probability .

IV Toss : Heads .

Probability of 4 heads occurring in 4 consecutive tosses is 1/2⁴ or 1/16 ie 6.25%. This is still a significant probability value , so we cannot still say that we have enough evidence to reject our null-hypotheses.

V Toss : Heads.

Probability of 5 heads occurring in 5 consecutive tosses is 1/2⁵ or 1/32 ie about 3%. This is quite low, and not significant enough , and provides evidence that our null-hypothesis can be rejected.

So our p-value is the probability of an event occurring given our null-hypothesis is true. If the p-value is below a particular significance value, (most commonly 5% ) , we can say that we have enough evidence to reject our Null Hypothesis in the favour of our Alternative Hypothesis.

--

--