STATISTICS-Notes
Maths - Notes
Mean of Grouped Data
- Direct Method:
This is the most straightforward approach and is suitable when class marks and frequencies are small and manageable.
If
- \(x_i\) = class mark of the of \(i^{th}\) class
-
\(f_i\) = frequency of the \(i^{th}\) class
then the mean \(\overline{x}\) is calculated as \[ \boxed{\;\boldsymbol{\overline{x} = \dfrac{\sum{f_i x_i}}{\sum{f_i}}}\;} \] This method clearly shows how each class contributes to the overall average, but it may involve lengthy calculations when values are large.
- Assumed Mean Method:
To simplify calculations, an assumed mean \(a\) is chosen, usually one of the class marks near the centre of the distribution. Deviations of class marks from this assumed mean are then calculated.
This method reduces the size of numbers involved and is particularly useful when class marks are large but evenly spaced.
Let \[d_i=x_i-a\] then mean is obtained using \[ \boxed{\;\boldsymbol{\overline{x} = a +\dfrac{\sum{f_i x_i}}{\sum{f_i}}}\;} \] - Step Deviation Method:
When class intervals are equal and deviations are still large, the step deviation method provides further simplification by scaling down deviations.
If- \(h\) = common width of the class intervals
-
\(u_i=\frac{x_i-a}{h}\)
then the mean is calculated as \[ \boxed{\;\boldsymbol{\overline{x} = a + h \left(\dfrac{\sum{f_i x_i}}{\sum{f_i}}\right)}\;} \] This method is computationally efficient and widely used in examinations, especially when dealing with large data sets.
Class mark
It is assumed that the frequency of each class interval is centred around its mid-point. So the mid-point (or class mark) of each class can be chosen to represent the observations falling in the class.
\[\small\boxed{\boldsymbol{x=\dfrac{\text{Upper class limit + Lower class limit}}{2}}}\]Mode of Grouped Data
In statistics, the mode represents the value that occurs most frequently in a data set. When data is presented in a grouped form, individual values are not explicitly available. Hence, the mode cannot be identified by simple inspection. Instead, a systematic method is used to estimate the mode of grouped data, which indicates the class interval containing the highest concentration of observations.
Modal Class
In a grouped frequency distribution, the modal class is the class interval with the maximum frequency. This class gives a rough idea of where the mode lies, but a formula is required to obtain a more precise value
Formula for Mode of Grouped Data
Let- \(l\) = lower limit of the modal class
- \(h\) = size (width) of the class interval
- \(f_1\) = frequency of the modal class
- \(f_0\) = frequency of the class preceding the modal class
- \(f_2\) = frequency of the class succeeding the modal class
Then, the mode is given by the formula \[\boxed{\;\boldsymbol{\text{Mode}=l+\left(\dfrac{f_1-f_0}{2f_1-f_0-f_2}\right)h}\;}\] This formula estimates the value around which data is most densely clustered within the modal class.
Special Cases
- If two or more classes have the same highest frequency, the data is said to be bi-modal or multi-modal, and the mode may not be uniquely defined.
- If frequencies rise steadily and then fall, the mode is well-defined and meaningful.
- If the distribution is irregular, the mode may only give an approximate indication of concentration.
Uses of Mode
- Data contains extreme values that may distort the mean.
- The most common or typical value is required rather than an average.
- Data is qualitative or grouped into categories such as shoe sizes, grades, or income groups.
The mode is particularly useful when:
Median of Grouped Data
The median is a measure of central tendency that represents the middle value of a data set when the observations are arranged in order. In many real-life situations, data is organised into class intervals rather than listed individually. Such data is called grouped data, and in this case, the median cannot be identified by direct observation. Instead, it is determined using a structured approach based on cumulative frequencies.
Meaning of Median in Grouped Data
For grouped data, the median divides the entire frequency distribution into two equal parts. This means that half of the observations lie below the median value and the remaining half lie above it. Since individual observations are unknown, the median obtained is an approximate value that lies within a specific class interval known as the median class.
Cumulative Frequency
To find the median, the concept of cumulative frequency is essential. The cumulative frequency of a class is the total number of observations up to and including that class. Cumulative frequencies help in locating the position of the median within the distribution.
Steps to Find the Median of Grouped Data
- Arrange the data into a frequency table with class intervals and corresponding frequencies.
- Compute the cumulative frequencies for all classes.
- Find the value of \(\frac{N}{2}\), where \(N\) is the sum of all frequencies.
- Identify the class whose cumulative frequency is just greater than \(\frac{N}{2}\). This class is called the median class.
- Apply the median formula to calculate the required value.
Formula for Median of Grouped Data
Let- \(l\) = lower limit of the median class
- \(h\)= class width
- \(f\)= frequency of the median class
- \(cf\)= cumulative frequency of the class preceding the median class
- \(N\)= total frequency
Then, the median is given by \[\boxed{\;\boldsymbol{\text{Median}=l+\left(\dfrac{\frac{N}{2}-cf}{f}\right)h}\;}\]
Interpretation of the Formula
The term \(\frac{N}{2}−cf\) represents how many observations lie within the median class before reaching the middle position. Dividing by \(f\) gives the fractional position within the median class, and multiplying by \(h\) scales this position according to the class width. Adding this to \(l\) locates the median precisely within the class interval.
Key Characteristics of the Median
- The median is not affected by extreme values or outliers.
- It is particularly useful for skewed distributions.
- For grouped data, the median is an estimated value based on class intervals.
Relation with Other Measures of Central Tendency
In a moderately symmetrical distribution, the median is related to the mean and mode by the empirical relation: \[\boxed{\;\boldsymbol{}\text{Mode}=3\text{(Median)}-2\text{(Mean)}\;}\] This relation helps in cross-checking calculations and understanding the overall nature of the distribution.
Example-1
The marks obtained by 30 students of Class X of a certain school in a Mathematics paper consisting of 100 marks are presented in table below. Find the mean of the marks obtained by the students. \[\scriptsize \begin{array}{|c|c|} \hline \text{Marks Obtained } x_i&10&20&36&40&50&56&60&70&72&80&88&92&95\\\hline \text{Number of Students }f_i&1&1&3&4&3&2&4&4&1&1&2&3&1\\\hline \end{array} \]
Solution
Mean can be found by \[ \overline{x}=\dfrac{\sum(f_ix_i)}{\sum(f_i)} \]Let us put \(x_i\) and \(f_ix_i\) is a table
| Marks Obtained \(x_i\) | Number of Students \(f_i\) | \(f_ix_i\) |
|---|---|---|
| 10 | 1 | 10 |
| 20 | 1 | 20 |
| 36 | 3 | 108 |
| 40 | 4 | 160 |
| 50 | 3 | 150 |
| 56 | 2 | 112 |
| 60 | 4 | 240 |
| 70 | 4 | 280 |
| 72 | 1 | 72 |
| 80 | 1 | 80 |
| 88 | 2 | 176 |
| 92 | 3 | 276 |
| 95 | 1 | 95 |
| Total | \(\sum(f_i)=30\) | \(\sum(f_ix_i)=1779\) |
Let us convert this this ungrouped data into grouped data by forming class-interval of width, say 15
While allocating frequencies to each class-interval, students falling in any upper class-limit would be considered in the next class, e.g., 4 students who have obtained 40 marks would be considered in the class interval 40-55 and not in 25-40.
\[ \begin{array}{|c|c|} \hline \text{Class Interval }&10-25&25-40&40-55&55-70&70-85&85-100\\\hline \text{Number of Students }&2&3&7&6&6&6\\\hline \end{array} \] \[ \begin{aligned} \text{Class Mark }&=\dfrac{\text{Upper Class Limit + Lower Class Limit}}{2}\\\\ &=\dfrac{10+25}{2}\\\\ &=17.5 \end{aligned} \] Similarly, we can find class-interval of other class interval.Let us put all data in a table
| Class Interval | Number of Students \(f_i\) | Class Mark \(x_i\) \(\left(x_i=\frac{UCL+LCL}{2}\right)\) |
\(f_ix_i\) |
|---|---|---|---|
| 10-25 | 2 | 17.5 | 35.0 |
| 25-40 | 3 | 32.5 | 97.5 |
| 40-55 | 7 | 47.5 | 332.5 |
| 55-70 | 6 | 62.5 | 375 |
| 70-85 | 6 | 77.5 | 465 |
| 85-100 | 93.5 | 6 | 555.0 |
| Total | \(\sum{f_i}=30\) | \(\sum{f_ix_i}=1860.0\) |
Let us solve the same question by assumed mean method
Let's assume mean (mid value) of \(x_i\), in this example we can take assumed mean value as 47.5 or 62.5
Let assumed mean \(a\) = 47.5
| Class Interval | Number of Students \(f_i\) | Class Mark \(x_i\) \(\left(x_i=\frac{UCL+LCL}{2}\right)\) |
Deviation \(d_i = x_i-47.5\) |
\(f_id_i\) |
|---|---|---|---|---|
| 10-25 | 2 | 17.5 | -30 | -60 |
| 25-40 | 3 | 32.5 | -15 | -45 |
| 40-55 | 7 | 47.5 | 0 | 0 |
| 55-70 | 6 | 62.5 | 15 | 90 |
| 70-85 | 6 | 77.5 | 30 | 180 |
| 85-100 | 93.5 | 6 | 45 | 270 |
| Total | \(\sum{x_i}=30\) | \(\sum{f_id_i}=435\) |
Applying Step Deviation Method
If we divide column-4 value with class size (h)So, Let \[ u_i=\dfrac{x_i-a}{h} \] Let us form the table with this additional value \(u_i\)
| Class Interval | Number of Students \(f_i\) | Class Mark \(x_i\) \(\left(x_i=\frac{UCL+LCL}{2}\right)\) |
\(d_i = x_i-47.5\) | \(u_i=\dfrac{x_i-a}{h}\) | \(f_iu_i\) |
|---|---|---|---|---|---|
| 10-25 | 2 | 17.5 | -30 | -2 | -4 |
| 25-40 | 3 | 32.5 | -15 | -1 | -3 |
| 40-55 | 7 | 47.5 | 0 | 0 | 0 |
| 55-70 | 6 | 62.5 | 15 | 1 | 6 |
| 70-85 | 6 | 77.5 | 30 | 2 | 12 |
| 85-100 | 93.5 | 6 | 45 | 3 | 18 |
| Total | \(\sum{f_i}=30\) | \(\sum{f_iu_i}=29\) |
Example-2
A survey conducted on 20 households in a locality by a group of students resulted in the following frequency table for the number of family members in a household:
\[ \begin{array}{|c|c|} \hline \text{Family Size}&1-3&3-5&5-7&7-9&9-11\\\hline \text{Number of Families}&7&8&2&2&1\\\hline \end{array} \]Solution
Here is Maximum Class Frequecy is 8, so modal class corresponding to this frequency is 3-5
- Modal Class=3-5
- Lower Limit of Modal Class \(l\)=3
- Class Size \(h\)=2
- Frequecy \(f_1\) of Modal Class = 8
- Frequecy \(f_0\) of class preceding the Modal Class = 7
- Frequecy \(f_2\) of class succeeding the Modal Class = 2
Example-3
A survey regarding the heights (in cm) of 51 girls of Class X of a school was conducted and the following data was obtained:
| Height (in cm) | Number of Girls |
|---|---|
| Less than 140 | 4 |
| Less than 145 | 11 |
| Less than 150 | 29 |
| Less than 155 | 40 |
| Less than 160 | 46 |
| Less than 165 | 51 |
Solution
To calculate Median of Height , we need to find class interval and their corresponding frequncies
| Class Interval | Frequecy | Cumulative Frequecy |
|---|---|---|
| Less than 140 | 4 | 4 |
| 140-145 | 7 | 11 |
| 145-150 | 18 | 29 |
| 150-155 | 11 | 40 |
| 155-160 | 6 | 46 |
| 160-165 | 5 | 51 |
\(n=51\), therfore, \(\frac{n}{2}=25.5\), this observation lies in class 145-150
- \(l\) (Lower limt)=145
- \(cf\) (the cumulative frequency of the class preceding 145 - 150) = 11
- \(f\) (the frequency of the median class 145 - 150) = 18
- \(h\) = Class Size = 5
Example-4
The median of the following data is 525. Find the values of x and y, if the total frequency is 100.
| Class Interval | Frequecy |
|---|---|
| 0-100 | 2 |
| 100-200 | 5 |
| 200-300 | \(x\) |
| 300-400 | 12 |
| 400-500 | 17 |
| 500-600 | 20 |
| 600-700 | \(y\) |
| 700-800 | 9 |
| 800-900 | 7 |
| 900-1000 | 4 |
Solution
| Class Interval | Frequecy | Cumulative Frequecy |
|---|---|---|
| 0-100 | 2 | \(2\) |
| 100-200 | 5 | \(7\) |
| 200-300 | \(x\) | \(7+x\) |
| 300-400 | 12 | \(19+x\) |
| 400-500 | 17 | \(36+x\) |
| 500-600 | 20 | \(56+x\) |
| 600-700 | \(y\) | \(56+x+y\) |
| 700-800 | 9 | \(65+x+y\) |
| 800-900 | 7 | \(72+x+y\) |
| 900-1000 | 4 | \(76+x+y\) |
It is given that \(n=100\)
\[ \begin{align} 76+x+y&=100\\ x+y&=100-76\\ x+y&=24\tag{1} \end{align} \] Median Class=500-600,- \(l=500\)
- \(n=100\)
- \(cf=36+x\)
- \(f=20\)
- \(h=100\)
Substituting \(x=9\) in equation-(1)
\[ \begin{aligned} x+y&=24\\ 9+y&=24\\ y&=24-9\\ &=15 \end{aligned} \]