How is the median calculated? How did the concept of mean come about? The meaning of the word median

The MEDIAN function in Excel is used to analyze a range of numeric values ​​and returns a number that is the middle of the set under study (median). That is, this function conditionally divides the set of numbers into two subsets, the first of which contains numbers less than the median, and the second - more. The median is one of several methods for determining the central trend of a range under study.

Examples of using the MEDIAN function in Excel

When studying the age groups of students, data from a randomly selected group of students in the university were used. The task is to determine the median age of students.

Initial data:

Formula for calculation:


Argument description:

  • B3:B15 - the range of the studied ages.

Result:

That is, there are students in the group whose age is less than 21 years and more than this value.



Comparing the MEDIAN and AVERAGE functions to calculate the average value

During the evening round in the hospital, each patient's body temperature was measured. Demonstrate the feasibility of using the median parameter instead of the mean value to explore a series of obtained values.

Initial data:

Formula for finding the average value:

Formula for finding the median:

As can be seen from the average value, on average, the temperature in patients is above normal, but this is not true. The median shows that at least half of the patients have a normal body temperature, not exceeding 36.6.

Attention! Another method for determining the central trend is the mode (the most common value in the range under study). To determine the central trend in Excel, use the FASHION function. Note that in this example, the median and mode values ​​are the same:

That is, the median value that divides one set into subsets of smaller and larger values ​​is also the most frequently occurring value in the set. As you can see, most patients have a temperature of 36.6.

An example of calculating the median in statistical analysis in Excel

Example 3. There are 3 salespeople working in a store. Based on the results of the last 10 days, it is necessary to determine the employee to whom the bonus will be issued. When choosing the best worker, the degree of efficiency of his work is taken into account, and not the number of goods sold.

Source data table:


To characterize the efficiency, we will use three indicators at once: the mean value, the median, and the mode. Let's define them for each employee using the AVERAGE, MEDIAN and FASHION formulas, respectively:


To determine the degree of data scatter, we use a value that is the total value of the modulus of the difference between the mean value and the mode, the mean value and the median, respectively. That is, the coefficient x=|av-med|+|av-mod|, where:

  • av – mean value;
  • med is the median;
  • mod - fashion.

Calculate the value of the coefficient x for the first seller:

Similarly, we will carry out calculations for other sellers. Results:


Let's define the seller to whom the bonus will be given:

Note: The SMALL function returns the first minimum value from the considered range of x-factor values.


The coefficient x is some quantitative characteristic of the stability of the work of sellers, which was introduced by the store economist. With its help, it was possible to determine the range with the smallest deviations of values. This method demonstrates how three methods of determining the central trend can be used at once to obtain the most reliable results.

Features of using the MEDIAN function in Excel

The function has the following syntax:

MEDIAN(number1, [number2],...)

Description of arguments:

  • number1 is a mandatory argument that characterizes the first numeric value contained in the range under study;
  • [number2] – optional second (and subsequent arguments, up to 255 arguments in total) characterizing the second and subsequent values ​​of the range under study.

Notes 1:

  1. When calculating, it is more convenient to transfer the entire range of the studied values ​​at once instead of entering the arguments sequentially.
  2. Arguments are numeric data, names containing numbers, reference data, and arrays (for example, =MEDIAN((1;2;3;5;7;10))).
  3. When calculating the median, cells containing empty values ​​or logical TRUE, FALSE are taken into account, which will be interpreted as numeric values ​​1 and 0, respectively. For example, the result of executing a function with logical values ​​in the arguments (TRUE; FALSE) is equivalent to the result of executing with arguments (1; 0) and is equal to 0.5.
  4. If one or more function arguments take text values ​​that cannot be converted to numeric values, or contain error codes, the function will return the #VALUE! error code.
  5. Other Excel functions can be used to determine the median of the sample: PERCENTILE.INC, QUARTILE.INC, GREAT Examples of use:
  • =PERCENTILE.ON(A1:A10,0.5) because by definition the median is the 50th percentile.
  • =QUARTILE.ON(A1:A10,2) because the median is the 2nd quartile.
  • =LARGE(A1:A9;COUNT(A1:A9)/2), but only if the number of numbers in the range is an odd number.

Notes 2:

  1. If all numbers in the range under study are distributed symmetrically about the mean, the arithmetic mean and median for this range will be equivalent.
  2. With large data deviations in the range (“scatter” of values), the median better reflects the trend in the distribution of values ​​than the arithmetic mean. An excellent example is the use of the median to determine the real level of salaries of the population of a state in which officials receive an order of magnitude more than ordinary citizens.
  3. The range of investigated values ​​may contain:
  • Odd number of numbers. In this case, the median will be a single number dividing the range into two subsets of larger and smaller values, respectively;
  • An even number of numbers. Then the median is calculated as the arithmetic mean of two numeric values ​​dividing the set into the two subsets indicated above.

Due to the fact that the researcher does not have data on the volume of sales in each exchange office, the calculation of the arithmetic average in order to determine the average price per dollar is inappropriate.

Median of a series of numbers

However, it is possible to determine the value of the attribute, which is called the median (Me). Median

Median number: NoMe = ;

Fashion

Table 3.6.

f is the sum of the frequencies of the series;

S cumulative frequencies

S are accumulated frequencies.

On fig. 3.2. A histogram of a series of distribution of banks by profit is shown (according to Table 3.6.).

x is the amount of profit, million rubles,

f is the number of banks.

"MEDIAN OF THE ORDERED SERIES"

Text HTML version of the publication


Summary of the lesson of algebra in grade 7

Theme of the lesson: "MEDIAN OF THE ORDERED SERIES".

teacher of the Lake School branch of MKOU Burkovskaya secondary school Eremenko Tatyana Alekseevna
Goals:
the concept of the median as a statistical characteristic of an ordered series; to form the ability to find the median for ordered series with an even and odd number of members; to form the ability to interpret the values ​​of the median depending on the practical situation, to consolidate the concept of the arithmetic mean set of numbers. Develop independent work skills. Build an interest in mathematics.
During the classes

oral work.
Rows are given: 1) 4; one; eight; 5; one; 2) ; 9; 3; 0.5; ; 3) 6; 0.2; ; four; 6; 7.3; 6. Find: a) the largest and smallest values ​​of each row; b) the range of each row; c) the fashion of each row.
II. Explanation of new material.
Textbook work. 1. Consider the problem from paragraph 10 of the textbook. What does ordered row mean? I emphasize that before finding the median, you must always sort the data series. 2. On the board, we get acquainted with the rules for finding the median for series with an even and odd number of members:
median

orderly

row
numbers
With

odd

number

members

called the number written in the middle, and
median

ordered row
numbers
with an even number of members
is called the arithmetic mean of two numbers written in the middle.
median

arbitrary

row
is called the median 1 3 1 7 5 4 of the corresponding ordered series.
I note that the indicators are the arithmetic mean, mode and median for

differently

characterize

data,

received

result

observations.

III. Formation of skills and abilities.
1st group. Exercises on the application of formulas for finding the median of an ordered and unordered series. one.
№ 186.
Solution: a) Number of members of the series P= 9; median Me= 41; b) P= 7, the row is ordered, Me= 207; in) P= 6, the row is ordered, Me== 21; G) P= 8, the row is ordered, Me== 2.9. Answer: a) 41; b) 207; at 21; d) 2.9. Students comment on how the median is found. 2. Find the arithmetic mean and median of a series of numbers: a) 27, 29, 23, 31, 21, 34; in) ; 1. b) 56, 58, 64, 66, 62, 74. Solution: To find the median, it is necessary to sort each row: a) 21, 23, 27, 29, 31, 34. P = 6; X = = 27,5; Me== 28; 20 22 2 + 2, 6 3, 2 2 + 1125 ; ; ; 3636 21 23 27 29 31 34 165 66 +++++ = 27 29 2 + b) 56, 58, 62, 64, 66, 74.

How to find median in statistics

P = 6; X = 63,3; Me== 63; in) ; one. P = 5; X = : 5 = 3: 5 = 0,6; Me = . 3.
№ 188
(orally). Answer: yes; b) no; c) no; d) yes. 4. Knowing that the ordered series contains t numbers, where t is an odd number, indicate the number of the term that is the median if t is equal to: a) 5; b) 17; c) 47; d) 201. Answer: a) 3; b) 9; c) 24; d) 101. 2nd group. Practical tasks for finding the median of the corresponding series and interpreting the result. one.
№ 189.
Solution: Number of row members P= 12. To find the median, the series must be ordered: 136, 149, 156, 158, 168, 174, 178, 179, 185, 185, 185, 194. Median of the series Me= = 176. Monthly output was more than the median for the following members of the artel: 56 58 62 64 66 74 380 66 +++++ =≈ 62 64 2 + 1125; ; ; 3636 1125 12456 18 1:5:5 6336 6 6 ++++ ⎛⎞ ++++ = = ⎜⎟ ⎝⎠ 2 3 67 174 178 22 xx+ + = 1) Kvitko; 4) Bobkov; 2) Baranov; 5) Rylov; 3) Antonov; 6) Astafiev. Answer: 176. 2.
№ 192.
Solution: Let's arrange the data series: 30, 31, 32, 32, 32, 32, 32, 32, 33, 35, 35, 36, 36, 36, 38, 38, 38, 40, 40, 42; number of row members P= 20. Swipe A = x max- x min = 42 - 30 = 12. Mode Mo= 32 (this value occurs 6 times - more often than others). Median Me= = 35. In this case, the range shows the greatest spread of time for processing the part; the mode shows the most typical value of the processing time; median is the processing time that half of the turners did not exceed. Answer: 12; 32; 35.
IV. Summary of the lesson.
What is the median of a series of numbers? – Can the median of a series of numbers not coincide with any of the numbers in the series? – What number is the median of an ordered series containing 2 P numbers? 2 P– 1 numbers? How to find the median of an unordered series?
Homework:
№ 187, № 190, № 191, № 254. 10 11 35 35 22 xx + + =

In the section basic general education

Mode and median

The mean values ​​also include the mode and the median.

The median and mode are often used as an average characteristic in those populations where the calculation of the average (arithmetic, harmonic, etc.) is impossible or impractical.

For example, a sample survey in the city of Omsk of 12 commercial currency exchange offices made it possible to fix various prices for the dollar when it was sold (data as of October 10, 1995 at the exchange rate of the dollar -4493 rubles).

Due to the fact that the researcher does not have data on the volume of sales in each exchange office, the calculation of the arithmetic average in order to determine the average price per dollar is inappropriate. However, it is possible to determine the value of the attribute, which is called the median (Me). Median lies in the middle of the ranked row and bisects it.

The calculation of the median for ungrouped data is done as follows:

a) arrange the individual values ​​of the feature in ascending order:

4500 4500 4535 4540 4550 4560 4560 4560 4560 4570 4570 4570

b) determine the serial number of the median by the formula:

in our example, this means that the median in this case is located between the sixth and seventh feature values ​​in the ranked series, since the series has an even number of individual values. Thus, Me is equal to the arithmetic mean of neighboring values: 4550, 4560.

c) consider the procedure for calculating the median in the case of an odd number of individual values.

Suppose we observe not 12, but 11 currency exchange points, then the ranked series will look like this (we discard the 12th point):

4500 4500 4535 4540 4550 4560 4560 4560 4560 4570 4570

Median number: NoMe = ;

in sixth place is = 4560, which is the median: Me = 4560. On both sides of it is the same number of points.

Fashion- this is the most common value of the attribute in units of this population. It corresponds to a certain characteristic value.

In our case, the modal price per dollar can be called 4560 rubles: this value is repeated 4 times, more often than all others.

In practice, the mode and median are usually found from grouped data. As a result of the grouping, a series of distribution of banks according to the amount of profit received for the year was obtained (Table 3.6.).

Table 3.6.

Grouping of banks by the amount of profit received for the year

To determine the median, it is necessary to calculate the sum of the cumulative frequencies. The increase in total continues until the cumulative sum of frequencies exceeds half the sum of frequencies. In our example, the sum of the accumulated frequencies (12) exceeds half of all values ​​(20:2). This value corresponds to the median interval, which contains the median (5.5 - 6.4). Let's determine its value by the formula:

where is the initial value of the interval containing the median;

- the value of the median interval;

f is the sum of the frequencies of the series;

is the sum of the cumulative frequencies preceding the median interval;

is the frequency of the median interval.

Thus, 50% of banks have a profit of 6.1 million rubles, and 50% of banks - more than 6.1 million rubles.

The highest frequency also corresponds to the interval 5.5 - 6.4, i.e. the mode must be in this interval. Its value is determined by the formula:

where is the initial value of the interval containing the mode;

- the value of the modal interval;

is the frequency of the modal interval;

- the frequency of the interval preceding the modal;

- the frequency of the interval following the modal.

The given fashion formula can be used in variational series with equal intervals.

Thus, in this aggregate, the most common profit is 6.10 million rubles.

Median and mode can be determined graphically. The median is determined by the cumulate (Fig. 3.1.). To construct it, it is necessary to calculate the cumulative frequencies and frequencies. Cumulative frequencies show how many units of the population have feature values ​​no greater than the considered value, and is determined by successive summation of interval frequencies. When constructing the cumulative interval distribution series, the lower boundary of the first interval corresponds to a frequency equal to zero, and the upper boundary corresponds to the entire frequency of the given interval. The upper limit of the second interval corresponds to the cumulative frequency equal to the sum of the frequencies of the first two intervals, and so on.

Let's build a cumulative curve according to Table. 6 on the distribution of banks by profit.

S cumulative frequencies

3.7-4.6 4.6-5.5 5.5-6.4 6.4-7.3 7.3-8.2 Х profit

Rice. 3.1. The cumulative distribution of banks by profit:

x is the amount of profit, million rubles,

S are accumulated frequencies.

To determine the median, the height of the largest ordinate, which corresponds to the total population, is divided in half. A straight line is drawn through the obtained point, parallel to the abscissa axis, until it intersects with the cumulate. The abscissa of the intersection point is the median.

The mode is determined from the histogram of the distribution. The histogram is built like this:

equal segments are plotted on the abscissa axis, which, on the accepted scale, correspond to the size of the intervals of the variation series. Rectangles are built on the segments, the areas of which are proportional to the frequencies (or frequencies) of the interval.

Median in statistics

3.2. A histogram of a series of distribution of banks by profit is shown (according to Table 3.6.).

3.7-4.6 4.6-5.5 5.5-6.4 6.4-7.3 7.3-8.2 Х

Rice. 3.2. Distribution of commercial banks by profit:

x is the amount of profit, million rubles,

f is the number of banks.

To determine the fashion, we connect the right vertex of the modal rectangle with the upper right corner of the previous rectangle, and the left vertex of the modal rectangle with the upper left corner of the next rectangle. The abscissa of the point of intersection of these lines will be the distribution mode.

Median (statistic)

Median (statistic), in mathematical statistics, a number that characterizes a sample (for example, a set of numbers). If all the elements in the sample are different, then the median is the number of the sample such that exactly half of the elements in the sample are greater than it and the other half are less than it. In a more general case, the median can be found by ordering the elements of the sample in ascending or descending order and taking the middle element. For example, the sample (11, 9, 3, 5, 5) after ordering turns into (3, 5, 5, 9, 11) and its median is the number 5. If the sample has an even number of elements, the median may not be uniquely determined: for numerical data, the half-sum of two adjacent values ​​is most often used (that is, the median of the set (1, 3, 5, 7) is taken equal to 4).

In other words, the median in statistics is the value that divides the series in half in such a way that on both sides of it (up or down) the same number of units of the given population is located.

Task number 1. Calculation of the arithmetic mean, modal and median value

Because of this property, this indicator has several other names: the 50th percentile or the 0.5 quantile.

  • Mean
  • Median
  • Fashion

Median (statistic)

Median (statistic), in mathematical statistics, a number that characterizes a sample (for example, a set of numbers). If all the elements in the sample are different, then the median is the number of the sample such that exactly half of the elements in the sample are greater than it and the other half are less than it. In a more general case, the median can be found by ordering the elements of the sample in ascending or descending order and taking the middle element. For example, the sample (11, 9, 3, 5, 5) after ordering turns into (3, 5, 5, 9, 11) and its median is the number 5.

5.5 Mode and median. Their calculation in discrete and interval variational series

If the sample has an even number of elements, the median may not be uniquely determined: for numerical data, the half-sum of two adjacent values ​​is most often used (that is, the median of the set (1, 3, 5, 7) is taken equal to 4).

In other words, the median in statistics is the value that divides the series in half in such a way that on both sides of it (up or down) the same number of units of the given population is located. Because of this property, this indicator has several other names: the 50th percentile or the 0.5 quantile.

The median is used instead of the arithmetic mean when the extreme variants of the ranked series (smallest and largest) in comparison with the rest turn out to be excessively large or excessively small.

The MEDIAN function measures the central trend, which is the center of a set of numbers in a statistical distribution. There are three most common ways to determine the central trend:

  • Mean- the arithmetic mean, which is calculated by adding a set of numbers, followed by dividing the resulting sum by their number.
    For example, the average for the numbers 2, 3, 3, 5, 7, and 10 is 5, which is the result of dividing their sum, which is 30, by their number, which is 6.
  • Median- a number that is the middle of a set of numbers: half of the numbers have values ​​​​greater than the median, and half of the numbers are smaller.
    For example, the median for the numbers 2, 3, 3, 5, 7, and 10 is 4.
  • Fashion is the number that occurs most frequently in the given set of numbers.
    For example, the mode for the numbers 2, 3, 3, 5, 7, and 10 would be 3.

Algebra lesson in 7th grade.

Topic "Median as a statistical characteristic".

Teacher Egorova N.I.

The purpose of the lesson: to form students' understanding of the median of a set of numbers and the ability to calculate it for simple numerical sets, fixing the concept of the arithmetic mean set of numbers.

Lesson type: explanation of new material.

During the classes

1. Organizational moment.

Inform the topic of the lesson and formulate its objectives.

2. Actualization of previous knowledge.

Questions for students:

What is the arithmetic mean of a set of numbers?

Where is the arithmetic mean located within a set of numbers?

What characterizes the arithmetic mean of a set of numbers?

Where is the arithmetic mean of a set of numbers often used?

Oral tasks:

Find the arithmetic mean of a set of numbers:

Checking homework.

Textbook: No. 169, No. 172.

3. Learning new material.

In the previous lesson, we got acquainted with such a statistical characteristic as the arithmetic mean of a set of numbers. Today we will devote a lesson to another statistical characteristic - the median.

Not only the arithmetic mean shows where on the number line the numbers of any set are located and where their center is. Another indicator is the median.

The median of a set of numbers is the number that divides the set into two equal parts. Instead of "median" one could say "middle".

First, using examples, we will analyze how to find the median, and then we will give a strict definition.

Consider the following verbal example using a projector

At the end of the school year, 11 students of the 7th grade passed the standard for running 100 meters. The following results were recorded:

After the guys ran the distance, Petya approached the teacher and asked what his result was.

“Most average: 16.9 seconds,” the teacher replied

"Why?" Petya was surprised. - After all, the arithmetic mean of all the results is about 18.3 seconds, and I ran a second or more better. And in general, Katya’s result (18.4) is much closer to the average than mine.”

“Your result is average because five people ran better than you and five worse. So you are right in the middle,” the teacher said.

Write an algorithm for finding the median of a set of numbers:

Order the numerical set (compose a ranked series).

At the same time, we cross out the “largest” and “smallest” numbers of this set of numbers until one number or two numbers remain.

If there is only one number, then it is the median.

If there are two numbers left, then the median will be the arithmetic mean of the two remaining numbers.

Invite students to independently formulate the definition of the median of a set of numbers, then read the definition of the median in the textbook (p. 40), then solve No. 186 (a, b), No. 187 (a) of the textbook (p. 41).

Comment:

Draw students' attention to an important circumstance: the median is practically insensitive to significant deviations of individual extreme values ​​of sets of numbers. In statistics, this property is called stability. The stability of a statistical indicator is a very important property, it insures us against random errors and individual unreliable data.

4. Consolidation of the studied material.

Problem solving.

Denote x-arithmetic mean, Me-median.

Set of numbers: 1,3,5,7,9.

x=(1+3+5+7+9):5=25:5=5,

Set of numbers: 1,3,5,7,14.

x=(1+3+5+7+14):5=30:5=6.

a) Set of numbers: 3,4,11,17,21

b) Set of numbers: 17,18,19,25,28

c) Set of numbers: 25, 25, 27, 28, 29, 40, 50

Conclusion: the median of a set of numbers consisting of an odd number of members is equal to the number in the middle.

a) A set of numbers: 2, 4, 8, 9.

Me = (4+8):2=12:2=6

b) A set of numbers: 1,3,5,7,8,9.

Me = (5+7):2=12:2=6

The median of a set of numbers containing an even number of members is half the sum of the two numbers in the middle.

The student received the following grades in algebra during the quarter:

5, 4, 2, 5, 5, 4, 4, 5, 5, 5.

Find the mean score and median of this set.

Let's find the average score, that is, the arithmetic mean:

x= (5+4+2+5+5+4+4+5+5+5): 10=44:10 = 4.4

Find the median of this set of numbers:

Let's order a set of numbers: 2,4,4,4,5,5,5,5,5,5

Only 10 numbers, to find the median you need to take two middle numbers and find their half sum.

Me = (5+5):2 = 5

Question to students: If you were a teacher, what grade would you give this student for a quarter? Justify the answer.

The president of the company receives a salary of 300,000 rubles. three of his deputies receive 150,000 rubles each, forty employees - 50,000 rubles each. and the salary of a cleaner is 10,000 rubles. Find the arithmetic mean and median of salaries in the company. Which of these characteristics is more profitable for the president to use for advertising purposes?

x \u003d (300000 + 3 150000 + 40 50000 + 10000): (1 + 3 + 40 + 1) \u003d 2760000: 45 \u003d 61333.33 (rubles)

No. 6. Orally.

A) How many numbers are in the set if its median is its ninth member?

B) How many numbers are in the set if its median is the arithmetic mean of the 7th and 8th terms?

C) In a set of seven numbers, the largest number was increased by 14. Will this change both the arithmetic mean and the median?

D) Each of the numbers in the set has been increased by 3. What will happen to the arithmetic mean and median?

Sweets in the store are sold by weight. To find out how many sweets are contained in one kilogram, Masha decided to find the weight of one candy. She weighed several candies and got the following results:

12, 13, 14, 12, 15, 16, 14, 13, 11.

Both characteristics are suitable for estimating the weight of one candy, since they are not very different from each other.

So, to characterize statistical information, the arithmetic mean and median are used. In many cases, some of the characteristics may not have any meaningful meaning (for example, having information about the time of road traffic accidents, it hardly makes sense to talk about the arithmetic mean of these data).

Homework: paragraph 10, No. 186 (c, d), No. 190.

5. The results of the lesson. Reflection.

  1. "Statistical research: collection and grouping of statistical data"

    Lesson

    Topics proposed for the seventh class. THEMATIC PLANNING. § one. Statisticalcharacteristics. P 1. Arithmetic mean, range and mode 1h. P 2. Medianhowstatisticalcharacteristic

  2. The work program of the training course "Algebra" in the 7th grade (basic level) explanatory note

    Working programm

    ... item 10 Medianhowstatisticalcharacteristic 23 p.9 Arithmetic mean, range and mode 24 Exam No. 2 on topic

  3. Working programm. Maths. 5th grade p. Kanashi. 2011

    Working programm

    ... equations. Arithmetic mean, range and mode. Medianhowstatisticalcharacteristic. The goal is to systematize and summarize information about ... and skills acquired at lessons according to topics(well algebra 10 class). 11 Class(4 hours per week...

  4. Order No. 51 of August 30, 2012 Algebra Work Program Grade 7

    Working programm

    … learning material Medianhowstatisticalcharacteristic Know the definition of arithmetic mean, range, mode and medianshowstatisticalcharacteristics Frontal and individual ...

  5. Work program in mathematics grade 7 ii level basic level (1)

    Working programm

    How to find the median of a series

    same, how at 6 classroom. The study Topics ends by introducing students to the simplest statisticalcharacteristics: medium ... M .: Publishing house "Genzher", 2009. 3. Zhokhov, V.I. Lessonsalgebra at 7 classroom: book. for the teacher / V. I. Zhokhov ...

Other related documents..

TEST

On the topic: "Mode. Median. Methods for calculating them"


Introduction

Mean values ​​and related indicators of variation play a very important role in statistics, which is due to the subject of its study. Therefore, this topic is one of the central in the course.

The average is a very common generalizing indicator in statistics. This is explained by the fact that only with the help of the average it is possible to characterize the population according to a quantitatively varying attribute. An average value in statistics is a generalizing characteristic of a set of phenomena of the same type according to some quantitatively varying attribute. The average shows the level of this attribute, related to the unit of the population.

Studying social phenomena and seeking to identify their characteristic, typical features in specific conditions of place and time, statisticians make extensive use of average values. With the help of averages, different populations can be compared with each other according to varying characteristics.

Averages used in statistics belong to the class of power averages. Of the power averages, the arithmetic mean is most often used, less often the harmonic mean; the harmonic mean is used only when calculating the average rates of dynamics, and the mean square - only when calculating the variation indicators.

The arithmetic mean is the quotient of dividing the sum of the options by their number. It is used in cases where the volume of a variable attribute for the entire population is formed as the sum of the attribute values ​​for its individual units. The arithmetic mean is the most common type of average, since it corresponds to the nature of social phenomena, where the volume of varying signs in the aggregate is most often formed precisely as the sum of the values ​​of the attribute in individual units of the population.

According to its defining property, the harmonic mean should be used when the total volume of the attribute is formed as the sum of the reciprocal values ​​of the variant. It is used when, depending on the material available, the weights have to be not multiplied, but divided by options or, which is the same, multiplied by their inverse value. The harmonic mean in these cases is the reciprocal of the arithmetic mean of the reciprocal values ​​of the attribute.

The harmonic mean should be used in those cases when not the units of the population - the carriers of the attribute, but the products of these units and the value of the attribute are used as weights.


1. Definition of mode and median in statistics

The arithmetic and harmonic means are the generalizing characteristics of the population according to one or another varying attribute. Auxiliary descriptive characteristics of the distribution of a variable attribute are the mode and the median.

In statistics, fashion is the value of a feature (variant) that is most often found in a given population. In the variation series, this will be the variant with the highest frequency.

The median in statistics is the variant, which is in the middle of the variation series. The median divides the series in half, on both sides of it (up and down) there is the same number of population units.

Mode and median, in contrast to power-law averages, are specific characteristics, their value is any particular option in the variation series.

Mode is used in cases where it is necessary to characterize the most frequently occurring value of a feature. If it is necessary, for example, to find out the most common wage rate in the enterprise, the market price at which the largest number of goods were sold, the size of shoes that are most in demand among consumers, etc., in these cases resort to fashion.

The median is interesting in that it shows the quantitative limit of the value of the variable characteristic, which was reached by half of the members of the population. Let the average salary of bank employees amount to 650,000 rubles. per month. This characteristic can be supplemented if we say that half of the workers received a salary of 700,000 rubles. and higher, i.e. let's take the median. The mode and median are typical characteristics in cases where the populations are homogeneous and large in number.


2. Finding the Mode and Median in a Discrete Variation Series

Finding the mode and median in a variational series, where the attribute values ​​are given by certain numbers, is not very difficult. Consider table 1. with the distribution of families by the number of children.

Table 1. Distribution of families by number of children

Obviously, in this example, the fashion will be a family with two children, since this value of options corresponds to the largest number of families. There may be distributions where all variants are equally frequent, in which case there is no fashion, or, in other words, all variants can be said to be equally modal. In other cases, not one, but two options may be the highest frequency. Then there will be two modes, the distribution will be bimodal. Bimodal distributions may indicate the qualitative heterogeneity of the population according to the trait under study.

To find the median in a discrete variation series, you need to divide the sum of frequencies in half and add ½ to the result. So, in the distribution of 185 families by the number of children, the median will be: 185/2 + ½ = 93, i.e. The 93rd option, which divides the ordered row in half. What is the meaning of the 93rd option? In order to find out, you need to accumulate frequencies, starting from the smallest options. The sum of the frequencies of the 1st and 2nd option is 40. It is clear that there are no 93 options here. If we add the frequency of the 3rd option to 40, then we get the sum equal to 40 + 75 = 115. Therefore, the 93rd option corresponds to the third value of the variable attribute, and the median will be a family with two children.

Mode and median in this example coincided. If we had an even sum of frequencies (for example, 184), then applying the above formula, we get the number of the median options, 184/2 + ½ = 92.5. Since there are no fractional options, the result indicates that the median is in the middle between 92 and 93 options.

3. Calculation of the mode and median in the interval variation series

The descriptive nature of the mode and median is due to the fact that they do not compensate for individual deviations. They always correspond to a certain variant. Therefore, the mode and median do not require calculations to find them if all the values ​​of the attribute are known. However, in the interval variation series, calculations are used to find the approximate value of the mode and median within a certain interval.

To calculate a certain value of the modal value of a sign enclosed in an interval, the following formula is used:

M o \u003d X Mo + i Mo * (f Mo - f Mo-1) / ((f Mo - f Mo-1) + (f Mo - f Mo + 1)),

Where X Mo is the minimum limit of the modal interval;

i Mo is the value of the modal interval;

fMo is the frequency of the modal interval;

f Mo-1 - the frequency of the interval preceding the modal;

f Mo+1 is the frequency of the interval following the modal.

We will show the calculation of the mode using the example given in Table 2.


Table 2. Distribution of workers of the enterprise according to the implementation of production standards

To find the mode, we first determine the modal interval of the given series. It can be seen from the example that the highest frequency corresponds to the interval where the variant lies in the range from 100 to 105. This is the modal interval. The value of the modal interval is 5.

Substituting the numerical values ​​from table 2. into the above formula, we get:

M o \u003d 100 + 5 * (104 -12) / ((104 - 12) + (104 - 98)) \u003d 108.8

The meaning of this formula is as follows: the value of that part of the modal interval, which must be added to its minimum boundary, is determined depending on the magnitude of the frequencies of the previous and subsequent intervals. In this case, we add 8.8 to 100, i.e. more than half of the interval, because the frequency of the previous interval is less than the frequency of the subsequent interval.

Let's calculate the median now. To find the median in the interval variation series, we first determine the interval in which it is located (the median interval). Such an interval will be one whose cumulative frequency is equal to or greater than half the sum of the frequencies. Cumulative frequencies are formed by gradual summation of frequencies, starting from the interval with the smallest feature value. Half the sum of the frequencies we have is 250 (500:2). Therefore, according to table 3. the median interval will be the interval with the value of wages from 350,000 rubles. up to 400,000 rubles.

Table 3. Calculation of the median in the interval variation series

Before this interval, the sum of the accumulated frequencies was 160. Therefore, in order to obtain the value of the median, it is necessary to add another 90 units (250 - 160).

In 1906, the great scientist and renowned eugenicist Francis Galton visited the annual Animal and Poultry Exhibition in western England, where, quite by chance, he performed an interesting experiment.

According to James Surowetsky, author of The Wisdom of the Crowd, there was one interesting competition at the Galton Fair in which people had to guess the weight of a slaughtered bull. The one who named the closest to the true number was declared the winner.

Galton was known for his contempt for the intellectual abilities of ordinary people. He believed that only real experts would be able to make accurate statements about the bull's weight. And 787 participants of the competition were not experts.

The scientist was going to prove the incompetence of the crowd by calculating the average number from the participants' answers. What was his surprise when it turned out that the result he received corresponded almost exactly to the real weight of the bull!

Average value - late invention

Of course, the accuracy of the answer amazed the researcher. But even more remarkable is the fact that Galton thought of using the average at all.

In today's world, averages, and so-called medians, are found everywhere: the average temperature in New York in April is 52 degrees Fahrenheit; Stephen Curry averages 30 points per game; The median household income in the US is $51,939/year.

However, the idea that many different outcomes can be represented by a single number is quite new. Until the 17th century, averages were not generally used.

How did the concept of averages and medians come about and develop? And how did it manage to become the main measuring technique in our time?

The predominance of means over medians had far-reaching consequences for our understanding of information. And often it led people astray.

Mean and median values

Imagine that you are telling a story about four people who dined with you last night at a restaurant. You would give one of them 20 years, another 30, the third 40, and the fourth 50. What would you say about their ages in your story?

Most likely, you will call them the average age.

The mean is often used to convey information about something, as well as to describe a set of measurements. Technically, the average is what mathematicians call the "arithmetic mean" - the sum of all measurements divided by the number of measurements.

Although the word "average" is often used as a synonym for the word "median" (median), the latter is more often referred to as the middle of something. This word comes from the Latin "medianus", which means "middle".

Median value in Ancient Greece

The history of the median value originates from the teachings of the ancient Greek mathematician Pythagoras. For Pythagoras and his school, the median had a clear definition and was very different from how we understand the average today. It was used only in mathematics, not in data analysis.

In the Pythagorean school, the median value was the average number in a three-term sequence of numbers, in "equal" relation to neighboring terms. "Equal" ratio could mean the same distance. For example, the number 4 in the row 2,4,6. However, it could also express a geometric progression, such as 10 in the sequence 1,10,100.

The statistician Churchill Eisenhart explains that in ancient Greece, the median was not used as a representative or substitute for any set of numbers. It simply denoted the middle, and was often used in mathematical proofs.

Eisenhart spent ten years studying the mean and median. Initially, he tried to find the representative function of the median in early scientific constructions. Instead, however, he found that most of the early physicists and astronomers relied on single, skillfully made measurements, and they did not have a methodology to choose the best result among many observations.

Modern researchers base their conclusions on the collection of large amounts of data, as, for example, biologists studying the human genome. Ancient scientists, on the other hand, could take several measurements, but chose only the best for building their theories.

As the historian of astronomy Otto Neugebauer wrote, "this is consistent with the conscious desire of ancient people to minimize the amount of empirical data in science, because they did not believe in the accuracy of direct observations."

For example, the Greek mathematician and astronomer Ptolemy calculated the angular diameter of the moon using the method of observation and the theory of the motion of the earth. His score was 31'20. Today we know that the diameter of the Moon ranges from 29'20 to 34'6, depending on the distance from the Earth. Ptolemy used little data in his calculations, but he had every reason to believe that they were accurate.

Eisenhart writes: “It must be borne in mind that the relationship between observation and theory in antiquity was different than it is today. The results of observations were understood not as facts to which the theory must be adjusted, but as concrete cases that can be useful only as illustrative examples of the truth of the theory.

Eventually, scientists will turn to representative measurements of the data, but initially neither means nor medians were used in this role. From antiquity to the present day, another mathematical concept has been used as such a representative means - the half-sum of extreme values.

Half sum of extreme values

New scientific tools almost always arise from the need to solve a certain problem in some discipline. The need to find the best value among many measurements arose from the need to accurately determine the geographic location.

The 11th century intellectual giant Al-Biruni is known as one of the first people to use the methodology of representative meanings. Al-Biruni wrote that when he had many measurements at his disposal and wanted to find the best among them, he used the following "rule": you need to find a number corresponding to the middle between two extreme values. When calculating the half-sum of extreme values, all numbers between the maximum and minimum values ​​\u200b\u200bare not taken into account, but only the average of these two numbers is found.

Al-Biruni applied this method in various fields, including to calculate the longitude of the city of Ghazni, which is located on the territory of modern Afghanistan, as well as in his studies of the properties of metals.

However, in the last few centuries, the half-sum of the extremes has been used less and less. In fact, in modern science, it is not relevant at all. The median value replaced the half-sum.

Transition to Averages

By the early 19th century, the use of the median/mean had become a common method for finding the most accurately representative value from a group of data. Friedrich von Gauss, an outstanding mathematician of his time, wrote in 1809: “It was believed that if a certain number was determined by several direct observations made under the same conditions, then the arithmetic mean is the most true value. If it is not quite strict, then at least it is close to reality, and therefore one can always rely on it.

Why has there been such a shift in methodology?

This question is rather difficult to answer. In his research, Churchill Eisenhart suggests that the method of finding the arithmetic mean could have originated in the field of measuring magnetic deviation, that is, in finding the difference between the direction of the compass needle pointing north and the real north. This measurement was extremely important during the Age of Discovery.

Eisenhart found that until the end of the 16th century, most scientists who measured magnetic deviation used the ad hoc method (from Latin "to this, for this occasion, for this purpose") in choosing the most accurate measurement.

But in 1580, the scientist William Borough approached the problem differently. He took eight different measurements of deflection and compared them, and concluded that the most accurate reading was between 11 ⅓ and 11 ¼ degrees. He probably calculated the arithmetic mean, which was in this range. However, Borough himself did not openly call his approach the new method.

Before 1635, there were no unequivocal cases of using the average value as a representative number at all. However, it was then that the English astronomer Henry Gellibrand took two different measurements of the magnetic deflection. One was done in the morning (11 degrees) and the other in the afternoon (11 degrees and 32 minutes). Calculating the most true value, he wrote:

“If we find the arithmetic mean, we can say with high probability that the result of an accurate measurement should be about 11 degrees 16 minutes.”

It is likely that this was the first time that the average was used as the closest to the true!

The word "average" was used in English in the early 16th century to refer to financial losses from damage that a ship or cargo suffered during a voyage. For the next hundred years, it denoted precisely these losses, which were calculated as the arithmetic mean. For example, if a ship was damaged during a voyage and the crew had to throw some goods overboard to save the weight of the ship, the investors suffered a financial loss equivalent to the amount of their investment - these losses were calculated in the same way as the arithmetic average. So gradually the values ​​of the average (average) and the arithmetic mean converged.

Median value

Today, the average or arithmetic mean is used as the main way to select a representative value of a set of measurements. How did it happen? Why was this role not assigned to the median value?

Francis Galton was the median champion

The term "median value" (median) - the middle term in a series of numbers, dividing this series by half - appeared at about the same time as the arithmetic mean. In 1599, the mathematician Edward Wright, who was working on the problem of normal deviation in a compass, first suggested using the median value.

“... Let's say a lot of archers shoot at some target. The target is subsequently removed. How can you find out where the target was? You need to find the middle place between all the arrows. Likewise, among the set of results of observations, the closest to the truth will be the one in the middle.

The median was widely used in the nineteenth century, becoming an indispensable part of any data analysis at that time. It was also used by Francis Galton, the eminent nineteenth-century analyst. In the bull weighing story at the beginning of this article, Galton originally used the median as representing the opinion of the crowd.

Many analysts, including Galton, preferred the median because it is easier to calculate for smaller datasets.

However, the median has never been more popular than the mean. Most likely, this happened due to the special statistical properties inherent in the mean value, as well as its relationship to the normal distribution.

Relation between mean and normal distribution

When we take many measurements, the results are, as statisticians say, "normally distributed." This means that if this data is plotted on a graph, then the points on it will depict something similar to a bell. If you connect them, you get a "bell-shaped" curve. Many statistics fit the normal distribution, such as height of people, IQ, and the highest annual temperature.

When the data is normally distributed, the mean will be very close to the highest point on the bell curve, and a very large number of measurements will be close to the mean. There is even a formula that predicts how many measurements will be some distance from the average.

Thus, calculating the mean gives researchers a lot of additional information.

The relationship of the mean to the standard deviation gives it a great advantage, because the median has no such relationship. This connection is an important part of the analysis of experimental data and statistical processing of information. That is why the average has become the core of statistics and all sciences that rely on multiple data for their conclusions.

The advantage of the mean is also due to the fact that it is easily calculated by computers. Although the median value for a small group of data is fairly easy to calculate on your own, it is much easier to write a computer program that would find the average value. If you use Microsoft Excel, you probably know that the median function is not as easy to calculate as the mean value function.

As a result, due to its great scientific value and ease of use, the average value has become the main representative value. However, this option is not always the best.

Advantages of the median value

In many cases where we want to calculate the center of a distribution, the median is the best measure. This is because the average value is largely determined by the extreme measurements.

Many analysts believe that the thoughtless use of the average negatively affects our understanding of quantitative information. People look at the average and think it's "normal". But in fact it can be defined by some one term that stands out strongly from the homogeneous series.

Imagine an analyst who wants to know a representative value for the value of five houses. Four houses are worth $100,000 and the fifth is $900,000. The mean would then be $200,000 and the median would be $100,000. In this, as in many other cases, the median value gives a better understanding of what can be called a "standard".

Understanding how extreme values ​​can affect the average, the median value is used to reflect changes in US household income.

The median is also less sensitive to the "dirty" data that analysts deal with today. Many statisticians and analysts collect information by interviewing people on the Internet. If the user accidentally adds an extra zero to the answer, which turns 100 into 1000, then this error will affect the mean much more than the median.

Mean or median?

The choice between the median and the mean has far-reaching implications, from our understanding of the effects of medicines on health to our knowledge of what a standard family budget is.

As the collection and analysis of data increasingly determines how we understand the world, so does the value of the quantities we use. In an ideal world, analysts would use both the mean and median to plot the data.

But we live in conditions of limited time and attention. Because of these limitations, we often need to choose just one. And in many cases, the median value is preferable.

PRACTICE #4 .

Calculation of the structural characteristics of the variational distribution series.

The student must:

know:

- scope and methodology for calculating structural averages;

be able to:

- calculate structural averages;

- formulate a conclusion based on the results obtained.

Guidelines

In statistics, the mode and median are calculated, which are related to structural averages, so what value depends on buildings statistical aggregate.

fashion calculation

Fashion the value of the feature (variant) is called, more often all occurring in the studied population. In a discrete distribution series, the mode will be the variant with the highest frequency.

For example: The distribution of women's shoes sold by size is characterized as follows:

Shoe size

Number of pairs sold

In this distribution series, the mode is size 37, i.e. Mo=37 size.

For an interval distribution series, the mode is determined by the formula:

where X Mo - the lower limit of the modal interval;

h Mo - the value of the modal interval;

fMo is the frequency of the modal interval;

fMo -1and fMo +1 – interval frequency, respectively

preceding the modal and following it.

For example: The distribution of workers by length of service is characterized by the following data.

Work experience, years

up to 2

8-10

10 or more

Number of workers, pers.

Determine the mode of the interval series of the distribution.

The mode of the interval series is

Fashion is always somewhat vague; it depends on the size of the groups and the precise position of the group boundaries. Fashion is widely used in commercial practice when studying consumer demand, when registering prices, etc.

Median calculation

median in statistics is called a variant located in the middle of an ordered data series, and which divides the statistical population into two equal parts so that one half of the value is less than the median, and the other half is greater than it. To determine the median, it is necessary to build a ranked series, i.e. a series in ascending or descending order of individual characteristic values.

In a discrete ordered series with an odd number of members, the median will be the variant located in the center of the series.

For example: The experience of five workers was 2, 4, 7, 9 and 10 years. In this series, the median is 7 years, i.e. Me=7 years

If a discrete ordered series consists of an even number of members, then the median will be the arithmetic mean of two adjacent options in the center of the series.

For example: The work experience of six workers was 1, 3, 4, 5, 10 and 11 years. There are two options in this row, standing in the center of the row. These are options 4 and 5. The arithmetic mean of these values ​​​​will be the median of the series

To determine the median for grouped data, the cumulative frequencies must be read.

For example:Based on the available data, we determine the median shoe size

Shoe size

Number of pairs sold

Sum of cumulative frequencies

8+19=27

27+34=61

61+108=169

Total

To determine the median, it is necessary to calculate the sum of the accumulated frequencies of the series. The accumulation of the total continues until the accumulated sum of frequencies exceeds half the sum of the frequencies of the series. In our example, the sum of frequencies was 300, its half - 150. The accumulated sum of frequencies turned out to be 169. The variant corresponding to this sum, i.e. 37 is the median of the series.

If the sum of the accumulated frequencies against one of the options is exactly half the sum of the frequencies of the series, then the median is defined as the arithmetic mean of this option and the next one.

For example: Based on the available data, we determine the median wages of workers

Monthly salary, thousand rubles

Number of workers, pers.

Sum of cumulative frequencies

14,0

14,2

2+6=8

16,0

8+12=20

16,8

18,0

Total:

The median will be:

The median of the interval variation series of the distribution is determined by the formula:

Where x me is the lower limit of the median interval;

h Me is the value of the median interval;

f- the sum of the frequencies of the series;

f Me is the frequency of the median interval;

For example:Based on the available data on the distribution of enterprises by the number of industrial and production personnel, calculate the median in the interval variation series

Number of enterprises

Sum of cumulative frequencies

100-200

200-300

1+3=4

300-400

4+7=11

400-500

11+30=41

500-600

600-700

700-800

Total:

Let us first define the median interval. In this example, the sum of accumulated frequencies that exceed half the sum of all values ​​of the series corresponds to the interval 400-500. This is the median interval, i.e. the interval containing the median of the series. Let's define its meaning

If the sum of the accumulated frequencies against one of the intervals is exactly half the sum of the frequencies of the series, then the median is determined by the formula:

where n- the number of units in the population.

For example:Based on the available data on the distribution of enterprises by the number of industrial and production personnel, calculate the median in the interval variation series

Groups of enterprises by the number of PPPs, pers.

Number of enterprises

Sum of cumulative frequencies

100-200

200-300

1+3=4

300-400

4+6=10

400-500

10+30=40

500-600

40+20=60

600-700

700-800

Total:

people

Mode and median in the interval series can be define graphically:

the mode in discrete series - by the distribution polygon, the mode in interval series - by the distribution histogram, and the median - by the cumulate.

Mode of the interval distribution series determined by the distribution histogram determine in the following way. To do this, the tallest rectangle is selected, which in this case is modal. Then we connect the right vertex of the modal rectangle with the upper right corner of the previous rectangle. And the left vertex of the modal rectangle is with the upper left corner of the subsequent rectangle. Further, from the point of their intersection, a perpendicular is lowered to the abscissa axis. The abscissa of the point of intersection of these lines will be the distribution mode.

The median is calculated from the cumulate. To determine it, from a point on the scale of accumulated frequencies (frequencies) corresponding to 50%, a straight line is drawn parallel to the abscissa axis, until it intersects with the cumulate. Then, from the point of intersection of the specified straight line with the cumulate, a perpendicular is lowered to the abscissa axis. The abscissa of the intersection point is the median.

In addition to the mode and median, other structural characteristics, quantiles, can be determined in the variant series. Quantiles are intended for a deeper study of the structure of the distribution series.

quantile- this is the value of a feature that occupies a certain place in the population ordered by this feature. There are the following types of quantiles:

- quartiles are the attribute values ​​dividing the ordered set into four equal parts;

- deciles – attribute values ​​dividing the ordered set into ten equal parts;

- percentels - attribute values ​​dividing the ordered set into one hundred equal parts.

Thus, to characterize the position of the center of the distribution series, 3 indicators can be used: mean feature, mode, median. When choosing the type and form of a specific indicator of the distribution center, it is necessary to proceed from the following recommendations:

- for sustainable socio-economic processes, the arithmetic mean is used as an indicator of the center. Such processes are characterized by symmetrical distributions, in which ;

- for unstable processes, the position of the distribution center is characterized by Mo or Me. For asymmetric processes, the preferred characteristic of the distribution center is the median, since it occupies a position between the arithmetic mean and the mode.

Views