Print

Print


Dear listmembers,
I have some comments re Dennis Greene's discussion below
based on what I learned about statistics terminology fifty years ago.

There are four concepts which need to be considered:
1.The definition of a measurement modeled mathematically as a random variable.X
2.The probability distribution of X.
3.Measures of central tendency of the probability distribution of X.
4.The populations from which the measurements were made.
The terms measurement and observation are used interchangably.

The median of X is defined as the 50 percent point of the probability
distribution.
It is estimated from a finite set of data by finding a number  for which half
the data is larger and half the data is smaller.

The term average is a general term referring to a measure of central
tendency of
the probability distribution of X.Usually,the term average refers
specifically to
the arithmetic mean of a set of measurments of X,but it could refer to the
median,
to half the difference between the largest and smallest measurement,to the
square
root of the product of the largest and smallest measurement(a geometric
average),
the difference between the 75 percent point and 25 percent point,the most
probable
value of the probability distribution and so on.
Depending upon the shape of the underlying probability distribution,these
averages
can be vitually the same or quite different.Thus for a Gaussian
distribution,the mean,median,
and most-probable values are the same because it is a well-behaved symmetric
distribution,whereas
for a nonsymmetric distribution,they will generally be significantly
different.There are
nonwell-behaved symmetric distributions such as the Cauchy distribution for
which the arithmetic
mean is virtually useless,whereas the median or most-probable values contain
more useful
information about the data.

Finally,the usual basic assumption of the data is that the observations were
made from the same
population.The model is one of a set of observations under similar
conditions.If however,
the observations are based on sampling from more than one population,then it
is possible
that the probability distribution will not have a single peak and therefore
may not have a
unique most-probable value.How the observations are defined determines the
statistical
properties of the random variable X.

Now to Dennis's discussion below.
Par.1:Based on the above,a median is an average,but an average is not
necessarily
      a median.However,by definition,the median does split the observed
population in two.
Par.2:Since for nonsymmetric distributions,the arithmetic mean and most
likely values
      are not the same,sentence one is incorrect since by average,Dennis is
referring
      to the arithmetic mean.The last sentence touches on the concept of
population.
      One population could be the population of PWPs.Another could be the
population of
      ages at which these PWPs experienced onset.But Dennis confuses this
disinction.
Par.3:This confusion is continued here.Furthermore,the definition of median
used here
      is incorrect.The calculation of a median is based on the complete data
set,not
      just on the minimum and maximum values.For example,given 101 data
points,the
      median is calculated by ordering the numbers.The median is the number
such that
      50 numbers are larger and 50 numbers smaller.
-------------------------------------------------------
Date:    Sat, 12 Sep 1998 08:09:58 +0800
From:    Dennis Greene <[log in to unmask]>
Subject: Median is not average

Several recent postings have mentioned that 57 is considered the median age
for the onset of PD.  As far as I can see most have assumed that "median"
and "average" mean the same thing and have concluded from this that half of
the PD population are under the age of 57.  However median and average are
not the same thing, nor can you use either of them to conclude that half the
PD population are under 57.

 An  average age for something to happen tells you at what age the event is
most likely to happen. In our case it would be calculated by adding up the
age at onset of every PWP, and then dividing the total by the number of PWP
(several million bits of information processed).  Even if 57 were the
average age, it would mean that 57 was the age at which most people
experienced onset and consequently most PWP are older than 57.

Median refers to the age itself, not to the numbers afflicted.   It is
derived by finding the middle point between the age of the youngest  and
oldest age of onset (two bits of information processed). To have any meaning
at all, a median value would need to be quoted in its context.  By itself
the statement "the median age of onset is 57" tells us very little.  It
would be true in both the following cases:

1.    Youngest age of onset = 50
       Oldest age of onset  = 64
       Median age of onset = 57

2.    Youngest age of onset = 20
       Oldest age of onset =  94
       Median age of onset = 57


Dennis

+++++++++++++++++++++++++++
Dennis Greene 48/11

"It is better to be a crystal and be broken,
Than to remain perfect like a tile upon the housetop."

[log in to unmask]
http://members.networx.net.au/~dennisg/
+++++++++++++++++++++++++++