QB64.org Forum

Samples Gallery & Reference => Utilities => Topic started by: The Librarian on June 28, 2018, 06:41:36 am

Title: Descriptive Statistics by Bruno Schaefer
Post by: The Librarian on June 28, 2018, 06:41:36 am
Descriptive Statistics

Author: @BSpinoza Bruno Schaefer, Losheim am See, Germany
Author contact: bup.schaefer (.at.) web.de
Source: Submission
Version: 2018-06-16
Tags: [maths] [statistics]

Description:
This program calculates basic descriptive statistics of univariate data:
         n, Std.error, sum, standard error, mean, geometrical mean, variance,
         standard deviation, coefficient of variation, minimum, 1st quartile, median,
         2rd quartile, maximum,skewness, kurtosis, and excess kurtosis.
A dataset must have at least 4 values.
 
Remarks to kurtosis and skewness:
    For kurtosis and skewness the same equation as SPSS, PAST and Excel is used.
    Slightly different results may occur using other programs, especially for
    small sample sizes.
    kurtosis: peak shape  > 3 (excess > 0) leptokurtic: distribution with tapered peak and fat tails
                                    = 3 (excess = 0) mesokurtic: similar to normal bell-curved distribution
                                    < 3 (excess < 0) platykurtic: flat distribution with thin tails
     skewness: symmetry    > 0 skewed right: its right tail is longer and most of the distribution is at the left.
                                         = 0 symmetrical (not skewed)
                                         < 0 skewed left: the left tail is longer and most of the distribution is at the right


Note that this program includes extended ASCII characters and may not copy/paste correctly. If the interface does not draw correctly, use the attached source listing.

Source code:
Code: QB64: [Select]
  1. 'PROGRAM: descriptiveStatistics.bas
  2. '================= Descriptive Statistics  ================
  3. '        written by Bruno Schaefer, Losheim am See, Germany
  4. '                                       created: 15.12.2016
  5. '                                   last review: 16.06.2018
  6. '============================================================================================================
  7. ' This programm calculates basic descriptive statistics of univariate data:
  8. ' n, Std.error, sum, standard error, mean, geometrical mean, variance,
  9. ' standard deviation, coefficient of variation, minimum, 1st quartile, median,
  10. ' 2rd quartile, maximum,skewness, kurtosis, and excess kurtosis.
  11. ' A dataset must have at least 4 values.
  12. ' For kurtosis and skewness the same equation as SPSS, PAST and Excel is used.
  13. ' Slightly different results may occur using other programs, especially for
  14. ' small sample sizes.
  15. ' kurtosis: peak shape  > 3 (excess > 0) leptokurtic: distribution with tapered peak and fat tails
  16. '                       = 3 (excess = 0) mesokurtic: similar to normal bell-curved distribution
  17. '                       < 3 (excess < 0) platykurtic: flat distribution with thin tails
  18. ' skewness: symmetry    > 0 skewed right: its right tail is longer and most of the distribution is at the left.
  19. '                       = 0 symmetrical (not skewed)
  20. '                       < 0 skewed left: the left tail is longer and most of the distribution is at the right
  21. '===============================================================================================================
  22. _TITLE "descriptive statistics"
  23. SCREEN _NEWIMAGE(680, 520, 256)
  24. WEITER$ = "y" 'loop variable
  25. _CLIPBOARD$ = "" 'clears the clipboard
  26.     _LIMIT 30
  27.     DO
  28.         CLS , 14
  29.         COLOR 0, 14
  30.         PRINT " ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»   "
  31.         PRINT " º  DESCRIPTIVE STATISTICS OF UNIVARIATE DATA  º   "
  32.         PRINT " ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍŒ   "
  33.         PRINT "  number of values (n>3): ";
  34.         COLOR 9, 14
  35.         INPUT "", n 'input of the number of values
  36.     LOOP UNTIL n > 3
  37.     REDIM SHARED sample(n)
  38.     FOR I = 1 TO n
  39.         COLOR 0, 14
  40.         PRINT "  value no. " + STR$(I) + ": ";
  41.         COLOR 12, 14
  42.         INPUT "", Wert#
  43.         sample(I) = Wert# '               fills the data array with values
  44.     NEXT I
  45.     ' ----- SORT of the values ----------
  46.     DO
  47.         ic = 0
  48.         FOR I = 1 TO n - 1
  49.             IF sample(I) > sample(I + 1) THEN
  50.                 h = sample(I)
  51.                 sample(I) = sample(I + 1)
  52.                 sample(I + 1) = h
  53.                 ic = 1
  54.             END IF
  55.         NEXT I
  56.     LOOP UNTIL ic = 0
  57.     ' -----------  calculations and output of the results ------------
  58.     CLS
  59.     COLOR 0, 14
  60.     PRINT
  61.     PRINT " =========================== RESULTS =================================="
  62.     COLOR 2, 14
  63.     PRINT "  n (number of values):          "; n
  64.     PRINT "  sum (sum of values):           "; sum#(sample())
  65.     PRINT "  standard error:                "; StdDev.s#(sample()) / SQR(n) ' stderr#(sample())
  66.     PRINT "  range (xmax - xmin):           "; sample(UBOUND(sample)) - sample(LBOUND(sample))
  67.     COLOR 12, 14
  68.     PRINT "  mean:                          "; mean#(sample())
  69.     PRINT "  geometrical mean:              "; geomean#(sample())
  70.     PRINT "  root mean square RMS:          "; rms#(sample())
  71.     PRINT "  variance (sample):             "; variance.s#(sample())
  72.     PRINT "  std.dev. (sample):             "; StdDev.s#(sample()); " = "; _ROUND((StdDev.s#(sample()) * 100 / mean#(sample())) * 100) / 100; " %"
  73.     PRINT "  coeff. of variation:           "; 100 * StdDev.s#(sample()) / mean#(sample())
  74.     COLOR 9, 14
  75.     PRINT "  variance (population):         "; variance.p#(sample())
  76.     PRINT "  std.dev. (population):         "; StdDev.p#(sample()); " = "; _ROUND((StdDev.p#(sample()) * 100 / mean#(sample())) * 100) / 100; " %"
  77.     PRINT "  coefficient of variation:      "; 100 * StdDev.p#(sample()) / mean#(sample())
  78.     COLOR 6, 14
  79.     PRINT "  minimum:                       "; sample(LBOUND(sample))
  80.     PRINT "  1st quartile (percentile 25%): "; quantile#(sample(), 0.25)
  81.     PRINT "  median (percentile 50%):       "; quantile#(sample(), 0.50)
  82.     PRINT "  standard error of the median:  "; variance.p#(sample()) / SQR(n)
  83.     PRINT "  3rd quartile (percentile 75%): "; quantile#(sample(), 0.75)
  84.     PRINT "  maximum:                       "; sample(UBOUND(sample))
  85.     PRINT "  interquartile range:           "; quantile#(sample(), 0.75) - quantile#(sample(), 0.25)
  86.     COLOR 9, 14
  87.     PRINT "  skewness (sample):             "; _ROUND(skew#(sample()) * 100000) / 100000
  88.     PRINT "  kurtosis (sample):             "; _ROUND(kurt#(sample()) * 100000) / 100000
  89.     PRINT "  excess kurtosis(sample):       "; _ROUND(kurt#(sample()) * 100000) / 100000 - 3
  90.     PRINT "  skewness (population):         "; _ROUND(skew#(sample()) * (n - 2) / SQR(n * (n - 1)) * 100000) / 100000
  91.     PRINT "  kurtosis (population):         "; _ROUND((kurt#(sample()) * (n - 2) * (n - 3) / (n - 1) - 6) / (n + 1) * 100000) / 100000
  92.     PRINT "  excess kurtosis (population):  "; _ROUND((kurt#(sample()) * (n - 2) * (n - 3) / (n - 1) - 6) / (n + 1) * 100000) / 100000 - 3
  93.     COLOR 0, 14
  94.     PRINT " ======================================================================"
  95.     DIM CrLf AS STRING * 2
  96.     CrLf = CHR$(13) + CHR$(10)
  97.     _CLIPBOARD$ = _CLIPBOARD$ + " ========================================= " + CrLf
  98.     _CLIPBOARD$ = _CLIPBOARD$ + " DESCRIPTIVE STATISTICS OF UNIVARIATE DATA      " + CrLf
  99.     _CLIPBOARD$ = _CLIPBOARD$ + " ========================================= " + CrLf
  100.     _CLIPBOARD$ = _CLIPBOARD$ + " sorted data:" + CrLf
  101.     FOR I = 1 TO n
  102.         _CLIPBOARD$ = _CLIPBOARD$ + "    " + STR$(sample(I)) + CrLf
  103.     NEXT I
  104.     _CLIPBOARD$ = _CLIPBOARD$ + " ---------------------------------------------------------" + CrLf
  105.     _CLIPBOARD$ = _CLIPBOARD$ + " n (number of values):                  " + STR$(n) + CrLf
  106.     _CLIPBOARD$ = _CLIPBOARD$ + " sum (sum of values):                   " + STR$(sum#(sample())) + CrLf
  107.     _CLIPBOARD$ = _CLIPBOARD$ + " standard error:                        " + STR$(StdDev.s#(sample()) / SQR(n)) + CrLf
  108.     _CLIPBOARD$ = _CLIPBOARD$ + " range (xmax - xmin):                   " + STR$(sample(UBOUND(sample)) - sample(LBOUND(sample))) + CrLf
  109.     _CLIPBOARD$ = _CLIPBOARD$ + " mean:                                  " + STR$(mean#(sample())) + CrLf
  110.     _CLIPBOARD$ = _CLIPBOARD$ + " geometrical mean                       " + STR$(geomean#(sample())) + CrLf
  111.     _CLIPBOARD$ = _CLIPBOARD$ + " root mean square RMS:                  " + STR$(rms#(sample())) + CrLf
  112.     _CLIPBOARD$ = _CLIPBOARD$ + " variance (sample):                     " + STR$(variance.s#(sample())) + CrLf
  113.     _CLIPBOARD$ = _CLIPBOARD$ + " standard deviation (sample):           " + STR$(StdDev.s#(sample())) + CrLf
  114.     _CLIPBOARD$ = _CLIPBOARD$ + " standard deviation (sample) %:         " + STR$(_ROUND((StdDev.s#(sample()) * 100 / mean#(sample())) * 100) / 100) + " %" + CrLf
  115.     _CLIPBOARD$ = _CLIPBOARD$ + " coefficient of variation (sample):     " + STR$(100 * StdDev.s#(sample()) / mean#(sample())) + CrLf
  116.     _CLIPBOARD$ = _CLIPBOARD$ + " variance (population):                 " + STR$(variance.p#(sample())) + CrLf
  117.     _CLIPBOARD$ = _CLIPBOARD$ + " standard deviation(population):        " + STR$(StdDev.p#(sample())) + CrLf
  118.     _CLIPBOARD$ = _CLIPBOARD$ + " standard deviation (population) %:     " + STR$(_ROUND((StdDev.p#(sample()) * 100 / mean#(sample())) * 100) / 100) + " %" + CrLf
  119.     _CLIPBOARD$ = _CLIPBOARD$ + " coefficient of variation (population): " + STR$(100 * StdDev.p#(sample()) / mean#(sample())) + CrLf
  120.     _CLIPBOARD$ = _CLIPBOARD$ + " minimum:                               " + STR$(sample(LBOUND(sample))) + CrLf
  121.     _CLIPBOARD$ = _CLIPBOARD$ + " 1st quartile (25% percentile):         " + STR$(quantile#(sample(), 0.25)) + CrLf
  122.     _CLIPBOARD$ = _CLIPBOARD$ + " median: 2nd quartile (50% percentile): " + STR$(quantile#(sample(), 0.50)) + CrLf
  123.     _CLIPBOARD$ = _CLIPBOARD$ + " standard error of the median:          " + STR$(variance.p#(sample()) / SQR(n)) + CrLf
  124.     _CLIPBOARD$ = _CLIPBOARD$ + " 3rd quartile (75%) :                   " + STR$(quantile#(sample(), 0.75)) + CrLf
  125.     _CLIPBOARD$ = _CLIPBOARD$ + " maximum:                               " + STR$(sample(UBOUND(sample))) + CrLf
  126.     _CLIPBOARD$ = _CLIPBOARD$ + " interquartile range:                   " + STR$(quantile#(sample(), 0.75) - quantile#(sample(), 0.25)) + CrLf
  127.     _CLIPBOARD$ = _CLIPBOARD$ + " skewness (sample):                     " + STR$(_ROUND(skew#(sample()) * 100000) / 100000) + CrLf
  128.     _CLIPBOARD$ = _CLIPBOARD$ + " kurtosis (sample):                     " + STR$(_ROUND(kurt#(sample()) * 100000) / 100000) + CrLf
  129.     _CLIPBOARD$ = _CLIPBOARD$ + " excess kurtosis (sample):              " + STR$(_ROUND(kurt#(sample()) * 100000) / 100000 - 3) + CrLf
  130.     _CLIPBOARD$ = _CLIPBOARD$ + " skewness (population):                 " + STR$(_ROUND(skew#(sample()) * (n - 2) / SQR(n * (n - 1)) * 100000) / 100000) + CrLf
  131.     _CLIPBOARD$ = _CLIPBOARD$ + " kurtosis (population):                 " + STR$(_ROUND((kurt#(sample()) * (n - 2) * (n - 3) / (n - 1) - 6) / (n + 1) * 100000) / 100000) + CrLf
  132.     _CLIPBOARD$ = _CLIPBOARD$ + " excess kurtosis (population):          " + STR$(_ROUND((kurt#(sample()) * (n - 2) * (n - 3) / (n - 1) - 6) / (n + 1) * 100000) / 100000 - 3) + CrLf
  133.     _CLIPBOARD$ = _CLIPBOARD$ + " ---------------------------------------------------------" + CrLf
  134.     PRINT
  135.     PRINT " All results are stored in the clipboard!"
  136.     PRINT " Do you want to start a new statistical evaluation  [y/n]? ";
  137.     SLEEP
  138.     WEITER$ = INKEY$
  139. LOOP WHILE (WEITER$ = "y") OR (WEITER$ = "Y")
  140. COLOR 12, 14
  141. LOCATE 10, 25: PRINT " E N D   O F   P R O G R A M "
  142. LOCATE 12, 25: PRINT "         - - - -"
  143. LOCATE 14, 25: PRINT "      Press any key ": PRINT
  144. 'FUNCTIONS
  145. '============= sum =========="
  146. FUNCTION sum# (x())
  147.     s# = 0
  148.     FOR i = 1 TO n
  149.         s# = s# + x(i)
  150.     NEXT i
  151.     sum# = s#
  152. '============= mean =========="
  153. FUNCTION mean# (x())
  154.     mean# = sum#(x()) / n
  155. '========= variance (sample) =========="
  156. FUNCTION variance.s# (x())
  157.     m# = mean#(x())
  158.     s# = 0
  159.     FOR i = 1 TO n
  160.         s# = s# + (x(i) - mean#(x())) ^ 2
  161.     NEXT i
  162.     variance.s# = s# / (n - 1)
  163. '========= variance population) =========="
  164. FUNCTION variance.p# (x())
  165.     m# = mean#(x())
  166.     s = 0
  167.     FOR i = 1 TO n
  168.         s# = s# + (x(i) - mean#(x())) ^ 2
  169.     NEXT i
  170.     variance.p# = s# / n
  171. '======= standard deviation (sample) ========"
  172. FUNCTION StdDev.s# (x())
  173.     StdDev.s# = SQR(variance.s#(x()))
  174. '======= standard deviation (population) ========"
  175. FUNCTION StdDev.p# (x())
  176.     StdDev.p# = SQR(variance.p#(x()))
  177. '============== median ====================="
  178. FUNCTION median# (x())
  179.     IF (n / 2) = INT(n / 2) THEN
  180.         'even
  181.         median# = (sample(n / 2) + sample((n / 2) + 1)) / 2
  182.     ELSE
  183.         'odd
  184.         median# = sample((n + 1) / 2)
  185.     END IF
  186. '============================ quantile ========================
  187. FUNCTION quantile# (x(), a)
  188.     rang# = a * (n - 1) + 1
  189.     index% = INT(rang#)
  190.     gewicht# = rang# - index%
  191.     quantile# = x(index%) + gewicht# * (x(index% + 1) - x(index%))
  192. '============================ skewness ========================
  193. FUNCTION skew# (x())
  194.     m# = mean#(x())
  195.     s# = StdDev.s#(x())
  196.     sk# = 0
  197.     FOR J = 1 TO n
  198.         sk# = sk# + ((x(J) - m#) / s#) ^ 3
  199.     NEXT J
  200.     IF s# <> 0 THEN
  201.         skew# = sk# * (n / ((n - 1) * (n - 2)))
  202.     ELSE
  203.         skew# = 0
  204.     END IF
  205. '============================ kurtosis ========================
  206. FUNCTION kurt# (x())
  207.     m# = mean#(x())
  208.     s# = StdDev.s#(x())
  209.     krt# = 0
  210.     FOR j = 1 TO n
  211.         krt# = krt# + ((x(j) - m#) / s#) ^ 4
  212.     NEXT j
  213.     IF s# <> 0 THEN
  214.         kurt# = ((krt# * (n + 1) * n) / ((n - 1) * (n - 2) * (n - 3))) - ((3 * (n - 1) ^ 2) / ((n - 2) * (n - 3)))
  215.     ELSE
  216.         kurt# = 0
  217.     END IF
  218. '====================== geometrical mean ========================
  219. FUNCTION geomean# (x())
  220.     gm# = 1
  221.     FOR j = 1 TO n
  222.         gm# = gm# * x(j)
  223.     NEXT j
  224.     geomean# = gm# ^ (1 / n)
  225.  
  226. '============ mean square error ===================
  227. FUNCTION rms# (x())
  228.     ms# = 0
  229.     FOR j = 1 TO n
  230.         ms# = ms# + x(j) ^ 2
  231.     NEXT j
  232.     rms# = SQR(ms# / n)
  233.