Additional statistics methods for arrays | SuperCollider 3.13.0 Help

MathLib provides additional methods for Collection and SequenceableCollection.

Here's a pseudo-normal distribution which we'lll analyse in the followinga = {45.0.sum3rand + 65}.dup(1000);

Various measures to characterise the distribution

Measures such as variance and skewness are commonly calculated on the assumption that the data is a sample from the true (larger) population which we wish to know about. The above measures apply in this case. The following alternatives are used when the data represents the entire population:a.variancePop a.stdDevPop // Standard deviation a.skewPop

percentile

Finds the requested percentile(s) of the distributions, specified as float values from 0 to 1 - e.g. for the 90 percentile use 0.9.a.percentile // By default you get the 25%/50%/75%ile a.percentile(0.25) // Just the 25%ile a.percentile([0.1, 0.9]) // 10 and 90 percentiles

Histogramming

histo partitions the distribution into a set of equal-width bins (default 100 bins):a.histo.plot a.histo(10).plot a.histo(1000).plot

histoBands gives you the corresponding bin centers (same arguments as histo; argument 'center' determines whether you get the center value (default 0.5), the left (0.0) or right (1.0) edge of the bin, or anything in between). This can be useful for creating an annotated plot.a.histoBands

The weighted mean and variance functions can be used to estimate the mean and variance if all you have is histogram-like data:[2, 4, 6].wmean([1,1,2]) // [2, 4, 6] is the bin positions, [1, 1, 2] the heights within each bin [2, 4, 6].wmean([1,1,4])

Statistical measures of association

Pearson correlation

Kendall's W statistic

A non-parametric correlation test between separate raters' rankings of a common set of objects.

The input array should be an array-of-arrays, each of which is the same size and contains integer rankings. The output varies from 0 (no inter-rater agreement) to 1 (perfect inter-rater agreement). The list of rankings can range (0 .. N-1) or (1 .. N), that won't affect the statistic. The example used in Kendall's original paper (W value should be around 0.16):[ [5,4,1,6,3,2], [2,3,1,5,6,4], [4,1,6,3,2,5] ].kendallW // If we generate data in which there's perfect agreement we should always get 1 for the W-value: (0..10).scramble.dup(5).postcs.kendallW

Principal Component Analysis

The pc1 method finds the first principal component of a multidimensional data distribution. It doesn't calculate the full PCA, but finds the first PC via expectation-maximisation. The data must already be centred (mean removed) and any scaling issues dealt with appropriately. The termination threshold can be set via an argument to pc1.~data = 10000.collect{ if(0.5.coin){[-1, -0.5]}{[1, 0.5]}.collect{|v| v + 0.95.sum3rand} }; GNUPlot.new.scatter(~data) ~data.pc1

Autocorrelation

Fitting

linearFit offers fitting via simple linear regression (least squares).b = a.deepCopy.sort; // sort ~aFit = a.linearFit; // fit ~bFit = b.linearFit; a.plot("a", minval: 20, maxval: 120); b.plot("b", minval: 20, maxval: 120); Array.series(1000, ~aFit.at(0), ~aFit.at(1)).plot("a fit", minval: 20, maxval: 120); Array.series(1000, ~bFit.at(0), ~bFit.at(1)).plot("b fit", minval: 20, maxval: 120)

theilSenFit offers offers a robust linear regression by finding the median slope of all data pairs.b = a.deepCopy.sort; // sort ~aFit = a.theilSenFit; // fit ~bFit = b.theilSenFit; a.plot("a", minval: 20, maxval: 120); b.plot("b", minval: 20, maxval: 120); Array.series(1000, ~aFit.at(0), ~aFit.at(1)).plot("a fit", minval: 20, maxval: 120); Array.series(1000, ~bFit.at(0), ~bFit.at(1)).plot("b fit", minval: 20, maxval: 120)

helpfile source: /Library/Application Support/SuperCollider/downloaded-quarks/MathLib/HelpSource/Guides/extStatistics.schelp
link::Guides/extStatistics::

Additional statistics methods for arraysExtension

Additional statistics methods for arrays
Extension