% !TeX program = lualatex
% =====================================================================
%  statistics-probability.tex
%  Probability and statistics, vocabulary and figures: counting (binomial
%  C(n,k), arrangement A(n,k), factorial), the blackboard operators for
%  probability PP and expectation EE, variance/covariance, the common
%  distributions, and the distribution/density functions. The blackboard
%  letters are reached by the doubled PP and EE; the conditional bar is the
%  keyword "mid".
% =====================================================================
\documentclass[
  margins=24,
  font=Latin Modern Roman,
  size=12,
  linespread=1.4,
  lang=en
]{scholatex}
\begin{document}

let title = <Red b 18pt c>
let topic = <Navy b section ROMAN>
let p     = <tab>
let h1    = <Navy b section>
let note  = <Gray i>

<title>scholatex — probability

% =====================================================================
<h1>Counting
% =====================================================================

The binomial coefficient is $C(n, k)$, written with two arguments; with a
single argument $C(t)$ stays an ordinary function, so the same letter serves
both. The arrangement is $A(n, k)$, and the factorial is $factorial(n)$ or
simply $n!$ in running maths.

<note>The Pascal rule and the binomial theorem.

$C(n, k) = C(n-1, k-1) + C(n-1, k)$

$(a + b)^n = sum(k=0,n) C(n, k) a^k b^(n-k)$

% =====================================================================
<h1>Probability and expectation
% =====================================================================

A doubled capital is the blackboard letter: $PP$ is the blackboard P and
$EE$ the blackboard E, the same doubling rule as $NN$, $RR$ and $CC$. So
$PP(A)$ is a probability, $EE(X)$ an expectation. The conditional bar is the
keyword mid, so $PP(A mid B)$ reads with proper spacing.

<note>Total probability and the definition of the mean.

$PP(A) = PP(A mid B) PP(B) + PP(A mid bar(B)) PP(bar(B))$

$EE(X) = sum(k=1,n) k PP(X = k)$

% =====================================================================
<h1>Variance, deviation, covariance
% =====================================================================

The spread of a variable: $var(X)$ is the variance, $std(X)$ the standard
deviation, $cov(X, Y)$ the covariance of a pair.

<note>The König–Huygens identity.

$var(X) = EE(X^2) - EE(X)^2$

$cov(X, Y) = EE(X Y) - EE(X) EE(Y)$

% =====================================================================
<h1>Distributions
% =====================================================================

The usual laws name themselves: $normal(mu, sigma)$ is the normal law (the
second argument is the standard deviation, squared in the rendering),
$poisson(lambda)$ the Poisson law, $binomial(n, p)$ the binomial law.

<note>Reading a model.

$X$ follows $normal(0, 1)$, the standard normal.

$N$ follows $poisson(lambda)$, with $PP(N = k) = exp(-lambda) lambda^k / factorial(k)$.

$S$ follows $binomial(n, p)$, with $EE(S) = n p$ and $var(S) = n p (1 - p)$.

% =====================================================================
<h1>Distribution and density
% =====================================================================

The cumulative distribution function is $repart(X, x)$ and the density is
$densite(X, x)$ — the variable in subscript, the point in the argument.

<note>Linking the two.

$repart(X, x) = PP(X <= x)$

For a continuous law, $repart(X, x) = int(t=-inf,x) densite(X, t)$.

% ---------------------------------------------------------------------
<topic>Weighted probability trees

<p>{ Two events; a node with a single branch receives its complement
automatically (!B, with the complementary probability), the product of each path printed
at the leaf: }

<tree products:on>{
	A 0.3 {
		B 0.6
	}
	!A 0.7 {
		B 0.1
	}
}

% ---------------------------------------------------------------------
<topic>Probability laws

<p>{ The standard normal law, $P(-1 <= X <= 1.5)$ shaded: }

<plot normal:{0, 1} area:{-1, 1.5}>

<p>{ The binomial law $binomial(12, 0.4)$, the bars of
$P(3 <= X <= 6)$ highlighted: }

<plot binomial:{12, 0.4} area:{3, 6}>

% ---------------------------------------------------------------------
<topic>Descriptive statistics

<p>{ The bar chart of a discrete series, a dictionary of frequencies: }

<stats kind:bars data:{1: 4 | 2: 7 | 3: 2 | 4: 5}>

<p>{ Data as a Lua table declared with let, passed by name --- one source,
several figures. Map keys come out sorted; a list of pairs keeps its
writing order: }

let notes = {
	Mathematiques = 15.5,
	Francais = 12.0,
	Histoire = 14.0,
	Physique = 16.5,
	Anglais = 18.0
}

<stats kind:bars data:notes>
<stats kind:pie data:notes>

<p>{ The same tag with categories: }

<stats kind:bars data:{Walk: 5 | Bus: 3 | Bike: 2}>

<p>{ The histogram of a grouped series --- unequal classes, so the height
is the density (count over width), not the count: }

<stats kind:histogram bounds:{0, 5, 10, 20} counts:{3, 7, 2}>

<p>{ The pie chart, twelve o'clock start, clockwise: }

<stats kind:pie data:{Walk: 5 | Bus: 3 | Bike: 2}>

<p>{ The box plot, French secondary-school quartiles: }

<stats kind:boxplot data:{12, 15, 9, 21, 14, 15, 18, 11, 16}>

<p>{ The scatter plot with its least-squares line: }

<stats kind:scatter data:{(1, 2.1) (2, 2.6) (3, 3.4) (4, 3.9) (5, 4.2) (6, 5.1)} fit:on>

\end{document}
