Refresher on Probability and Matrix Operations
Outline
Matrix operations
Vectors and summation
Probability
Definitions
Calculating probabilities
Probability properties
Probability formulae
Random variables
Joint Probabilities
Definition
Independence
Conditional Probabilities
Matrix operations
Here are some of the important properties of matrix and vector
operations. You don't need to memorize them, but you should
be able to apply them to questions on your problem sets.
What's different between the matrices A, B, and C versus L,
M, and N?
Since we are taking the inverses of the latter group , they must
be square and full rank. This is known as being non-singular.
Think of the matrix as a system of equations. each row in M is
a set of coefficients for the values of a vector of unknown
variables, say r, and solutions, s. If the matrix is invertible, you
can solve Ar = s.
In the context of regression, the M matrix is our observed
covariates (usually called X), s is a vector of outcomes (y), and
r is the vector of coefficients that we are trying to find ( β).
Vectors and summation
Define l as a conformable vector of ones.
We can express a sum in terms of vector multiplication:
We can do the same for a sum of squares of x:
Preliminary definitions
The sample space
is the set of all possible outcomes of our
"experiment."
Note that experiment has the meaning in probability theory of
being any situation in which the final outcome is unknown and
is distinct from the way that we will define an experiment in
class.
An event is a collection of possible realizations from the
sample space.
Events are disjoint if they do not share a common element, i.e.,
An elementary event is an event that only contains a single
realization from the sample space.
Lastly, though getting a bit ahead, two events are
statistically
independent if
Calculating probabilities
The classical definition of probability states that, for a
sample
space containing equally- likely elementary events, then the
probability of an event is the ratio of the number of
elementary events in to that of
, i.e.,
The axiomatic definition of probability defines probability
by
stating that
for pairwise disjoint 1, : : : , n
Probability properties
Let A and B be events in the sample space .
(Boole' s inequality )
(Bonferroni's inequality)
Probability formulae
The conditional probability of an event A given that an
event B
has occurred is
This equation reflects that you received additional
information
about the probability of A knowing that B occurred.
Pr(B) is in the denominator because the sample space has been
reduced from the full space
to just that portion in which B
arises.
The law of total probability holds that, for a countable
partition
of , and),
then
Bayes' rule states that
Bayes' rule states that
for the two event case.
Random variables
A random variable is actually a function that maps every
outcome in the sample space to the real line . Formally,
If we want to find the probability of some subset of
, we can
induce a probability onto a random variable X. Let X() = x.
Then,
Define the cumulative distribution function (CDF),
,
as
Pr(X≤ x). The CDF has three important properties:
(i.e., the CDF is non-decreasing)
A continuous random variable has a sample space with an
uncountable number of outcomes . Here, the CDF is defined as
For a discrete random variable, which has a countable
number
of outcomes, the CDF is defined as
We can define the probability density function (PDF) for a
continuous variable as
by the Fundamental Theorem of Calculus.
It can be defined for a discrete random variable as
Note that
Joint Probabilities
Previously, we considered the distribution of a lone
random
variable. Now we will consider the joint distribution of several
random variables. For simplicity , we will restrict ourselves to
the case of two random variables, but the provided results can
easily be extended to higher dimensions.
The joint cumulative distribution function (joint CDF),
, of the random variables X and Y is defined by
As with any CDF, must equal 1 as x and y go to
infinity.
The joint probability mass function (joint PMF),
is
defined
by
The joint probability density function (joint PDF),
is
defined by
The marginal cumulative distribution function (marginal
CDF)
of is
The marginal PMF of X is
The marginal PDF of X is
You are "integrating out" y from the joint PDF.
Note that, while a marginal PDF (PMF) can be found from a
joint PDF (PMF), the converse is not true, there are an infinite
number of joint PDFs (PMFs) that could be described by a
given marginal PDF (PMF).
Independence
If X and Y are independent, then
and
Conditional Probabilities
The conditional PDF (PMF) of Y given
is defined by
As for any PDF (PMF), over the support of Y , the
conditional
PDF (PMF) must integrate (sum) to 1. It must also be
non- negative for all real values.
For discrete random variables, we see that the conditional
PMF
is
Question: What is random in the conditional
distribution of Y ,
If X and Y are independent, then
This implies that knowing X gives you no additional
ability to
predict Y , an intuitive notion underlying independence.
Prev | Next |