Logit regression
If one assumes that the probability is $P(i→j)$ so that actor $i = 1..n$ chooses alternative $j = 1..k$ is proportional (within the set of alternative choices) to the exponent of a linear combination of $p = 1..p$ data values $X_{ijp}$ related to $i$ and $j$, one arrives at the logit model, or more formally:
Assume $P(i \to j) \sim w_{ij} \\ w_{ij} :=exp(v_{ij}) \\ v_{ij} := \sum\limits_{p} \beta_p X_{ijp} $
Thus $L(i→j) := log(P(i→j)) ∼ v_{ij}$.
Consequently, $w_{ij} > 0$ and $P(i \to j) := { w_{ij} \over \sum\limits_{j’}w_{ij’}}$, since $\sum\limits_{j}P_{ij}$ must be $1$.
Note that:
- $v_{ij}$ is a linear combination of $X_{ijp}$ with weights $β_p$ as logit model parameters.
- the odds ratio $P(i \to j) \over P(i \to j’)$ of choice $j$ against alternative $j′$ is equal to ${w_{ij} \over w_{ij’}} = exp( v_{ij} - v_{ij’} ) = exp \sum\limits_{p} \beta_p \left( X_{ijp}- X_{ij’p} \right)$
- this formulation does not require a separate beta index (aka parameter space dimension) per alternative choice $j$ for each exogenous variable.
observed data
Observed choices $Y_{ij}$ are assumed to be drawn from a repreated Bernoulli experiment with probabilites $P(i→j)$.
Thus $P(Y) = \prod\limits_{ij} {N_i ! \times P(i \to j)^{Y_{ij}} \over Y_{ij}! }$ with $N_i := \sum\limits_{j} Y_{ij}$.
Thus $L(Y) := log(P(Y))$
$= log \prod\limits_{ij} {N_i ! \times P(i \to j)^{Y_{ij}} \over Y_{ij}! }$
$= C + \sum\limits_{ij} (Y_{ij} \times log(P_{ij}))$
$= C + \sum\limits_{i} \left[{\sum\limits_{j}Y_{ij} \times L(i \to j)}\right]$
$= C + \sum\limits_{i} \left[{ \sum\limits_{j}Y_{ij} \times \left(v_{ij} - log \sum\limits_{j’}w_{ij’}\right)}\right]$
$= C +\sum\limits_{i} \left[{ \left( \sum\limits_{j}Y_{ij} \times v_{ij} \right) - N_i \times log \sum\limits_{j}w_{ij}}\right]$
with $C = \sum\limits_{i} C_i$ and $C_i := [log (N_i!) - \sum\limits_{j} log (Y_{ij}!)]$, which is independent of $P_{ij}$ and $β_j$. Note that: $N_i = 1 ⟹ C_i = 0$
specification
The presented form $v_{ij} := β_p \times X_{ij}^p$ (using Einstein Notation from here) is more generic than known implementations of logistic regression (such as in SPSS and R), where $X_i^q$, a set of $q = 1..q$ data values given for each $i$ ($X_i^0)$ is set to $1$ to represent the incident for each $j$) and $(k−1) \times (q+1)$ parameters are to be estimated, thus $v_{ij} := β_{jq} \times X_i^q$ for $j = 2..k$ which requires a different beta for each alternative choice and data set, causing unnecessary large parameter space.
The latter specification can be reduced to the more generic form by:
- assigning a unique $p$ to each $jq$ combination, represented by $A_{jq}^p$.
- defining $X_{ij}^p := A_{jq}^p \times X_i^q$ for $j = 2..k$, thus creating redundant and zero data values.
However, a generical model cannot be reduced to a specification with different $β$’s for each alternative choice unless the latter parameter space can be restricted to contain no more dimensions than a generic form. With large $n$ and $k$, the data values $X_{ijk}$ can be huge. To mitigate the data size, the following tricks can be applied:
- limit the set of combinations of $i$ and $j$ to the most probable or near $j$’s for each $i$ and/or cluster the other $j$’s.
- use only a sample from the set of possible $i$’s.
- support specific forms of data:
# | form | reduction | description |
---|---|---|---|
0 | $β_p X_{ij}^p$ | general form of p factors specific for each i and j | |
1 | $β_p A_{jq}^p X_i^q$ | $X_{ij}^p := A_{jq}^p X_i^q$ | q factors that vary with i but not with j. |
2 | $β_p X_i^p X_j^p$ | $X_{ij}^p := X_j^p X_i^p$ | p specific factors in simple multiplicative form |
3 | $β_{jq} X_i^q$ | q factors that vary with j but not with i. | |
4 | $β_p X_j^p$ | $X_{ij}^p := X_j^p$ | state constants Dj |
5 | $β_j$ | state dependent intercept | |
6 | $β_p (J_i^p == j)$ | usage of a recorded preference |
regression
The $β_p$’s are found by maximizing the likelihood $L(Y | β)$ which is equivalent to finding the maximum of $\sum\limits_{i} \left[{ \sum\limits_{j}Y_{ij} \times v_{ij} - N_i \times log \sum\limits_{j}w_{ij}}\right]$
First order conditions, for each $p$: $0 = { \partial L \over \partial\beta_p } = \sum\limits_{i} \left[{ \sum\limits_{j}Y_{ij} \times { \partial v_{ij} \over \partial \beta_p } - N_i \times { \partial log \sum\limits_{j}w_{ij} \over \partial \beta_p }} \right]$
Thus, for each $p$: $\sum\limits_{ij} Y_{ij} \times X_{ijp} = \sum\limits_{ij} N_i \times P_{ij} \times X_{ijp}$ as ${ \partial v_{ij} \over \partial \beta_p } = X^p_{ij}$ and
$({\partial log \sum\limits_{j}w_{ij} \over \partial \beta_p } \times {\sum\limits_{j} {\partial w_{ij} / \partial \beta_p } \over \sum\limits_{j}w_{ij} } \times {\sum\limits_{j} {w_{ij} \times \partial v_{ij} / \partial \beta_p } \over \sum\limits_{j}w_{ij} } \times {\sum\limits_{j} {w_{ij} \times X_{ijp} } \over \sum\limits_{j}w_{ij} } \times \sum\limits_{j} P_{ij} \times X_{ijp} )$
example
logit regression of rehousing logit_regression_of_rehousing “wikilink”.