Pitfall: Non-Linearity of Marginal Effects in Logistic Regression


In the previous post on logistic regression, it was shown that the absolute coefficients within logistic regression models are difficult to interpret meaningfully due to their reference units. Another challenge in interpreting logistic regression weights has not yet been explicitly addressed: The effect of increasing an independent variable by one unit on the expression of the dependent variable, known as the marginal effect, is always influenced by the specific values of the independent variable under consideration as well as all other independent variables in logistic regression.
Estimation by Indirect Means
This is because the probability of occurrence of a given category of a categorical dependent variable in logistic regression is estimated in a nonlinear manner, essentially through an intermediate step: In principle, in a first step, the expression of an unobservable variable, a so-called latent variable, is modeled using an ordinary linear model. This latent variable reflects the "propensity" for the occurrence of the category of the dependent variable under consideration. (The occurrence of the category of interest of the dependent variable is conventionally denoted as y = 1.)
$$y_{l} =beta_{ 0 }+beta_{ 1 }x_{ 1 } + ... + beta_{ n }x_{ n }$$
The logistic regression model assumes that the observed categorical variable takes on the respective value of interest when the latent variable exceeds an arbitrarily chosen threshold of 0. The modeled values of the latent variable and the associated regression weights of the independent variables from the linear model must be transformed in order to determine the probability of occurrence of the category of interest of the dependent variable. This transformation requires knowledge of the distribution of the estimation errors. Within the logistic regression model, it is assumed that the errors follow a logistic distribution. Since not only the functional form but also the exact distribution of the errors must be known, the non-estimable variance of the error distribution as well as its conditional expectation value are fixed at $sigma^{ 2 } =pi^{ 2 } / 3$ and $E(epsilon|x) = 0$. This results in the fundamental equation of the logistic model:
$$P(y = 1| x) = frac{ e^{beta_{ 0 }+beta_{ 1 }x_{1} + ... + beta_{ n }x_{ n }} }{ 1 + e^{beta_{ 0 }+beta_{ 1 }x_{ 1 } + ... + beta_{ n }x_{ n }} } = frac{e^ { x'beta } }{ 1 + e^{x'beta} } = frac{ e^{Logit} }{ 1 + e^{Logit} }$$
Due to this estimation method and transformation, logistic regression coefficients represent the linear relationship between the independent variables and the latent variable, or the logits, or logarithmized odds ratios, for the respective category of the dependent variable under consideration. However, the relationship between logits, odds ratios, and regression coefficients on the one hand and the probabilities of occurrence of the categories of the dependent variable on the other is not linear. This non-linearity is always evident in the equations of the logistic regression model. In particular, when examining the exponentiated logistic regression coefficients, the odds ratios, it becomes immediately clear that the regression weights are multiplicatively rather than additively linked in a linear fashion. Odds ratios indicate a factorial change in probability of occurrence, whose absolute magnitude naturally depends on the "baseline probability."
Basic equation of the logistic model:
$$P(y = 1| x) = frac{e^ { x'beta } }{ 1 + e^{x'beta} } = frac{ e^{Logit} }{ 1 + e^{Logit} }$$
...resolved according to the logit:
$$Logit = lnfrac{ P }{ 1- P } = beta_{ 0 }+beta_{ 1 }x_{ 1 } + ... + beta_{ n }x_{ n }$$
... and additionally de-logarithmised:
$$OR := e^{ Logit } = e^{ lnfrac{ P }{ 1- P } } = e^{ beta_{ 0 }+beta_{ 1 }x_{ 1 } + ... + beta_{ n }x_{ n } } = e^{beta_{ 0 }}times e^{beta_{ 1 }x_{ 1 }} times ... times e^{beta_{ n }x_{ n }}$$
An Intuitive Illustration
Why marginal effects of independent variables depend on the exact values of all independent characteristics can be intuitively understood as follows: An increase in the (latent) propensity for the occurrence of the considered category of the dependent variable by a certain amount has a negligible effect on the predicted (probability of) the actually observed category of the dependent variable when the propensity is already very high or very low—far above or below the threshold. However, if the (latent) propensity is close to the threshold, an increase in propensity by the same amount is much more likely to be decisive for the predicted (probability of) category of the dependent variable. The existing propensity for the occurrence of the considered category, in turn, depends on the exact values of all independent characteristics.
The non-linearity of the marginal effects of independent characteristics becomes particularly evident in graphical representations when the probability of occurrence is plotted against the values of an independent characteristic: The slope of the probability curve is not constant.

Solution: AME and MEM
There are various ways to address the non-linearity when interpreting logistic models. One approach is to calculate or plot (changes in) probabilities of occurrence for different combinations of values, both for the independent variable of interest and the remaining independent variables, allowing for direct contrasts. However, if marginal effects are to be expressed in a compact, summarizing metric, AMEs or MEMs can be computed. The average marginal effect (AME) represents the average effect of increasing the independent variable by one unit, averaged over all available observations. The marginal effect at the mean (MEM), on the other hand, evaluates the effect of increasing the independent variable by one unit at the mean value of all independent variables. However, it is important to note that AMEs do not adequately capture the substantively meaningful non-linearity of effects, and crucial information about the effects of the independent variables is simply disregarded.