0 \label{sel} % \end{align} % where $\nu$ and $\epsilon$ are potentially correlated. Assume that % $\nu$ and $\epsilon$ are independent of $x$ and $z$. We stated that % it is possible to identify this model without making any parametric % assumptions about the distribution of $\nu$ and $\epsilon$. The key % to showing identification is to assume that there is a set of values % of $z$, $Z^\infty$ that occur with positive probability such that % $P(g(z) > \nu|z \in Z^\infty) = 1$. In this set, there is no % selection problem since the fact that $z \in Z^\infty$ and $y$ is % observed tells us nothing about the value of $\nu$. This means that % $\mu(x)$ could be consistently estimated by standard methods using % just the observations with $z \in Z^\infty$. Given $\mu(x)$, the rest % of the parameters are easy to identify. % This sort of identification argument is often referred to % identification at infinity because it relies on pushing $z$ off to an % extreme value. It is a fairly common method of proving % identification. Unfortunately, at least for selection models, it is % quite fragile. If there is no $z$ with $P(g(z) >\nu) = 1$, then % identification completely breaks down and there is no finite bound on % $\mu(x)$ unless the support of $y$ is bounded. In practice this means % that estimating the model nonparametrically can be very sensitive to % the exact choice of method. % Manski (1989) pointed out that though we cannot bound the conditional % mean of $y$, we can always bound the conditional distribution. % We can write the conditional distribution of $y$ given $x$ as: % \[ % F(y|x) = F(y|x,z=1)P(z=1|x) + F(y|x,z=0)P(z=0|x) % \] % The only unobserved part of the right side of the equation is % $F(y|x,z=0)$. Without some assumptions, all we know about % $F(y|x,z=0)$ is that it is between $0$ and $1$. % This suggests the following bound on % the conditional distribution of $y$ given $x$: % \begin{align} % F(y|x,z=1)P(z=1|x) \leq F(y|x) \leq F(y|x,z=1) P(z=1|x) + % P(z=0|x) \label{distBound} % \end{align} % We can invert these bounds on the distribution function to obtain a % bound on the conditional quantile function of % $y$. I first heard about this idea in some lecture notes by Koenker % that Victor showed. I don't think anyone has actually implemented % it. The lower bound, $Q_0(\tau|x)$ solves: % \begin{align} % \tau = & F(Q_0|x,z=1)P(z=1|x) + P(z=0|x) \notag \\ % \frac{\tau - P(z=0|x)}{P(z=1|x)} = & F(Q_0|x,z=1) \notag \\ % Q_0(\tau|x) = & \begin{cases} Q_{Y}\left( \frac{\tau - % P(z=0|x)}{P(z=1|x)} |x,z=1\right) & \text{ if } \tau \geq % P(z=0|x) \\ % \underline{y} & \text{ otherwise} % \end{cases} % \end{align} % where $\underline{y}$ is the smallest possible value of $y$ (possibly % $-\infty$). Similarly, the upper bound is % \begin{align} % Q_1(\tau|x) = & \begin{cases} Q_{Y}\left( \frac{\tau}{P(z=1|x)} |x,z=1\right) & \text{ if } \tau \leq % P(z=1|x) \\ % \overline{y} & \text{ otherwise} % \end{cases} % \end{align} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \newpage \section{Quantile regression with endogeneity} Just like OLS always estimates the best linear approximation to the conditional expectation function, quantile regression always estimates the (weighted) best linear approximation to the conditional quantile function. To meaningfully talk about endogeneity, we must have some model in mind where the function we want to estimate is not the conditional quantile function we observe in the data. This model could simply come from a thinking about an ideal experiment and the causal effects it would reveal, or it could come from a more structured economic model. In any case, suppose our model implies that \begin{align*} y_i = & x_i'\beta(\tau) + z_{1i} \gamma(\tau) + u_i \\ x_i = & z_i' \pi(\tau_r) + v_i \end{align*} where $x_i$ are some endogenous regressors, $z_i = (z_{1i},z_{2i})$ are exogenous, and we are primarily interested in estimating $\beta(\tau)$ and $\gamma(\tau)$. Recall that if we were doing mean regression, there are three essentially equivalent ways to proceed.\footnote{This comparison of the three approaches to endogeneity is based largely on \cite{lee2007} and \cite{iwqr2007}} \begin{enumerate} \item\label{2sls} 2SLS: regress $x_i$ on $z_i$, form $\hat{x}_i = z_i \hat{\pi}$, then regress $y_i$ on $\hat{x}_i$ and $z_{1i}$. \item\label{cf} Control function: regress $x_i$ on $z_i$, estimate the residuals $\hat{v}_i$, and then regress $y_i$ on $x_i$, $z_{1i}$, and $\hat{v}_i$. \item\label{iv} IV: form instruments $w_i = \Phi(z_i)$ do GMM using the moment conditions $\Er[ (y_i - x_i'\beta - z_{1i} \gamma) w_i ] = 0$. \end{enumerate} Two-stage least square and the control function approach produce identical estimates. The IV estimate generally differs, but all three approaches are consistent under the assumption that $\Er[u_i | z_i ] = 0$\footnote{2SLS and control function just need $\Er[u_i z_i] = 0$. IV works with this assumption as well, but only for $\Phi(z_i) = z_i$}. If you do not remember this result, it would be a good review exercise to show it. There are quantile analogs of 2SLS, control functions, and IV, but interestingly, they each require different assumptions about $u_i$ and $v_i$. \subsection{Two stage quantile regression} Two stage quantile regression was first studied by \cite{amemiya1982} and \cite{powell1983}. These papers specifically focused on the median and called the estimator two-stage least absolute deviations, but the same sort of analysis applies to any quantile. In the first stage, we estimate $\hat{\pi}(\tau_r)$ by quantile regression. Note that the quantile in the first stage need not equal the quantile in the second stage, but it is hard to imagine why they would differ. In the second stage, we perform quantile regression on \[ y_i = (z_i \hat{\pi}(\tau_r) + \hat{v}_i ) \beta(\tau) + z_{1i} \gamma(\tau) + u_i. \] Under the assumptions in \ref{s:inference} $\hat{\pi}(\tau_r) \inprob \pi(\tau_r)$ and $\hat{v}_i \inprob v_i$. Then, the second stage estimates will converge to the weighted best linear approximation to the $\tau$th conditional quantile of \[ (z_i \pi(\tau_r) + v_i)' \beta(\tau) + z_{1i} \gamma(\tau) + u_i = (z_i \pi(\tau_r))'\beta(\tau) + z_{1i} \gamma(\tau) + u_i + v_i'\beta(\tau). \] If the $\tau$ quantile of $u_i + v_i'\beta(\tau)$ conditional on $z$ does not depend on $z$, then it is clear that $\beta(\tau)$ and $\gamma(\tau)$ will be consistently estimated (except the intercept). In contrast to mean regression, this is assumption about $u_i + v_i'\beta$ instead of $u_i$. Conditional expectations are linear, so with mean regression we have \[ \Er[ u_i + v_i'\beta(\tau) | z] = \Er[u_i|z] + \Er[v_i|z]'\beta \] and $\Er[v_i|z] = 0$ because of the properties of projections. Conditional quantiles do not have this linearity property, and $Q_v(\tau|z)$ need not equal zero unless $\tau = \tau_r$. The required assumption for 2SQR to be consistent, $Q_{u+v'\beta}(\tau|z)$ does not depend on $z$, is difficult to interpret. Consequently, there have not been many applications of this approach. % FIXME: combination of ols first stage, qr second stage \subsection{Control function approach to quantile regression} \cite{lee2007} analyzes the control function approach to quantile regression with endogeneity. \cite{blundellPowell2007} study the control function approach to quantile regression with endogeneity and censoring. To motivate the control function, first observe that, as long as $x = f(z,\nu)$ for some function that is invertible in $\nu$, it will always be true that \[ Q_u(\tau|x,z) = Q_u(\tau|v,z). \] Now, suppose that $(v,u)$ is independent of $z$. This is the strongest sense in which $z$ could be assumed exogenous. Then, \begin{align} Q_u(\tau|v,z) = Q_u(\tau|v). \label{e:qcfa} \end{align} In this case, \[ Q_y(\tau|x,z,v) = x'\beta(\tau) + z_{1}'\gamma(\tau) + Q_u(\tau|v). \] In fact, to get this equation, we just need to assume (\ref{e:qcfa}) instead of independence. This sort of model, where want to estimate a function that is linear in some variables plus an unrestricted function of some other variables is called a partially linear model. If we observed $v$, we could estimate it be doing a quantile regression of $y_i$ on $x_i$, $z_{1i}$, and a series or kernel of $v_i$. $v$ is not observed, but we can estimate it. We begin by estimating $\hat{\pi}(\tau_r)$ using quantile regression. We then form an estimate of $\hat{v}_i = x_i - z_i \hat{\phi}(\tau_r)$. Finally, we can estimate $\beta$, $\gamma$, and $Q_u(\tau|v)$ by performing a partially linear quantile regression. See \cite{lee2007} or \cite{blundellPowell2007} for details. \subsection{IV quantile regression} Instrumental variable quantile regression assumes that \[ Q_u(\tau|z) = 0. \] It is immediate that \[ Q_{y-x'\beta(\tau) - z_1'\gamma(\tau)}(\tau|z) = 0. \] In terms of a minimization problem, this means that \[ 0 \in \argmin_{f \in \mathcal{F}} \Er\left[ \rho_\tau\left(y - x'\beta(\tau) - z_1'\gamma(\tau) - f(z) \right) \right]. \] This suggests estimating $\beta$ and $\gamma$ as follows. Choose $w_i = \Phi(z_i)$. Define $\hat{\alpha}(\beta,\tau)$ and $\hat{\gamma}(\beta,\tau)$ as \[ \left(\hat{\gamma}(\beta,\tau), \hat{\alpha}(\beta,\tau) )\right) = \argmin \En\left[ \rho_\tau\left(y - x'\beta - z_1'\gamma - w'\alpha\right) \right]. \] In other words, $\hat{\gamma}(\beta,\tau)$ and $\hat{\alpha}(\beta,\tau)$ are the coefficients from a quantile regression of $y-x'\beta$ on $z_1$ and $w$. At the true $\beta$ and $\gamma$, we should get $\alpha=0$. Therefore, estimate $\beta$ by \begin{align} \hat{\beta}(\tau) = \argmin_\beta \norm{\hat{\alpha}(\beta,\tau)}, \notag \end{align} and set $\hat{\gamma}(\tau) = \hat{\gamma}(\hat{\beta}(\tau),\tau)$. \cite{ch2006} discuss inference for IV quantile regression. \subsubsection{Quantile treatment effects} \cite{ch2005} show how the assumption \begin{align} Q_u(\tau|z) = 0 \label{e:qiv} \end{align} arises from a model of quantile treatment effects. As in the previous notes on treatment effects, suppose we have some treatment $d$. For each possible value of $d$ there is a potential outcome $Y_d$. We have some exogenous covariates $x$ and instruments $z$. $Y_d$ is given by \[ Y_d = q(d,x,U_d) \] where $U_d \sim U(0,1)$. $q$ is called the quantile treatment response, and we are interested in quantile treatment effects defined as \[ q(d_1,x,\tau) - q(d_0,x,\tau). \] These represent the change in the $\tau$th quantile of outcomes conditional on $x$ if everyone is changed from treatment $d_0$ to treatment $d_1$. Note that we can write the average treatment effect in terms of quantile treatment effects by integrating, $\int_0^1 q(d_1,x,\tau) - q(d_0,x,\tau) d\tau$. \cite{ch2005} show that the following assumptions imply (\ref{e:qiv}). \begin{assumption}[Potential outcomes\label{qte1}] Conditional on $X = x$, for each $d$, \[ Y_d = q(d,x,U_d) \] where $U_d \sim U(0,1)$ and $q$ is strictly increasing in its third argument. \end{assumption} \begin{assumption}[Independence\label{qte2}] Conditional on $X=x$, $\{U_d\}$ are independent of $Z$. \end{assumption} \begin{assumption}[Selection\label{qte3}] \[ D = \delta(Z,X,V) \] for some unknown function $\delta$ and random vector $V$. \end{assumption} \begin{assumption}[Rank similarity\label{qte4}] Conditional on $X=x$ and $Z=z$, $\{U_d\}$ are identically distributed, conditional on $V$. \end{assumption} \begin{assumption}[Observed variables \label{qte5}] $Y=q(D,X,U_D)$, $D$, $X$, and $Z$ are observed. \end{assumption} \begin{theorem} \label{thm:qte} If assumptions \ref{qte1}-\ref{qte5} hold, then for all $\tau \in (0,1)$, a.s. \begin{align} \Pr \left( Y \leq q(D,X,\tau) | X,Z\right) = \tau \label{e:qte} \end{align} and $U_D \sim U(0,1)$ conditional on $Z$ and $X$. \end{theorem} \begin{proof} See \cite{ch2005}. \end{proof} Given the result of theorem \ref{thm:qte} and the definition of conditional quantiles, an immediate implication is that \[ Q_{Y - q(D,X,\tau)} (\tau | X,Z) = 0. \] If we additionally assume that $q(D,X,\tau) = D\beta(\tau) + X \gamma(\tau)$ we have exactly the same condition as in instrumental variables quantile regression. Theorem \ref{thm:qte} gives us an estimating equation for $q(d,x,\tau)$, but it does not guarantee that $q(d,x,\tau)$ is the only function satisfying (\ref{e:qte}). We need an additional assumption to guarantee that the solution to (\ref{e:qte}) is unique. This assumption is the analog of the rank condition for mean IV regression. Suppose the treatment and the instrument are both binary. Conditional on $X=x$, $q(D,X,\tau)$ is just a two-dimensional vector, $q_0 \equiv q(0,X,\tau)$ and $q_1 \equiv q(1,X,\tau)$. Then for each possible $\tilde{q}=\tilde{q}_0,\tilde{q}_1$, (\ref{e:qte}) can be written, \[ \Pi(\tilde{q}) \equiv \begin{pmatrix} \Pr(Y \leq \tilde{q}_0(1-D)+D \tilde{q}_1 | X,Z=0) - \tau \\ \Pr(Y \leq \tilde{q}_0(1-D)+D \tilde{q}_1 | X,Z=1) - \tau \end{pmatrix} = 0 \] We want to know whether $\tilde{q}=q$ uniquely solves this equations. We know that this equation has a locally unique solution if its Jacobian has rank two. The Jacobian can be written \begin{align*} \Pi'(\tilde{q}) = & \begin{pmatrix} f_Y(\tilde{q}_0|X,D=0,Z=0) \Pr(D=0|X,Z=0) & f_Y(\tilde{q}_1|X,D=1,Z=0) \Pr(D=1|X,Z=0) \\ f_Y(\tilde{q}_0|X,D=0,Z=1)\Pr(D=0|X,Z=1) & f_Y(\tilde{q}_1|X,D=1,Z=1)\Pr(D=1|X,Z=1) \end{pmatrix} \\ = & \begin{pmatrix} f_{Y,D}(\tilde{q}_0,0|X,Z=0) & f_{Y,D}(\tilde{q}_1,1|X,Z=0) \\ f_{Y,D}(\tilde{q}_0,0|X,Z=1) & f_{Y,D}(\tilde{q}_1,1|X,Z=1) \end{pmatrix} \end{align*} For this to have full rank when $\tilde{q}=q$, we need \begin{align*} f_{Y,D}(q_0,0,|X,Z=0) f_{Y,D}(q_1,1|X,Z=1) - & f_{Y,D}(q_1,1|X,Z=0) f_{Y,D}(q_0,0|X,Z=1) > 0 \\ \frac{f_{Y,D}(q_1,1|X,Z=1)} {f_{Y,D}(q_0,1|X,Z=1)} > & \frac{f_{Y,D}(q_1,0|X,Z=0)} {f_{Y,D}(q_0,0,|X,Z=0)} \end{align*} It would be a good exercise to try to think of a simpler assumption that implies this condition. See \cite{ch2005} for one such condition. Similar identification conditions can be stated when the treatment and/or instrument are continuous, see \cite{ch2005}. Assumptions \ref{qte1}-\ref{qte5} are fairly natural if we have a randomized experiment, or if we think of the instrument as inducing a natural experiment. However, they also make sense if we think about some other models. \cite{ch2006} give two examples: a Roy model of education, and supply and demand. Homework 5 goes over the supply and demand example. \begin{example}[Roy model of education] Let $d \in \mathcal{D} := \{0,1,...,\bar{d}\}$ be possible levels of education. Suppose that potential earnings are \[ Y_d = q(d,x,U) \] where the rank variable, $U$, is determined by ability and other factors that do not vary with $d$. $d$ is chosen to maximize utility, \[ D = \argmax_{d \in \mathcal{D}} \Er[ W(Y_d,d,X) | X ,Z, \nu] \] where $W$ is an unobserved utility function, and $\nu$ includes unobserved information that is correlated with $U$, and other shocks that affect the education decision. \end{example} \subsubsection{Comparison to control function quantile regression} Control function quantile regression requires that the first stage error, $v$, and the endogenous variable, $X$, have the same type\footnote{More formally, their supports need to have the same topology.} of support so that \[ Q_u(\tau|x,z) = Q_u(\tau|v,z). \] In particular, if $v$ is continuous, $x$, must also be continuous. This rules out the common cases where the endogenous variable is binary and generated from a latent index model. Additionally, the assumption that $Q_u(\tau|v,z) = Q_u(\tau|v)$ required for the control function approach is not the same as the assumption that $Q_u(\tau|z) = 0$ required for IV quantile regression. Neither assumption implies the other. It would be interesting to come up with some examples where one assumption holds but not the other. \subsubsection{Computation} As described above, IV quantile regression can be computed by solving \begin{align} \hat{\beta}(\tau) = \argmin_\beta \norm{\hat{\alpha}(\beta,\tau)}, \label{e:amin} \end{align} where \[ \left(\hat{\gamma}(\beta,\tau), \hat{\alpha}(\beta,\tau) )\right) = \argmin \En\left[ \rho_\tau\left(y - x'\beta - z_1'\gamma - w'\alpha\right) \right]. \] Given $\beta$, $\hat{\gamma}(\beta,\tau)$ and $\hat{\alpha}(\beta,\tau)$ are given by a standard quantile regression. These are fast and easy to compute. However, $\norm{\hat{\alpha}(\beta,\tau)}$ need not be a well-behaved function. If the identification condition holds, then $\norm{\alpha(\beta,\tau)}$ has a unique minimum. However, $\norm{\hat{\alpha}(\beta,\tau)}$ need not have a unique minimum in finite samples. Additionally, $\norm{\hat{\alpha}(\beta,\tau)}$ tends to have many local minima. Figure \ref{fig:amin} shows two examples of $\norm{\hat{\alpha}(\beta,\tau)}$ as function of $\beta$. These were generated using the data for homework 5. The left panel uses mixed and stormy as instruments, the right panel uses mixed, stormy, and wind speed as instruments. Both have $\tau=0.5$. As shown, $\norm{\hat{\alpha}(\beta,\tau)}$ can be very badly behaved. When $\beta$ is only one or two dimensional, this can be overcome by doing an exhaustive grid search for minimization. However, as the dimension of $\beta$ increases, the number of points in a grid needed to achieve a fixed precision increases exponentially. \begin{figure}\caption{$\norm{\hat{\alpha}(\beta,\tau)}$ \label{fig:amin}} \begin{tabular}{cc} \includegraphics[width=0.48\linewidth]{../hw5/figures/b50sm} & \includegraphics[width=0.48\linewidth]{../hw5/figures/b50smw} \end{tabular} \end{figure} When the dimension of $\beta$ is high, the MCMC approach of \cite{chernozhukovHong2003} can be used to compute IV quantile regression. Let \[ \hat{g}(\beta,\gamma) = \En\left[ (\tau - 1\{Y < X'\beta + Z_1'\gamma \} ) Z \right], \] \[ W_n = \frac{1}{\tau(1-\tau)} \En[ZZ'], \] and \[ \hat{Q}(\beta,\gamma) = n \frac{1}{2} \hat{g}(\beta,\gamma)'W_n \hat{g}(\beta,\gamma). \] Then \cite{chernozhukovHong2003} show that the quasi-posterior mean, \[ (\hat{beta}^{MCMC},\hat{\gamma}^{MCMC}) \int (\beta,\gamma) \frac{e^{-\hat{Q}(\beta,\gamma)}} { \int e^{-\hat{Q}(\beta,\gamma)} d(\beta,\gamma) } d (\beta,\gamma) \] converges to $\beta$. Furthermore, $\hat{\beta}^{MCMC},\hat{\gamma}^{MCMC}$ have the same asymptotic distribution as the estimates you would get from minimizing $\norm{\hat{\alpha}(\beta,\tau)}$ (when the norm is chosen appropriately). Finally, the quasi-distribution, \[ \frac{e^{-\hat{Q}(\beta,\gamma)}} { \int e^{-\hat{Q}(\beta,\gamma)} d(\beta,\gamma) } \] can be used for inference, so, for example, the interval from the 2.5\%tile to the 97.5\%tile of this distribution is a valid confidence interval for $\beta$. Finding posterior distributions is central to Bayesian statistics. As a result, there are many methods for simulating from posterior distributions. Collectively, these methods are called Markov Chain Monte Carlo or MCMC. They construct a Markov Chain with transition densities that can be easily simulated and with stationary distribution equal to the desired posterior. One general purpose MCMC method is the Metropolis-Hastings algorithm. Let $\theta=(\beta,\gamma)$. Suppose we have some conditional density that can easily be sampled, $q(\theta'|\theta)$. In applications, $q(\theta'|\theta)$ is often a random walk, such as \[ \theta'|\theta \sim N(\theta,\sigma^2) \] or \[ \theta'|\theta \sim U(\theta-\sigma,\theta+\sigma). \] In the Metropolis Hasting algorithm you then \begin{enumerate} \item Choose a starting value $\theta^{(0)}$. \item Draw $\xi$ from $q(\xi|\theta^{(j)})$. \item Set \[ \theta^{(j+1)} = \begin{cases} \xi & \text{ with probability } \rho(\theta^{(j)},\xi) \\ \theta^{(j)} & \text{ with probability } 1 - \rho(\theta^{(j)},\xi) \end{cases} \] where \[ \rho(\theta,\xi) = \min\{ \frac{e^{-\hat{Q}(\xi)} q(\theta|\xi)} {e^{-\hat{Q}(\theta)} q(\xi|\theta) }, 1\}. \] \end{enumerate} Note that this algorithm is more likely to accept $\xi$ when $e^{-\hat{Q}(\xi)}$ is relatively high. This ensures that the stationary distribution of the chain is proportional to $e^{-\hat{Q}(\xi)}$. If we use either a normal or uniform random walk for $q(\theta'|\theta)$, then we face a tradeoff when choosing $\sigma$. If $\sigma$ is low, we will accept many draws of $\xi$, but the values of $\theta^{(j)}$ will be close together. This may lead us to need many draws for $\theta^{(j)}$ to adequately explore the support of $\theta$. On the other hand if $\sigma$ is high, accepted draws of $\theta^{(j)}$ will be further apart, but the probability of acceptance is lower. See the references in \cite{chernozhukovHong2003} for more information. It will take some time for the draws of $\theta^{(j)}$ to converge to their stationary distribution, so the first $B_0$ draws should be discarded. Then, the average of the next $B_1$ draws can be used as estimate of $\hat{\beta}^{MCMC}$, and confidence regions can be constructed from the quantiles of these draws. \subsubsection{Local QTE} \cite{aai2002} develop an approach to quantile treatment effects that is similar to LATE. As in LATE, suppose there is a binary treatment $d$ and a binary instrument $z$. Let $Y_d$ be the potential outcomes and $D_z$ be the potential treatments. As in LATE, assume the following, \renewcommand{\theassumption}{LQTE--A\arabic{assumption}} \begin{assumption} \begin{enumerate} \item \emph{Independence}: $(Y_1,Y_0,D_1,D_0) \indep Z | X$ \item \emph{Nontrivial assignment}: $\Pr(Z=1|X) \in (0,1)$ \item \emph{First-stage}: $\Er[D_1|X] \neq \Er[D_0 | X]$ \item \emph{Monotonicity}: $\Pr(D_1\geq D_0 | X) = 1$. \end{enumerate} \end{assumption} As in LATE, with these assumptions, we can identify the quantile treatment effect for compliers ($D_1 > D_0$). For simplicity, assume linearity. \begin{assumption} \[ Q_Y(\tau| X, D, D_1 > D_0) = \alpha(\tau) D + X'\beta(\tau) \] \end{assumption} It follows that \[ (\alpha(\tau),\beta(\tau)) = \argmin_{\alpha,\beta} \Er\left[ \rho_\tau(Y - \alpha D - X'\beta) | D_1 > D_0 \right] \] The event $D_1 > D_0$ is unobserved. However, it can be shown that \begin{align*} \Er\left[ \rho_\tau(Y - \alpha D - X'\beta) | D_1 > D_0 \right] \Pr(D_1>D_0) = & \Er\left[\rho_\tau(Y - \alpha D - X'\beta) \left(1 - \frac{D(1-Z)}{1-\Pr(Z=1|X)} - \frac{(1-D)Z}{\Pr(Z=1|X)} \right)\right] \\ = & \Er\left[\rho_\tau(Y - \alpha D - X'\beta) \kappa \right] \end{align*} with $\kappa \equiv 1 - \frac{D(1-Z)}{1-\Pr(Z=1|X)} - \frac{(1-D)Z}{\Pr(Z=1|X)}$. $\kappa$ is an estimable function of observable variables, so \[ (\alpha(\tau),\beta(\tau)) = \argmin_{\alpha,\beta} \Er\left[ \rho_\tau(Y - \alpha D - X'\beta) \kappa \right] \] can be used for estimating. Since the population objective function is equal to the conditional expectation of the check function, it must be convex. However, in finite samples $\kappa$ can be both positive and negative, so the sample minimization problem, \[ (\hat{\alpha}(\tau),\hat{\beta}(\tau)) = \argmin_{\alpha,\beta} \En\left[ \rho_\tau(Y - \alpha D - X'\beta) \kappa \right], \] need not be convex. However, by iterated expectations, \[ \Er\left[ \rho_\tau(Y - \alpha D - X'\beta) \kappa \right] = \Er\left[ \rho_\tau(Y - \alpha D - X'\beta) \Er[\kappa|Y,D,X] \right] \] and as \cite{aai2002} show, $\Er[\kappa|Y,D,X] = \Pr(D_1>D_0 | Y,D,X) \geq 0$. This suggests estimating by solving \[ (\hat{\alpha}(\tau),\hat{\beta}(\tau)) = \argmin_{\alpha,\beta} \En\left[ \rho_\tau(Y - \alpha D - X'\beta) \widehat{\Er[\kappa|Y,D,X]} \right], \] where $\widehat{\Er[\kappa|Y,D,X]}$ is some non-negative consistent estimate of \[ \Er[\kappa|Y,D,X] = \Er\left[1 - \frac{D(1-\Er[Z|Y,D,X])}{1-\Pr(Z=1|X)} - \frac{(1-D)\Er[Z|Y,D,X]}{\Pr(Z=1|X)}\right]. \] \cite{aai2002} propose estimating $\Pr(Z=1|X)$ and $\Er[Z|Y,D,X]$ by series regression, and then setting \[ (\hat{\alpha}(\tau),\hat{\beta}(\tau)) = \argmin_{\alpha,\beta} \En\left[ \rho_\tau(Y - \alpha D - X'\beta) \widehat{\Er[\kappa|Y,D,X]}1\{ \widehat{\Er[\kappa|Y,D,X]} \geq 0\} \right]. \] The indicator function here is just to ensure convexity of the sample objective function. Asymptotically, $\Pr(\widehat{\Er[\kappa|Y,D,X]} \geq 0) \to 1$, so the indicator function does not affect asymptotic behavior of the objective function. \cite{aai2002} give conditions for $\hat{\alpha}$ and $\hat{\beta}$ to be $\sqrt{n}$ asymptotically normal and give the asymptotic variance. % FIXME: comparison to IVQR. \newpage \section{Applications \label{s:apply}} \subsection{Quantile regression} \subsubsection{Subjective wine quality and physical wine characteristics} Figure \ref{fig:redCoeff} shows quantile regression estimates from regressing wine quality (measured by the median of three reviewers' assessments on a scale from 1 to 10) on various chemical characteristics of the wine. The data comes from \cite{ccmr2009} and is available at the UCI machine learning repository, \url{http://archive.ics.uci.edu/ml/datasets/Wine+Quality}. Each of the covariates have been standardized to have mean zero and standard deviation one. The solid red line is the OLS estimate, and the dashed lines form a 95\% confidence interval. The dotted black line are the quantile regression estimates as a function of $\tau$, and the gray region is a 95\% confidence interval. Figure \ref{fig:whiteCoeff} shows the same thing for white wine. Figure \ref{fig:wineQA} shows a scatter plot of quality as a function of alcohol, and the fitted quantile regression lines for the bivariate quantile regression of quality on alcohol. \begin{figure}\caption{Quantile regression of red wine quality on physical characteristics \label{fig:redCoeff}} \includegraphics[width=\linewidth]{wine/redCoefs} \end{figure} \begin{figure}\caption{Quantile regression of white wine quality on physical characteristics \label{fig:whiteCoeff}} \includegraphics[width=\linewidth]{wine/whiteCoefs} \end{figure} \begin{figure}\caption{Wine quality and alcohol content \label{fig:wineQA}} \begin{tabular}{cc} Red & White \\ \includegraphics[width=0.48\linewidth]{wine/redQA} & \includegraphics[width=0.48\linewidth]{wine/whiteQA} \end{tabular} \end{figure} \newpage \subsubsection{Decomposition of distribution changes \label{s:decomp}} One popular application of quantile regression has been the analysis of the change in inequality. During the 80s and 90s, income inequality increased in the US and much of the rest of the world. Labor economists have been interested in the mechanism through which this happened. One way of thinking about the increase in inequality is to try to break it into (1) changes in the observed distribution of characteristics, (2) changes in the prices of worker characteristics, and (3) residual changes. \cite{juhnMurphyPierce1993} were the first to consider this sort of decomposition, but they had a somewhat ad-hoc method. \cite{dfl1996} proposed a more complicated method based on kernel re-weighting. Many others have used similar methods. \cite{melly2005} and \cite{machadoMata2005} use quantile regression to perform the decomposition. In a quantile regression, \[ y_{it} = x_{it} \beta_t(\tau) \] $x_{it}$ are the observed characteristics, $\beta_t$ are the prices, and $\tau$ captures residual changes. For each year, $t$, we can estimate $\beta_t(\tau)$. We can use these estimates to simulate what $y_{it}$ would have been had $x$'s been distributed as in year $s$, \[ \hat{y}_{it|s} = x_{is} \hat{\beta}_t(\tau) \] We can seperate out residual inequality by looking at the distribution of \[ \hat{u}_{it|s} = \hat{y}_{it|s} - E[\hat{y}_{it|s}|x_{is}] = x_{is}(\hat{\beta}_t(\tau) - \hat{\beta}_{1/2}(\tau)) \] \cite{acf2006} is, in part, about how to interpret this sort of quantile regression if the true conditional quantile is not linear. A main result is that the $x\beta(\tau)$ minimizes a weighted squared difference from the true conditional quantile function. \subsection{IVQR and LQTE} \subsubsection{JTPA} The Job Training Partnership Act was a large publicly-funded job training program in the US. Individuals in the treatment were randomly offered training immediately, while individuals in the control group were only offered training 18 months later. \cite{aai2002} applied their local QTE estimator to the JTPA. The outcome variable is the sum of earnings in the 30 months following treatment. The treatment is receiving training, and the instrument is being offered training. The LATE assumptions are quite natural here. In particular, the monotonicity assumption seems sensible. The covariates include race, education, marital status, and age. Table II from \cite{aai2002} shows OLS and quantile regression estimates of the effect of the training program. Table III from \cite{aai2002} shows 2SLS and local QTE estimates. \newpage \includegraphics[width=\linewidth]{tabFig/aaitab2} \newpage \includegraphics[width=\linewidth]{tabFig/aait3} \newpage \cite{ch2008} also analyze the JTPA using IVQR instead of local QTE. Figure \ref{ch2008fig4} shows their estimates of the effect of training. The estimates are fairly similar to \cite{aai2002}, but not identical. The LQTE estimator of \cite{aai2002} and the IVQR estimator of \cite{ch2005} generally identify and estimate different quantities. The LQTE estimator identifies the quantile treatment effect conditional on being a complier. It only applies to binary treatments and instruments. The IVQR estimator identifies quantile treatment effects unconditional on being a complier, and applies non-binary as well as binary treatments and instruments. However, the IVQR estimator requires the perhaps stronger assumption of rank similarity. Nonetheless, IVQR and LQTE can estimate the same thing if the assumptions of both models are satisfied (mainly rank similarity and monotonicity), and the compliers are representative of the population. If these conditions are not met, then the two estimators will in general have different probability limits. Therefore, a comparison of results based on the two models can provide evidence on the plausibility of their assumptions. \begin{figure}\caption{Chernozhukov and Hansen (2008) figure 4\label{ch2008fig4}} \includegraphics[width=\linewidth]{tabFig/ch2008fig4} \end{figure} \newpage \subsubsection{401(k) participation and wealth} \cite{ch2004} estimate the effect of 401(k) participation and wealth. The outcome is wealth, the treatment is 401(k) participation, and the instrument is 401(k) eligibility. \cite{ch2004} discuss how rank similarity might not be a valid assumption in this context. They say, ``In the context of 401(k) participation, matching practices of employers could jeopardize the validity of the similarity assumption. This is because individuals in firms with high match rates may be expected to have a higher rank in the asset distribution than workers in firms with less generous match rates.' This suggests that the distribution of $U_d$ may be different across the treatment states.'' Still they argue, that rank similarity may still hold. As evidence of this, they also estimate LQTE and find that the results are similar to the IVQR estimates. \newpage \includegraphics[width=\linewidth]{tabFig/ch2008f1} \newpage \includegraphics[width=\linewidth]{tabFig/ch2008f2} \newpage \subsection{Quantile regression with censoring} Unlike mean regression, quantile regression can deal with censoring without making any distributional assumptions. Suppose $Y^*$ has conditional quantile function $Q_{Y^*}(\tau|X)$, but is not observed. Instead, we observe $Y_i = \max\{Y_i^*,C_i\}$. Then it is immediate that the conditional quantile function of $Y$ is \[ Q_Y(\tau|X) = \max\{Q_{Y^*}(\tau|X), C\}. \] If $Q_{Y^*}(\tau|X) = X'\beta(\tau)$, then we could estimate $\beta(\tau)$ by \[ \hat{\beta}(\tau) = \argmin_\beta \En\left[ Y - \max\{C,X'\beta(tau)\}\right]. \] This estimator was first proposed by \cite{powell1986}. Due to the $\max$ inside the check function, this objective function is not necessarily convex. Nonetheless, a number of effective computational strategies exist for finding the minimum, see the references on \cite{koenker2005} p253. A limitation of the censored quantile regression estimate of \cite{powell1986} is that it requires that the censoring point, $C_i$, be known for all observations. For some applications, this is a fine assumption. For example, if $Y$ is spending on some good, then we know $C = 0$. However, in other situations, we do not know $C$. For example, if $Y^*$ is how long someone survived after some medical treatment and $C$ is when either the study ends or when the person drops out of the study, then we cannot know when someone would have dropped out of the study if they had not died. \cite{portnoy2002} and \cite{hkp2002} describe an estimators for such situations with random censoring. Both estimators are based on constructing an estimate of $Y^*$ for the censored observations or $C$ for the uncensored observations, and using this estimate to adjust the check function appropriately. \subsubsection{Survival after hospitalization for ACF or MOSF} Figure \ref{fig:supp1} shows \cite{portnoy2002} censored quantile regression estimates. The outcome is survival time after hospitalization for patients with acute renal failure or multiple organ system failure\footnote{At least, I think that's what ACF/MOSF stands for.}. About 62\% of these patients died within a year, 50\% within 6 months, and 32\% within one month. The median survival time of those with observed deaths is 28 days. The median censored time of those with no observed death is 918 days. The data comes from the SUPPORT\footnote{See the following reference for more information about the SUPPORT study (funded by the Robert Wood Johnson Foundation): Knaus WA, Harrell FE, Lynn J et al. (1995): The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Annals of Internal Medicine 122:191-203.} study \url{http://biostat.mc.vanderbilt.edu/wiki/Main/SupportDesc}. \begin{figure} \caption{Censored quantile regression of survival time \label{fig:supp1}} \begin{minipage}{\linewidth} \includegraphics[width=\linewidth]{support/support1} \footnotetext{\texttt{num.co} is the number of co-morbidities (other conditions), \texttt{meanbp} is blood pressure, \texttt{wblc} is white blood cell count, \texttt{hrt} is heart rate, \texttt{resp} is respiratory rate, \texttt{temp} is body temperature, \texttt{sod} is sodium level. Other variables should be self-explanatory. All variables were measured within three days of hospitalization. The solid blue line are the randomly censored quantile regression coefficient estimates. The shaded blue area is a 95\% point-wise confidence band.} \end{minipage} \end{figure} \newpage \subsubsection{Censored quantile regression with endogeneity} \cite{blundellPowell2007} and \cite{cfk2011} estimate censored (with known censoring points) quantile regression with endogeneity using a control function approach. \cite{kowalski2009} uses this approach to estimate the price elasticity of medical care. She has detailed data on medical spending for families that face an annual piecewise linear cost of medical care. These families have insurance plans with a deductible, followed by a constant coinsurance rate, with a stoploss that caps total out of pocket spending. So, for the first $X$ dollars of medical care each year, the family must pay for it completely. For the next $X'$ dollars, the family pays $c<1$ for each dollar of medical care. If the family has spent more than $X''$ dollars that year, then any additional medical care has zero marginal price. The outcome is medical care consumed by an individual. The endogenous variable is the marginal cost of medical care at the end of the year. Whether another family member was injured in the year is used as an instrument. \newpage \includegraphics[width=\linewidth]{tabFig/kowTab} \newpage \bibliographystyle{econometrica} \bibliography{../628} \end{document}