1.Estimation of spatial autoregressive models for origin-destination flows: A partial likelihood approach (with Hanbat Jeong and Lung-Fei Lee) [Economics Letters, Volume 229, August 2023, 111202] [Link]

Abstract: We extend LeSage and Pace (2008)’s spatial autoregressive model for origin–destination flows by accommodating two-way fixed effects. A partial likelihood approach is used for estimation by applying an orthogonal transformation to remove fixed effects in the model. The quasi-maximum likelihood (QML) estimator of the partial log-likelihood function is consistent and asymptotically centered normal. Monte Carlo experiments verify this advantage in finite samples. From the U.S. migration flows, significant spatial influences are captured with smaller magnitudes than those from the model without fixed effects.

Working Papers

1.“A spatial dynamic panel data model for flow variables with asymmetric origins and destinations”  (with Yang Yang) [Job Market Paper] 

Abstract: This paper introduces a higher-order spatial dynamic panel data (SDPD) model for a directed origin-destination (OD) flow outcome variable with the quasi-maximum likelihood (QML) estimation method. The model can capture both the intra-temporal and inter-temporal spatial interactions among the origins, among the destinations, and from neighbors to the origins to neighbors of the destination. We extend the traditional symmetric bilateral design from traditional gravity models by allowing for an asymmetric origins and destinations structure to accommodate broader empirical needs in the studies of unilateral flows, effects of flows from one selected direction, and/or the effects of net flows, etc. A variety of specifications of fixed effects have been set forth to account for the three-dimensional nature and the unobserved heterogeneity of a flow variable in the panel data setting. A direct approach for directly estimating the fixed effects and a data transformation approach to remove the time-variant effects have been proposed. We establish the consistency and asymptotic distribution as well as the analytic bias correction procedure for the QML estimators. We then perform Monte Carlo experiments to investigate their finite sample performance. Moreover, by applying our model to study U.S. State-to-State migration flows from 1991 to 2019, we detect three channels of contemporaneous interactions, three channels of diffusion effects, and a moderate persistence in this type of OD flows.

2.“Addressing endogeneity issues in a spatial autoregressive model using copulas” (with Yichun Song) [Revise and Resubmit at Journal of Econometrics]

Abstract: We propose a new semiparametric copula method to tackle possible endogeneity issues in a spatial autoregressive (SAR) model, which might originate from an endogenous spatial weights matrix or endogenous regressors. Using copula endogeneity correction technique, we derive three-stage estimation methods and establish their consistency and asymptotic normality. We then perform Monte Carlo experiments to investigate the finite sample performance of the proposed maximum likelihood (ML) estimator and the instrumental variable (IV) estimator. Moreover, we apply our methods to an empirical study of spatial spillovers in regional productivity with endogenous spatial weights constructed by the proximity of a “meaningful” socioeconomic characteristic - years of education.

3.“An efficient residual-adjusted two-step estimator for a SARAR model” (with Lung-Fei Lee and Yang Yang) [Revise and Resubmit at Econometric Reviews]


Abstract: In this paper, we derive an efficient two-step estimator for a spatial autoregressive (SAR) model with SAR disturbances (SARAR). With separate consistent estimates obtained from the first step, a constructed residual-adjusted estimator is employed in the second step estimation motivated by Hatanaka (1974, 1976) and Dhrymes (1974). The proposed estimator is computationally simple yet can be asymptotically equivalent to the QML estimator (QMLE) or the best GMM estimator (BGMME) in Liu et al. (2010). Compared with the iterative BGMM and QML approaches, it also has the merit of reduction in computation time with an exceptional performance for large sample since the second stage estimator avoids any numerical issues encountered in optimization iterations and resolve the computation complexities. Monte Carlo experiments and an empirical study of spatial spillovers in U.S. county-level homicide rates verify the advantages of our numerical procedure in small and relatively large samples.

4.“Copula joint estimation for spatial dynamic panel data models with endogeneity issues” (with Yichun Song

Manuscript, Supplementary file

Abstract: Spatial dynamic panel data (SDPD) models might have endogeneity issues when the spatial weights are constructed from (time-varying) socioeconomic characteristics or when the regressors other than the spatial lag, dynamic spatial lag and dynamic time lag terms are correlated with the error term. This paper proposes a semiparametric copula endogeneity correction technique to handle the endogeneity issues in SDPD models without the need to find excluded instrumental variables or knowing the exact equation structures of the endogenous variables compared with the control function approach. We present the model specification and derive a three-stage estimation method, including a first stage nonparametric estimation, a second stage ordinary least squares estimation and a third stage maximum likelihood (ML) estimation. The consistency and asymptotic normality of the proposed third stage ML estimator are rigorously established based on the asymptotic inference under the spatial-time near- epoch dependence (NED). Monte Carlo experiments are carried out to investigate the finite sample performance of the proposed estimators. We then apply our method to an empirical study, which re-evaluates the tax policy implication on gasoline consumption and carbon dioxide emission in Davis and Kilian (2011), where the gasoline prices may be endogenous. Our results confirm that people indeed response to the tax by reducing their gasoline consumption.

5.Specification and estimation of a spatial autoregressive model with nonlinear interaction effects 

Manuscript, Supplementary file

Abstract: This paper develops a nonlinear spatial autoregressive (SAR) equation system. Of particular interests are structural interaction models for spatial stochastic frontiers, and simultaneous equations derived from structural modellings such as a production network with constant elasticity of substitution (CES) technologies and/or the “dual gravity” model. It’s shown that the linear SAR model cannot fully characterize the effects brought by the nonlinear interaction structure. We then consider a quasi-maximum likelihood (QML) estimation method for this model and analyze the consistency and asymptotic normality of the QMLE based on the spatial near-epoch dependence (NED) concept. Monte Carlo experiments are designed to investigate the finite sample performance of the estimates and verify their theoretical properties. By employing cross-sectional agricultural inputs and outputs data in 1,940 counties from China, we adapt the proposed model to a spatial stochastic frontier analysis. We capture significant positive spatial coefficients, identify elasticities for agricultural inputs, estimate the absolute technical efficiencies, and uncover the total, direct and indirect relative efficiencies for all counties.

6.Social networks with heterogeneity and group fixed effects: a likelihood approach (with Yang Yang

Manuscript, Supplementary file

Abstract: This paper considers social interaction models with group fixed effects and observed heterogeneity among agents, both heterogeneous endogenous peer effects and exogenous contextual effects can be identified and estimated consistently using a likelihood approach. We establish the asymptotic properties of the quasi-maximum likelihood estimator (QMLE) and verify their finite sample performance by a Monte Carlo study. For an application, we investigate the China Education Panel Survey (CEPS) and focus on gender heterogeneity on academic achievement of Grade 8 students in junior high school. We capture significant gender disparities in peer effects from gender subgroups in a classroom.

7.QML estimation of a spatial autoregressive model with endogenous heterogeneity (with Yichun Song and Yang Yang)

Manuscript, Supplementary file

Abstract: We develop a spatial autoregressive (SAR) model with endogenous heterogeneity. The scalar spatial coefficient is allowed to be a bounded function of a spatial unit’s own characteristics. Of particular interest is a structural interaction model with nonlinear and endogenous spillover effects. We propose a quasi-maximum likelihood estimator (QMLE) for this model and investigate its asymptotic properties, which is verified to have good performance in finite sample Monte Carlo simulations. We apply our model to an empirical study of regional economic performance and detect that the regional productivity spillovers are heterogeneous and depend on an endogenous variable - years of education.

8.“WORSE THAN THE PROBLEM ITSELF”? The health and economic costs and benefits of COVID-19 policy responses (with Hanbat Jeong and Bruce Weinberg

Manuscript, Supplementary file

Abstract: We model the co-evolution of COVID-19 cases, economic activity, and all-cause mortality for large U.S. counties in 2020. Estimates show that economic activity and cases are highly persistent, implying lasting effects of policies. Economic activity increases cases, which in turn, reduces future economic activity. This dynamic relationship impedes efforts to promote economic activity and motivates efforts to reduce cases. Simulations show that the optimal long-run social distancing policy involves strict mask requirements. Optimal policies can involve a lockdown despite their costs because of the long-run benefits of controlling cases. The optimal strictness of lockdowns typically increases in population and network centrality. Simulations further show the staggering cost of suboptimal policies – 10% less economic activity and 6000 times as many cases in December 2020 compared to those under optimal policies.

9.Copy to China? foreign venture capital, network spillover, and local innovation (with Tingfan Gao and Shixun Wang


In this paper, we analyze not only the direct but also the network effects of foreign venture capital investment on the innovation of Chinese investee companies based on 12,819 Chinese venture capital transaction data, matched with patent application data and commercial registration information of investee companies. From the patent application results, foreign venture capital investment directly significantly improves the patent applications of investee companies, adding an average of 1.86; the average increase in invention patents is 1.67. We further distinguish substantial innovation from strategic innovation, empirically finding that foreign venture capital investment in innovation of Chinese companies is mainly manifested in strategic innovation. The feature of venture capital is that it has built an innovation network among enterprises. Through the endogenous network model, we find strong evidence that the foreign venture capital network has a significant positive spillover effect.

10.Unveiling the common prosperity gene within Chinese clan culture: a discussion on the Piketty dilemma” (with Zhongqi Deng, Tingfan Gao and Ziqi Lu


Abstract: For millennia, Chinese clan culture has molded the distinct spiritual realm and thought patterns of the Chinese people, establishing a profound cultural bedrock for the Chinese path. Deviating from the prevalent notion of relying on income redistribution to promote common prosperity, this study delves into the cultural gene of the Chinese people’s pursuit of common prosperity and its contemporary significance. By constructing a general equilibrium model that reflects the realities of China and integrating elements of Chinese clan culture, this study reveals that the collectivist values inherent in Chinese clan culture can concurrently enhance the level of prosperity and mitigate income inequality. Chinese clan culture plays a pivotal role in narrowing the disparities between the return rate on capital, the economic growth rate, and the growth rate of worker wages, thereby mitigating or even circumventing the Piketty Dilemma. Furthermore, this study corroborates the theoretical findings through the utilization of data from China’s prefecture-level cities and spatial econometric models. The research outcomes of this study contribute Chinese wisdom to avoiding the Piketty Dilemma in the development of human society.

Work in Progress

1.“Estimating multi-level network effects with hierarchy data structures” (with Lung-Fei Lee and Yang Yang)

Brief introduction: In social sciences, data structures are often hierarchical due to the inherent multi-level nature of organizations. We would have variables describing individual units who belong to groups as well as elements characterizing these groups. Also, individual units might be affected by each other associated with networks both inside each group and across groups. As an illustration, regarding educational outcomes, data collected by surveys usually contains the attributes of a student, the class he/she belongs to, and the features of a teacher who teaches a course for this class and multiple other classes. The academic outcome of this student may be affected directly by his/her classmates and indirectly by students affiliated with other classes taught by the same teacher. When it comes to some aggregate-level data, it's also important to take the hierarchical structure and effects from multi-level networks into account. To give an example, housing price in a county can be affected not only by spillover effects from other counties within the same state, but also the counties outside the state through interactions among states. However, in recent econometrics and statistics literature, there are rare serious discussions on estimating multilevel network effects with hierarchical data structure. 

One common model to deal with hierarchical data sets is the hierarchical linear model (HLM), also called the multilevel modeling or mixed regression, introduced by Bryk and Raudenbush (1992). It is a random and heterogeneous coefficient linear regression model that allows the coefficients of individual level independent variables and the intercept term to be determined by higher-level variables. The estimation technique is similar to other random coefficient models discussed in Hsiao (1975) and Saxonhouse (1976). Although the model is popular in sociology, psychology and some other social science subjects, there are only a few applications in empirical economics research, such as Beron et al. (1999), Kayo and Kimura (2011) , etc. A significant drawback of this model is that it assumes individual units are nested in groups while ignores network effects inside each group and across groups. Intuitively, this assumption is not proper since it rules out any possible interactions among individual units.

A favored model for the estimation of network effect is the spatial autoregressive (SAR) model. Ever since the extension from the time series autocorrelation to the spatial literature by Cliff and Ord (1973), the SAR model has been widely used by empirical researchers in fields like regional science and urban economics and public economics to measure the dependence among spatial units. Furthermore, Lee (2007) and Lin (2010) incorporate group fixed effects into the SAR model for the identification of peer effects within group assignments, which successfully solve the reflection problem pointed out by Manski (1993). Recently, there are some papers considering more complex model settings, among which are the endogenous networks and random group-level effects in Johnsson and Moon (2021), Kuersteiner et al. (2023) and so on. Several geography and geostatistics studies, such as Dong and Harris (2015), try to extend the SAR model in order to account for hierarchical geographical data structures. Nonetheless, the proposed models are similar to previous SAR models in estimating peer effects that have a single spatial weighting matrix for correlations among lower geographical units and add random or fixed effects for higher-level geographical units. The limitations of those existing models are also clear: indirect network effects across higher-order groups and cross effects between lower- and higher- level variables are ignored. Even if individual units/groups are randomly assigned, there might still exist interactions across them. Without the control of this indirect networks at a high level, we might have miss-specification issues.

In this paper, we aim to find a better solution to estimate network effects when the data structure is hierarchical with two levels - one level for the individual units and another one for the groups. We employ the fundamental idea of the HLM, i.e., the effects from the individual-level independent variables can be correlated with the group-level independent variables. In addition, we introduce two types of network effects into the individual-level regression function: one is the direct network effect from members in the same group, which has the form of a SAR term, and the other is the indirect network effects from members outside the group, which is formed by group- level networks. Moreover, a multi-level network structure on the disturbance term at the individual level is included to capture unobserved common shock and its network spillover.

2.“Sieve estimation for Gaussian copula endogeneity corrections of a SAR model”

3.“Do human capital spillovers of China’s Big Tech promote the integration of data and reality?” (with Tingfan Gao and Shixun Wang

4.“Entrepreneurs’ social networks and corporate performance” (with Tingfan Gao and Shixun Wang

Research Awards

2023 Tom Kniesner and Debbie Freund Award, Department of Economics, The Ohio State University

2022 G.S. Maddala Prize in Econometrics, Department of Economics, The Ohio State University