publications

book chapter:

Chapter 9 in Non-Academic Careers for Quantitative Social Scientists (Springer, 2023).

peer-reviewed:

A note on real estate appraisal in Brazil. (with Rodrigo Peres and Leonardo Sales). Brazilian Review of Economics, 75(1), 29-36, 2021. Brazilian banks commonly use linear regression to appraise real estate: they regress price on features like area, location, etc, and use the resulting model to estimate the market value of the target property. But Brazilian banks do not test the predictive performance of those models, which for all we know are no better than random guesses. That introduces huge inefficiencies in the real estate market. Here we propose a machine learning approach to the problem. We use real estate data scraped from 15 thousand online listings and use it to fit a boosted trees model. The resulting model has a median absolute error of 8,16%. We provide all data and source code. Data and code It’s all here.

Automated Democracy Scores. Brazilian Review of Econometrics, 37(1), 31-43, 2017. In this paper I use natural language processing to create the first machine-coded democracy index, which I call Automated Democracy Scores (ADS). I base the ADS on 42 million news articles from 6,043 different sources. The ADS cover all independent countries in the 1993-2012 period. Unlike the democracy indices we have today the ADS are replicable and have standard errors small enough to actually distinguish between cases. (I also wrote a related paper where I try a bunch of other methods - LSA, LDA, Random Forest.) Data and code I created a web app that lets anyone tweak the training data and see how the results change - without having to write any code. If you do want to see the gory details, here’s what you need to know.

Deep Learning Anomaly Detection as Support Fraud Investigation in Brazilian Exports and Anti-Money Laundering (with Ebberth Paula, Marcelo Ladeira, and Rommel Carvalho). 15th IEEE International Conference on Machine Learning and Applications (ICMLA), 2016. Here we use deep learning to detect fake Brazilian exports. Data and code Sorry, it’s company-level data and therefore protected by Brazilian privacy laws (only had access to it because co-author works at Brazil’s tax authority.)

A dimensão geográfica das eleições brasileiras (“The spatial dimension of Brazilian elections”). Opinião Pública (Public Opinion), 19(2), 270-290, 2013. Here I use spatial econometrics and the Brazilian election of 2010 to understand why neighboring counties tend to vote similarly. The preprint is in English. Data and code. I used a mix of Stata (here) and R (here) code. The dataset is here (it’s in Stata format; convert it to CSV format to run the R code). The list of missing observations is here. (To produce the plots I used GeoDa and ArcGIS, using the respective GUIs, so there’s no code for those.)

Lobby e protecionismo no Brasil contemporâneo (“Lobby and protectionism in Brazil”). Revista Brasileira de Economia (Brazilian Review of Economics), 62(3), 263-178, 2008. Here I regress tariffs on industry-level indicators of political power (economic concentration, number of workers, etc). Data and code. I ran everything almost a decade ago and back then I used Excel spreadsheets to store data (I know, I know…) and I clicked buttons instead of writing code (I didn’t know any better), so I don’t have much to offer here. The spreadsheets are all in this zipped folder.

not peer-reviewed (yet):

Predictors of long-term resistance exercise adherence among beginners (2026). With Federica Conti, Andy Galpin, and Brad Schoenfeld. Here we mine Fitbod’s data to find the predictors of long-term strength training adherence.

Insider trading in Brazil’s stock market (2021). Here I estimate the probability of insider trading for each stock in Brazil’s stock market, for each quarter from 2019Q4 through 2021Q1.

Putting a price on tenure (2021). Here I estimate how much tenure is worth in $ to the employees who benefit from it.

Using SVM to pre-classify government expenditures (2015). Here I use support vector machines (SVM) to create an app that could reduce misclassification of government purchases in Brazil. The app suggests likely categories based on the description of the good being purchased.

Ideological bias in democracy measures (2012). Here I use Monte Carlos to reassess some studies on the biases behind the Freedom House, Polity IV, etc. I find that the evidence of bias is robust but that we can’t know which measures are biased or in what direction (e.g., for all we know the Freedom House may as well have a leftist bias, contrary to popular belief). Data and code. I used a mix of Stata (here and here) and R (here) code. Here’s the data in Stata format; here’s the same data in CSV format (for the R code).

Why is democracy declining in Latin America? (2011). Here I argue that Latin America’s “left turn” in the 2000s was accompanied by democratic erosion, as the new governments that came to power relied on constituencies that did not value democracy (which in turn reduced the electoral cost of suppressing press freedom, violating term limits, etc). Data and code. Here’s the Stata do file and here’s the dta file.

newspaper articles:

O terceiro fracasso do Mercosul (“The third failure of Mercosur”). O Estado de São Paulo, 2/5/2011. Here I discuss why Mercosur failed to lock in the trade liberalization of the 1990s.

O preço de aceitar a Venezuela (“The price of accepting Venezuela”). O Estado de São Paulo, 5/28/2009. Here I discuss the policy consequences of Venezuela’s entry into Mercosur (a trade bloc comprising Brazil, Argentina, Paraguay, Uruguay, and Venezuela).

Thiago Marzagão

publications