measuring statism

When it comes to economic freedom, Brazil ranks a shameful 144th - behind former Soviet republics like Ukraine (134th) and Uzbekistan (114th). Down here the recently announced PlayStation 5 is going to sell for twice the price it is sold in the US, because decades ago some academics and bureaucrats decided that heavy taxes on videogames would make industrialists manufacture “useful” stuff instead (cars and whatnot). We even try to regulate the work of people who watch parked cars for a living, if you can believe that (I kid you not).

But how does that interventionist appetite vary across economic activities?

People try to answer that question in all sorts of ways. They look at how much money each industry spends on lobbying. They look at how often lobbists travel to the country’s capital. They look at how well-funded each regulatory agency is. They count the number of words in each industry’s regulations (RegData is a cool example; Letícia Valle is doing something similar for Brazil).

Here I’ll try something else: I’ll look at the companies that get mentioned in the Diário Oficial da União, which is the official gazette where all the acts of the Brazilian government are published. I’ll look up what each company does (mining? banking? retail? etc) so that I can aggregate company mentions by economic sector. That will let me know how much government attention each sector receives.

(Yes, there are lots of caveats. We’ll get to them in due time - chill out, reviewer #2.)

mining the Diário Oficial da União

Each company in Brazil has a unique identifier. It’s like a Social Security Number, but for organizations. It’s a 14-digit number; we call it CNPJ (Cadastro Nacional de Pessoas Jurídicas). Here is an example: 20.631.209/0001-60. When a company is mentioned in the Diário Oficial da União, that company’s CNPJ is there.

I had already downloaded the Diário Oficial da União, for something else. Well, not all of it: the Diário was launched in 1862, but the online version only goes back to 2002. So here we have about 18 years of government publications.

To get all CNPJs mentioned in the Diário I used this regular expression:


I didn’t mine the whole Diário. The Diário is divided into three sections: section 1, which has laws and regulations; section 2, which has personnel-related publications (promotions, appointments, etc); and section 3, which has procurement-related publications (invitations for bids, contracts, etc). I only mined section 1, as the other sections would merely add noise to our analysis. Just because your company won a contract to supply toilet paper to some government agency doesn’t mean your industry is regulated.

That regular expression resulted in 1.5 million matches (after dropping matches that were not valid CNPJs). In other words, section 1 of the Diário contains 1.5 million CNPJ mentions between 2002 and mid-2020 (I scraped the Diário back in July and I was too lazy to scrape the rest of it now).

The result was a table that looks like this:

date cnpj
2010-09-28 39302369000194
2010-09-28 39405063000163
2010-09-28 60960994000110
2010-09-29 31376361000160
2010-09-29 76507706000106
2010-09-29 08388911000140

That’s it for the Diário part. Now on to economic sectors.

from CNPJs to CNAEs

The Brazilian government has a big list of economic activities, and assigns to each of them a 5-digit numeric code. Hence 35140 is “distribution of electric energy”, 64212 is “commercial banks”, and so on. We call that taxonomy the CNAE (Classificação Nacional de Atividades Econômicas). There are some 700 CNAE codes in total.

(The CNAE is similar to the International Standard Industrial Classification of All Economic Activities - ISIC. You can find CNAE-ISIC correspondence tables here).

When you start a company in Brazil you’re required to inform the government what type of economic activity your company will be doing, and you do that by informing the appropriate CNAE code. That means each CNPJ is associated with a CNAE code. Fortunately, that data is publicly available.

I parsed that data to create a big CNPJ->CNAE table. If you want to do that yourself you need to download all the 20 zip files, unzip each of them, then run something like this:

import os
import pandas as pd

# path to the unzipped data files
path_to_files = '/Users/thiagomarzagao/Desktop/data/cnaes/'

# position of the CNPJ and CNAE fields
colspecs = [
    (4, 17), # CNPJ
    (376, 382), # CNAE

# fix indexing
colspecs = [(t[0] - 1, t[1]) for t in colspecs]

# set field names and types
names = {
    'CNPJ': str,
    'CNAE': str,

# load and merge datasets
df = pd.DataFrame([])
for fname in os.listdir(path_to_files):
    if ('.csv' in fname) or ('.zip' in fname):
    df_new = pd.read_fwf(
        path_to_files + fname, 
        skiprows = 1, 
        skipfooter = 1, 
        header = None, 
        colspecs = colspecs, 
        names = list(names.keys()), 
        dtype = names
    df = df.append(df_new)

# drop CNPJS that don't have a valid CNAE
df = df.dropna()
df = df[df['CNAE'] != '0000000']

# drop duplicates
df = df.drop_duplicates()

# save to csv
df.to_csv(path_to_files + 'cnpjs_to_cnaes.csv', index = False)

I joined the table produced by that script with the table I had created before (with dates and CNPJs - see above). The result was something like this:

date cnpj cnae
2010-09-28 39302369000194 85996
2010-09-28 39405063000163 46192
2010-09-28 60960994000110 58115
2010-09-29 31376361000160 80111
2010-09-29 76507706000106 32302
2010-09-29 08388911000140 80111

That’s all we need to finally learn which sectors get the most government attention!

But first a word from my inner reviewer #2.


Perhaps pet shops appear in the Diário Oficial da União a lot simply because there a lot of pet shops - and not because pet shops are heavily regulated. Also, the Diário Oficial da União only covers the federal government - which means that I am ignoring all state and municipal regulations/interventions. (Some nice folks are trying to standardize non-federal publications; they could use your help if you can spare the time.) Finally, each CNPJ can be associated with multiple CNAE codes; one of them has to be picked as the “primary” one, and that’s the one I’m using here, but it’s possible that using each CNPJ’s secondary CNAE codes might change the results.

This whole idea could be completely bonkers - please let me know what you think.

statism across economic sectors

Here are the 30 economic sectors whose companies most often show up in the Diário. Mouse over each bar to see the corresponding count.

(The description of some sectors is shortented/simplified to fit the chart. Sometimes the full description includes lots of footnotes and exceptions - “retail sale of X except of the A, B, and C types”, that sort of thing.)

Wow. Lots to unpack here.


The top result - “retail sale of pharmaceutical goods” - is a big surprise to me. I mean, yes, I know that selling drugs to people is a heavily regulated activity. But I thought it was the job of states or municipalities to authorize/inspect each individual drugstore. I thought the federal government only laid out general guidelines.

Well, I was wrong. Turns out if you want to open a drugstore in Brazil, you need the permission of the federal government. And that permission, if granted, is published in the Diário. Also, it falls to the federal government to punish your little pharmacy when you do something wrong - irregular advertising, improper storage of medicine, and a myriad of other offenses. Those punishments also go in the Diário.

Now, we need to keep in mind that there are a lot more drugstores than, say, nuclear power plants. I’m sure that nuclear plants are under super intense, minute regulation, but because they are rare they don’t show up in the Diário very often. So we can’t conclude that selling drugs to people is the most heavily regulated sector of the Brazilian economy.

We could normalize the counts by the number of CNPJs in each economic sector. But I think the raw counts tell an interesting story. The raw counts give us a sense of how “busy” the state gets with each economic sector. I’m sure that nuclear plants are more heavily regulated than drugstores, but I bet that a lot more bureaucrat-hours go into regulating drugstores than into regulating nuclear plants.

NGOs (sort of)

The second most frequent category - “social rights organizations” - corresponds to what most people call non-governmental organizations. Except that this is Brazil, where NGOs are not really NGOs: they receive a lot of funding from the state and they perform all kinds of activities on behalf of the state. Which explains why CNPJs belonging to NGOs (well, “NGOs”) have appeared over 90 thousand times in the Diário since 2002.

I looked into some of the cases. There are NGOs receiving state funding to provide healthcare in remote areas; to provide computer classes to kids in poor neighborhoods; to fight for the interests of disabled people; just about anything you can think of. Brazil has turned NGOs into government agencies. Our non-governmental organizations are independent from the Brazilian government in the same way that the Hitlerjugend was independent from the Reich.

Needless to say, NGOs are often at the center of corruption scandals. People come up with a pretty name and a CNPJ, apply for funding to do some social work, and then just pocket the money.


As if government-funded NGOs weren’t embarassing enough, a good chunk of the performing arts in Brazil is also government-funded. Hence the third and fifth most frequent categories here: “performing arts, concerts, dancing” and “movies, videos, and TV shows”.

You don’t have to be a good producer. As long as you are in the good graces of the government, you can apply for funding from the Ministry of Culture and use it to do your play/movie/concert. That’s why CNPJs belonging to those two categories have appeared in the Diário over 90 thousand times since 2002. I did the math and that’s about fifteen CNPJs a day receiving funding to have people dancing or acting or whatnot. Mind you, that’s just the federal funding.

And that’s just the official funding. Sometimes taxpayers end up funding “cultural” productions through less-than-transparent means. For instance, back in 2009 there was a biopic about Lula da Silva, who happened to be the president at the time. Well, turns out that 12 of the 17 companies that invested in the production of the movie had received hundreds of millions of dollars in government contracts. Neat, right?

Every now and then a good movie or play comes out of it. If you haven’t seen Tropa de Elite yet you should stop whatever you’re doing and go watch it (it’s on Netflix). But nearly all productions are flops. For every Tropa de Elite there are thirty Lula, Filho do Brasil.

If you want to have a taste of how bad most productions are, here is a teaser of Xuxa e os Duendes, which pocketed a few million bucks in taxpayers money. Trust me, you don’t need to understand Portuguese to assess the merits of the thing:

Meanwhile, we have over 40 thousand homicides a year, only a tiny fraction of which get solved. But what do I know, maybe Xuxa e os duendes is the sort of thing Albert Hirschman had in mind when he talked about backward and forward linkages.

I leave the analysis of the other categories as an exercise for the reader. If you want to see the full results, it’s here.

to do

It would be interesting to see how these counts by state or municipality; how these counts correlate with other measures of statism; how they change over the years; and so on.

That’s it. Remember, folks: these are the people in charge of public policy.