…and that’s how political scientists became redundant

From here:

Upload a data set, and the automatic statistician will attempt to describe the final column of your data in terms of the rest of the data. After constructing a model of your data, it will then attempt to falsify its claims to see if there is any aspect of the data that has not been well captured by its model.

enhancing R

From here:

I describe an approach to compiling common idioms in R code directly to native machine code and illustrate it with several examples. Not only can this yield significant performance gains, but it allows us to use new approaches to computing in R. Importantly, the compilation requires no changes to R itself, but is done entirely via R packages. This allows others to experiment with different compilation strategies and even to define new domain-specific languages within R. We use the Low-Level Virtual Machine (LLVM) compiler toolkit to create the native code and perform sophisticated optimizations on the code. By adopting this widely used software within R, we leverage its ability to generate code for different platforms such as CPUs and GPUs, and will continue to benefit from its ongoing development. This approach potentially allows us to develop high-level R code that is also fast, that can be compiled to work with different data representations and sources, and that could even be run outside of R. The approach aims to both provide a compiler for a limited subset of the R language and also to enable R programmers to write other compilers. This is another approach to help us write high-level descriptions of what we want to compute, not how.

competition

So, I’m no longer the only one using machine learning in order to measure democracy – I just found out about this initiative. It looks interesting and it made me regret not going to APSA this year. We differ in tools (they use SVM, I use Wordscores in one paper and a combination of LSA, LDA, and decision trees in another) and texts (they use human rights reports and Freedom House reports, I use 6,043 newspapers and magazines), but the spirit is the same: producing measures that are more transparent and reproducible (and eventually maybe real-time).

a comparison of programming languages in economics

Here.

Highlights:

1. C++ and Fortran are still considerably faster than any other alternative, although one needs to be careful with the choice of compiler.

2. C++ compilers have advanced enough that, contrary to the situation in the 1990s and some folk wisdom, C++ code runs slightly faster (5-7 percent) than Fortran code.

3. Julia, with its just-in-time compiler, delivers outstanding performance. Execution speed is only between 2.64 and 2.70 times the execution speed of the best C++ compiler.

4. Baseline Python was slow. Using the Pypy implementation, it runs around 44 times slower than in C++. Using the default CPython interpreter, the code runs between 155 and 269 times slower than in C++.

5. However, a relatively small rewriting of the code and the use of Numba (a just-in-time compiler for Python that uses decorators) dramatically improves Python’s performance: the decorated code runs only between 1.57 and 1.62 times slower than the best C++ executable.

6. Matlab is between 9 to 11 times slower than the best C++ executable. When combined with Mex files, though, the difference is only 1.24 to 1.64 times.

7. R runs between 500 to 700 times slower than C++. If the code is compiled, the code is between 240 to 340 times slower.

8. Mathematica can deliver excellent speed, about four times slower than C++, but only after a considerable rewriting of the code to take advantage of the peculiarities of the language. The baseline version our algorithm in Mathematica is much slower, even after taking advantage of Mathematica compilation.

rise of the machines – part 3

In part 2 we saw how to use C++ and bytecodes to program LEGO Mindstorms EV3 bricks. Now, bytecodes don’t make for human-readable scripts. And C++ scripts take ~4 times longer to write than equivalent Python or Perl scripts. So, I’ve started writing a Python module that should make life easier – I’ve called it ev3py.

Here’s the GitHub repo. For now the module is still inchoate; it only covers three basic functions (starting motors, stopping motors, and reading data from sensors) and it only works on Macs, and only via Bluetooth. But it’s a start.

Let’s see a concrete example. Say you want to start the motor on port A with power 20. If you’re using bytecodes and C++ you need to write something like this:

#include <unistd.h>
#include <fcntl.h>
#include "ev3sources/lms2012/c_com/source/c_com.h"

int main()
{    
    unsigned const char start_motor[] {13, 0, 0, 0,
        DIRECT_COMMAND_NO_REPLY,
        0, 0,
        opOUTPUT_POWER, LC0(0), LC0(1), LC1(20),
        opOUTPUT_START, LC0(0), LC0(1)};

    int bt = open("/dev/tty.EV3-SerialPort", O_RDWR);
    write(bt, start_motor, 15);
 }

With ev3py here’s how you do it:

from ev3py import ev3

mybrick = ev3()
mybrick.connect('bt')
mybrick.start_motor(port = 'a', power = 20)

So, with ev3py the code becomes human-readable and intuitive. It also becomes much faster to write. You no longer need to set message size, message counter, command type, etc.

Unlike other EV3 modules ev3py interacts with the EV3’s native firmware, so there’s no need to make the EV3 boot to a different operating system; just turn the brick on and you’re ready.

The goal is to eventually cover all EV3 capability and make ev3py work with USB and WiFi and also with Linux and Windows. I.e., something along the lines of the QUT-EV3 toolkit and the Microsoft EV3 API. If you’d like to contribute your help is much appreciated – just fork the GitHub repo and add capabilities, fix bugs, or suggest changes to the overall structure of the module.

rise of the machines – part 2

Here I show how to use C++ to communicate via Bluetooth with the LEGO Mindstorms EV3 brick (see previous post).

If you are on a Mac everything should work right away. If you are using Ubuntu or other Linux distro I think you’ll only need to change the Bluetooth part a bit (my Ubuntu laptop doesn’t have Bluetooth, so I can’t be sure). If somehow you are forced to use Windows I think you’ll need to change the Bluetooth part a lot. All the rest should be the same though.

So, you start by cloning the source code of the EV3 firmware: open up your terminal and do git clone https://github.com/mindboards/ev3sources.git Name the folder ev3sources, to make the examples below easier to run. Also, open the ev3sources/lms2012/c_com/source/c_com.h file and change the line #include "lms2012.h" in order to provide the full path to the lms2012.h file (for instance: #include "/Users/YourUsername/MyLegoProject/ev3sources/lms2012/lms2012/source/lms2012.h").

That’s all the setup you need – you are now ready to write and send commands to the EV3. Turn on your EV3, enable Bluetooth, make it discoverable (see the EV3 user guide if necessary), plug some motor to port A, fire up Xcode or whatever IDE you use, and try running the following code snippet:

#include <unistd.h>
#include <fcntl.h>
#include "ev3sources/lms2012/c_com/source/c_com.h"

int main()
{
    
    // write command to start motor on port A with power 20
    unsigned const char start_motor[] {13, 0, 0, 0,
        DIRECT_COMMAND_NO_REPLY,
        0, 0,
        opOUTPUT_POWER, LC0(0), LC0(1), LC1(20),
        opOUTPUT_START, LC0(0), LC0(1)};

    // send command to EV3 via Bluetooth
    int bt = open("/dev/tty.EV3-SerialPort", O_RDWR);
    write(bt, start_motor, 15);

    // end connection with EV3
    close(bt);
}

If everything went well you should see the motor starting.

If instead you get an authentication-related error message, download and install the official LEGO app (if you haven’t already), launch it, use it to connect to the EV3 via Bluetooth, check that it really connected, then close it. Somehow that fixes the issue for good. (I know, it’s an ugly hack, but life is short).

Now let’s deconstruct our little script. There are two steps: writing the command and sending the command. Writing the command is the hard part. As you see, it’s not as simple as, say, EV3.start_motor(port = "A", power = 20). Instead of human-readable code what we have here is something called bytecodes. In this particular example every comma-separated piece of the expression inside the inner curly braces is a bytecode – except for the LC1(20) part, which is two bytecodes (more on this in a moment). The first and second bytecodes – 13 and 0 – tell the EV3 the message size (not counting the 13 and the 0 themselves). The third and fourth bytecodes – 0 and 0 – are the message counter.

The fifth bytecode – DIRECT_COMMAND_NO_REPLY – tells the EV3 two things. First, that the instruction is a direct command, as opposed to a system command. Direct commands let you interact with the EV3 and the motors and sensors. System commands let you do things like write to files, create directories, and update the firmware. Second, DIRECT_COMMAND_NO_REPLY tells the EV3 that this is a one-way communication: just start the motor, no need to send any data back. So, the three alternatives to DIRECT_COMMAND_NO_REPLY are SYSTEM_COMMAND_NO_REPLY, DIRECT_COMMAND_REPLY, and SYSTEM_COMMAND_REPLY.

The sixth and seventh bytecodes – 0 and 0 – are, respectively, the number of global and local variables you will need when receiving data from the EV3. Here we’re using a DIRECT_COMMAND_NO_REPLY type of command, so there is no response from the EV3 and hence both bytecodes are zero.

Now we get to the command lui-même. We actually have two commands here, one after the other. The first one, opOUTPUT_POWER, sets how much power to send to the motor. The second one, opOUTPUT_START, starts the motor. Each command is followed by a bunch of local constants (that’s what LC stands for), which contain the necessary arguments. For both commands the first LC0() is zero unless you have multiple EV3 bricks (you can join up to four EV3 bricks together; that’s called a “daisy chain”). Also for both commands, the second LC0() determines the EV3 port. Here we’re using port A – hence LC0(1). Use LC0(2) for port B, LC0(4) for port C, and LC0(8) for port D. Finally, opOUTPUT_POWER takes one additional argument: the desired power. The unit here is percentages: 20 means that we want the motor to run at 20% of its maximum capacity. Unlike the other local constants, this one is of type LC1, not LC0, so it takes up two bytes (see the bytecodes.h file for more on local constants); that is why the message size is 13 even though we only have 12 comma-separated elements.

(Don’t be a sloppy coder like me: instead of having these magic numbers, declare proper variables or constants and use these instead – LC0(port), LC1(power), etc.)

Now let’s send the command we just wrote. On a Mac the way we communicate with other devices via Bluetooth is by writing to (and reading from) tty files that live in the \dev folder (these are not actual files, but file-like objects). If you inspect that folder you will see one tty file for every Bluetooth device you have paired with your computer: your cell phone, your printer, etc. The EV3 file is called tty.EV3-SerialPort. (If you’re curious, here’s all the specs and intricacies of how Bluetooth is implemented on a Mac.)

So, to send the command we wrote before to the EV3 via Bluetooth we open the tty.EV3-SerialPort file (line 16), write the command to it (line 17), and close it (line 20).

That’s it, you can now use C++ to control the EV3 motors.

Just so you know, your command is automatically converted to hexadecimal format before being sent to the EV3 (those LC()s are macros that make the conversion). In other words, your EV3 will not receive {13, 0, 0, 0, DIRECT_COMMAND_NO_REPLY, 0, 0, opOUTPUT_POWER, LC0(0), LC0(1), LC1(20), opOUTPUT_START, LC0(0), LC0(1)}. It will receive \x0D\x00\x00\x00\x80\x00\x00\xA4\x00\x01\x81\x14\xA6\x00\x01 instead. The mapping is provided in the bytecodes.h file. For instance, DIRECT_COMMAND_NO_REPLY is 0x80, opOUTPUT_POWER is 0xA4, and so on.

If you prefer you can hardcode the hexadecimals. This produces the exact same outcome:

#include <unistd.h>
#include <fcntl.h>
#include "ev3sources/lms2012/c_com/source/c_com.h"

int main()
{
    
    // write command to start motor on port A with power 20
    char start_motor[] = "\x0D\x00\x00\x00\x80\x00\x00\xA4\x00\x01\x81\x14\xA6\x00\x01";

    // send command to EV3 via Bluetooth
    int bt = open("/dev/tty.EV3-SerialPort", O_RDWR);
    write(bt, start_motor, 15);

    // end connection with EV3
    close(bt);
}

If you master the hexadecimals you can use any language to communicate with the EV3. For instance, in Python you can do this:

# write command to start motor on port A with power 20
start_motor = '\x0D\x00\x00\x00\x80\x00\x00\xA4\x00\x01\x81\x14\xA6\x00\x01' + '\n'

# send command to EV3 via Bluetooth
with open('/dev/tty.EV3-SerialPort, mode = 'w+', buffering = 0) as bt:
    bt.write(start_motor)

All right then. Now, how do we get data back from the EV3? Well, it’s the reverse process: instead of writing to tty.EV3-SerialPort we read from it. The trick here is to find the sensor data amidst all the other stuff that the EV3 sends back to your computer, but we’ll get there (btw, I’m grateful to the good samaritan who showed me how to do this). To make matters more clear, plug some sensor on port 1 and try running this code:

#include <unistd.h>
#include <fcntl.h>
#include <iostream>
#include "ev3sources/lms2012/c_com/source/c_com.h"

int main()
{
    
    // read sensor on port 1
    unsigned const char read_sensor[] {11, 0, 0, 0,
        DIRECT_COMMAND_REPLY,
        1, 0,
        opINPUT_READ, LC0(0), LC0(0), LC0(0), LC0(0), GV0(0)};

    // send command to EV3 via Bluetooth
    int bt = open("/dev/tty.EV3-SerialPort", O_RDWR);
    write(bt, read_sensor, 13);

    // receive data back from EV3 via Bluetooth
    unsigned char sensor_data[255];
    read(bt, sensor_data, 255);
    for(int i=0; i<255; i++) {
        printf("%x", sensor_data[i]);
    }
    
    // end connection with EV3
    close(bt);
}

The structure of the code is pretty similar to what we had before. The first change is that now our command type is no longer DIRECT_COMMAND_NO_REPLY but DIRECT_COMMAND_REPLY, as we now want to receive data from the EV3. The second change is the sixth bytecode, which is now 1. That means we are now requesting one global variable – we’ll need it to store the sensor data.

The third change is of course the command itself, which is now opINPUT_READ. Its arguments are, in order: the EV3 brick (usually 0, unless you have multiple bricks), the port number (minus 1), the sensor type (0 = don’t change the type), and the sensor mode (0 = don’t change the mode). GV0 is not an argument, but the global variable where the sensor data will be stored. Like the motor power, the data we will get back will be in percentage (alternatively, there is an opINPUT_READSI command that returns the data in SI units).

The fourth change is that we now have a new code block. Its first line – unsigned char sensor_data[255] – creates an array of size 255, with all values initialized to zero. The size is 255 because at this point we don’t know exactly what the actual size of the received data will be, so we want to be safe: the data will be in hexadecimal format, so 255 is about as large as it gets (just as with the data we send, the first two bytes of the data we receive tell us how large the message is – but we can only count up to 255 with two bytes in hexadecimal format, so 255 is the limit here). The second line receives the data and the for loop prints each byte to the screen.

If everything went well you should see as output something like 400021F00000… Try it a couple more times, moving the sensor around in-between. You will notice that the first five digits or so don’t change, and neither do all the others after the sixth or seventh digit. For instance, your results will look like 400023D00000… or 400025B00000… Only two digits or so will change. That is your sensor data! In these three examples, for instance, your data points are 1F, 3D, and 5B. That’s hexadecimal format; in decimal format that means 31, 61, and 91 (here’s a conversion table). Now, once you’ve figured out what the relevant digits are you can get rid of that loop and print only them (say, printf("%x", sensor_data[5]);).

That’s it! Now you can control the motors and read the sensors – that should help you get started. If you inspect the c_com.h files you will see lots of other commands, some of them with usage examples. The way forward is by exploring the firmware code and by trial and error.

Happy building!