The Holy Grail?

Abstract:

The discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X, Y . This was often considered to be impossible. Nevertheless, several approaches for addressing this bivariate causal discovery problem were proposed recently. In this paper, we present the benchmark data set CauseEffectPairs that consists of 88 different “causeeffect pairs” selected from 31 datasets from various domains. We evaluated the performance of several bivariate causal discovery methods on these real-world benchmark data and on artificially simulated data. Our empirical results provide evidence that additive-noise methods are indeed able to distinguish cause from effect using only purely observational data. In addition, we prove consistency of the additive-noise method proposed by Hoyer et al. (2009).

…and that’s how political scientists became redundant

From here:

Upload a data set, and the automatic statistician will attempt to describe the final column of your data in terms of the rest of the data. After constructing a model of your data, it will then attempt to falsify its claims to see if there is any aspect of the data that has not been well captured by its model.

enhancing R

From here:

I describe an approach to compiling common idioms in R code directly to native machine code and illustrate it with several examples. Not only can this yield significant performance gains, but it allows us to use new approaches to computing in R. Importantly, the compilation requires no changes to R itself, but is done entirely via R packages. This allows others to experiment with different compilation strategies and even to define new domain-specific languages within R. We use the Low-Level Virtual Machine (LLVM) compiler toolkit to create the native code and perform sophisticated optimizations on the code. By adopting this widely used software within R, we leverage its ability to generate code for different platforms such as CPUs and GPUs, and will continue to benefit from its ongoing development. This approach potentially allows us to develop high-level R code that is also fast, that can be compiled to work with different data representations and sources, and that could even be run outside of R. The approach aims to both provide a compiler for a limited subset of the R language and also to enable R programmers to write other compilers. This is another approach to help us write high-level descriptions of what we want to compute, not how.