z

Risk forecasting

How CrunchDAO is Advancing Causal Discovery Research at ADIA Lab

Background

Machine learning excels at finding correlations, however, discovering true causal relationships remains a challenge. This costs billions across healthcare, economics, and social sciences, where mistaking correlation for causation leads to failed interventions and wasted resources.

Causal discovery involves uncovering the causal structure that governs relationships among variables from observational data alone. The challenge becomes constructing causal directed acyclic graphs (DAGs) that reveal how variables truly influence each other, moving beyond statistical associations to understand the underlying mechanisms driving observed phenomena.

ADIA Lab, Abu Dhabi's premier AI research institute, recognized that advancing causal discovery at scale required diverse approaches. While their internal scientists possessed deep domain expertise, the complexity of developing robust causal inference algorithms could benefit from fresh perspectives.

The Challenge

ADIA Lab partnered with CrunchDAO to launch their Causal Discovery Challenge, building on their previous collaboration.

"After the success of our initial competition last year, which saw more than 5000 submissions, we are excited to continue our partnership with Crunch Lab," said Dr. Horst Simon, Director of ADIA Lab. "This year we are focused on the crucial topic of Causal AI, which I'm sure will appeal to many of CrunchDAO's 5,000 data scientists and 600 PhDs."

The challenge presented participants with 47,000 datasets to discover causal directed acyclic graphs (DAGs) representing true cause-and-effect relationships between variables.

Each dataset contained a treatment variable (X) and effect variable (Y) with additional variables playing different causal roles:

  • Confounders influencing both treatment and effect
  • Mediators lying on the causal pathway
  • Colliders receiving effects from multiple sources
  • Independent variables with no causal relationship

Participants had to predict whether causal relationships existed between each variable pair in the test datasets. Success was measured by how accurately they identified each variable's role in relation to the X→Y causal pathway.

Solution

The decentralized approach leveraged collective intelligence from our community to explore diverse causal discovery methodologies simultaneously. This generated breakthrough approaches that would be unlikely to emerge from any single research team within the same timeframe.

Some of the top contributions included:
  1. Hicham Hallak, an engineer from École Centrale developed an end-to-end deep learning solution using graph neural networks where edge features classified nodes. The model achieved 76.7% balanced accuracy by processing scatter plot data through residual convolutional blocks, self-attention layers, and specialized classification heads.
  2. Mutian Hong from ShanghaiTech University and Guoqin Gu from Xiamen University achieved 74.06% accuracy through extensive feature engineering with innovative data augmentation. Their approach combined correlation-based features, information-theoretic features, causal discovery algorithms (ExactSearch, PC, FCI, GRaSP), regression coefficients, and structural equation modeling outputs.
  3. Alex KC achieved 72.88% accuracy through extensive feature engineering with 923 features across multiple categories. The approach combined correlation statistics, conditional independence measures, causal discovery algorithms (GES, BES, DirectLiNGAM), and conditional mutual information features estimated using nearest-neighbor approaches. A final LGBM classifier integrated all features to predict causal relationships.
  4. Hoàng Thiên Nữ reached 70.37% accuracy using polynomial transformations, ANM-based features, and AutoML optimization. The solution employed SMOTE oversampling and 5-fold cross-validation, with AutoGluon achieving superior performance over manually tuned models.

The competition generated production-ready algorithms achieving accuracy scores up to 76.7%, significantly outperforming some traditional methods.

Impact

The challenge engaged researchers from across the world, allowing ADIA Lab to access global talent pools while maximizing research investment returns through the $100,000 prize pool distributed across ten winners.

Winners developed methods that substantially advanced the state-of-the-art in causal discovery while maintaining computational efficiency for real-world applications. As noted in post-competition analysis, ensembling these different approaches could potentially achieve accuracy scores above 80%

Conclusion

The ADIA Lab Causal Discovery Challenge solved fundamental problems in machine learning through collective intelligence. It generated breakthrough causal discovery algorithms achieving 76.7% accuracy, far exceeding traditional methods traditionally require massive internal teams and significant capital investment.

The success validates how decentralized research is establishing a new model for advancing causal discovery research.