DriftMoE: A Mixture of Experts Approach to Handle Concept Drifts

Miguel Aspis^*, Sebastián A. Cajas Ordoñez^*, Andrés L. Suárez-Cetrulo, Ricardo Simón Carbajo

Ireland's National Centre for Artificial Intelligence (CeADAR)
University College Dublin
Accepted as Poster, Presentation & Proceedings at SYNDAiTE@ECML-PKDD 2025 Workshop
^*Indicates Equal Contribution

PDF Code arXiv

Hugging Face

Main Findings:

We introduce the first Mixture of Experts framework specifically designed for real-time data streams with concept drift.

achieves competitive performance with fewer trees than traditional ensembles.
adapts in real-time through novel symbiotic router-expert co-training.

DriftMoE outperforms state-of-the-art methods like ARF and Leveraging Bagging while using only 12 experts instead of 100+ trees.

This summary was automatically generated by Google's NotebookLM

Architecture Overview

DriftMoE Architecture Pipeline showing symbiotic training loop

DriftMoE Architecture and Symbiotic Training Framework.

The architecture features a neural router co-trained with Hoeffding tree experts through a continuous feedback loop. The router dynamically selects experts, which then update incrementally, providing correctness feedback that refines the router's decision-making capabilities.

Abstract

Learning from non-stationary data streams subject to concept drift requires models that can adapt on-the-fly while remaining resource-efficient. Existing adaptive ensemble methods often rely on coarse-grained adaptation mechanisms or simple voting schemes that fail to optimally leverage specialized knowledge. This paper introduces DriftMoE, an online Mixture-of-Experts (MoE) architecture that addresses these limitations through a novel co-training framework.

DriftMoE features a compact neural router that is co-trained alongside a pool of incremental Hoeffding tree experts. The key innovation lies in a symbiotic learning loop that enables expert specialization: the router selects the most suitable expert for prediction, the relevant experts update incrementally with the true label, and the router refines its parameters using a multi-hot correctness mask that reinforces every accurate expert.

We evaluate DriftMoE's performance across nine state-of-the-art data stream learning benchmarks spanning abrupt, gradual, and real-world drifts. Our results demonstrate that DriftMoE achieves competitive results with state-of-the-art stream learning adaptive ensembles, offering a principled and efficient approach to concept drift adaptation.

Key Features

🎯 Symbiotic Training

Router and experts co-evolve through continuous feedback loops, enabling dynamic specialization without explicit drift detection.

⚡ Resource Efficient

Achieves competitive performance with significantly fewer base learners compared to traditional ensemble methods.

🔄 Two Variants

MoE-Data (multi-class experts) and MoE-Task (binary experts) configurations for different specialization strategies.

Motivation

Existing ensemble methods for concept drift rely on coarse-grained adaptation and fail to optimally leverage specialized knowledge.

🔒 Challenges

(1) Active methods rely on explicit drift detection which can be delayed or generate false positives
(2) Passive methods use simple voting schemes that don't leverage expert specialization
(3) Most approaches lack principled mechanisms for experts to develop specialized knowledge

🔑 Solutions

(1) Symbiotic co-training framework without explicit drift detection
(2) Neural router that dynamically assigns data to most suitable experts
(3) Multi-hot correctness mask enables principled expert specialization

Accuracy comparison across different datasets showing DriftMoE performance

Prequential accuracy comparison across nine benchmark datasets.

DriftMoE variants achieve competitive performance with state-of-the-art adaptive ensembles across synthetic and real-world streams, demonstrating robust adaptation to different types of concept drift.

Concept Drift Adaptation

DriftMoE demonstrates fast recovery after concept drift points, matching the adaptation speed of larger ensembles.

Accuracy over time for LED gradual drift dataset.

The figure shows how DriftMoE recovers quickly from concept drift points (marked by vertical lines), achieving similar recovery speed to ADWIN-equipped ensembles while using significantly fewer base learners.

Comprehensive Evaluation

Performance evaluation across multiple metrics demonstrates DriftMoE's robustness and efficiency.

Comprehensive metrics comparison including Kappa-M and Kappa-Temporal

Performance across accuracy, Kappa-M, and Kappa-Temporal metrics.

Beyond accuracy, DriftMoE maintains competitive performance on Kappa metrics that account for class imbalance and temporal dependencies, confirming its robustness across different evaluation criteria.

Experimental Results

Datasets Evaluated

Synthetic: LED (Abrupt/Gradual), SEA (Abrupt/Gradual), RBF (Moderate/Fast)
Real-world: Airlines, Electricity, CoverType

Baselines Compared

Adaptive Random Forest (ARF)
Leveraging Bagging (LevBag)
OzaBag, OzaBoost, SmoothBoost
Streaming Random Patches (SRP)

Key Findings

MoE-Data: Most robust across datasets, achieves 70.33 ± 0.18 on Airlines (best among all methods)
MoE-Task: Excels in high-frequency drift scenarios (75.45 ± 0.11 on RBFf, 88.65 ± 0.07 on RBFm)
Efficiency: Competitive with SOTA methods using significantly fewer base learners (12 vs 100+ trees)
Fast Recovery: Matches ADWIN-equipped ensemble recovery speed after concept drift

Performance Highlights

Dataset	MoE-Data	MoE-Task
Airlines	70.33 ± 0.18	60.92 ± 0.01
CovType	81.28 ± 0.75	58.28 ± 0.31
Electricity	83.76 ± 0.45	68.73 ± 0.85
LED_a	73.77 ± 0.18	71.11 ± 0.54
LED_g	73.11 ± 0.11	70.82 ± 0.38
RBF_f	61.90 ± 0.20	75.45 ± 0.11
RBF_m	79.89 ± 0.48	88.65 ± 0.07
SEA_a	89.09 ± 0.05	88.04 ± 0.09
SEA_g	88.74 ± 0.05	87.76 ± 0.04

Symbiotic Training Process

Router Selection

MoE-Data: Router selects Top-K experts based on learned weights
MoE-Task: All experts activated with binary labels

Expert Updates

Selected experts make predictions and update incrementally using Hoeffding tree learning rules

Correctness Mask

Multi-hot mask generated: m_t,i = 1 if expert i predicts correctly, 0 otherwise

Router Update

Router parameters refined using binary cross-entropy loss with correctness feedback in mini-batches

BibTeX

@article{aspis2024driftmoe,
  title={DriftMoE: A Mixture of Experts Approach to Handle Concept Drifts},
  author={Aspis, Miguel and Cajas Ordoñez, Sebastián A. and Suárez-Cetrulo, Andrés L. and Simón Carbajo, Ricardo},
  journal={arXiv preprint},
  year={2025},
  note={Ireland's National Centre for Artificial Intelligence (CeADAR), University College Dublin},
  conference={Accepted at the SYNDAiTE@ECMLPKDD 2025 workshop}
}