EWC (Elastic Weight Consolidation)
Overview
EWC (Elastic Weight Consolidation) is a regularization technique designed to solve the continual learning problem in deep learning. When a neural network learns new tasks, it preserves weights important for previous tasks to prevent catastrophic forgetting. Proposed by James Kirkpatrick et al. at DeepMind in 2017, this method measures the importance of weights using the Fisher Information Matrix and adds constraints that suppress changes to important weights.
Main Content
Background: Catastrophic Forgetting Problem
Neural networks tend to rapidly forget previously learned knowledge when trained on new data. This is because, unlike biological brains, artificial neural networks fail to resolve the stability-plasticity dilemma. EWC addresses this issue by evaluating the importance of each weight and constraining important weights from changing significantly.
Mathematical Principle
EWC approaches the problem from a Bayesian perspective. When learning a new task B, it learns in a direction that preserves the posterior probability of the previous task A. The loss function is defined as follows:
L(θ) = L_B(θ) + λ Σ_i (F_i (θ_i - θ_A,i)^2)
Here, L_B is the loss for the new task, λ is the regularization strength, and F_i is the diagonal element of the Fisher Information Matrix, representing the importance of weight i. θ_A,i is the weight value after learning task A. The Fisher Information Matrix measures the sensitivity of each weight to the output, which is related to the inverse of the weight's variance.
Algorithm Procedure
1. After learning task A, compute the Fisher Information Matrix F for each weight.
2. Store the optimal weights θ_A for task A.
3. When learning task B, add a regularization term to the loss function to constrain weights from deviating significantly from θ_A.
4. If necessary, this can be applied iteratively for multiple tasks.
Advantages and Limitations
- Advantages: Simple to implement, applicable to various network architectures, and shows excellent performance in continual learning.
- Limitations: Computing the Fisher Information Matrix incurs additional cost; as the number of tasks increases, the regularization terms accumulate, potentially degrading performance. Additionally, using a diagonal approximation that does not consider interactions between weights may reduce accuracy.
Variants and Extensions
- Online EWC: A method that incrementally updates Fisher information when tasks arrive sequentially.
- MAS (Memory Aware Synapses): Measures importance based on output changes instead of Fisher information.
- SI (Synaptic Intelligence): Dynamically computes importance by tracking the path of weight changes.
Recent Trends
As of 2024-2025, EWC remains an important baseline in the field of continual learning. Recent research focuses on combining EWC with memory-based methods (e.g., Experience Replay) to improve performance. Additionally, studies are actively applying EWC to transformer-based models (e.g., GPT, BERT) to preserve performance on previous tasks during fine-tuning. In particular, EWC has been shown to mitigate catastrophic forgetting in the continual learning of large language models (LLMs), but lightweight variants are being proposed due to computational cost issues. In 2025, meta-learning approaches based on EWC have emerged, drawing attention for methods that automatically adjust importance across multiple tasks.
Related Topics
- [[Continual Learning]]
- [[Catastrophic Forgetting]]
- [[Fisher Information Matrix]]
---
AI-generated document · Improved by the community