Skip to main content

Federated Learning

Federated learning is a machine learning approach that enables training models across distributed devices without centralizing data. Instead of moving data to the model, the model travels to the data.

Overview

In traditional machine learning, data is collected from various sources and aggregated in a central location for training. Federated learning inverts this paradigm: the model is distributed to edge devices, trained locally on private data, and only the model updates are sent back to aggregate into a global model.

This approach was pioneered by Google Research and first deployed at scale in Gboard, Google's mobile keyboard, to improve next-word prediction while keeping user typing data on-device [McMahan et al., 2017].

How It Works

The Federated Learning Cycle

  1. Initialization: A central server initializes a global model with random or pre-trained weights
  2. Selection: The server selects a subset of eligible devices to participate in a training round
  3. Distribution: Selected devices download the current global model
  4. Local Training: Each device trains the model on its local data for several epochs
  5. Upload: Devices compute and upload model updates (weight deltas or full weights)
  6. Aggregation: The server aggregates updates using an algorithm like FedAvg
  7. Update: The server updates the global model and the cycle repeats

Key Terminology

  • Round: One complete iteration of the federated learning cycle
  • Global Model: The aggregated model maintained by the server
  • Local Update: Weight changes computed by a device during local training
  • Cohort: The subset of devices selected for a particular round
  • Aggregation: The process of combining local updates into a global model update

Federated Averaging (FedAvg)

EdgeML uses Federated Averaging, the most widely-adopted aggregation algorithm for federated learning.

System Architecture

Mathematical Foundation

Given K devices with local datasets D₁, D₂, ..., D_K and model weights w, FedAvg computes:

w_t+1 = Σ(n_k / n) * w_k^t

where:
- w_t+1 is the new global model at round t+1
- n_k is the number of samples on device k
- n is the total number of samples across all devices
- w_k^t is the locally trained weights from device k at round t

This weighted average gives devices with more training data proportionally more influence on the global model, which empirically produces better convergence [McMahan et al., 2017].

Why Weighted Averaging Matters

Simply averaging model weights (unweighted) treats all devices equally, regardless of how much data they have. A device with 10 samples would have the same influence as one with 10,000 samples, leading to:

  • Slower convergence: The model takes longer to learn meaningful patterns
  • Bias toward small datasets: Devices with little data can skew the global model
  • Poor generalization: The model may not represent the true data distribution

Weighted averaging ensures that devices contribute proportionally to their data volume, leading to faster convergence and better model quality.

Why EdgeML Uses Federated Learning

Privacy by Design

Raw user data never leaves the device. Only model updates are transmitted, which:

  • Prevents central data breaches from exposing user information
  • Complies with privacy regulations (GDPR, CCPA, HIPAA)
  • Builds user trust through transparent data practices

Access to Distributed Data

Many valuable datasets cannot be centralized due to:

  • Privacy regulations: Healthcare data (HIPAA), financial data (PCI-DSS)
  • Legal restrictions: Cross-border data transfer limitations
  • Practical constraints: Network bandwidth, storage costs, data sovereignty

Federated learning enables training on this distributed data without moving it.

Real-World Learning

Models learn from actual usage patterns in production environments:

  • Natural distribution: Training data reflects real user behavior
  • Diverse contexts: Models see data from varied geographic, demographic, and temporal contexts
  • Continuous improvement: Models adapt to evolving usage patterns

Challenges and Tradeoffs

Communication Costs

Federated learning requires multiple rounds of model distribution and update collection. EdgeML optimizes this through:

  • Delta compression: Sending only weight changes, not full models
  • Update frequency control: Configurable rounds and local epochs
  • Model format optimization: ONNX, TFLite, and CoreML for efficient serialization

Heterogeneous Devices

Edge devices have varying computational capabilities, battery levels, and network conditions. EdgeML handles this through:

  • Device selection: Only devices meeting eligibility criteria participate
  • Asynchronous rounds: Devices don't need to synchronize perfectly
  • Graceful degradation: Partial round participation is acceptable

Non-IID Data

Unlike centralized training where data can be shuffled and balanced, federated data is often non-identically and independently distributed (non-IID). A keyboard app, for example, sees different languages, writing styles, and topics per device.

FedAvg handles this reasonably well, though more advanced algorithms (FedProx, FedMA) can improve convergence on highly skewed data distributions [Li et al., 2020].

Real-World Applications

Mobile Keyboards

  • Use Case: Next-word prediction, autocorrect, emoji suggestions
  • Challenge: Typing data is extremely private
  • Solution: Federated learning trains on billions of user interactions without collecting keystrokes

Healthcare

  • Use Case: Disease prediction, medical imaging, diagnostic models
  • Challenge: HIPAA prohibits centralizing patient data
  • Solution: Hospitals collaborate on models while keeping patient records local

Financial Services

  • Use Case: Fraud detection, credit risk modeling
  • Challenge: Regulatory restrictions on sharing customer data
  • Solution: Banks improve models collectively without exposing transaction details

IoT and Edge Devices

  • Use Case: Anomaly detection, predictive maintenance
  • Challenge: High network costs for continuous data upload
  • Solution: Train on-device and send only model updates

References and Further Reading

  1. McMahan, B., et al. (2017). "Communication-Efficient Learning of Deep Networks from Decentralized Data." AISTATS. [arXiv:1602.05629]

    • Original FedAvg paper introducing federated learning and weighted averaging
  2. Kairouz, P., et al. (2021). "Advances and Open Problems in Federated Learning." Foundations and Trends in Machine Learning. [arXiv:1912.04977]

    • Comprehensive survey of federated learning research and challenges
  3. Li, T., et al. (2020). "Federated Optimization in Heterogeneous Networks." MLSys. [arXiv:1812.06127]

    • FedProx: Handling non-IID data and system heterogeneity
  4. Bonawitz, K., et al. (2019). "Towards Federated Learning at Scale: System Design." MLSys. [arXiv:1902.01046]

    • Google's production federated learning infrastructure
  5. Yang, Q., et al. (2019). "Federated Machine Learning: Concept and Applications." ACM TIST. [Paper]

    • Overview of federated learning taxonomy and applications

Next Steps