Federated Learning

Federated learning is a machine learning approach that enables training models across distributed devices without centralizing data. Instead of moving data to the model, the model travels to the data.

Overview

In traditional machine learning, data is collected from various sources and aggregated in a central location for training. Federated learning inverts this paradigm: the model is distributed to edge devices, trained locally on private data, and only the model updates are sent back to aggregate into a global model.

This approach was pioneered by Google Research and first deployed at scale in Gboard, Google's mobile keyboard, to improve next-word prediction while keeping user typing data on-device [McMahan et al., 2017].

How It Works

The Federated Learning Cycle

Initialization: A central server initializes a global model with random or pre-trained weights
Selection: The server selects a subset of eligible devices to participate in a training round
Distribution: Selected devices download the current global model
Local Training: Each device trains the model on its local data for several epochs
Upload: Devices compute and upload model updates (weight deltas or full weights)
Aggregation: The server aggregates updates using an algorithm like FedAvg
Update: The server updates the global model and the cycle repeats

Key Terminology

Round: One complete iteration of the federated learning cycle
Global Model: The aggregated model maintained by the server
Local Update: Weight changes computed by a device during local training
Cohort: The subset of devices selected for a particular round
Aggregation: The process of combining local updates into a global model update

Federated Averaging (FedAvg)

EdgeML uses Federated Averaging, the most widely-adopted aggregation algorithm for federated learning.

System Architecture

Mathematical Foundation

Given K devices with local datasets D₁, D₂, ..., D_K and model weights w, FedAvg computes:

w_t+1 = Σ(n_k / n) * w_k^t

where:
- w_t+1 is the new global model at round t+1
- n_k is the number of samples on device k
- n is the total number of samples across all devices
- w_k^t is the locally trained weights from device k at round t

This weighted average gives devices with more training data proportionally more influence on the global model, which empirically produces better convergence [McMahan et al., 2017].

Why Weighted Averaging Matters

Simply averaging model weights (unweighted) treats all devices equally, regardless of how much data they have. A device with 10 samples would have the same influence as one with 10,000 samples, leading to:

Slower convergence: The model takes longer to learn meaningful patterns
Bias toward small datasets: Devices with little data can skew the global model
Poor generalization: The model may not represent the true data distribution

Weighted averaging ensures that devices contribute proportionally to their data volume, leading to faster convergence and better model quality.

Why EdgeML Uses Federated Learning

Privacy by Design

Raw user data never leaves the device. Only model updates are transmitted, which:

Prevents central data breaches from exposing user information
Complies with privacy regulations (GDPR, CCPA, HIPAA)
Builds user trust through transparent data practices

Access to Distributed Data

Many valuable datasets cannot be centralized due to:

Privacy regulations: Healthcare data (HIPAA), financial data (PCI-DSS)
Legal restrictions: Cross-border data transfer limitations
Practical constraints: Network bandwidth, storage costs, data sovereignty

Federated learning enables training on this distributed data without moving it.

Real-World Learning

Models learn from actual usage patterns in production environments:

Natural distribution: Training data reflects real user behavior
Diverse contexts: Models see data from varied geographic, demographic, and temporal contexts
Continuous improvement: Models adapt to evolving usage patterns

Challenges and Tradeoffs

Communication Costs

Federated learning requires multiple rounds of model distribution and update collection. EdgeML optimizes this through:

Delta compression: Sending only weight changes, not full models
Update frequency control: Configurable rounds and local epochs
Model format optimization: ONNX, TFLite, and CoreML for efficient serialization

Heterogeneous Devices

Edge devices have varying computational capabilities, battery levels, and network conditions. EdgeML handles this through:

Device selection: Only devices meeting eligibility criteria participate
Asynchronous rounds: Devices don't need to synchronize perfectly
Graceful degradation: Partial round participation is acceptable

Non-IID Data

Unlike centralized training where data can be shuffled and balanced, federated data is often non-identically and independently distributed (non-IID). A keyboard app, for example, sees different languages, writing styles, and topics per device.

FedAvg handles this reasonably well, though more advanced algorithms (FedProx, FedMA) can improve convergence on highly skewed data distributions [Li et al., 2020].

Real-World Applications

Mobile Keyboards

Use Case: Next-word prediction, autocorrect, emoji suggestions
Challenge: Typing data is extremely private
Solution: Federated learning trains on billions of user interactions without collecting keystrokes

Healthcare

Use Case: Disease prediction, medical imaging, diagnostic models
Challenge: HIPAA prohibits centralizing patient data
Solution: Hospitals collaborate on models while keeping patient records local

Financial Services

Use Case: Fraud detection, credit risk modeling
Challenge: Regulatory restrictions on sharing customer data
Solution: Banks improve models collectively without exposing transaction details

IoT and Edge Devices

Use Case: Anomaly detection, predictive maintenance
Challenge: High network costs for continuous data upload
Solution: Train on-device and send only model updates

References and Further Reading

McMahan, B., et al. (2017). "Communication-Efficient Learning of Deep Networks from Decentralized Data." AISTATS. [arXiv:1602.05629]
- Original FedAvg paper introducing federated learning and weighted averaging
Kairouz, P., et al. (2021). "Advances and Open Problems in Federated Learning." Foundations and Trends in Machine Learning. [arXiv:1912.04977]
- Comprehensive survey of federated learning research and challenges
Li, T., et al. (2020). "Federated Optimization in Heterogeneous Networks." MLSys. [arXiv:1812.06127]
- FedProx: Handling non-IID data and system heterogeneity
Bonawitz, K., et al. (2019). "Towards Federated Learning at Scale: System Design." MLSys. [arXiv:1902.01046]
- Google's production federated learning infrastructure
Yang, Q., et al. (2019). "Federated Machine Learning: Concept and Applications." ACM TIST. [Paper]
- Overview of federated learning taxonomy and applications

Next Steps

Training Rounds - Deep dive into round mechanics
Privacy Model - Understand EdgeML's privacy guarantees
Model Lifecycle - Model versioning and deployment
Quickstart Guide - Build your first federated learning application

Overview​

How It Works​

The Federated Learning Cycle​

Key Terminology​

Federated Averaging (FedAvg)​

System Architecture​

Mathematical Foundation​

Why Weighted Averaging Matters​

Why EdgeML Uses Federated Learning​

Privacy by Design​

Access to Distributed Data​

Real-World Learning​

Challenges and Tradeoffs​

Communication Costs​

Heterogeneous Devices​

Non-IID Data​

Real-World Applications​

Mobile Keyboards​

Healthcare​

Financial Services​

IoT and Edge Devices​

References and Further Reading​

Next Steps​