Privacy Model

EdgeML is built so that sensitive data never leaves the device. The platform is designed around data minimization and local training.

Data Flow Architecture

Privacy Guarantees

What stays on-device

Raw user data (texts, images, sensor data)
Feature extraction results
Local training batches

What is shared

Model weight updates (or deltas)
Training metadata (sample counts, basic metrics)

Update Privacy Comparison

Threat model (MVP)

Prevent server-side collection of raw data
Reduce data exposure by keeping training local
Provide audit trails for model changes

Privacy-Preserving Mechanisms

1. Local Training

All training happens on the device, never in the cloud:

def train_locally(base_model):
    """Training function runs on-device only"""
    model = load_model(base_model)

    # Local data NEVER leaves the device
    local_data = load_local_dataset()  # Private user data

    # Train locally
    for epoch in range(3):
        for batch in local_data:
            loss = model.train_step(batch)

    # Return only model weights, NOT data
    return model.state_dict()

Privacy benefit: Raw data (images, text, sensor readings) never transmitted over the network.

2. Weight Updates Only

EdgeML transmits only model parameters, not training data:

# What gets uploaded:
{
    "model_id": "fraud-detector",
    "version": "1.0.0",
    "updates": {
        "layer1.weight": [...],  # Numerical weights only
        "layer1.bias": [...],
        "layer2.weight": [...]
    },
    "sample_count": 1000,  # Aggregate count, no individual records
    "metrics": {
        "loss": 0.42,
        "accuracy": 0.89
    }
}

# What NEVER gets uploaded:
# - Individual training examples
# - User identifiers
# - Sensitive features (names, addresses, photos)

Privacy benefit: Even if network traffic is intercepted, raw data is not exposed.

3. Differential Privacy (Planned)

Differential privacy adds calibrated noise to model updates to prevent individual data points from being reverse-engineered.

How it works:

Example:

# Without differential privacy
original_weight_update = 0.453

# With differential privacy (ε=1.0)
noisy_update = original_weight_update + np.random.laplace(0, sensitivity/epsilon)
# → 0.467 (slightly perturbed)

Privacy benefit:

Individual training examples cannot be recovered from model updates
Provides mathematical privacy guarantees (ε-differential privacy)
Protects against membership inference attacks

Trade-off: Adding noise slightly reduces model accuracy, but empirically the impact is small (1-3% accuracy loss) for reasonable privacy budgets [Abadi et al., 2016].

EdgeML's planned implementation (not in MVP):

User-configurable privacy budget (ε)
Automatic noise calibration based on model architecture
Privacy accounting across multiple rounds

4. Secure Aggregation (Planned)

Secure aggregation uses cryptographic techniques to compute the average of updates without the server seeing individual contributions.

How it works:

Privacy benefit:

Server cannot see individual device updates
Even a compromised server cannot isolate a single device's contribution
Protects against "honest-but-curious" server attacks

Cryptographic protocol (Bonawitz et al., 2017):

Devices generate pairwise shared secrets using Diffie-Hellman
Each device masks its update with these secrets
Server sums masked updates
Masks cancel out in aggregation, revealing only the sum

EdgeML's planned implementation (not in MVP):

Based on Google's secure aggregation protocol [Bonawitz et al., 2017]
Threshold cryptography for fault tolerance
Efficient for 100-10,000 devices

Privacy Attacks and Defenses

Model Inversion Attacks

Attack: Reconstruct training data by analyzing model parameters.

Example: Given a facial recognition model, can you generate faces it was trained on?

EdgeML's defense:

Aggregate updates from 100+ devices → individual contributions are diluted
Differential privacy (planned) adds noise to prevent reconstruction
Regular model retraining prevents memorization

Research: Model inversion is most effective on models trained on very few samples (<100). Federated learning with large cohorts (1000+ devices) makes this attack impractical [Fredrikson et al., 2015].

Membership Inference Attacks

Attack: Determine if a specific data point was in the training set.

Example: Given a medical model, can you determine if Alice's health record was used for training?

EdgeML's defense:

Aggregation across many devices reduces signal
Differential privacy (planned) provides provable protection
Limit local epochs to prevent overfitting

Research: Membership inference success drops from 80% (centralized) to 50-60% (federated with 100+ devices) [Shokri et al., 2017].

Poisoning Attacks

Attack: Malicious device submits corrupted updates to degrade model performance or inject backdoors.

Example: Attacker sends updates that cause the model to misclassify specific inputs.

EdgeML's defense:

Statistical outlier detection: Reject updates far from the median
Byzantine-robust aggregation: Use median or trimmed mean instead of simple average
Reputation systems: Track device reliability over time

Planned enhancements:

Secure aggregation prevents attacker from seeing other updates
Differential privacy limits impact of individual malicious updates

Compliance and Regulations

Key requirements:

Data minimization: Only collect necessary data → ✅ EdgeML keeps data on-device
Right to erasure: Users can delete their data → ✅ Data stays local, user controls deletion
Data portability: Users can export their data → ✅ Data never centralized
Purpose limitation: Data used only for specified purpose → ✅ Model training only

EdgeML's approach:

Raw data never leaves the device → no central data processing
Model updates are not "personal data" under GDPR (anonymized aggregates)
Users can opt out without affecting others

HIPAA (Health Insurance Portability and Accountability Act)

Key requirements:

Protected Health Information (PHI): Must be secured and not disclosed
Minimum necessary: Only access minimum data needed

EdgeML's approach:

PHI stays on-device (e.g., patient records remain in hospital systems)
Only model updates (not PHI) transmitted to server
Enables multi-hospital collaboration without centralizing patient data

Use case: Hospital consortium training disease prediction model:

# Hospital A
client_a = FederatedClient(api_key="...")
client_a.train_from_remote(
    model="disease-predictor",
    local_train_fn=train_on_local_patients  # PHI stays local
)

# Hospital B
client_b = FederatedClient(api_key="...")
client_b.train_from_remote(
    model="disease-predictor",
    local_train_fn=train_on_local_patients  # PHI stays local
)

# Server aggregates WITHOUT seeing patient data
federation.train(model="disease-predictor", min_updates=10)

CCPA (California Consumer Privacy Act)

Key requirements:

Right to know: Users can see what data is collected
Right to delete: Users can request data deletion
Right to opt-out: Users can opt out of data "sale"

EdgeML's approach:

Transparent: Users see that only model updates (not data) are shared
Deletion: User data stays local, can be deleted without affecting system
No "sale": Model updates are not sold or shared with third parties

Privacy Configuration

Client-Side Controls

client = FederatedClient(
    api_key="ek_live_...",
    privacy_budget=1.0,  # Differential privacy ε (planned)
    max_local_epochs=3,  # Limit overfitting
    sample_fraction=0.1,  # Use only 10% of local data
    opt_in_required=True  # User must explicitly consent
)

Server-Side Controls

federation = Federation(
    api_key="ek_live_...",
    min_updates=100,  # Require many devices for aggregation
    outlier_threshold=3.0,  # Reject updates >3σ from median
    secure_aggregation=True  # Enable cryptographic aggregation (planned)
)

Privacy vs. Utility Trade-offs

Guidelines for balancing privacy and utility:

Scenario	Privacy Settings	Expected Impact
Public dataset (MNIST)	No DP, plain aggregation	No accuracy loss
Internal company data	Light DP (ε=10), plain aggregation	<1% accuracy loss
Healthcare data (HIPAA)	Strong DP (ε=1), secure aggregation	2-5% accuracy loss
Financial data (PCI-DSS)	Strong DP (ε=0.5), secure aggregation	5-10% accuracy loss

Tuning recommendations:

Start without differential privacy to establish baseline accuracy
Gradually increase privacy (reduce ε) while monitoring model quality
Use more devices and more rounds to compensate for privacy overhead

References

Abadi, M., et al. (2016). "Deep Learning with Differential Privacy." ACM CCS. [arXiv:1607.00133]
- Practical differential privacy for deep learning
Bonawitz, K., et al. (2017). "Practical Secure Aggregation for Privacy-Preserving Machine Learning." ACM CCS. [arXiv:1611.04482]
- Cryptographic protocol for secure aggregation at scale
Fredrikson, M., et al. (2015). "Model Inversion Attacks that Exploit Confidence Information." ACM CCS. [PDF]
- Demonstrates privacy risks of model inversion
Shokri, R., et al. (2017). "Membership Inference Attacks Against Machine Learning Models." IEEE S&P. [arXiv:1610.05820]
- Quantifies privacy leakage through membership inference
Kairouz, P., et al. (2021). "Advances and Open Problems in Federated Learning." [arXiv:1912.04977]
- Comprehensive survey including privacy challenges

Next Steps

Federated Learning - Core concepts
Training Rounds - How rounds work
Model Lifecycle - Model versioning and deployment
Quickstart Guide - Build your first federated app

Data Flow Architecture​

Privacy Guarantees​

What stays on-device​

What is shared​

Update Privacy Comparison​

Threat model (MVP)​

Privacy-Preserving Mechanisms​

1. Local Training​

2. Weight Updates Only​

3. Differential Privacy (Planned)​

4. Secure Aggregation (Planned)​

Privacy Attacks and Defenses​

Model Inversion Attacks​

Membership Inference Attacks​

Poisoning Attacks​

Compliance and Regulations​

GDPR (General Data Protection Regulation)​

HIPAA (Health Insurance Portability and Accountability Act)​

CCPA (California Consumer Privacy Act)​

Privacy Configuration​

Client-Side Controls​

Server-Side Controls​

Privacy vs. Utility Trade-offs​

References​

Next Steps​

Data Flow Architecture

Privacy Guarantees

What stays on-device

What is shared

Update Privacy Comparison

Threat model (MVP)

Privacy-Preserving Mechanisms

1. Local Training

2. Weight Updates Only

3. Differential Privacy (Planned)

4. Secure Aggregation (Planned)

Privacy Attacks and Defenses

Model Inversion Attacks

Membership Inference Attacks

Poisoning Attacks

Compliance and Regulations

GDPR (General Data Protection Regulation)

HIPAA (Health Insurance Portability and Accountability Act)

CCPA (California Consumer Privacy Act)

Privacy Configuration

Client-Side Controls

Server-Side Controls

Privacy vs. Utility Trade-offs

References

Next Steps