Developer Documentation

Machine Learning Pipeline for Unknown Pollution Factor Identification

Neural network that predicts AQI values and identifies pollutant contribution ratios. Calculates residual discrepancies between predicted and actual AQI values.

Input Features

featuresarrayRequired

Array of 18 engineered features including pollutant AQI values, interactions, ratios, and encoded location data.

citystringOptional

Target city for region-specific analysis. Defaults to global model.

pollutantsobjectRequired

Pollutant AQI values: CO_AQI, Ozone_AQI, NO2_AQI, PM2.5_AQI

Model Architecture

Input Layer:18 features

Hidden Layers:128 → 64 → 32 neurons

Dropout:30% (prevents overfitting)

Output 1:AQI Prediction (regression)

Output 2:Contribution Ratios (softmax)

Performance:R² > 0.85

import requests
import numpy as np

# Stage 1: Discrepancy Detection
url = "https://api.airflux.io/v1/detect-discrepancy"

payload = {
  "city": "Hyderabad",
  "pollutants": {
    "CO_AQI": 45,
    "Ozone_AQI": 78,
    "NO2_AQI": 92,
    "PM2.5_AQI": 156
  },
  "features": [
    # Country_Encoded, City_Encoded
    1, 23,
    # Pollutant AQI values
    45, 78, 92, 156,
    # Interactions
    3510, 14352,
    # Total, Avg, Max, Min, Range
    371, 92.75, 156, 45, 111,
    # Dominant pollutant encoded
    3,
    # Ratios
    0.121, 0.210, 0.248, 0.421
  ]
}

headers = {
  "Content-Type": "application/json",
  "Authorization": "Bearer $AIRFLUX_API_KEY"
}

response = requests.post(url, json=payload, headers=headers)
stage1_result = response.json()

print(f"Predicted AQI: {stage1_result['aqi_prediction']}")
print(f"Discrepancy: {stage1_result['discrepancy']}")

# Stage 2: Unknown Factor Identification
url2 = "https://api.airflux.io/v1/identify-factors"

payload2 = {
  "features": payload["features"],
  "discrepancy": stage1_result["discrepancy"],
  "contributions": stage1_result["contribution_ratios"]
}

response2 = requests.post(url2, json=payload2, headers=headers)
stage2_result = response2.json()

print(f"Unknown Factor: {stage2_result['factor_name']}")
print(f"Confidence: {stage2_result['confidence']:.2%}")

Model Performance: Hyderabad, India

Stage 1 R² Score0.87

Stage 2 Accuracy89.2%

Training Samples4,000+

Avg Response Time~140ms