DIY Machine-Learning Forecasts: Nowcasting with Your Station Data

Quick Answer

Nowcasting — predicting weather conditions 1–6 hours ahead — is where local station data outperforms global models. Your station captures microclimatic patterns that GFS and ECMWF cannot resolve at their grid spacing. With a year of archived data, a random forest model in scikit-learn can produce useful temperature and precipitation probability forecasts for your specific location in under 50 lines of Python. This Tutorials guide walks through the full pipeline from feature engineering to deployment.

What This Guide Covers

We cover what nowcasting means and why local data matters, feature engineering from raw station observations (pressure tendency, temperature change rate, humidity patterns, wind shifts), model selection (starting with random forest, comparing gradient boosting), proper time-series train/test splitting, model evaluation metrics, deploying the model on a Raspberry Pi or server, and understanding the limitations of single-station forecasting. The data preparation techniques build directly on the Python and Pandas analysis guide.

For context on how professional ensemble models work and where your local model fits in, see the ensemble forecasting overview.

Prerequisites

At least one year of continuous station data (temperature, pressure, humidity, wind, rain) at 5–15 minute intervals
Python 3.9+ with scikit-learn, pandas, numpy, matplotlib
Familiarity with basic ML concepts (training vs test sets, overfitting)

pip install scikit-learn pandas numpy matplotlib

Why Local Station Data Matters for Nowcasting

Global numerical weather prediction (NWP) models like GFS run at roughly 13 km grid spacing. The ECMWF HRES model resolves at 9 km. Your backyard station measures conditions at a single point with sub-metre accuracy. For 1–6 hour forecasts at your specific location, the recent trend in your own data — pressure dropping 3 hPa in the last 2 hours, humidity climbing, wind shifting southwest — contains more predictive signal than a global model's interpolated output at your coordinates.

This is not a replacement for NWP. For forecasts beyond 6–12 hours, physics-based models win decisively. But for "will it rain in the next 2 hours?" or "how cold will it get by dawn?", your station's recent history is gold.

Step 1: Load and Prepare Your Data

Start by loading your station archive and computing the features that have predictive power:

import pandas as pd
import numpy as np

df = pd.read_csv("station_archive.csv", parse_dates=["dateTime"])
df.set_index("dateTime", inplace=True)
df.sort_index(inplace=True)

# Ensure consistent 5-minute intervals
df = df.resample("5min").mean().interpolate(method="time", limit=6)

The interpolation fills gaps up to 30 minutes (6 × 5 min). Larger gaps should be left as NaN and dropped from training later.

Step 2: Feature Engineering

Raw sensor readings are inputs, but the predictive signal lives in how they are changing. Compute features that capture trends and patterns:

# Pressure tendency (change over last 3 hours) — strong predictor
df["pressure_3h"] = df["barometer"] - df["barometer"].shift(36)  # 36 × 5min = 3h
df["pressure_1h"] = df["barometer"] - df["barometer"].shift(12)

# Temperature rate of change
df["temp_1h"] = df["outTemp"] - df["outTemp"].shift(12)
df["temp_3h"] = df["outTemp"] - df["outTemp"].shift(36)

# Humidity change
df["humidity_1h"] = df["outHumidity"] - df["outHumidity"].shift(12)

# Wind features
df["wind_speed_avg_1h"] = df["windSpeed"].rolling(12).mean()
df["wind_dir_sin"] = np.sin(np.radians(df["windDir"]))
df["wind_dir_cos"] = np.cos(np.radians(df["windDir"]))

# Dew point depression (temperature minus dew point) — indicates proximity to saturation
def dew_point(t, rh):
    a, b = 17.27, 237.7
    alpha = (a * t) / (b + t) + np.log(rh / 100.0)
    return (b * alpha) / (a - alpha)

df["dewpoint"] = dew_point(df["outTemp"], df["outHumidity"])
df["dp_depression"] = df["outTemp"] - df["dewpoint"]

# Time features (diurnal cycle matters for temperature)
df["hour_sin"] = np.sin(2 * np.pi * df.index.hour / 24)
df["hour_cos"] = np.cos(2 * np.pi * df.index.hour / 24)
df["month_sin"] = np.sin(2 * np.pi * df.index.month / 12)
df["month_cos"] = np.cos(2 * np.pi * df.index.month / 12)

Critical rule: Every feature must use only past data. Using future data (e.g., tomorrow's pressure) leaks information and produces unrealistically good training scores that collapse in production. This is the number one mistake in time-series ML.

Step 3: Define Forecast Targets

What are we predicting? Start with two practical targets:

# Target 1: Temperature 2 hours ahead
df["target_temp_2h"] = df["outTemp"].shift(-24)  # 24 × 5min = 2h

# Target 2: Rain in next 2 hours (binary classification)
df["rain_2h"] = df["rain"].rolling(24).sum().shift(-24)
df["target_rain_2h"] = (df["rain_2h"] > 0.2).astype(int)  # >0.2mm threshold

Drop rows where either features or targets are NaN:

feature_cols = [
    "outTemp", "outHumidity", "barometer", "windSpeed",
    "pressure_3h", "pressure_1h", "temp_1h", "temp_3h",
    "humidity_1h", "wind_speed_avg_1h", "wind_dir_sin", "wind_dir_cos",
    "dp_depression", "hour_sin", "hour_cos", "month_sin", "month_cos",
]

data = df[feature_cols + ["target_temp_2h", "target_rain_2h"]].dropna()

Step 4: Train-Test Split (Time-Series Aware)

Never use random train/test splits for time-series data. The test set must be entirely after the training set to simulate real-world deployment:

split_date = data.index[-1] - pd.Timedelta(days=60)  # last 60 days for testing
train = data[:split_date]
test = data[split_date:]

X_train = train[feature_cols]
X_test = test[feature_cols]
y_train_temp = train["target_temp_2h"]
y_test_temp = test["target_temp_2h"]
y_train_rain = train["target_rain_2h"]
y_test_rain = test["target_rain_2h"]

Step 5: Train Models

Temperature Forecast (Regression)

from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Random Forest
rf_temp = RandomForestRegressor(n_estimators=200, max_depth=15, random_state=42, n_jobs=-1)
rf_temp.fit(X_train, y_train_temp)
pred_temp_rf = rf_temp.predict(X_test)

print(f"RF — MAE: {mean_absolute_error(y_test_temp, pred_temp_rf):.2f} °C")
print(f"RF — RMSE: {np.sqrt(mean_squared_error(y_test_temp, pred_temp_rf)):.2f} °C")

# Gradient Boosting (often slightly better)
gb_temp = GradientBoostingRegressor(n_estimators=300, max_depth=5, learning_rate=0.1, random_state=42)
gb_temp.fit(X_train, y_train_temp)
pred_temp_gb = gb_temp.predict(X_test)

print(f"GB — MAE: {mean_absolute_error(y_test_temp, pred_temp_gb):.2f} °C")
print(f"GB — RMSE: {np.sqrt(mean_squared_error(y_test_temp, pred_temp_gb)):.2f} °C")

Typical MAE for 2-hour temperature forecasts with a well-placed station: 0.5–1.5 °C. That is competitive with NWP model output for your specific location.

Rain Probability (Classification)

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

rf_rain = RandomForestClassifier(n_estimators=200, max_depth=10, class_weight="balanced", random_state=42)
rf_rain.fit(X_train, y_train_rain)
pred_rain = rf_rain.predict(X_test)

print(classification_report(y_test_rain, pred_rain, target_names=["No Rain", "Rain"]))

Use class_weight="balanced" because rainy hours are a minority class. Without balancing, the model learns to always predict "no rain" and scores well on accuracy while being useless for the prediction you actually care about.

Step 6: Feature Importance

Understanding which features matter most validates that the model is learning physically meaningful patterns:

importances = rf_temp.feature_importances_
for name, imp in sorted(zip(feature_cols, importances), key=lambda x: -x[1]):
    print(f"  {name}: {imp:.3f}")

Expected result: pressure_3h, temp_1h, hour_sin/hour_cos, and dp_depression should rank highly. If a feature like month_cos dominates, the model may be memorising seasonal patterns rather than learning weather dynamics — a sign of overfitting.

Step 7: Deploy the Model

For production use on a Raspberry Pi or server, save the trained model and run predictions via cron:

import joblib

joblib.dump(rf_temp, "model_temp_2h.pkl")
joblib.dump(rf_rain, "model_rain_2h.pkl")

A prediction script loads the models, reads the latest station data, computes features, and generates forecasts:

model = joblib.load("model_temp_2h.pkl")
latest = compute_features(get_latest_data())  # your data pipeline
forecast_temp = model.predict(latest[feature_cols].values.reshape(1, -1))
print(f"Temperature in 2 hours: {forecast_temp[0]:.1f} °C")

Run this every 15 minutes via cron and publish the forecast alongside your current observations. Push it to your dashboard, include it in your weather page template, or log it for verification.

Limitations

Be honest about what this can and cannot do:

Single-station blindness. Your station cannot see weather approaching from upwind. A front 50 km away is invisible until it arrives. NWP models see the full atmosphere.
No dynamics. The ML model learns statistical correlations, not atmospheric physics. It will fail on unusual weather patterns it has not seen in training.
Forecast horizon. Skill degrades rapidly beyond 6 hours. Beyond 12 hours, even well-trained local models add little value over NWP.
Data quality dependence. If your sensors drift (see the maintenance and calibration guide), the model's inputs become unreliable and forecasts degrade. Garbage in, garbage out.

Common Mistakes

Data leakage. Using future data as a feature (even accidentally) produces unrealistically good training scores that collapse when deployed. Always verify that every feature is computed from past data only.
Random train/test split. Shuffling time-series data and splitting randomly allows the model to learn from future observations during training. Always split chronologically.
Overfitting to seasonal patterns. A model that memorises "December is cold" without learning pressure-temperature dynamics will fail on mild December days. Use limited tree depth and cross-validate across seasons.
Ignoring class imbalance. Rain events are rare compared to dry periods. Without class balancing, the model always predicts "no rain" and achieves 85% accuracy while being useless.
Not retraining. Station characteristics change (sensor drift, local environment changes, new obstructions). Retrain the model every 6–12 months with recent data.

FAQ

How much data do I need to train a useful model? One full year is the minimum to capture seasonal variation. Two or more years give more robust results. A model trained on summer data alone will fail in winter.

Can I use deep learning (LSTM, Transformer) instead? Yes, but for single-station nowcasting, the added complexity rarely justifies the improvement. Random forest and gradient boosting are robust, fast to train, and interpretable. Start simple.

Should I include NWP model output as a feature? If you have access to NWP forecasts for your location (e.g., via Open-Meteo API), using them as additional features can significantly improve your model. The combination of NWP dynamics and local station observations is powerful.

How do I validate that my model is actually useful? Compare its MAE and RMSE against two baselines: (1) persistence forecast (assume current conditions continue unchanged), and (2) climatological average (use the historical mean for this time of year). Your model should beat both. If it does not beat persistence, it is not adding value.