2 posts tagged with "configuration"

Edge Gateway Hot-Reload and Watchdog Patterns for Industrial IoT [2026]

March 3, 2026 · 12 min read

Here's a scenario every IIoT engineer dreads: it's 2 AM on a Saturday, your edge gateway in a plastics manufacturing plant has lost its MQTT connection to the cloud, and nobody notices until Monday morning. Forty-eight hours of production data — temperatures, pressures, cycle counts, alarms — gone. The maintenance team wanted to correlate a quality defect with process data from Saturday afternoon. They can't.

This is a reliability problem, and it's solvable. The patterns that separate a production-grade edge gateway from a prototype are: configuration hot-reload (change settings without restarting), connection watchdogs (detect and recover from silent failures), and graceful resource management (handle reconnections without memory leaks).

This guide covers the architecture behind each of these patterns, with practical design decisions drawn from real industrial deployments.

Edge gateway hot-reload and firmware patterns

The Problem: Why Edge Gateways Fail Silently

Industrial edge gateways operate in hostile environments: temperature swings, electrical noise, intermittent network connectivity, and 24/7 uptime requirements. The failure modes are rarely dramatic — they're insidious:

MQTT connection drops silently. The broker stops responding, but the client library doesn't fire a disconnect callback because the TCP connection is still half-open.
Configuration drift. An engineer updates tag definitions on the management server, but the gateway is still running the old configuration.
Memory exhaustion. Each reconnection allocates new buffers without properly freeing the old ones. After enough reconnections, the gateway runs out of memory and crashes.
PLC link flapping. The PLC reboots or loses power briefly. The gateway keeps polling, getting errors, but never properly re-detects or reconnects.

Solving these requires three interlocking systems: hot-reload for configuration, watchdogs for connections, and disciplined resource management.

Pattern 1: Configuration File Hot-Reload

The simplest and most robust approach to configuration hot-reload is file-based with stat polling. The gateway periodically checks if its configuration file has been modified (using the file's modification timestamp), and if so, reloads and applies the new configuration.

Design: stat() Polling vs. inotify

You have two options for detecting file changes:

stat() polling — Check the file's st_mtime on every main loop iteration:

on_each_cycle():
    current_stat = stat(config_file)
    if current_stat.mtime != last_known_mtime:
        reload_configuration()
        last_known_mtime = current_stat.mtime

inotify (Linux) — Register for kernel-level file change notifications:

fd = inotify_add_watch(config_file, IN_MODIFY)
poll(fd)  // blocks until file changes
reload_configuration()

For industrial edge gateways, stat() polling wins. Here's why:

It's simpler. No file descriptor management, no edge cases with inotify watches being silently dropped.
It works across filesystems. inotify doesn't work on NFS, CIFS, or some embedded filesystems. stat() works everywhere.
The cost is negligible. A single stat() call takes ~1 microsecond. Even at 1 Hz, it's invisible.
It naturally integrates with the main loop. Industrial gateways already run a polling loop for PLC reads. Adding a stat() check is one line.

Graceful Reload: The Teardown-Rebuild Cycle

When a configuration change is detected, the gateway must:

Stop active PLC connections. For EtherNet/IP, destroy all tag handles. For Modbus, close the serial port or TCP connection.
Free allocated memory. Tag definitions, batch buffers, connection contexts — all of it.
Re-read and validate the new configuration.
Re-detect the PLC and re-establish connections with the new tag map.
Resume data collection with a forced initial read of all tags.

The critical detail is step 2. Industrial gateways often use a pool allocator instead of individual malloc/free calls. All configuration-related memory is allocated from a single large buffer. On reload, you simply reset the allocator's pointer to the beginning of the buffer:

// Pseudo-code: pool allocator reset
config_memory.write_pointer = config_memory.base_address
config_memory.used_bytes = 0
config_memory.free_bytes = config_memory.total_size

This eliminates the risk of memory leaks during reconfiguration. No matter how many times you reload, memory usage stays constant.

Multi-File Configuration

Production gateways often have multiple configuration files:

Daemon config — Network settings, serial port parameters, batch sizes, timeouts
Device configs — Per-device-type tag maps (one JSON file per machine model)
Connection config — MQTT broker address, TLS certificates, authentication tokens

Each file should be watched independently. If only the daemon config changes (e.g., someone adjusts the batch timeout), you don't need to re-detect the PLC — just update the runtime parameter. If a device config changes (e.g., someone adds a new tag), you need to rebuild the tag chain.

A practical approach: when the daemon config changes, set a flag to force a status report on the next MQTT cycle. When a device config changes, trigger a full teardown-rebuild of that device's tag chain.

Pattern 2: Connection Watchdogs

The most dangerous failure mode in MQTT-based telemetry is the silent disconnect. The TCP connection appears alive (no RST received), but the broker has stopped processing messages. The client's publish calls succeed (they're just writing to a local socket buffer), but data never reaches the cloud.

The MQTT Delivery Confirmation Watchdog

The robust solution uses MQTT QoS 1 delivery confirmations as a heartbeat:

// Track the timestamp of the last confirmed delivery
last_delivery_timestamp = 0

on_publish_confirmed(packet_id):
    last_delivery_timestamp = now()

on_watchdog_check():  // runs every N seconds
    if last_delivery_timestamp == 0:
        return  // no data sent yet, nothing to check

    elapsed = now() - last_delivery_timestamp
    if elapsed > WATCHDOG_TIMEOUT:
        trigger_reconnect()

With MQTT QoS 1, the broker sends a PUBACK for every published message. If you haven't received a PUBACK in, say, 120 seconds, but you've been publishing data, something is wrong.

The key insight is that you're not watching the connection state — you're watching the delivery pipeline. A connection can appear healthy (no disconnect callback fired) while the delivery pipeline is stalled.

Reconnection Strategy: Async with Backoff

When the watchdog triggers, the reconnection must be:

Asynchronous — Don't block the PLC polling loop. Data collection should continue even while MQTT is reconnecting. Collected data gets buffered locally.
Non-destructive — The MQTT loop thread must be stopped before destroying the client. Stopping the loop with force=true ensures no callbacks fire during teardown.
Complete — Disconnect, destroy the client, reinitialize the library, create a new client, set callbacks, start the loop, then connect. Half-measures (just calling reconnect) often leave stale state.

A dedicated reconnection thread works well:

reconnect_thread():
    while true:
        wait_for_signal()  // semaphore blocks until watchdog triggers

        log("Starting MQTT reconnection")
        stop_mqtt_loop(force=true)
        disconnect()
        destroy_client()
        cleanup_library()

        // Re-initialize from scratch
        init_library()
        create_client(device_id)
        set_credentials(username, password)
        set_tls(certificate_path)
        set_protocol(MQTT_3_1_1)
        set_callbacks(on_connect, on_disconnect, on_message, on_publish)
        start_loop()
        set_reconnect_delay(5, 5, no_exponential)
        connect_async(host, port, keepalive=60)

        signal_complete()  // release semaphore

Why a separate thread? The connect_async call can block for up to 60 seconds on DNS resolution or TCP handshake. If this runs on the main thread, PLC polling stops. Industrial processes don't wait for your network issues.

PLC Connection Watchdog

MQTT isn't the only connection that needs watching. PLC connections — both EtherNet/IP and Modbus TCP — can also fail silently.

For Modbus TCP, the watchdog logic is simpler because each read returns an explicit error code:

on_modbus_read_error(error_code):
    if error_code in [ETIMEDOUT, ECONNRESET, ECONNREFUSED, EPIPE, EBADF]:
        close_modbus_connection()
        set_link_state(DOWN)
        // Will reconnect on next polling cycle

For EtherNet/IP via libraries like libplctag, a return code of -32 (connection failed) should trigger:

Setting the link state to DOWN
Destroying the tag handles
Attempting re-detection on the next cycle

A critical detail: track consecutive errors, not individual ones. A single timeout might be a transient hiccup. Three consecutive timeouts (error_count >= 3) indicate a real problem. Break the polling cycle early to avoid hammering a dead connection.

Link State Telemetry

The gateway should treat the connection state itself as a telemetry point. When the PLC link goes up or down, immediately publish a link state tag — a boolean value with do_not_batch: true:

link_state_changed(device, new_state):
    publish_immediately(
        tag_id=LINK_STATE_TAG,
        value=new_state,  // true=up, false=down
        timestamp=now()
    )

This gives operators cloud-side visibility into gateway connectivity. A dashboard can show "Device offline since 2:47 AM" instead of just "no data" — which is ambiguous (was the device off, or was the gateway offline?).

Pattern 3: Store-and-Forward Buffering

When MQTT is disconnected, you can't just drop data. A production gateway needs a paged ring buffer that accumulates data during disconnections and drains it when connectivity returns.

Paged Buffer Architecture

The buffer divides a fixed-size memory region into pages of equal size:

Total buffer: 2 MB
Page size: ~4 KB (derived from max batch size)
Pages: ~500

Page states:
  FREE → Available for writing
  WORK → Currently being written to
  USED → Full, queued for delivery

The lifecycle:

Writing: Data is appended to the WORK page. When it's full, WORK moves to the USED queue, and a FREE page becomes the new WORK page.
Sending: When MQTT is connected, the first USED page is sent. On PUBACK confirmation, the page moves to FREE.
Overflow: If all pages are USED (buffer full, MQTT down for too long), the oldest USED page is recycled as the new WORK page. This loses the oldest data to preserve the newest — the right tradeoff for most industrial applications.

Thread safety is critical. The PLC polling thread writes to the buffer, the MQTT thread reads from it, and the PUBACK callback advances the read pointer. A mutex protects all buffer operations:

buffer_add_data(data, size):
    lock(mutex)
    append_to_work_page(data, size)
    if work_page_full():
        move_work_to_used()
    try_send_next()
    unlock(mutex)

on_puback(packet_id):
    lock(mutex)
    advance_read_pointer()
    if page_fully_delivered():
        move_page_to_free()
    try_send_next()
    unlock(mutex)

on_disconnect():
    lock(mutex)
    connected = false
    packet_in_flight = false  // reset delivery state
    unlock(mutex)

Sizing the Buffer

Buffer sizing depends on your data rate and your maximum acceptable offline duration:

buffer_size = data_rate_bytes_per_second × max_offline_seconds

For a typical deployment:

50 tags × 4 bytes average × 1 read/second = 200 bytes/second
With binary encoding overhead: ~300 bytes/second
Maximum offline duration: 2 hours (7,200 seconds)
Buffer needed: 300 × 7,200 = ~2.1 MB

A 2 MB buffer with 4 KB pages gives you ~500 pages — more than enough for 2 hours of offline operation.

The Minimum Three-Page Rule

The buffer needs at minimum 3 pages to function:

One WORK page (currently being written to)
One USED page (queued for delivery)
One page in transition (being delivered, not yet confirmed)

If you can't fit 3 pages in the buffer, the page size is too large relative to the buffer. Validate this at initialization time and reject invalid configurations rather than failing at runtime.

Pattern 4: Periodic Forced Reads

Even with change-detection enabled (the compare flag), a production gateway should periodically force-read all tags and transmit their values regardless of whether they changed. This serves several purposes:

Proof of life. Downstream systems can distinguish "the value hasn't changed" from "the gateway is dead."
State synchronization. If the cloud-side database lost data (a rare but real scenario), periodic full-state updates resynchronize it.
Clock drift correction. Over time, individual tag timers can drift. A periodic full reset realigns all tags.

A practical approach: reset all tags on the hour boundary. Check the system clock, and when the hour rolls over, clear all "previously read" flags. Every tag will be read and transmitted on its next polling cycle, regardless of change detection:

on_each_read_cycle():
    current_hour = localtime(now()).hour
    previous_hour = localtime(last_read_time).hour

    if current_hour != previous_hour:
        reset_all_tags()  // clear read-once flags
        log("Hourly forced read: all tags will be re-read")

This adds at most one extra transmission per tag per hour — a negligible bandwidth cost for significant reliability improvement.

Pattern 5: SAS Token and Certificate Expiry Monitoring

If your MQTT connection uses time-limited credentials (like Azure IoT Hub SAS tokens or short-lived TLS certificates), the gateway must monitor expiry and refresh proactively.

For SAS tokens, extract the se (expiry) parameter from the connection string and compare it against the current system time:

on_config_load(sas_token):
    expiry_timestamp = extract_se_parameter(sas_token)

    if current_time > expiry_timestamp:
        log_warning("Token has expired!")
        // Still attempt connection — the broker will reject it,
        // but the error path will trigger a config reload
    else:
        time_remaining = expiry_timestamp - current_time
        log("Token valid for %d hours", time_remaining / 3600)

Don't silently fail. If the token is expired, log a prominent warning. The gateway should still attempt to connect (the broker rejection will be informative), but operations teams need visibility into credential lifecycle.

For TLS certificates, monitor both the certificate file's modification time (has a new cert been deployed?) and the certificate's validity period (is it about to expire?).

How machineCDN Implements These Patterns

machineCDN's edge gateway — deployed on OpenWRT-based industrial routers in plastics manufacturing plants — implements all five patterns:

Configuration hot-reload using stat() polling on the main loop, with pool-allocated memory for zero-leak teardown/rebuild cycles
Dual watchdogs for MQTT delivery confirmation (120-second timeout) and PLC link state (3 consecutive errors trigger reconnection)
Paged ring buffer with 2 MB capacity, supporting both JSON and binary encoding, with automatic overflow handling that preserves newest data
Hourly forced reads that ensure complete state synchronization regardless of change detection
SAS token monitoring with proactive expiry warnings

These patterns enable 99.9%+ data capture rates even in plants with intermittent cellular connectivity — because the gateway collects data continuously and back-fills when connectivity returns.

Implementation Checklist

If you're building or evaluating an edge gateway for industrial IoT, verify that it supports:

Capability	Why It Matters
Config hot-reload without restart	Zero-downtime updates, no data gaps during reconfiguration
Pool-based memory allocation	No memory leaks across reload cycles
MQTT delivery watchdog	Detects silent connection failures
Async reconnection thread	PLC polling continues during MQTT recovery
Paged store-and-forward buffer	Preserves data during network outages
Consecutive error thresholds	Avoids false-positive disconnections
Link state telemetry	Distinguishes "offline gateway" from "idle machine"
Periodic forced reads	State synchronization and proof-of-life
Credential expiry monitoring	Proactive certificate/token management

Conclusion

Reliability in industrial IoT isn't about preventing failures — it's about recovering from them automatically. Networks will drop. PLCs will reboot. Certificates will expire. The question is whether your edge gateway handles these events gracefully or silently loses data.

The patterns in this guide — hot-reload, watchdogs, store-and-forward, forced reads, and credential monitoring — are the difference between a gateway that works in the lab and one that works at 3 AM on a holiday weekend in a plant with spotty cellular coverage.

Build for the 3 AM scenario. Your operations team will thank you.

JSON-Based PLC Tag Configuration: Building Maintainable IIoT Device Templates [2026]

March 3, 2026 · 12 min read

If you've ever stared at a spreadsheet of 200 PLC register addresses trying to figure out which ones your SCADA system is actually polling, you know the pain. Traditional tag configuration — hardcoded in ladder logic comments, scattered across HMI screens, buried in proprietary configuration tools — doesn't scale.

The solution that's gaining traction in modern IIoT deployments is declarative, JSON-based tag configuration. Instead of configuring your data collection logic in opaque proprietary formats, you define your device's entire tag map as a structured JSON document. This approach brings version control, template reuse, and automated validation to the industrial data layer.

In this guide, we'll walk through the architecture of a production-grade JSON tag configuration system, drawing from real patterns used in industrial edge gateways connecting to Allen-Bradley Micro800 PLCs via EtherNet/IP and to various devices via Modbus RTU and TCP.

JSON-based PLC tag configuration for IIoT

Why JSON for PLC Tag Configuration?

The traditional approach to configuring PLC data collection involves vendor-specific tools: RSLinx for Allen-Bradley, TIA Portal for Siemens, or proprietary gateway configurators. These tools work, but they create several problems at scale:

No version control. You can't git diff a proprietary binary config file.
No templating. When you deploy the same machine type across 50 sites, you're manually recreating the same configuration 50 times.
No validation. Typos in register addresses don't surface until runtime.
No automation. You can't script the generation of configurations from a master device database.

JSON solves all of these. A tag configuration becomes a text file that can be:

Stored in Git with full change history
Templated per device type (one JSON per machine model)
Validated against a schema before deployment
Generated programmatically from engineering databases

Anatomy of a Tag Configuration Document

A well-structured PLC tag configuration document needs to capture several layers of information:

Device-Level Metadata

Every configuration file should identify the device type it applies to, carry a version string for change tracking, and specify the protocol:

{
  "device_type": 1010,
  "version": "a3f7b2c",
  "name": "Continuous Blender Model X",
  "protocol": "ethernet-ip",
  "plctags": [ ... ]
}

The device_type field is a numeric identifier that maps to a specific machine model. When an edge gateway auto-detects a PLC (by reading a known register), it uses this type ID to look up the correct configuration file. The version field — ideally a short Git hash — lets you track which configuration version is running on each gateway in the field.

For Modbus devices, you'd also include protocol-specific parameters:

{
  "device_type": 5000,
  "version": "b8e1d4a",
  "name": "Temperature Control Unit",
  "protocol": "modbus-rtu",
  "base_addr": 48,
  "baud": 9600,
  "parity": "even",
  "data_bits": 8,
  "stop_bits": 1,
  "byte_timeout": 4,
  "resp_timeout": 100,
  "plctags": [ ... ]
}

Notice the serial link parameters are part of the same document. This is deliberate — you want a single source of truth for "how to talk to this device and what to read from it."

Tag Definitions: The Core Data Model

Each tag in the configuration represents a single data point you want to collect from the PLC. A complete tag definition captures:

{
  "name": "barrel_zone1_temp",
  "id": 42,
  "type": "float",
  "ecount": 2,
  "sindex": 0,
  "interval": 5,
  "compare": true,
  "do_not_batch": false
}

Let's break down each field:

name — A human-readable identifier for the tag. For EtherNet/IP (CIP) devices, this is the actual PLC tag name. For Modbus, it's a descriptive label since Modbus uses numeric addresses.

id — A numeric identifier used in the wire protocol when transmitting data to the cloud. Using compact integer IDs instead of string names dramatically reduces payload sizes — critical when you're sending telemetry over cellular connections.

type — The data type of the register value. Common types include:

Type	Size	Range	Use Case
`bool`	1 byte	0 or 1	Alarm states, run/stop status
`int8`	1 byte	-128 to 127	Small counters, mode selectors
`uint8`	1 byte	0 to 255	Status codes, alarm bytes
`int16`	2 bytes	-32,768 to 32,767	Temperature (×10), pressure
`uint16`	2 bytes	0 to 65,535	RPM, flow rate, raw ADC values
`int32`	4 bytes	±2.1 billion	Production counters, energy
`uint32`	4 bytes	0 to 4.2 billion	Lifetime counters, timestamps
`float`	4 bytes	IEEE 754	Temperature, weight, setpoints

ecount (element count) — How many consecutive elements to read. For a single register, this is 1. For a 32-bit float stored across two Modbus registers, this is 2. For an array of 10 temperature readings, this is 10.

sindex (start index) — The starting element index for array reads. Combined with ecount, this lets you read slices of PLC arrays without pulling the entire array.

interval — How often (in seconds) to poll this tag. This is where you make intelligent decisions about bandwidth:

1 second: Critical alarms, emergency stops, safety interlocks
5 seconds: Process temperatures, pressures, flows
30 seconds: Setpoints, mode selectors (change infrequently)
300 seconds: Configuration parameters, serial numbers

compare — When true, the gateway compares each new reading against the previous value and only transmits if the value changed. This is the single most impactful optimization for reducing bandwidth and cloud ingestion costs.

do_not_batch — When true, the value is transmitted immediately rather than being accumulated into a batch payload. Use this for critical alarms that need sub-second cloud visibility.

Modbus Address Conventions

For Modbus devices, each tag also carries an addr field that encodes both the register address and the function code:

{
  "name": "process_temp",
  "id": 10,
  "addr": 400100,
  "type": "float",
  "ecount": 2,
  "interval": 5,
  "compare": true
}

The address convention follows a well-established pattern:

Address Range	Modbus Function Code	Register Type
0 – 65,536	FC 01	Coils (read/write)
100,000 – 165,536	FC 02	Discrete Inputs (read)
300,000 – 365,536	FC 04	Input Registers (read)
400,000 – 465,536	FC 03	Holding Registers (R/W)

So addr: 400100 means "holding register at address 100, read via function code 3." This convention eliminates ambiguity about which Modbus function to use — the address itself encodes it.

Why this matters: A common source of bugs in Modbus deployments is using the wrong function code. Someone configures a tag to read address 100 with FC 03 when the device exposes it as an input register (FC 04). With the address convention above, the function code is implicit and unambiguous.

Advanced Patterns: Calculated and Dependent Tags

Simple register reads cover 80% of use cases. But industrial devices often pack multiple boolean values into a single 16-bit alarm word, or have tags whose values only matter when a parent tag changes.

Calculated Tags: Extracting Bits from Alarm Words

Many PLCs pack 16 individual alarm flags into a single uint16 register. Rather than reading 16 separate coils, you read one register and extract the bits:

{
  "name": "alarm_word_1",
  "id": 50,
  "addr": 400200,
  "type": "uint16",
  "ecount": 1,
  "interval": 1,
  "compare": true,
  "calculated": [
    {
      "name": "high_temp_alarm",
      "id": 51,
      "type": "bool",
      "shift": 0,
      "mask": 1
    },
    {
      "name": "low_pressure_alarm",
      "id": 52,
      "type": "bool",
      "shift": 1,
      "mask": 1
    },
    {
      "name": "motor_overload",
      "id": 53,
      "type": "bool",
      "shift": 2,
      "mask": 1
    }
  ]
}

When alarm_word_1 is read, the gateway automatically:

Reads the raw uint16 value
For each calculated tag, applies the right-shift and mask to extract the bit
Compares the extracted boolean against its previous value
Only transmits if the bit actually changed

This is vastly more efficient than polling 16 individual coils — one Modbus read instead of 16, with identical semantic output.

Dependent Tags: Event-Driven Secondary Reads

Some tags only need to be read when a related tag changes. For example, you might have a machine_state register that changes between IDLE, RUNNING, and FAULT. When it changes, you want to immediately read a block of diagnostic registers — but you don't want to poll those diagnostics every cycle when the machine state is stable.

{
  "name": "machine_state",
  "id": 100,
  "addr": 400001,
  "type": "uint16",
  "ecount": 1,
  "interval": 1,
  "compare": true,
  "dependents": [
    {
      "name": "fault_code",
      "id": 101,
      "addr": 400010,
      "type": "uint16",
      "ecount": 1,
      "interval": 60
    },
    {
      "name": "fault_timestamp",
      "id": 102,
      "addr": 400011,
      "type": "uint32",
      "ecount": 2,
      "interval": 60
    }
  ]
}

When machine_state changes, the gateway forces an immediate read of all dependent tags, regardless of their normal polling interval. This gives you:

Low latency on state transitions — fault diagnostics arrive within 1 second of the fault occurring
Low bandwidth during steady state — diagnostic registers are only polled every 60 seconds when nothing is happening

Contiguous Register Optimization

One of the most impactful optimizations in Modbus data collection is contiguous register grouping. Instead of making separate Modbus read requests for each tag, the gateway sorts tags by address and groups adjacent registers into single bulk reads.

Consider these tags:

[
  { "name": "temp_1", "addr": 400100, "ecount": 1 },
  { "name": "temp_2", "addr": 400101, "ecount": 1 },
  { "name": "temp_3", "addr": 400102, "ecount": 1 },
  { "name": "pressure", "addr": 400103, "ecount": 2 }
]

A naive implementation makes four separate Modbus requests. An optimized one makes one request: read 5 registers starting at address 400100. The response contains all four values, which are dispatched to the correct tag definitions.

For this optimization to work, the configuration system must:

Sort tags by address at load time, not at runtime
Validate that function codes match — you can't group a coil read (FC 01) with a holding register read (FC 03)
Respect maximum packet sizes — Modbus TCP allows up to 125 registers per read; some devices are more restrictive
Respect polling intervals — only group tags that share the same polling interval

The performance difference is dramatic. A typical PLC with 50 Modbus tags might require 50 individual reads (50 × ~10ms = 500ms per cycle) or 5 grouped reads (5 × ~10ms = 50ms per cycle). That's a 10× improvement in polling speed.

IEEE 754 Float Handling: The Register Order Problem

Reading 32-bit floating-point values over Modbus is notoriously tricky because the Modbus specification doesn't define register byte ordering for multi-register values. A float spans two 16-bit registers, and different PLCs may store them in different orders:

Big-endian (AB CD): Register N contains the high word, N+1 the low word
Little-endian (CD AB): Register N contains the low word, N+1 the high word
Mid-endian (BA DC or DC BA): Each word's bytes are swapped

Your tag configuration should support specifying the byte order, or at least document which convention your gateway assumes. Most libraries (libmodbus, for example) provide helper functions like modbus_get_float() that assume big-endian by default — but always verify against your specific PLC.

Pro tip: When commissioning a new device, read a register where you know the expected value (e.g., a temperature setpoint showing 72.0°F on the HMI). If the gateway reads 72.0, your byte order is correct. If it reads 2.388e-38 or 1.23e+12, you have a byte-order mismatch.

Binary vs. JSON Telemetry Encoding

Once you've collected your tag values, you need to transmit them. Your configuration should support both JSON and binary encoding, with the choice driven by bandwidth constraints:

JSON encoding is human-readable and debuggable:

{
  "groups": [{
    "ts": 1709500800,
    "device_type": 1010,
    "serial_number": 85432,
    "values": [
      { "id": 42, "values": [72.3] },
      { "id": 43, "values": [true] }
    ]
  }]
}

Binary encoding is 3-5× smaller. A typical binary frame packs:

1-byte header marker
4-byte group count
Per group: 4-byte timestamp, 2-byte device type, 4-byte serial number, 4-byte value count
Per value: 2-byte tag ID, 1-byte status, 1-byte value count, 1-byte value size, then raw value bytes

A batch that's 2,000 bytes in JSON might be 400 bytes in binary. Over a cellular connection billed per megabyte, that savings compounds fast.

Putting It All Together: Configuration Lifecycle

A production deployment follows this lifecycle:

Template creation: For each machine model, create a JSON tag configuration. Store it in Git.
Deployment: Push configurations to edge gateways via your device management platform. The gateway monitors the config file and reloads automatically when it changes.
Auto-detection: When the gateway starts, it queries the PLC for its device type (a known register). It then matches the type to the correct configuration file.
Validation: At load time, validate register addresses (no duplicates, valid ranges), data types, and interval values. Reject invalid configs before they cause runtime errors.
Runtime: The gateway polls tags according to their configured intervals, applies change detection, groups contiguous registers, and batches values for transmission.

How machineCDN Handles Tag Configuration

machineCDN's edge gateway uses this exact pattern — JSON-based device templates that are automatically selected based on PLC auto-detection. Each machine type in a plastics manufacturing facility (blenders, dryers, granulators, chillers, TCUs) has its own configuration template with pre-mapped tags, optimized polling intervals, and calculated alarm decomposition.

When a new machine is connected, the gateway detects the PLC type, loads the matching template, and starts collecting data — typically in under 30 seconds with zero manual configuration. For plants running 20+ machines across 5 different models, this eliminates weeks of commissioning time.

Common Pitfalls

1. Overlapping addresses. Two tags pointing to the same register with different IDs will cause confusion in your data pipeline. Validate for uniqueness at load time.

2. Wrong element count for floats. A 32-bit float on Modbus requires ecount: 2 (two 16-bit registers). Setting ecount: 1 gives you garbage data.

3. Polling too fast on serial links. Modbus RTU over RS-485 at 9600 baud can handle roughly 10-15 register reads per second. If you configure 50 tags at 1-second intervals, you'll never keep up. Budget your polling rate against your link speed.

4. Missing change detection on high-volume tags. Without compare: true, every reading gets transmitted. For a tag polled every second, that's 86,400 data points per day — even if the value never changed.

5. Batch timeout too long. If your batch timeout is 60 seconds but an alarm fires, it won't reach the cloud for up to a minute unless that alarm tag has do_not_batch: true.

Conclusion

JSON-based tag configuration isn't just a nice-to-have — it's a fundamental enabler for scaling IIoT deployments. It brings software engineering best practices (version control, templating, validation, automation) to a domain that has traditionally relied on manual, vendor-specific tooling.

The key design principles are:

One file per device type with version tracking
Rich tag metadata covering data types, intervals, and delivery modes
Hierarchical relationships for calculated and dependent tags
Protocol-aware addressing that encodes function codes implicitly
Contiguous register grouping for optimal Modbus performance

Get this foundation right, and you'll spend your time analyzing machine data instead of debugging data collection.

The Problem: Why Edge Gateways Fail Silently​

Pattern 1: Configuration File Hot-Reload​

Design: stat() Polling vs. inotify​

Graceful Reload: The Teardown-Rebuild Cycle​

Multi-File Configuration​

Pattern 2: Connection Watchdogs​

The MQTT Delivery Confirmation Watchdog​

Reconnection Strategy: Async with Backoff​

PLC Connection Watchdog​

Link State Telemetry​

Pattern 3: Store-and-Forward Buffering​

Paged Buffer Architecture​

Sizing the Buffer​

The Minimum Three-Page Rule​

Pattern 4: Periodic Forced Reads​

Pattern 5: SAS Token and Certificate Expiry Monitoring​

How machineCDN Implements These Patterns​

Implementation Checklist​

Conclusion​

Why JSON for PLC Tag Configuration?​

Anatomy of a Tag Configuration Document​

Device-Level Metadata​

Tag Definitions: The Core Data Model​

Modbus Address Conventions​

Advanced Patterns: Calculated and Dependent Tags​

Calculated Tags: Extracting Bits from Alarm Words​

Dependent Tags: Event-Driven Secondary Reads​

Contiguous Register Optimization​

IEEE 754 Float Handling: The Register Order Problem​

Binary vs. JSON Telemetry Encoding​

Putting It All Together: Configuration Lifecycle​

How machineCDN Handles Tag Configuration​

Common Pitfalls​

Conclusion​

The Problem: Why Edge Gateways Fail Silently

Pattern 1: Configuration File Hot-Reload

Design: stat() Polling vs. inotify

Graceful Reload: The Teardown-Rebuild Cycle

Multi-File Configuration

Pattern 2: Connection Watchdogs

The MQTT Delivery Confirmation Watchdog

Reconnection Strategy: Async with Backoff

PLC Connection Watchdog

Link State Telemetry

Pattern 3: Store-and-Forward Buffering

Paged Buffer Architecture

Sizing the Buffer

The Minimum Three-Page Rule

Pattern 4: Periodic Forced Reads

Pattern 5: SAS Token and Certificate Expiry Monitoring

How machineCDN Implements These Patterns

Implementation Checklist

Conclusion

Why JSON for PLC Tag Configuration?

Anatomy of a Tag Configuration Document

Device-Level Metadata

Tag Definitions: The Core Data Model

Modbus Address Conventions

Advanced Patterns: Calculated and Dependent Tags

Calculated Tags: Extracting Bits from Alarm Words

Dependent Tags: Event-Driven Secondary Reads

Contiguous Register Optimization

IEEE 754 Float Handling: The Register Order Problem

Binary vs. JSON Telemetry Encoding

Putting It All Together: Configuration Lifecycle

How machineCDN Handles Tag Configuration

Common Pitfalls

Conclusion