Skip to main content

2 posts tagged with "configuration"

View All Tags

Edge Gateway Hot-Reload and Watchdog Patterns for Industrial IoT [2026]

· 12 min read

Here's a scenario every IIoT engineer dreads: it's 2 AM on a Saturday, your edge gateway in a plastics manufacturing plant has lost its MQTT connection to the cloud, and nobody notices until Monday morning. Forty-eight hours of production data — temperatures, pressures, cycle counts, alarms — gone. The maintenance team wanted to correlate a quality defect with process data from Saturday afternoon. They can't.

This is a reliability problem, and it's solvable. The patterns that separate a production-grade edge gateway from a prototype are: configuration hot-reload (change settings without restarting), connection watchdogs (detect and recover from silent failures), and graceful resource management (handle reconnections without memory leaks).

This guide covers the architecture behind each of these patterns, with practical design decisions drawn from real industrial deployments.

Edge gateway hot-reload and firmware patterns

The Problem: Why Edge Gateways Fail Silently

Industrial edge gateways operate in hostile environments: temperature swings, electrical noise, intermittent network connectivity, and 24/7 uptime requirements. The failure modes are rarely dramatic — they're insidious:

  • MQTT connection drops silently. The broker stops responding, but the client library doesn't fire a disconnect callback because the TCP connection is still half-open.
  • Configuration drift. An engineer updates tag definitions on the management server, but the gateway is still running the old configuration.
  • Memory exhaustion. Each reconnection allocates new buffers without properly freeing the old ones. After enough reconnections, the gateway runs out of memory and crashes.
  • PLC link flapping. The PLC reboots or loses power briefly. The gateway keeps polling, getting errors, but never properly re-detects or reconnects.

Solving these requires three interlocking systems: hot-reload for configuration, watchdogs for connections, and disciplined resource management.

Pattern 1: Configuration File Hot-Reload

The simplest and most robust approach to configuration hot-reload is file-based with stat polling. The gateway periodically checks if its configuration file has been modified (using the file's modification timestamp), and if so, reloads and applies the new configuration.

Design: stat() Polling vs. inotify

You have two options for detecting file changes:

stat() polling — Check the file's st_mtime on every main loop iteration:

on_each_cycle():
current_stat = stat(config_file)
if current_stat.mtime != last_known_mtime:
reload_configuration()
last_known_mtime = current_stat.mtime

inotify (Linux) — Register for kernel-level file change notifications:

fd = inotify_add_watch(config_file, IN_MODIFY)
poll(fd) // blocks until file changes
reload_configuration()

For industrial edge gateways, stat() polling wins. Here's why:

  1. It's simpler. No file descriptor management, no edge cases with inotify watches being silently dropped.
  2. It works across filesystems. inotify doesn't work on NFS, CIFS, or some embedded filesystems. stat() works everywhere.
  3. The cost is negligible. A single stat() call takes ~1 microsecond. Even at 1 Hz, it's invisible.
  4. It naturally integrates with the main loop. Industrial gateways already run a polling loop for PLC reads. Adding a stat() check is one line.

Graceful Reload: The Teardown-Rebuild Cycle

When a configuration change is detected, the gateway must:

  1. Stop active PLC connections. For EtherNet/IP, destroy all tag handles. For Modbus, close the serial port or TCP connection.
  2. Free allocated memory. Tag definitions, batch buffers, connection contexts — all of it.
  3. Re-read and validate the new configuration.
  4. Re-detect the PLC and re-establish connections with the new tag map.
  5. Resume data collection with a forced initial read of all tags.

The critical detail is step 2. Industrial gateways often use a pool allocator instead of individual malloc/free calls. All configuration-related memory is allocated from a single large buffer. On reload, you simply reset the allocator's pointer to the beginning of the buffer:

// Pseudo-code: pool allocator reset
config_memory.write_pointer = config_memory.base_address
config_memory.used_bytes = 0
config_memory.free_bytes = config_memory.total_size

This eliminates the risk of memory leaks during reconfiguration. No matter how many times you reload, memory usage stays constant.

Multi-File Configuration

Production gateways often have multiple configuration files:

  • Daemon config — Network settings, serial port parameters, batch sizes, timeouts
  • Device configs — Per-device-type tag maps (one JSON file per machine model)
  • Connection config — MQTT broker address, TLS certificates, authentication tokens

Each file should be watched independently. If only the daemon config changes (e.g., someone adjusts the batch timeout), you don't need to re-detect the PLC — just update the runtime parameter. If a device config changes (e.g., someone adds a new tag), you need to rebuild the tag chain.

A practical approach: when the daemon config changes, set a flag to force a status report on the next MQTT cycle. When a device config changes, trigger a full teardown-rebuild of that device's tag chain.

Pattern 2: Connection Watchdogs

The most dangerous failure mode in MQTT-based telemetry is the silent disconnect. The TCP connection appears alive (no RST received), but the broker has stopped processing messages. The client's publish calls succeed (they're just writing to a local socket buffer), but data never reaches the cloud.

The MQTT Delivery Confirmation Watchdog

The robust solution uses MQTT QoS 1 delivery confirmations as a heartbeat:

// Track the timestamp of the last confirmed delivery
last_delivery_timestamp = 0

on_publish_confirmed(packet_id):
last_delivery_timestamp = now()

on_watchdog_check(): // runs every N seconds
if last_delivery_timestamp == 0:
return // no data sent yet, nothing to check

elapsed = now() - last_delivery_timestamp
if elapsed > WATCHDOG_TIMEOUT:
trigger_reconnect()

With MQTT QoS 1, the broker sends a PUBACK for every published message. If you haven't received a PUBACK in, say, 120 seconds, but you've been publishing data, something is wrong.

The key insight is that you're not watching the connection state — you're watching the delivery pipeline. A connection can appear healthy (no disconnect callback fired) while the delivery pipeline is stalled.

Reconnection Strategy: Async with Backoff

When the watchdog triggers, the reconnection must be:

  1. Asynchronous — Don't block the PLC polling loop. Data collection should continue even while MQTT is reconnecting. Collected data gets buffered locally.
  2. Non-destructive — The MQTT loop thread must be stopped before destroying the client. Stopping the loop with force=true ensures no callbacks fire during teardown.
  3. Complete — Disconnect, destroy the client, reinitialize the library, create a new client, set callbacks, start the loop, then connect. Half-measures (just calling reconnect) often leave stale state.

A dedicated reconnection thread works well:

reconnect_thread():
while true:
wait_for_signal() // semaphore blocks until watchdog triggers

log("Starting MQTT reconnection")
stop_mqtt_loop(force=true)
disconnect()
destroy_client()
cleanup_library()

// Re-initialize from scratch
init_library()
create_client(device_id)
set_credentials(username, password)
set_tls(certificate_path)
set_protocol(MQTT_3_1_1)
set_callbacks(on_connect, on_disconnect, on_message, on_publish)
start_loop()
set_reconnect_delay(5, 5, no_exponential)
connect_async(host, port, keepalive=60)

signal_complete() // release semaphore

Why a separate thread? The connect_async call can block for up to 60 seconds on DNS resolution or TCP handshake. If this runs on the main thread, PLC polling stops. Industrial processes don't wait for your network issues.

PLC Connection Watchdog

MQTT isn't the only connection that needs watching. PLC connections — both EtherNet/IP and Modbus TCP — can also fail silently.

For Modbus TCP, the watchdog logic is simpler because each read returns an explicit error code:

on_modbus_read_error(error_code):
if error_code in [ETIMEDOUT, ECONNRESET, ECONNREFUSED, EPIPE, EBADF]:
close_modbus_connection()
set_link_state(DOWN)
// Will reconnect on next polling cycle

For EtherNet/IP via libraries like libplctag, a return code of -32 (connection failed) should trigger:

  1. Setting the link state to DOWN
  2. Destroying the tag handles
  3. Attempting re-detection on the next cycle

A critical detail: track consecutive errors, not individual ones. A single timeout might be a transient hiccup. Three consecutive timeouts (error_count >= 3) indicate a real problem. Break the polling cycle early to avoid hammering a dead connection.

The gateway should treat the connection state itself as a telemetry point. When the PLC link goes up or down, immediately publish a link state tag — a boolean value with do_not_batch: true:

link_state_changed(device, new_state):
publish_immediately(
tag_id=LINK_STATE_TAG,
value=new_state, // true=up, false=down
timestamp=now()
)

This gives operators cloud-side visibility into gateway connectivity. A dashboard can show "Device offline since 2:47 AM" instead of just "no data" — which is ambiguous (was the device off, or was the gateway offline?).

Pattern 3: Store-and-Forward Buffering

When MQTT is disconnected, you can't just drop data. A production gateway needs a paged ring buffer that accumulates data during disconnections and drains it when connectivity returns.

Paged Buffer Architecture

The buffer divides a fixed-size memory region into pages of equal size:

Total buffer: 2 MB
Page size: ~4 KB (derived from max batch size)
Pages: ~500

Page states:
FREE → Available for writing
WORK → Currently being written to
USED → Full, queued for delivery

The lifecycle:

  1. Writing: Data is appended to the WORK page. When it's full, WORK moves to the USED queue, and a FREE page becomes the new WORK page.
  2. Sending: When MQTT is connected, the first USED page is sent. On PUBACK confirmation, the page moves to FREE.
  3. Overflow: If all pages are USED (buffer full, MQTT down for too long), the oldest USED page is recycled as the new WORK page. This loses the oldest data to preserve the newest — the right tradeoff for most industrial applications.

Thread safety is critical. The PLC polling thread writes to the buffer, the MQTT thread reads from it, and the PUBACK callback advances the read pointer. A mutex protects all buffer operations:

buffer_add_data(data, size):
lock(mutex)
append_to_work_page(data, size)
if work_page_full():
move_work_to_used()
try_send_next()
unlock(mutex)

on_puback(packet_id):
lock(mutex)
advance_read_pointer()
if page_fully_delivered():
move_page_to_free()
try_send_next()
unlock(mutex)

on_disconnect():
lock(mutex)
connected = false
packet_in_flight = false // reset delivery state
unlock(mutex)

Sizing the Buffer

Buffer sizing depends on your data rate and your maximum acceptable offline duration:

buffer_size = data_rate_bytes_per_second × max_offline_seconds

For a typical deployment:

  • 50 tags × 4 bytes average × 1 read/second = 200 bytes/second
  • With binary encoding overhead: ~300 bytes/second
  • Maximum offline duration: 2 hours (7,200 seconds)
  • Buffer needed: 300 × 7,200 = ~2.1 MB

A 2 MB buffer with 4 KB pages gives you ~500 pages — more than enough for 2 hours of offline operation.

The Minimum Three-Page Rule

The buffer needs at minimum 3 pages to function:

  1. One WORK page (currently being written to)
  2. One USED page (queued for delivery)
  3. One page in transition (being delivered, not yet confirmed)

If you can't fit 3 pages in the buffer, the page size is too large relative to the buffer. Validate this at initialization time and reject invalid configurations rather than failing at runtime.

Pattern 4: Periodic Forced Reads

Even with change-detection enabled (the compare flag), a production gateway should periodically force-read all tags and transmit their values regardless of whether they changed. This serves several purposes:

  1. Proof of life. Downstream systems can distinguish "the value hasn't changed" from "the gateway is dead."
  2. State synchronization. If the cloud-side database lost data (a rare but real scenario), periodic full-state updates resynchronize it.
  3. Clock drift correction. Over time, individual tag timers can drift. A periodic full reset realigns all tags.

A practical approach: reset all tags on the hour boundary. Check the system clock, and when the hour rolls over, clear all "previously read" flags. Every tag will be read and transmitted on its next polling cycle, regardless of change detection:

on_each_read_cycle():
current_hour = localtime(now()).hour
previous_hour = localtime(last_read_time).hour

if current_hour != previous_hour:
reset_all_tags() // clear read-once flags
log("Hourly forced read: all tags will be re-read")

This adds at most one extra transmission per tag per hour — a negligible bandwidth cost for significant reliability improvement.

Pattern 5: SAS Token and Certificate Expiry Monitoring

If your MQTT connection uses time-limited credentials (like Azure IoT Hub SAS tokens or short-lived TLS certificates), the gateway must monitor expiry and refresh proactively.

For SAS tokens, extract the se (expiry) parameter from the connection string and compare it against the current system time:

on_config_load(sas_token):
expiry_timestamp = extract_se_parameter(sas_token)

if current_time > expiry_timestamp:
log_warning("Token has expired!")
// Still attempt connection — the broker will reject it,
// but the error path will trigger a config reload
else:
time_remaining = expiry_timestamp - current_time
log("Token valid for %d hours", time_remaining / 3600)

Don't silently fail. If the token is expired, log a prominent warning. The gateway should still attempt to connect (the broker rejection will be informative), but operations teams need visibility into credential lifecycle.

For TLS certificates, monitor both the certificate file's modification time (has a new cert been deployed?) and the certificate's validity period (is it about to expire?).

How machineCDN Implements These Patterns

machineCDN's edge gateway — deployed on OpenWRT-based industrial routers in plastics manufacturing plants — implements all five patterns:

  • Configuration hot-reload using stat() polling on the main loop, with pool-allocated memory for zero-leak teardown/rebuild cycles
  • Dual watchdogs for MQTT delivery confirmation (120-second timeout) and PLC link state (3 consecutive errors trigger reconnection)
  • Paged ring buffer with 2 MB capacity, supporting both JSON and binary encoding, with automatic overflow handling that preserves newest data
  • Hourly forced reads that ensure complete state synchronization regardless of change detection
  • SAS token monitoring with proactive expiry warnings

These patterns enable 99.9%+ data capture rates even in plants with intermittent cellular connectivity — because the gateway collects data continuously and back-fills when connectivity returns.

Implementation Checklist

If you're building or evaluating an edge gateway for industrial IoT, verify that it supports:

CapabilityWhy It Matters
Config hot-reload without restartZero-downtime updates, no data gaps during reconfiguration
Pool-based memory allocationNo memory leaks across reload cycles
MQTT delivery watchdogDetects silent connection failures
Async reconnection threadPLC polling continues during MQTT recovery
Paged store-and-forward bufferPreserves data during network outages
Consecutive error thresholdsAvoids false-positive disconnections
Link state telemetryDistinguishes "offline gateway" from "idle machine"
Periodic forced readsState synchronization and proof-of-life
Credential expiry monitoringProactive certificate/token management

Conclusion

Reliability in industrial IoT isn't about preventing failures — it's about recovering from them automatically. Networks will drop. PLCs will reboot. Certificates will expire. The question is whether your edge gateway handles these events gracefully or silently loses data.

The patterns in this guide — hot-reload, watchdogs, store-and-forward, forced reads, and credential monitoring — are the difference between a gateway that works in the lab and one that works at 3 AM on a holiday weekend in a plant with spotty cellular coverage.

Build for the 3 AM scenario. Your operations team will thank you.

JSON-Based PLC Tag Configuration: Building Maintainable IIoT Device Templates [2026]

· 12 min read

If you've ever stared at a spreadsheet of 200 PLC register addresses trying to figure out which ones your SCADA system is actually polling, you know the pain. Traditional tag configuration — hardcoded in ladder logic comments, scattered across HMI screens, buried in proprietary configuration tools — doesn't scale.

The solution that's gaining traction in modern IIoT deployments is declarative, JSON-based tag configuration. Instead of configuring your data collection logic in opaque proprietary formats, you define your device's entire tag map as a structured JSON document. This approach brings version control, template reuse, and automated validation to the industrial data layer.

In this guide, we'll walk through the architecture of a production-grade JSON tag configuration system, drawing from real patterns used in industrial edge gateways connecting to Allen-Bradley Micro800 PLCs via EtherNet/IP and to various devices via Modbus RTU and TCP.

JSON-based PLC tag configuration for IIoT

Why JSON for PLC Tag Configuration?

The traditional approach to configuring PLC data collection involves vendor-specific tools: RSLinx for Allen-Bradley, TIA Portal for Siemens, or proprietary gateway configurators. These tools work, but they create several problems at scale:

  • No version control. You can't git diff a proprietary binary config file.
  • No templating. When you deploy the same machine type across 50 sites, you're manually recreating the same configuration 50 times.
  • No validation. Typos in register addresses don't surface until runtime.
  • No automation. You can't script the generation of configurations from a master device database.

JSON solves all of these. A tag configuration becomes a text file that can be:

  • Stored in Git with full change history
  • Templated per device type (one JSON per machine model)
  • Validated against a schema before deployment
  • Generated programmatically from engineering databases

Anatomy of a Tag Configuration Document

A well-structured PLC tag configuration document needs to capture several layers of information:

Device-Level Metadata

Every configuration file should identify the device type it applies to, carry a version string for change tracking, and specify the protocol:

{
"device_type": 1010,
"version": "a3f7b2c",
"name": "Continuous Blender Model X",
"protocol": "ethernet-ip",
"plctags": [ ... ]
}

The device_type field is a numeric identifier that maps to a specific machine model. When an edge gateway auto-detects a PLC (by reading a known register), it uses this type ID to look up the correct configuration file. The version field — ideally a short Git hash — lets you track which configuration version is running on each gateway in the field.

For Modbus devices, you'd also include protocol-specific parameters:

{
"device_type": 5000,
"version": "b8e1d4a",
"name": "Temperature Control Unit",
"protocol": "modbus-rtu",
"base_addr": 48,
"baud": 9600,
"parity": "even",
"data_bits": 8,
"stop_bits": 1,
"byte_timeout": 4,
"resp_timeout": 100,
"plctags": [ ... ]
}

Notice the serial link parameters are part of the same document. This is deliberate — you want a single source of truth for "how to talk to this device and what to read from it."

Tag Definitions: The Core Data Model

Each tag in the configuration represents a single data point you want to collect from the PLC. A complete tag definition captures:

{
"name": "barrel_zone1_temp",
"id": 42,
"type": "float",
"ecount": 2,
"sindex": 0,
"interval": 5,
"compare": true,
"do_not_batch": false
}

Let's break down each field:

name — A human-readable identifier for the tag. For EtherNet/IP (CIP) devices, this is the actual PLC tag name. For Modbus, it's a descriptive label since Modbus uses numeric addresses.

id — A numeric identifier used in the wire protocol when transmitting data to the cloud. Using compact integer IDs instead of string names dramatically reduces payload sizes — critical when you're sending telemetry over cellular connections.

type — The data type of the register value. Common types include:

TypeSizeRangeUse Case
bool1 byte0 or 1Alarm states, run/stop status
int81 byte-128 to 127Small counters, mode selectors
uint81 byte0 to 255Status codes, alarm bytes
int162 bytes-32,768 to 32,767Temperature (×10), pressure
uint162 bytes0 to 65,535RPM, flow rate, raw ADC values
int324 bytes±2.1 billionProduction counters, energy
uint324 bytes0 to 4.2 billionLifetime counters, timestamps
float4 bytesIEEE 754Temperature, weight, setpoints

ecount (element count) — How many consecutive elements to read. For a single register, this is 1. For a 32-bit float stored across two Modbus registers, this is 2. For an array of 10 temperature readings, this is 10.

sindex (start index) — The starting element index for array reads. Combined with ecount, this lets you read slices of PLC arrays without pulling the entire array.

interval — How often (in seconds) to poll this tag. This is where you make intelligent decisions about bandwidth:

  • 1 second: Critical alarms, emergency stops, safety interlocks
  • 5 seconds: Process temperatures, pressures, flows
  • 30 seconds: Setpoints, mode selectors (change infrequently)
  • 300 seconds: Configuration parameters, serial numbers

compare — When true, the gateway compares each new reading against the previous value and only transmits if the value changed. This is the single most impactful optimization for reducing bandwidth and cloud ingestion costs.

do_not_batch — When true, the value is transmitted immediately rather than being accumulated into a batch payload. Use this for critical alarms that need sub-second cloud visibility.

Modbus Address Conventions

For Modbus devices, each tag also carries an addr field that encodes both the register address and the function code:

{
"name": "process_temp",
"id": 10,
"addr": 400100,
"type": "float",
"ecount": 2,
"interval": 5,
"compare": true
}

The address convention follows a well-established pattern:

Address RangeModbus Function CodeRegister Type
0 – 65,536FC 01Coils (read/write)
100,000 – 165,536FC 02Discrete Inputs (read)
300,000 – 365,536FC 04Input Registers (read)
400,000 – 465,536FC 03Holding Registers (R/W)

So addr: 400100 means "holding register at address 100, read via function code 3." This convention eliminates ambiguity about which Modbus function to use — the address itself encodes it.

Why this matters: A common source of bugs in Modbus deployments is using the wrong function code. Someone configures a tag to read address 100 with FC 03 when the device exposes it as an input register (FC 04). With the address convention above, the function code is implicit and unambiguous.

Advanced Patterns: Calculated and Dependent Tags

Simple register reads cover 80% of use cases. But industrial devices often pack multiple boolean values into a single 16-bit alarm word, or have tags whose values only matter when a parent tag changes.

Calculated Tags: Extracting Bits from Alarm Words

Many PLCs pack 16 individual alarm flags into a single uint16 register. Rather than reading 16 separate coils, you read one register and extract the bits:

{
"name": "alarm_word_1",
"id": 50,
"addr": 400200,
"type": "uint16",
"ecount": 1,
"interval": 1,
"compare": true,
"calculated": [
{
"name": "high_temp_alarm",
"id": 51,
"type": "bool",
"shift": 0,
"mask": 1
},
{
"name": "low_pressure_alarm",
"id": 52,
"type": "bool",
"shift": 1,
"mask": 1
},
{
"name": "motor_overload",
"id": 53,
"type": "bool",
"shift": 2,
"mask": 1
}
]
}

When alarm_word_1 is read, the gateway automatically:

  1. Reads the raw uint16 value
  2. For each calculated tag, applies the right-shift and mask to extract the bit
  3. Compares the extracted boolean against its previous value
  4. Only transmits if the bit actually changed

This is vastly more efficient than polling 16 individual coils — one Modbus read instead of 16, with identical semantic output.

Dependent Tags: Event-Driven Secondary Reads

Some tags only need to be read when a related tag changes. For example, you might have a machine_state register that changes between IDLE, RUNNING, and FAULT. When it changes, you want to immediately read a block of diagnostic registers — but you don't want to poll those diagnostics every cycle when the machine state is stable.

{
"name": "machine_state",
"id": 100,
"addr": 400001,
"type": "uint16",
"ecount": 1,
"interval": 1,
"compare": true,
"dependents": [
{
"name": "fault_code",
"id": 101,
"addr": 400010,
"type": "uint16",
"ecount": 1,
"interval": 60
},
{
"name": "fault_timestamp",
"id": 102,
"addr": 400011,
"type": "uint32",
"ecount": 2,
"interval": 60
}
]
}

When machine_state changes, the gateway forces an immediate read of all dependent tags, regardless of their normal polling interval. This gives you:

  • Low latency on state transitions — fault diagnostics arrive within 1 second of the fault occurring
  • Low bandwidth during steady state — diagnostic registers are only polled every 60 seconds when nothing is happening

Contiguous Register Optimization

One of the most impactful optimizations in Modbus data collection is contiguous register grouping. Instead of making separate Modbus read requests for each tag, the gateway sorts tags by address and groups adjacent registers into single bulk reads.

Consider these tags:

[
{ "name": "temp_1", "addr": 400100, "ecount": 1 },
{ "name": "temp_2", "addr": 400101, "ecount": 1 },
{ "name": "temp_3", "addr": 400102, "ecount": 1 },
{ "name": "pressure", "addr": 400103, "ecount": 2 }
]

A naive implementation makes four separate Modbus requests. An optimized one makes one request: read 5 registers starting at address 400100. The response contains all four values, which are dispatched to the correct tag definitions.

For this optimization to work, the configuration system must:

  1. Sort tags by address at load time, not at runtime
  2. Validate that function codes match — you can't group a coil read (FC 01) with a holding register read (FC 03)
  3. Respect maximum packet sizes — Modbus TCP allows up to 125 registers per read; some devices are more restrictive
  4. Respect polling intervals — only group tags that share the same polling interval

The performance difference is dramatic. A typical PLC with 50 Modbus tags might require 50 individual reads (50 × ~10ms = 500ms per cycle) or 5 grouped reads (5 × ~10ms = 50ms per cycle). That's a 10× improvement in polling speed.

IEEE 754 Float Handling: The Register Order Problem

Reading 32-bit floating-point values over Modbus is notoriously tricky because the Modbus specification doesn't define register byte ordering for multi-register values. A float spans two 16-bit registers, and different PLCs may store them in different orders:

  • Big-endian (AB CD): Register N contains the high word, N+1 the low word
  • Little-endian (CD AB): Register N contains the low word, N+1 the high word
  • Mid-endian (BA DC or DC BA): Each word's bytes are swapped

Your tag configuration should support specifying the byte order, or at least document which convention your gateway assumes. Most libraries (libmodbus, for example) provide helper functions like modbus_get_float() that assume big-endian by default — but always verify against your specific PLC.

Pro tip: When commissioning a new device, read a register where you know the expected value (e.g., a temperature setpoint showing 72.0°F on the HMI). If the gateway reads 72.0, your byte order is correct. If it reads 2.388e-38 or 1.23e+12, you have a byte-order mismatch.

Binary vs. JSON Telemetry Encoding

Once you've collected your tag values, you need to transmit them. Your configuration should support both JSON and binary encoding, with the choice driven by bandwidth constraints:

JSON encoding is human-readable and debuggable:

{
"groups": [{
"ts": 1709500800,
"device_type": 1010,
"serial_number": 85432,
"values": [
{ "id": 42, "values": [72.3] },
{ "id": 43, "values": [true] }
]
}]
}

Binary encoding is 3-5× smaller. A typical binary frame packs:

  • 1-byte header marker
  • 4-byte group count
  • Per group: 4-byte timestamp, 2-byte device type, 4-byte serial number, 4-byte value count
  • Per value: 2-byte tag ID, 1-byte status, 1-byte value count, 1-byte value size, then raw value bytes

A batch that's 2,000 bytes in JSON might be 400 bytes in binary. Over a cellular connection billed per megabyte, that savings compounds fast.

Putting It All Together: Configuration Lifecycle

A production deployment follows this lifecycle:

  1. Template creation: For each machine model, create a JSON tag configuration. Store it in Git.
  2. Deployment: Push configurations to edge gateways via your device management platform. The gateway monitors the config file and reloads automatically when it changes.
  3. Auto-detection: When the gateway starts, it queries the PLC for its device type (a known register). It then matches the type to the correct configuration file.
  4. Validation: At load time, validate register addresses (no duplicates, valid ranges), data types, and interval values. Reject invalid configs before they cause runtime errors.
  5. Runtime: The gateway polls tags according to their configured intervals, applies change detection, groups contiguous registers, and batches values for transmission.

How machineCDN Handles Tag Configuration

machineCDN's edge gateway uses this exact pattern — JSON-based device templates that are automatically selected based on PLC auto-detection. Each machine type in a plastics manufacturing facility (blenders, dryers, granulators, chillers, TCUs) has its own configuration template with pre-mapped tags, optimized polling intervals, and calculated alarm decomposition.

When a new machine is connected, the gateway detects the PLC type, loads the matching template, and starts collecting data — typically in under 30 seconds with zero manual configuration. For plants running 20+ machines across 5 different models, this eliminates weeks of commissioning time.

Common Pitfalls

1. Overlapping addresses. Two tags pointing to the same register with different IDs will cause confusion in your data pipeline. Validate for uniqueness at load time.

2. Wrong element count for floats. A 32-bit float on Modbus requires ecount: 2 (two 16-bit registers). Setting ecount: 1 gives you garbage data.

3. Polling too fast on serial links. Modbus RTU over RS-485 at 9600 baud can handle roughly 10-15 register reads per second. If you configure 50 tags at 1-second intervals, you'll never keep up. Budget your polling rate against your link speed.

4. Missing change detection on high-volume tags. Without compare: true, every reading gets transmitted. For a tag polled every second, that's 86,400 data points per day — even if the value never changed.

5. Batch timeout too long. If your batch timeout is 60 seconds but an alarm fires, it won't reach the cloud for up to a minute unless that alarm tag has do_not_batch: true.

Conclusion

JSON-based tag configuration isn't just a nice-to-have — it's a fundamental enabler for scaling IIoT deployments. It brings software engineering best practices (version control, templating, validation, automation) to a domain that has traditionally relied on manual, vendor-specific tooling.

The key design principles are:

  • One file per device type with version tracking
  • Rich tag metadata covering data types, intervals, and delivery modes
  • Hierarchical relationships for calculated and dependent tags
  • Protocol-aware addressing that encodes function codes implicitly
  • Contiguous register grouping for optimal Modbus performance

Get this foundation right, and you'll spend your time analyzing machine data instead of debugging data collection.