Data Normalization in IIoT: Handling Register Formats, Byte Ordering, and Scaling Factors [2026]
Every IIoT engineer eventually faces the same rude awakening: you've got a perfectly good Modbus connection to a PLC, registers are responding, data is flowing — and every single value is wrong.
Not "connection refused" wrong. Not "timeout" wrong. The insidious kind of wrong where a temperature reading of 23.5°C shows up as 17,219, or a pressure value oscillates between astronomical numbers and zero for no apparent reason.
Welcome to the data normalization problem — the unsexy, unglamorous, absolutely critical layer between raw industrial registers and usable engineering data. Get it wrong, and your entire IIoT platform is built on garbage.
Why Raw Register Values Aren't What You Think
Industrial PLCs don't store data the way modern applications expect. There's no JSON, no typed fields, no self-describing schema. A Modbus register is 16 bits. That's it. What those 16 bits mean depends entirely on context that exists nowhere in the protocol itself.
The Type Problem
Consider a single 16-bit Modbus holding register at address 40001 containing the value 0x4248. What is this?
- Unsigned integer (uint16): 16,968
- Signed integer (int16): 16,968
- BCD-encoded: 4248 (literally the digits)
- Half of a 32-bit float: meaningless without the adjacent register
The Modbus protocol gives you zero indication which interpretation is correct. That information lives in the PLC programmer's documentation — if you're lucky enough to have it.
In practice, industrial data acquisition systems need to handle at least these data types from 16-bit register values:
| Type | Size | Register Count | Value Range |
|---|---|---|---|
| Boolean | 1 bit | 1 coil/discrete | 0 or 1 |
| INT8 / UINT8 | 8 bits | 1 register (masked) | -128..127 / 0..255 |
| INT16 / UINT16 | 16 bits | 1 register | -32,768..32,767 / 0..65,535 |
| INT32 / UINT32 | 32 bits | 2 registers | ±2.1 billion / 0..4.2 billion |
| FLOAT32 (IEEE 754) | 32 bits | 2 registers | ±3.4×10³⁸ |
The Multi-Register Problem
When a value spans two 16-bit registers, you'd think combining them would be straightforward: take the high word and the low word, shift and OR. But which register contains the high word?
This is where byte ordering — also called endianness — makes or breaks your data pipeline.
Byte Ordering: The Four Flavors of Pain
Modbus uses big-endian byte ordering within each register (MSB first). But when values span multiple registers, there's no standard for register ordering. Different PLC manufacturers chose differently, and now we all live with the consequences.
Big-Endian (AB CD) — "Modbus Standard"
The high-order register comes first. If registers 40001-40002 contain 0x4248 and 0x0000:
Register 40001: 0x42 0x48 (bytes A, B)
Register 40002: 0x00 0x00 (bytes C, D)
Combined: 0x42480000
As float: 50.0
This is the most common ordering and what the Modbus spec implies. Allen-Bradley, Schneider, and most modern PLCs use this.
Little-Endian (CD AB) — "Swapped Registers"
The low-order register comes first:
Register 40001: 0x00 0x00 (bytes C, D)
Register 40002: 0x42 0x48 (bytes A, B)
Combined: 0x42480000
As float: 50.0
Some older Siemens controllers and various Asian-manufactured PLCs use this ordering.
Mid-Big-Endian (BA DC) — "Byte-Swapped"
Bytes within each register are swapped:
Register 40001: 0x48 0x42 (bytes B, A)
Register 40002: 0x00 0x00 (bytes D, C)
Combined: 0x42480000
As float: 50.0
Mid-Little-Endian (DC BA) — "Everything Backwards"
Both registers and bytes within registers are swapped. Rare, but it exists in the wild.
A Practical Detection Method
When commissioning a new PLC, here's how to determine byte order without documentation:
- Find a register you know the approximate value of (a room temperature sensor reading ~22°C, for example)
- Read the two registers as raw hex
- Try all four byte orderings and see which produces a sensible float value
- If none produce sensible floats, the value might be a scaled integer — try
raw_value / 10orraw_value / 100
# Quick byte-order detection script
import struct
reg1 = 0x41B8 # Register N
reg2 = 0x0000 # Register N+1
# Try all four orderings
orders = {
'AB_CD': struct.pack('>HH', reg1, reg2),
'CD_AB': struct.pack('>HH', reg2, reg1),
'BA_DC': struct.pack('>HH',
((reg1 & 0xFF) << 8) | (reg1 >> 8),
((reg2 & 0xFF) << 8) | (reg2 >> 8)),
'DC_BA': struct.pack('>HH',
((reg2 & 0xFF) << 8) | (reg2 >> 8),
((reg1 & 0xFF) << 8) | (reg1 >> 8)),
}
for name, raw in orders.items():
val = struct.unpack('>f', raw)[0]
print(f"{name}: {val:.4f}")
IEEE 754 Floating Point: The Industrial Minefield
Most process values — temperatures, pressures, flow rates — are stored as IEEE 754 single-precision floats across two Modbus registers. The conversion itself is well-defined, but the edge cases will ruin your day.
The NaN and Infinity Problem
IEEE 754 reserves certain bit patterns for special values:
- NaN (Not a Number): Exponent = 0xFF, mantissa ≠ 0
- +Infinity: 0x7F800000
- -Infinity: 0xFF800000
- Denormalized numbers: Exponent = 0, mantissa ≠ 0
When a PLC sensor faults, some manufacturers write NaN to the register. Others write 0x7FFFFFFF. Others write -999.9. And some just freeze the last good value indefinitely.
Your data normalization layer must check for these cases:
import math
def safe_float_convert(raw_bytes):
value = struct.unpack('>f', raw_bytes)[0]
if math.isnan(value) or math.isinf(value):
return None, "SENSOR_FAULT"
# Sanity bounds for typical process values
if abs(value) > 1e10:
return None, "OUT_OF_RANGE"
return value, "OK"
The libmodbus modbus_get_float() Trap
If you're using libmodbus (the de facto C library for Modbus), be aware that modbus_get_float() uses a specific byte ordering that may not match your PLC. The library provides:
modbus_get_float_abcd()— big-endianmodbus_get_float_dcba()— little-endianmodbus_get_float_badc()— mid-bigmodbus_get_float_cdab()— mid-little
The default modbus_get_float() historically used CDAB ordering — which is not what most PLCs output. This has caused more silent data corruption in industrial IoT deployments than perhaps any other single bug. Always use the explicit variant that matches your PLC.
Modbus Function Codes and Register Addressing
Understanding which Modbus function code to use isn't just a protocol detail — it determines what kind of data you're reading and how to interpret it.
The Four Register Spaces
| Address Range | Function Code | Type | Access |
|---|---|---|---|
| 00001–09999 | FC 01 (Read Coils) | Discrete outputs | Read/Write |
| 10001–19999 | FC 02 (Read Discrete Inputs) | Discrete inputs | Read Only |
| 30001–39999 | FC 04 (Read Input Registers) | 16-bit analog inputs | Read Only |
| 40001–49999 | FC 03 (Read Holding Registers) | 16-bit general purpose | Read/Write |
The address prefix (1xxxx, 3xxxx, 4xxxx) directly maps to the function code. This means:
- Address 300800 → Input Register 800 → FC 04
- Address 400500 → Holding Register 500 → FC 03
- Address 100005 → Discrete Input 5 → FC 02
Getting the function code wrong doesn't always cause an error — some PLCs silently return data from the wrong register space, giving you valid-looking but completely incorrect values.
Optimizing Register Reads
Reading registers one at a time is expensive. A single Modbus TCP transaction takes ~10ms round-trip on a local network, and ~100ms over cellular. If you have 200 tags, that's 2-20 seconds per poll cycle.
The key optimization: batch contiguous registers into single reads. Modbus allows reading up to 125 registers (FC 03/04) or 2,000 coils (FC 01/02) in a single request.
The algorithm:
- Sort tags by register address
- Group contiguous addresses that share the same function code
- Limit each batch to ~50 registers (safe maximum across all controllers)
- Insert 50ms delays between batches to avoid overwhelming the PLC
Tags sorted by address:
[addr=0, FC1], [addr=1, FC1], [addr=2, FC1] → batch 1 (3 coils)
[addr=100000, FC2], [addr=100001, FC2] → batch 2 (2 discrete)
[addr=300000, FC4], [addr=300001, FC4] → batch 3 (2 input regs)
[addr=400500, FC3], [addr=400520, FC3] → two separate batches (gap)
This batching strategy can reduce poll times by 10-50x.
Scaling Factors and Engineering Units
Raw register values rarely represent engineering units directly. Common scaling patterns:
Fixed-Point Scaling
Many PLCs store fractional values as integers with an implied decimal point:
- Temperature:
raw / 10→ 235 means 23.5°C - Pressure:
raw / 100→ 14500 means 145.00 PSI - Percentage:
raw / 10→ 997 means 99.7%
Linear Scaling (k1/k2 factors)
Some values require multiplicative scaling: engineering_value = raw * k1 / k2
This is common with analog sensors where the PLC stores raw ADC counts:
- 4-20mA sensor spanning 0-100 PSI
- Raw range: 0-4095 (12-bit ADC)
- Engineering value:
raw * 100 / 4095
Bit Extraction from Word Registers
Often, a single 16-bit register encodes multiple boolean values as individual bits. To extract a specific alarm or status:
bit_value = (register_value >> shift_count) & mask
For example, extracting alarm bit 5 from a status word:
status_register = 0x0024 (binary: 0000 0000 0010 0100)
alarm_bit_5 = (0x0024 >> 5) & 0x01 → 1 (alarm active)
alarm_bit_3 = (0x0024 >> 3) & 0x01 → 0 (no alarm)
This is extremely common in industrial chillers, dryers, and blenders where a single status register contains 16 independent alarm flags.
Change Detection and Delivery Strategies
Not all data changes are equal. A temperature that shifts by 0.1°C doesn't need the same urgency as a compressor fault alarm.
Compare-Before-Send
For slowly-changing analog values, compare the new reading against the previous:
- Skip identical values: If temperature is still 23.5°C, don't transmit
- Deadband filtering: Only transmit if |new - old| > threshold
- Periodic forced reads: Even unchanged values should be sent every N minutes (typically 60) to confirm the sensor is still alive
Immediate vs. Batched Delivery
Critical values (alarms, run/stop status, link state) should be delivered immediately — bypassing any batch collection. Process values (temperatures, pressures, flow rates) can be batched into periodic payloads.
A typical delivery strategy:
| Data Type | Interval | Batched | Compare |
|---|---|---|---|
| Alarm flags | 1s poll | No (immediate) | Yes (change only) |
| Run/stop status | 1s poll | No (immediate) | Yes (change only) |
| Temperature | 60s poll | Yes | No (always send) |
| Energy consumption | 60s poll | Yes | Yes (change only) |
| Serial number | 3600s poll | Yes | Yes (change only) |
Common Pitfalls and How to Avoid Them
1. Silent Byte-Order Mismatch
Symptom: Values look "almost right" but fluctuate randomly. Cause: Wrong byte ordering makes the exponent bits of a float mix with mantissa bits. Fix: Read a known value (setpoint) and try all four byte orderings.
2. The 50ms Modbus Timing Rule
Symptom: Intermittent read errors, especially under load.
Cause: Many PLCs need ~50ms between consecutive Modbus requests.
Fix: Insert usleep(50000) between batch reads. Some RTU devices need even longer (100-200ms).
3. Connection Reset After Timeout
Symptom: ETIMEDOUT, then all subsequent reads fail. Cause: After a timeout, the TCP connection is in an indeterminate state. Fix: On ETIMEDOUT, ECONNRESET, ECONNREFUSED, EPIPE, or EBADF — close and reconnect immediately. Flush the Modbus buffers before retrying.
4. Stale Data After Sensor Failure
Symptom: Temperature "frozen" at last good value for hours. Cause: PLC continues returning the last valid reading even after sensor disconnect. Fix: Monitor read status codes. Cross-reference with the PLC's own sensor-failure alarm bits. If a sensor failure alarm is active, mark the associated temperature value as invalid regardless of what the register says.
5. Hourly Data Reset
Symptom: Dashboard shows periodic "spikes" at the top of every hour. Cause: Some polling systems reset their comparison baselines hourly, forcing a full re-read and re-delivery of all tags. Fix: This is actually intentional in well-designed systems — it ensures that even unchanged values are periodically confirmed. But your visualization layer needs to expect it and not treat it as real data changes.
How machineCDN Handles Data Normalization
machineCDN's edge infrastructure handles the normalization layer at the gateway level — before data ever reaches the cloud. The gateway daemon supports both EtherNet/IP and Modbus (TCP and RTU), with per-device configuration that specifies:
- Exact data type for every tag (bool, int8, uint8, int16, uint16, int32, uint32, float)
- Register addresses and element counts
- Poll intervals per tag (from 1 second to 1 hour)
- Change detection with compare-before-send
- Immediate delivery for critical alarms vs. batched delivery for process values
- Configurable scaling factors (k1/k2 linear transforms)
- Bit extraction from word registers with shift and mask parameters
By handling normalization at the edge, machineCDN ensures that only clean, typed, engineering-unit data reaches the cloud platform — eliminating an entire category of data quality issues that plague centralized architectures.
Conclusion
Data normalization is the unsung hero of industrial IoT. It's not glamorous work — there are no fancy dashboards or AI models involved. But every analytics insight, every predictive maintenance model, and every automated alert depends on getting this layer right.
The key principles:
- Know your byte order before you trust any multi-register value
- Validate float conversions — check for NaN, infinity, and out-of-range values
- Batch contiguous registers to minimize poll cycle times
- Differentiate delivery urgency — alarms immediately, process data in batches
- Cross-reference sensor status — don't trust a register value if the sensor failure alarm is active
Get normalization right, and everything downstream works. Get it wrong, and you'll spend months debugging phantom issues that were really just byte-order problems from day one.