Skip to main content

Data Normalization in IIoT: Handling Register Formats, Byte Ordering, and Scaling Factors [2026]

· 11 min read
MachineCDN Team
Industrial IoT Experts

Every IIoT engineer eventually faces the same rude awakening: you've got a perfectly good Modbus connection to a PLC, registers are responding, data is flowing — and every single value is wrong.

Not "connection refused" wrong. Not "timeout" wrong. The insidious kind of wrong where a temperature reading of 23.5°C shows up as 17,219, or a pressure value oscillates between astronomical numbers and zero for no apparent reason.

Welcome to the data normalization problem — the unsexy, unglamorous, absolutely critical layer between raw industrial registers and usable engineering data. Get it wrong, and your entire IIoT platform is built on garbage.

Why Raw Register Values Aren't What You Think

Industrial PLCs don't store data the way modern applications expect. There's no JSON, no typed fields, no self-describing schema. A Modbus register is 16 bits. That's it. What those 16 bits mean depends entirely on context that exists nowhere in the protocol itself.

The Type Problem

Consider a single 16-bit Modbus holding register at address 40001 containing the value 0x4248. What is this?

  • Unsigned integer (uint16): 16,968
  • Signed integer (int16): 16,968
  • BCD-encoded: 4248 (literally the digits)
  • Half of a 32-bit float: meaningless without the adjacent register

The Modbus protocol gives you zero indication which interpretation is correct. That information lives in the PLC programmer's documentation — if you're lucky enough to have it.

In practice, industrial data acquisition systems need to handle at least these data types from 16-bit register values:

TypeSizeRegister CountValue Range
Boolean1 bit1 coil/discrete0 or 1
INT8 / UINT88 bits1 register (masked)-128..127 / 0..255
INT16 / UINT1616 bits1 register-32,768..32,767 / 0..65,535
INT32 / UINT3232 bits2 registers±2.1 billion / 0..4.2 billion
FLOAT32 (IEEE 754)32 bits2 registers±3.4×10³⁸

The Multi-Register Problem

When a value spans two 16-bit registers, you'd think combining them would be straightforward: take the high word and the low word, shift and OR. But which register contains the high word?

This is where byte ordering — also called endianness — makes or breaks your data pipeline.

Byte Ordering: The Four Flavors of Pain

Modbus uses big-endian byte ordering within each register (MSB first). But when values span multiple registers, there's no standard for register ordering. Different PLC manufacturers chose differently, and now we all live with the consequences.

Big-Endian (AB CD) — "Modbus Standard"

The high-order register comes first. If registers 40001-40002 contain 0x4248 and 0x0000:

Register 40001: 0x42  0x48  (bytes A, B)
Register 40002: 0x00 0x00 (bytes C, D)
Combined: 0x42480000
As float: 50.0

This is the most common ordering and what the Modbus spec implies. Allen-Bradley, Schneider, and most modern PLCs use this.

Little-Endian (CD AB) — "Swapped Registers"

The low-order register comes first:

Register 40001: 0x00  0x00  (bytes C, D)
Register 40002: 0x42 0x48 (bytes A, B)
Combined: 0x42480000
As float: 50.0

Some older Siemens controllers and various Asian-manufactured PLCs use this ordering.

Mid-Big-Endian (BA DC) — "Byte-Swapped"

Bytes within each register are swapped:

Register 40001: 0x48  0x42  (bytes B, A)
Register 40002: 0x00 0x00 (bytes D, C)
Combined: 0x42480000
As float: 50.0

Mid-Little-Endian (DC BA) — "Everything Backwards"

Both registers and bytes within registers are swapped. Rare, but it exists in the wild.

A Practical Detection Method

When commissioning a new PLC, here's how to determine byte order without documentation:

  1. Find a register you know the approximate value of (a room temperature sensor reading ~22°C, for example)
  2. Read the two registers as raw hex
  3. Try all four byte orderings and see which produces a sensible float value
  4. If none produce sensible floats, the value might be a scaled integer — try raw_value / 10 or raw_value / 100
# Quick byte-order detection script
import struct

reg1 = 0x41B8 # Register N
reg2 = 0x0000 # Register N+1

# Try all four orderings
orders = {
'AB_CD': struct.pack('>HH', reg1, reg2),
'CD_AB': struct.pack('>HH', reg2, reg1),
'BA_DC': struct.pack('>HH',
((reg1 & 0xFF) << 8) | (reg1 >> 8),
((reg2 & 0xFF) << 8) | (reg2 >> 8)),
'DC_BA': struct.pack('>HH',
((reg2 & 0xFF) << 8) | (reg2 >> 8),
((reg1 & 0xFF) << 8) | (reg1 >> 8)),
}

for name, raw in orders.items():
val = struct.unpack('>f', raw)[0]
print(f"{name}: {val:.4f}")

IEEE 754 Floating Point: The Industrial Minefield

Most process values — temperatures, pressures, flow rates — are stored as IEEE 754 single-precision floats across two Modbus registers. The conversion itself is well-defined, but the edge cases will ruin your day.

The NaN and Infinity Problem

IEEE 754 reserves certain bit patterns for special values:

  • NaN (Not a Number): Exponent = 0xFF, mantissa ≠ 0
  • +Infinity: 0x7F800000
  • -Infinity: 0xFF800000
  • Denormalized numbers: Exponent = 0, mantissa ≠ 0

When a PLC sensor faults, some manufacturers write NaN to the register. Others write 0x7FFFFFFF. Others write -999.9. And some just freeze the last good value indefinitely.

Your data normalization layer must check for these cases:

import math

def safe_float_convert(raw_bytes):
value = struct.unpack('>f', raw_bytes)[0]

if math.isnan(value) or math.isinf(value):
return None, "SENSOR_FAULT"

# Sanity bounds for typical process values
if abs(value) > 1e10:
return None, "OUT_OF_RANGE"

return value, "OK"

The libmodbus modbus_get_float() Trap

If you're using libmodbus (the de facto C library for Modbus), be aware that modbus_get_float() uses a specific byte ordering that may not match your PLC. The library provides:

  • modbus_get_float_abcd() — big-endian
  • modbus_get_float_dcba() — little-endian
  • modbus_get_float_badc() — mid-big
  • modbus_get_float_cdab() — mid-little

The default modbus_get_float() historically used CDAB ordering — which is not what most PLCs output. This has caused more silent data corruption in industrial IoT deployments than perhaps any other single bug. Always use the explicit variant that matches your PLC.

Modbus Function Codes and Register Addressing

Understanding which Modbus function code to use isn't just a protocol detail — it determines what kind of data you're reading and how to interpret it.

The Four Register Spaces

Address RangeFunction CodeTypeAccess
00001–09999FC 01 (Read Coils)Discrete outputsRead/Write
10001–19999FC 02 (Read Discrete Inputs)Discrete inputsRead Only
30001–39999FC 04 (Read Input Registers)16-bit analog inputsRead Only
40001–49999FC 03 (Read Holding Registers)16-bit general purposeRead/Write

The address prefix (1xxxx, 3xxxx, 4xxxx) directly maps to the function code. This means:

  • Address 300800 → Input Register 800 → FC 04
  • Address 400500 → Holding Register 500 → FC 03
  • Address 100005 → Discrete Input 5 → FC 02

Getting the function code wrong doesn't always cause an error — some PLCs silently return data from the wrong register space, giving you valid-looking but completely incorrect values.

Optimizing Register Reads

Reading registers one at a time is expensive. A single Modbus TCP transaction takes ~10ms round-trip on a local network, and ~100ms over cellular. If you have 200 tags, that's 2-20 seconds per poll cycle.

The key optimization: batch contiguous registers into single reads. Modbus allows reading up to 125 registers (FC 03/04) or 2,000 coils (FC 01/02) in a single request.

The algorithm:

  1. Sort tags by register address
  2. Group contiguous addresses that share the same function code
  3. Limit each batch to ~50 registers (safe maximum across all controllers)
  4. Insert 50ms delays between batches to avoid overwhelming the PLC
Tags sorted by address:
[addr=0, FC1], [addr=1, FC1], [addr=2, FC1] → batch 1 (3 coils)
[addr=100000, FC2], [addr=100001, FC2] → batch 2 (2 discrete)
[addr=300000, FC4], [addr=300001, FC4] → batch 3 (2 input regs)
[addr=400500, FC3], [addr=400520, FC3] → two separate batches (gap)

This batching strategy can reduce poll times by 10-50x.

Scaling Factors and Engineering Units

Raw register values rarely represent engineering units directly. Common scaling patterns:

Fixed-Point Scaling

Many PLCs store fractional values as integers with an implied decimal point:

  • Temperature: raw / 10 → 235 means 23.5°C
  • Pressure: raw / 100 → 14500 means 145.00 PSI
  • Percentage: raw / 10 → 997 means 99.7%

Linear Scaling (k1/k2 factors)

Some values require multiplicative scaling: engineering_value = raw * k1 / k2

This is common with analog sensors where the PLC stores raw ADC counts:

  • 4-20mA sensor spanning 0-100 PSI
  • Raw range: 0-4095 (12-bit ADC)
  • Engineering value: raw * 100 / 4095

Bit Extraction from Word Registers

Often, a single 16-bit register encodes multiple boolean values as individual bits. To extract a specific alarm or status:

bit_value = (register_value >> shift_count) & mask

For example, extracting alarm bit 5 from a status word:

status_register = 0x0024  (binary: 0000 0000 0010 0100)
alarm_bit_5 = (0x0024 >> 5) & 0x01 → 1 (alarm active)
alarm_bit_3 = (0x0024 >> 3) & 0x01 → 0 (no alarm)

This is extremely common in industrial chillers, dryers, and blenders where a single status register contains 16 independent alarm flags.

Change Detection and Delivery Strategies

Not all data changes are equal. A temperature that shifts by 0.1°C doesn't need the same urgency as a compressor fault alarm.

Compare-Before-Send

For slowly-changing analog values, compare the new reading against the previous:

  • Skip identical values: If temperature is still 23.5°C, don't transmit
  • Deadband filtering: Only transmit if |new - old| > threshold
  • Periodic forced reads: Even unchanged values should be sent every N minutes (typically 60) to confirm the sensor is still alive

Immediate vs. Batched Delivery

Critical values (alarms, run/stop status, link state) should be delivered immediately — bypassing any batch collection. Process values (temperatures, pressures, flow rates) can be batched into periodic payloads.

A typical delivery strategy:

Data TypeIntervalBatchedCompare
Alarm flags1s pollNo (immediate)Yes (change only)
Run/stop status1s pollNo (immediate)Yes (change only)
Temperature60s pollYesNo (always send)
Energy consumption60s pollYesYes (change only)
Serial number3600s pollYesYes (change only)

Common Pitfalls and How to Avoid Them

1. Silent Byte-Order Mismatch

Symptom: Values look "almost right" but fluctuate randomly. Cause: Wrong byte ordering makes the exponent bits of a float mix with mantissa bits. Fix: Read a known value (setpoint) and try all four byte orderings.

2. The 50ms Modbus Timing Rule

Symptom: Intermittent read errors, especially under load. Cause: Many PLCs need ~50ms between consecutive Modbus requests. Fix: Insert usleep(50000) between batch reads. Some RTU devices need even longer (100-200ms).

3. Connection Reset After Timeout

Symptom: ETIMEDOUT, then all subsequent reads fail. Cause: After a timeout, the TCP connection is in an indeterminate state. Fix: On ETIMEDOUT, ECONNRESET, ECONNREFUSED, EPIPE, or EBADF — close and reconnect immediately. Flush the Modbus buffers before retrying.

4. Stale Data After Sensor Failure

Symptom: Temperature "frozen" at last good value for hours. Cause: PLC continues returning the last valid reading even after sensor disconnect. Fix: Monitor read status codes. Cross-reference with the PLC's own sensor-failure alarm bits. If a sensor failure alarm is active, mark the associated temperature value as invalid regardless of what the register says.

5. Hourly Data Reset

Symptom: Dashboard shows periodic "spikes" at the top of every hour. Cause: Some polling systems reset their comparison baselines hourly, forcing a full re-read and re-delivery of all tags. Fix: This is actually intentional in well-designed systems — it ensures that even unchanged values are periodically confirmed. But your visualization layer needs to expect it and not treat it as real data changes.

How machineCDN Handles Data Normalization

machineCDN's edge infrastructure handles the normalization layer at the gateway level — before data ever reaches the cloud. The gateway daemon supports both EtherNet/IP and Modbus (TCP and RTU), with per-device configuration that specifies:

  • Exact data type for every tag (bool, int8, uint8, int16, uint16, int32, uint32, float)
  • Register addresses and element counts
  • Poll intervals per tag (from 1 second to 1 hour)
  • Change detection with compare-before-send
  • Immediate delivery for critical alarms vs. batched delivery for process values
  • Configurable scaling factors (k1/k2 linear transforms)
  • Bit extraction from word registers with shift and mask parameters

By handling normalization at the edge, machineCDN ensures that only clean, typed, engineering-unit data reaches the cloud platform — eliminating an entire category of data quality issues that plague centralized architectures.

Conclusion

Data normalization is the unsung hero of industrial IoT. It's not glamorous work — there are no fancy dashboards or AI models involved. But every analytics insight, every predictive maintenance model, and every automated alert depends on getting this layer right.

The key principles:

  1. Know your byte order before you trust any multi-register value
  2. Validate float conversions — check for NaN, infinity, and out-of-range values
  3. Batch contiguous registers to minimize poll cycle times
  4. Differentiate delivery urgency — alarms immediately, process data in batches
  5. Cross-reference sensor status — don't trust a register value if the sensor failure alarm is active

Get normalization right, and everything downstream works. Get it wrong, and you'll spend months debugging phantom issues that were really just byte-order problems from day one.