Skip to main content

Data Normalization in IIoT: Handling Register Formats, Byte Ordering, and Scaling Factors

· 13 min read
MachineCDN Team
Industrial IoT Experts

Data Normalization in IIoT

You've successfully polled your PLC. Registers are coming back as arrays of 16-bit unsigned integers. Your Modbus transaction completed without error. Now what?

The raw register values sitting in your receive buffer are useless until you transform them into meaningful engineering units — degrees Celsius, PSI, gallons per minute, kilowatt-hours. This transformation is where a shocking number of IIoT deployments break down, producing subtly wrong data that goes unnoticed for weeks until someone realizes the chiller outlet temperature has been reading 16,384°F.

This guide covers the real-world data normalization challenges you'll face when connecting to industrial equipment, and the strategies that actually work at scale.

The Fundamental Problem: Registers Are Just Numbers

Industrial protocols like Modbus operate on fixed-width registers. Modbus registers are 16 bits wide, giving you a range of 0–65,535 for unsigned values or -32,768 to 32,767 for signed values. EtherNet/IP is more flexible — you request data by tag name and specify the element size (1, 2, or 4 bytes) — but you still receive raw bytes that need interpretation.

The same raw value 0x4248 means completely different things depending on context:

  • As a uint16: 16,968
  • As an int16: 16,968
  • As the high word of an IEEE 754 float (paired with 0x0000): 50.0
  • As a temperature in tenths of a degree: 1,696.8°C (obviously wrong for a chiller)
  • As a temperature in hundredths of a degree: 169.68°C (still wrong)

Getting the interpretation right requires knowing three things for every data point:

  1. The data type (bool, int8, uint8, int16, uint16, int32, uint32, float32)
  2. The word/byte order (big-endian, little-endian, or the dreaded mid-endian)
  3. The scaling factor (raw value × scale = engineering unit)

Data Types in Industrial Registers

Boolean Values (1-bit)

Modbus coils (FC 01) and discrete inputs (FC 02) are inherently boolean. But many PLCs also pack boolean status flags into 16-bit holding or input registers, using individual bits to represent different states.

For example, a 16-bit status word at register 200 might encode:

  • Bit 0: Compressor A running
  • Bit 1: Compressor B running
  • Bit 2: Pump active
  • Bit 3: Alarm present
  • Bits 4–7: Operating mode (4-bit enum)
  • Bits 8–15: Reserved

Extracting individual bits requires shift-and-mask operations:

raw_value = read_register(200)  # Returns uint16

compressor_a = (raw_value >> 0) & 0x01 # Bit 0
compressor_b = (raw_value >> 1) & 0x01 # Bit 1
pump_active = (raw_value >> 2) & 0x01 # Bit 2
alarm = (raw_value >> 3) & 0x01 # Bit 3
op_mode = (raw_value >> 4) & 0x0F # Bits 4-7 (4-bit value)

This is an extremely common pattern in industrial PLCs. The PLC programmer packs multiple boolean and small-integer values into a single register to minimize the register map footprint. Your edge gateway needs to understand these bit-field definitions to extract meaningful individual signals.

Important subtlety: When you're doing change-on-value comparison for efficient data delivery, you need to compare the parent register (the full 16-bit word), then recalculate all derived bit-field values whenever the parent changes. Don't compare individual bits independently — you'll miss transitions that happen simultaneously in a single scan cycle.

16-bit Integers (int16 / uint16)

The most common register type in Modbus. One register = one value. Simple and reliable.

However, watch out for signed vs. unsigned interpretation:

  • Register value 0xFFFF as uint16 = 65,535
  • Register value 0xFFFF as int16 = -1

Most temperature sensors, pressure transducers, and flow meters deliver signed 16-bit values because they can go negative (temperatures below zero, differential pressures). If your normalization layer treats these as unsigned, a -5°C reading shows up as 65,531 — a classic data corruption bug that passes automated range checks because it's technically within the uint16 range.

32-bit Values from 16-bit Registers (The Hard Part)

This is where things get genuinely tricky. A 32-bit value — whether integer or float — occupies two consecutive 16-bit Modbus registers. But which register contains the high word and which contains the low word?

There are four possible byte orderings for a 32-bit value stored in two 16-bit registers:

OrderingRegister NRegister N+1Also Called
Big-endian (AB CD)High wordLow word"Motorola order"
Little-endian (CD AB)Low wordHigh word"Intel order"
Mid-endian big (BA DC)High word (byte-swapped)Low word (byte-swapped)Rare
Mid-endian little (DC BA)Low word (byte-swapped)High word (byte-swapped)Rare

Most industrial equipment uses one of the first two, but you cannot assume — even within the same manufacturer's product line. We've seen cases where Company X's chiller uses big-endian word order and the same company's dryer uses little-endian.

Reconstructing 32-bit Integers

For a uint32 stored in big-endian word order:

register_n   = 0x0001   # High word
register_n1 = 0x86A0 # Low word

value = (register_n << 16) | register_n1
# value = 0x000186A0 = 100,000

For little-endian word order (same raw data, different result):

register_n   = 0x0001   # Low word
register_n1 = 0x86A0 # High word

value = (register_n1 << 16) | register_n
# value = 0x86A00001 = 2,258,632,705 ← COMPLETELY WRONG

Same registers, same raw data, difference of six orders of magnitude. This is why byte ordering is the #1 source of "the data looks plausible but is subtly wrong" bugs in IIoT systems.

Reconstructing IEEE 754 Floats

Floating-point values are even more treacherous because the wrong byte order doesn't produce an obviously incorrect value — it produces a different float that might look plausible.

An IEEE 754 32-bit float consists of:

  • 1 sign bit
  • 8 exponent bits
  • 23 mantissa bits

When stored across two Modbus registers, the correct reconstruction depends entirely on the device's word order:

# Big-endian word order (most common in industrial PLCs)
register_n = 0x4248 # High word
register_n1 = 0x0000 # Low word

# Combine to 32-bit: 0x42480000
# Sign: 0, Exponent: 0x84 (132-127=5), Mantissa: 0x480000
# Value = 1.5625 × 2^5 = 50.0 ✓

# Little-endian word order (same registers)
# Combine to 32-bit: 0x00004248
# Sign: 0, Exponent: 0x00 (denormalized), Mantissa: 0x004248
# Value = 2.39e-38 ← astronomically wrong but not obviously impossible

Practical tip: The fastest way to determine a device's word order is to read a known physical value. If the chiller setpoint is displayed as 50°F on the HMI and the registers for that setpoint read [0x4248, 0x0000], you know it's big-endian. If you see [0x0000, 0x4248], it's little-endian. Use a known reference point; don't guess.

Most PLCs based on legacy architectures use big-endian word order. The modbus_get_float() function in the libmodbus library assumes standard (big-endian) word order. If your device uses little-endian, you need to swap the register order before calling any standard float conversion function.

Scaling Factors and Engineering Units

Raw register values rarely correspond directly to engineering units. Common scaling patterns:

Fixed-Point Scaling

Many PLCs store temperature as an integer with an implicit decimal point:

  • Raw value 505 = 50.5°F (÷10)
  • Raw value 2350 = 23.50 PSI (÷100)
  • Raw value 1234 = 12.34 GPM (÷100)

The scaling factor is defined in the device's register map documentation, and it varies by register. Don't assume all temperatures on the same device use the same scale — we've seen devices where supply temperature uses ÷10 and return temperature uses ÷100.

Linear Transformation

Some analog inputs apply a linear transformation:

engineering_value = (raw_value × k1) + k2

Where k1 is the scale factor and k2 is the offset. This is common for pressure transducers and load cells that have a non-zero offset.

Unit Conversion

Industrial equipment in the US often reports in imperial units (°F, PSI, GPM), while European equipment uses metric (°C, bar, L/min). Your normalization layer should standardize to a single unit system — ideally the one your operators actually use. A unit flag register (common in many PLCs) tells you whether the device is configured for imperial or metric, so your conversion logic can adapt automatically.

Change-on-Value Comparison: Optimizing Data Volume

In a typical industrial installation, most register values don't change most of the time. A chiller operating at steady state might show the same outlet temperature (±0.1°) for hours. Sending that identical value to the cloud every second is wasteful.

The efficient approach is compare-on-change: read the register, compare it to the last known value, and only transmit when the value actually changes.

Implementation Considerations

  1. Compare at the raw register level, not the scaled value. Raw integer comparison is fast and avoids floating-point precision issues. Two float values that look identical after scaling might differ at the bit level due to rounding.

  2. Store the previous raw value per tag. Each tag in your configuration needs a slot for its last-known value and a flag indicating whether it's been read at least once.

  3. Always transmit on first read. When the edge agent starts up or reconnects, every tag should be treated as "changed" — you have no baseline for comparison.

  4. Force periodic full reads. Even with change-detection enabled, force a complete read and transmit of all tags at a regular interval — hourly is a good default. This guards against:

    • Slow drift that never crosses a per-read threshold
    • Lost change events due to network outages
    • Clock drift between the edge and the cloud
  5. Handle error state transitions. A tag transitioning from "read OK" to "read error" is a significant event — transmit it immediately, don't batch it with regular telemetry. Similarly, transitioning back from error to OK should trigger an immediate value delivery.

Dependent Tag Reads

Some tag values only make sense in the context of a parent tag's state. For example, a blender's per-hopper feed rate values are only meaningful when the blender's "running" status bit is true. When the running state changes, you may want to immediately force-read all dependent tags to capture the new steady-state values.

This parent-child relationship between tags creates a tree structure:

  • Parent tag changes → force-read all children
  • Child value change → include parent context in the telemetry batch

Designing this dependency graph correctly reduces data volume by 60–80% compared to naive periodic polling, while ensuring no state transitions are missed.

Batching and Framing for Upstream Delivery

Once your values are normalized, they need to be packaged for delivery to the cloud. The two main approaches:

JSON Batching

Human-readable, easy to debug, universally supported:

{
"device_type": 1017,
"serial": 16842753,
"timestamp": 1740700800,
"groups": [
{
"ts": 1740700800,
"values": [
{"id": 80, "type": "int16", "value": 505},
{"id": 81, "type": "int16", "value": 498},
{"id": 82, "type": "int16", "value": 2350}
]
}
]
}

Overhead is significant — tag names, punctuation, and formatting typically add 3–5× to the raw data size. For bandwidth-constrained cellular connections, this adds up.

Binary Batching

Pack values as raw bytes with a compact header:

[device_type: 2B][serial: 4B][timestamp: 4B]
[group_count: 2B]
[group_timestamp: 4B][value_count: 2B]
[tag_id: 2B][status: 1B][type: 1B][value: 1-4B]
[tag_id: 2B][status: 1B][type: 1B][value: 1-4B]
...

5–10× more compact than JSON. Critical for deployments where the edge device connects via cellular (4G/LTE) with metered data plans — common in remote manufacturing sites.

Buffering for Disconnects

Your upstream connection (MQTT to cloud) will drop. It always does — cellular coverage gaps, broker maintenance, network congestion. A production-grade edge agent needs a ring buffer that:

  1. Accumulates batches locally when the connection is down
  2. Transmits oldest-first when the connection recovers
  3. Uses MQTT QoS 1 (at-least-once delivery) with publish acknowledgment tracking
  4. Drops the oldest data when the buffer fills — stale data is less valuable than fresh data
  5. Splits data into fixed-size pages to prevent memory fragmentation

The buffer size should accommodate at least 24 hours of data at your normal telemetry rate. For a typical 100-tag device polling every 60 seconds with change-on-value filtering, that's roughly 2–5 MB.

The Full Normalization Pipeline

Putting it all together, here's the complete data path from physical sensor to cloud:

Physical Sensor

PLC (analog-to-digital conversion, internal scaling)

Protocol Layer (Modbus TCP/RTU or EtherNet/IP read)

Raw Register Values (16-bit unsigned integers)

Type Interpretation (bool, int16, uint16, int32, uint32, float32)

Word/Byte Order Correction (swap registers if needed for 32-bit types)

Bit-Field Extraction (shift + mask for packed boolean/enum values)

Scaling (raw × k1 + k2, or raw ÷ scale_factor)

Change-on-Value Comparison (skip if unchanged)

Batching (group changed values with timestamp)

Serialization (JSON or binary framing)

Output Buffer (ring buffer with page management)

MQTT Publish (QoS 1, with delivery acknowledgment)

Cloud Platform (time-series storage, dashboards, alerts)

Each step introduces potential failure modes. The most insidious bugs happen in the middle stages — type interpretation and byte ordering — because they produce data that looks numerically valid but is physically meaningless.

Practical Debugging Techniques

When your data doesn't look right:

  1. Start with a known physical value. Read the setpoint or current temperature directly from the machine's HMI. Then compare what your gateway reports. If they differ, the normalization is wrong — work backward through the pipeline.

  2. Log raw register values. Before any transformation, log the hex values of every register. 0x4248 0x0000 is much easier to diagnose than "50.0" when the question is "why does this sometimes read 2.39e-38?"

  3. Check data types against the register map. If the vendor says register 300 is int16 and you're reading it as uint16, values above 32,767 will appear as large positive numbers instead of negative. Temperatures below zero are the classic trigger for this bug.

  4. Verify word order with float test values. Read a float register with a known value. Convert the raw registers to float in both big-endian and little-endian word order. Exactly one will match the physical value — that's your device's word order, and it applies to all float/uint32/int32 registers on that device (usually).

  5. Watch for register misalignment. If you're reading two registers for a float but you're off by one register, you're combining half of one float with half of another. The result is a valid IEEE 754 float — just a meaningless one.

How machineCDN Handles This

machineCDN's edge agent ships with pre-built register map configurations for dozens of industrial equipment types — chillers, blenders, dryers, granulators, feeders, and temperature control units. Each configuration defines:

  • Every tag's data type, register address, and element count
  • Read intervals per tag (1-second for critical alarms, 60-second for temperatures)
  • Change-on-value comparison flags
  • Batch vs. immediate delivery preferences
  • Dependent tag relationships

When the agent connects to a device, it reads a device-type identifier register, matches it to the correct configuration, and begins polling with the right types, byte orders, and scaling factors — no manual register map entry required.

For devices we haven't pre-configured, the platform supports uploading custom register maps in JSON format that follow the same schema. Once configured, the full normalization pipeline runs on the edge, and only clean, correctly-typed engineering values reach the cloud.

If you're tired of debugging byte-order issues at 2 AM on a plant floor, let us handle the protocol complexity.


This article is part of our Industrial Protocols Deep Dive series. Previously: Modbus TCP vs RTU: A Practical Guide. Up next: Protocol Bridging — Translating Between Modbus and MQTT at the Edge.