In this post, we are going to see some simple checksum functions that we can use in small processors like Arduino to verify data integrity.
When working with communication systems or stored data, there is a possibility that the data becomes corrupted. That is, one or more bytes in the sequence may change or even be lost completely.
We will often need to verify data integrity before executing actions. For example, the consequences of moving a robot based on incorrect orders received via wireless communication could be disastrous and even potentially dangerous.
For this, we can add a checksum, a small-sized value calculated from a larger block of data, whose purpose is to detect errors introduced into the sequence.
In a communication or storage system, the checksum is sent/received or stored along with the data. During processing, the checksum is calculated again and checked against the available checksum to verify data integrity.
There are multiple checksum functions, with different error detection rates and calculation requirements. On the other hand, the longer the checksum, the greater its reliability, but also the greater the additional communication/storage needs to store the checksum.
Finally, no system is infallible. Even if you sent the same message 100 times (which would be a kind of huge and inefficient “checksum”), you could not guarantee that the same error occurred in all 100 transmission batches. As almost always, it’s a matter of finding a compromise between reliability, length, and calculation load.
Let’s look at implementations of some of the main checksum functions that are light enough to run smoothly on a microprocessor like Arduino.
XOR Checksum
The XOR checksum, also called LCR (Longitudinal Redundancy Check), is widely used for its simplicity and calculation speed. It is calculated by processing the data sequence in fragments of N bits and performing the XOR operation on all fragments.
Usually, the fragments are chosen to be multiples of 8 because calculations are greatly accelerated, as processors are designed to perform actions on byte blocks.
Frequently, the checksum is returned negated. The reason is to differentiate a complete communication failure, which causes all received data to be 0, from a transmission of data consisting of 0’s.
The following code shows an 8-bit XOR checksum.
uint8_t XORChecksum8(const byte *data, size_t dataLength)
{
uint8_t value = 0;
for (size_t i = 0; i < dataLength; i++)
{
value ^= (uint8_t)data[i];
}
return ~value;
}
The code for a 16-bit XOR checksum is as follows.
uint16_t XORChecksum16(const byte *data, size_t dataLength)
{
uint16_t value = 0;
for (size_t i = 0; i < dataLength / 2; i++)
{
value ^= data[2*i] + (data[2*i+1] << 8);
}
if(dataLength%2) value ^= data[dataLength - 1];
return ~value;
}
As we said, the XOR checksum is simple and fast to execute, which is why it is very widespread. However, its error detection rate is quite limited.
Due to the operation of the XOR function, the XOR checksum actually behaves as a parity bit for each of the columns (bits) of the checksum. Therefore, it can only detect an odd number of bit changes in each of the columns. As a particular case, they are not capable of detecting the reception of two packets in the wrong order.
The average failure rate is 1/k, where K is the number of bits of the checksum, which means an average failure rate of 12.5% for 8 bits and 6.25% for 16 bits.
Additive Checksums
Another widely used family of checksums are the additive checksum families. Additive checksums also divide the data into fragments of N bits and perform the cumulative sum of the blocks.
Depending on how the sum of the received bytes is performed, we have different types of additive checksums.
The simplest case is to use the two’s complement sum, where we simply ignore the carries and let the checksum reset in case of overflow.
Here is an example of an 8-bit additive checksum with two’s complement.
uint8_t TwoComplementChecksum8(const byte *data, size_t dataLength)
{
uint8_t value = 0;
for (size_t i = 0; i < dataLength; i++)
{
value += (uint8_t)data[i];
}
return ~value;
}
And here is a 16-bit additive checksum with two’s complement.
uint16_t TwoComplementChecksum16(const byte *data, size_t dataLength)
{
uint16_t value = 0;
for (size_t i = 0; i < dataLength / 2; i++)
{
value += data[2*i] + (data[2*i+1] << 8);
}
if(dataLength%2) value += data[dataLength - 1];
return value;
}
This implementation is simple, but it implies a greater possibility of failure in the most significant bytes (MSB) because the carry information does not participate in the checksum calculation.
An alternative is to use the one’s complement sum as the sum function. In a processor with two’s complement calculation (most of them), one’s complement is simulated by using a larger checksum, and finally adding the carry bits to the final value until the carry disappears.
This slightly improves reliability, at the cost of a small increase in calculation cost.
uint8_t OneComplementChecksum8(const byte *data, size_t dataLength)
{
uint16_t value = 0;
for (size_t i = 0; i < dataLength; i++)
{
value += (uint8_t)data[i];
}
while (sum>>8)
sum = (sum & 0xF) + (sum >> 8);
return (uint8_t)~sum;
}
uint16_t OneComplementChecksum16(const byte *data, size_t dataLength)
{
uint32_t value = 0;
for (size_t i = 0; i < dataLength / 2; i++)
{
value += data[2*i] + (data[2*i+1] << 8);
}
if(dataLength%2) value += data[dataLength - 1];
while (sum>>16)
sum = (sum & 0xFF) + (sum >> 16);
return sum;
}
Additive checksums perform better than XOR because changes in bits affect other columns thanks to the carry bits, while maintaining high efficiency compared to the XOR case (in some cases identical).
The average failure rate for the checksum with one’s complement sum is half (1/2K) that of XOR with the same number of bits. Equivalently, on average, an additive checksum is as efficient as an XOR with twice the number of bits.
The reason is that the main cause of failure for the additive checksum is a simultaneous modification of two bits in the same column, whose value is different (0-1 1-0), while the XOR checksum fails in the simultaneous modification of two bits in the same column in all cases (0-0 0-1 1-0 1-1).
That is, the average failure rate for one’s complement is 6.25% for an 8-bit checksum, and 3.125% for a 16-bit one. In the case of two’s complement, it is slightly worse, approximately 7% for 8 bits and 3.3 for 16 bits.
However, they share a major weakness: they are not capable of detecting when two entire blocks arrive in the wrong order because the sum remains unchanged.
Finally, additive checksums perform worse when the distribution of 0’s and 1’s is uniform (which penalizes it in benchmarks with random numbers). The best performance is observed with data having a majority distribution of 0’s.
Fletcher16 Checksum
This verification algorithm was developed by John G. Fletcher (1934–2012) at Lawrence Livermore Labs in the 1970s.
The Fletcher checksum requires at least 16 bits and is formed by the concatenation of two summations. The first summation is simply the sum of the received bytes (with two’s or one’s complement). The second summation is the sum of the first summation at each step.
Here is a simple implementation of the Fletcher Checksum in 16 bits with two’s complement sum.
uint16_t ChecksumFletcher16(byte *data, size_t count )
{
uint8_t sum1 = 0;
uint8_t sum2 = 0;
for(size_t index = 0; index < count; ++index )
{
sum1 = sum1 + (uint8_t)data[index];
sum2 = sum2 + sum1;
}
return (sum2 << 8) | sum1;
}
And here is an implementation with one’s complement sum.
uint16_t ChecksumFletcher16(byte *data, size_t count )
{
uint16_t sum1 = 0;
uint16_t sum2 = 0;
for(size_t index = 0; index < count; ++index )
{
sum1 = sum1 + (uint8_t)data[index];
while (sum1>>8)
sum1 = (sum1 & 0xF) + (sum1 >> 8);
sum2 = sum2 + sum1;
while (sum2>>8)
sum2 = (sum2 & 0xF) + (sum2 >> 8);
}
return (sum2 << 8) | sum1;
}
The detection capability of the Fletcher function is significantly superior to the equivalent XOR or additive checksum with the same number of bits (several orders of magnitude).
Furthermore, the Fletcher checksum is capable of detecting complete exchanges of code fragments. The reason is that the second summation accumulates each fragment once it appears in the sequence, until the end of it, through the first summation.
If a fragment changes position in the data, the second summation is able to detect the error, even if the first one is identical.
CRC Checksum
CRC codes (Cyclic Redundancy Checks) are one of the most widespread checksums in the world of computing. Their calculation is based on the calculation of a series of polynomials whose formulation and order depend, among other things, on the number of bits of the checksum.
CRCs have the best detection rates among those presented here, but their calculation implies a higher load than the others. Although it is possible to speed it up using pre-calculated tables, the requirements are usually excessive to run smoothly on a small processor like Arduino.
Therefore, we will only see as an example the CRC8, based on the formulas distributed by the Dallas/Maxim company under the GNU GPL 3.0 license terms.
//CRC-8 - based on the CRC8 formulas by Dallas/Maxim
//code released under the therms of the GNU GPL 3.0 license
byte CRC8(const byte *data, size_t dataLength)
{
byte crc = 0x00;
while (dataLength--)
{
byte extract = *data++;
for (byte tempI = 8; tempI; tempI--)
{
byte sum = (crc ^ extract) & 0x01;
crc >>= 1;
if (sum)
{
crc ^= 0x8C;
}
extract >>= 1;
}
}
return crc;
}
CRCs have very good detection rates. The reason is the greater mixing of bits that the application of polynomials implies. In general, their behavior is comparable to 1/(2^k), a reference limit in checksum analysis (understanding comparable as being able to be measured against it, not that it reaches it).
Like Fletcher checksums, CRCs detect complete exchanges of fragments. In general, the behavior of a CRC is slightly superior to the equivalent Fletcher with the same number of bits.
The biggest problem is, as we mentioned, the greater calculation requirements. However, despite its simple formulation, the CRC8 checksum seen as an example has a very good failure detection capability, with moderate processor use.
There are several libraries for Arduino that allow calculating CRC16. However, given that the detection capability is superior, but not much superior to Fletcher16, my advice is to use the latter for its greater simplicity and calculation speed.
Download the Code
All the code from this post is available for download on Github.

