A Complete Guide to Character Encoding in HL7 and Redox

Character encoding is one of the most arcane subjects a technologist will ever encounter, yet it underpins literally every piece of text we see on our computers. This article will focus on HL7v2 and the Redox API, and how they treat different encodings.

First things first—you should have some background on fundamental encoding before reading this article. I recommend this article by Kunstube as a comprehensive introduction. If you feel comfortable with what makes ASCII different from UTF-8, then continue reading.

Redox API always uses UTF-8

If you read the introductory article it should be clear why UTF-8 has taken over as the de-facto standard for the web—it’s space-efficient and can represent the entire unicode character space.

Similarly, when you receive something from us, it will be encoded using UTF-8.

If you send 我爱你❤️ into the API, it will show up that way in the Redox dashboard. If you try to send it to an EHR, weird stuff can start to happen.

Many products do not handle Unicode well

Many EHR vendors were founded (and developed much of their codebase) before Unicode was conceived, much less widely supported. If you remember trying to get webpages to display in the late 90’s by switching the encoding, many pieces of software from that time support character sets. If you’ve ever heard of Windows 1252, that’s an extra 128 characters that you get from the extra bit on top of ASCII.

If you have a desperate need to put unicode into an EHR, make sure you check with your Redox install team first. In some cases, only certain fields (like names) will support unicode, and the way we actually update it can vary depending on how well they have read the HL7 specs.

HL7v2 support for Unicode is a whole science unto itself

Most of the details of how to do encoding other than ASCII in HL7v2 is in Chapter 2 of the HL7v2 spec.

The process goes something like this:

The default is 7-bit ASCII
Delimiters must be in 7-bit ASCII (including carriage returns)
If MSH-18 is populated the first repetition denotes the default encoding for the message
If the first repetition is blank, the default remains 7-bit ASCII
Additional repetitions indicate other coding schemes that may be used in the message

So doing UTF-8 is as simple as putting “UNICODE UTF-8” in MSH-18, right?

Not quite—as is the case with most HL7v2 implementations, this is generally an oversight on the part of most implementers.

To complicate matters, those multiple different encodings—and EHRs who can vary encoding by field—are accommodated by using escape sequences. So you can mix Unicode, IR87, and more in one field and the parser is supposed to be able to process it.

How Redox handles this

Our HL7 parser is a pretty nifty piece of tech. We parse all messages out into JSON. Since we can do this step without having to worry about the encoding, we push the actual processing of each field down the pipeline if needed, and apply individual translations per field as described above.

In practice, we haven’t run many situations where content other than 7-bit ASCII is sent. In some rare cases though, a handful of tiny symbols can wreak havoc. The degree symbol ° and exponent symbols like ² and ³ have a nasty habit in showing up in units. The degree symbol, for example, is represented as B0 (176) in windows-1252, and F8 (248) in Code Page 437

Interestingly enough, the Unicode code point for ° is 2 bytes (00B0). In UTF-8, that’s C2B0, so if you interpreted the message as windows-1252, you’d get À °, └░ in 437, and the correct ° symbol in UTF-8. Conversely, if the message was in windows-1252, you would most likely get some kind of error because B0 by itself is not a valid UTF-8 sequence. Yikes.

Advice for designing applications

At this point, you can start to see how if we know what symbol was supposed to be sent, we can work our way backwards using a programmer’s calculator.

Working backwards is a lot of work, though, so if you’re designing an application that integrates with Redox, keep these things in mind.

Be able to send/receive UTF-8 when talking to Redox API
Make sure the guts of your application (database, external services, etc.) can handle that UTF-8
Use your eyeballs. As I mentioned above, if someone sending HL7 is not following the rules, bad encodings can be impossible to spot.

We’re keeping our eyeballs peeled too. Good luck and 慢走.

Data interoperability

Finding the future of interoperability with Redox: Part 2 – HL7 v2 to FHIR (and back again)

Brendan from our Solutions Engineering team explores the current state of HL7® v2 and HL7® FHIR®, and how Redox can drive a FHIR-based application experience even when data exchange partners prefer to use HL7 v2.

By Brendan Iglehart - March 29, 2024

Data interoperability

Finding the future of interoperability with Redox: Part 1 – Bulk FHIR

Brendan from our Solutions Engineering team gives an overview of the current state of bulk FHIR support among EHR vendors and outlines Redox’s capabilities for bulk data extracts via EHR FHIR APIs.

By Brendan Iglehart - March 14, 2024

Data interoperability

Crystal Ball Chronicles: Reflecting on Out of Pocket Health’s 2024 Predictions

Pryce Ancona, a Redox data engineer in the trenches, dissects Out of Pocket Health’s 2024 predictions. Did they foresee what healthcare leaders are demanding? Which strategies does the current healthcare economy reward? Plus, how are regulations shaping innovation?

By Pryce Ancona - February 1, 2024

Developer Tech Talks

Everything you wanted to know about character encoding in HL7 and Redox

November 15, 2017

Redox API always uses UTF-8

Many products do not handle Unicode well

HL7v2 support for Unicode is a whole science unto itself

How Redox handles this

Advice for designing applications

Finding the future of interoperability with Redox: Part 2 – HL7 v2 to FHIR (and back again)

Finding the future of interoperability with Redox: Part 1 – Bulk FHIR

Crystal Ball Chronicles: Reflecting on Out of Pocket Health’s 2024 Predictions

Products

Explore Redox

Audience

Topics

Media

Developer Tech Talks

Everything you wanted to know about character encoding in HL7 and Redox

November 15, 2017

Redox API always uses UTF-8

Many products do not handle Unicode well

HL7v2 support for Unicode is a whole science unto itself

How Redox handles this

Advice for designing applications

Stay in the know! Subscribe to our newsletter.

Related Posts

Finding the future of interoperability with Redox: Part 2 – HL7 v2 to FHIR (and back again)

Finding the future of interoperability with Redox: Part 1 – Bulk FHIR

Crystal Ball Chronicles: Reflecting on Out of Pocket Health’s 2024 Predictions