Jayant Singh Website - Healthcare IT Blog

HL7 Analysis with NoSQL MongoDB - 2

I would like to thank all who read my article on HL7 analysis with NoSQL MongoDB - 1 and provided their valuable inputs. This has helped me improve the schema design & execute more queries for HL7 message analysis.

Some of the important considerations during this schema design were

  • We will be inserting data once while loading messages for analysis
  • There will not be any updates
  • Schema should be designed to facilitate faster reads

Here is a snapshot of schema that I have created. I have provided some sample queries using Map Reduce & Aggregation at the end of this article.

HL7_MongoDB_Schema


Let's take same example HL7 message from previous post

MSH|^~\&|ADT|ADI|ADT-1|ADI-1|20050215||ADT^A01|MSGADT003|T|2.4
EVN|A01|20031016000000
PID|1|111222333|H123123^^^^MR^ADT~111-222-333^^^^SS^ADT||John^Smith|GARSEN^^Melissa|19380818|M||2028-9|241 AVE^^Lake City^WA^98125^^^^100|100|(425)111-2222|(425)111-2222||S|CHR|1234567|111-222-333
NK1|2|GARSEN^Melissa
PV1|1|E|||||D123^Jeff^Carron|||MED||||7|||D123^Jeff^Taylor|E|3454|R^20050215|||||||||||||||||||EM|||||20050215
IN1|1|I123|ICOMP1|INS COMP 1|PO BOX 1^^Lake City^WA^98125||||||||||1|John^Smith|01|19380818
IN2|1||RETIRED
IN1|2|I456|ICOMP2|INS COMP 1|PO BOX 2^^Lake City^WA^98125||||||||||8|John^Smith|01|19380818
IN2|2||RETIRED

Here is a json representation of this sample HL7 message:

{
"_id" : "MSGADT003",
"Event" : "A01",
"MsgDt" : "20050215",
 
"Segments" : [
{
"_id" : "MSH",
"Rep" : 1,
"Seq" : 0,
"Val" : "MSH|^~\\&|ADT|ADI|ADT-1|ADI-1|20050215||ADT^A01|MSGADT003|T|2.4",
"FC" : 12,
"VF" : 11,
"Fields" : [
{"_id" : "MSH_1", "Val" : "|" },
{"_id" : "MSH_2", "Val" : "^~\\&" },
{"_id" : "MSH_3", "Val" : "ADT" },
{"_id" : "MSH_4", "Val" : "ADI" },
{"_id" : "MSH_5", "Val" : "ADT-1" },
{"_id" : "MSH_6", "Val" : "ADI-1" },
{"_id" : "MSH_7", "Val" : "20050215" },
{"_id" : "MSH_9", "Val" : "ADT^A01" },
{"_id" : "MSH_9_1", "Val" : "ADT" },
{"_id" : "MSH_9_2", "Val" : "A01" },
{"_id" : "MSH_10", "Val" : "MSGADT003" },
{"_id" : "MSH_11", "Val" : "T" },
{"_id" : "MSH_12", "Val" : "2.4" } ]
},
{
"_id" : "EVN",
"Rep" : 1,
"Seq" : 1,
"Val" : "EVN|A01|20031016000000",
"FC" : 2,
"VF" : 2,
"Fields" : [
{"_id" : "EVN_1", "Val" : "A01" },
{"_id" : "EVN_2", "Val" : "20031016000000" } ] 
},
{
"_id" : "PID",
"Rep" : 1,
"Seq" : 2,
"Val" : "PID|1|111222333|H123123^^^^MR^ADT~111-222-333^^^^SS^ADT||John^Smith|GARSEN^^Melissa|19380818|M|||241 AVE^^Lake City^WA^98125||(425)111-2222|||S|CHR|1234567",
"FC" : 18,
"VF" : 12,
"Fields" : [
{"_id" : "PID_1", "Val" : "1" },
{"_id" : "PID_2", "Val" : "111222333" },
{"_id" : "PID_3", "Val" : "H123123^^^^MR^ADT~111-222-333^^^^SS^ADT",
"Repetitions" : [
{"_id" : "PID_3", "Val" : "H123123^^^^MR^ADT", "Rep" : 1 },
{"_id" : "PID_3_1", "Val" : "H123123", "Rep" : 1 },
{"_id" : "PID_3_5", "Val" : "MR","Rep" : 1 },
{"_id" : "PID_3_6", "Val" : "ADT", "Rep" : 1 },
{"_id" : "PID_3", "Val" : "111-222-333^^^^SS^ADT", "Rep" : 2 },
{"_id" : "PID_3_1", "Val" : "111-222-333", "Rep" : 2 },
{"_id" : "PID_3_5", "Val" : "SS", "Rep" : 2 },
{"_id" : "PID_3_6", "Val" : "ADT", "Rep" : 2 }] 
},
{"_id" : "PID_5", "Val" : "John^Smith" },
{"_id" : "PID_5_1", "Val" : "John" },
{"_id" : "PID_5_2", "Val" : "Smith" },
{"_id" : "PID_6", "Val" : "GARSEN^^Melissa" },
{"_id" : "PID_6_1", "Val" : "GARSEN" },
{"_id" : "PID_6_3", "Val" : "Melissa" },
{"_id" : "PID_7", "Val" : "19380818" },
{"_id" : "PID_8", "Val" : "M" },
{"_id" : "PID_11", "Val" : "241 AVE^^Lake City^WA^98125" },
{"_id" : "PID_11_1", "Val" : "241 AVE" },
{"_id" : "PID_11_3", "Val" : "Lake City" },
{"_id" : "PID_11_4", "Val" : "WA" },
{"_id" : "PID_11_5", "Val" : "98125" },
{"_id" : "PID_13", "Val" : "(425)111-2222" },
{"_id" : "PID_16", "Val" : "S" },
{"_id" : "PID_17", "Val" : "CHR" },
{"_id" : "PID_18", "Val" : "1234567" }] 
},
{
"_id" : "PV1",
"Rep" : 1,
"Seq" : 4,
"Val" : "PV1|1|E|||||D123^Jeff^Carron|||MED||||7|||D123^Jeff^Carron|E|3454|R^20050215|||||||||||||||||||EM|||||20050215",
"FC" : 44,
"VF" : 11,
"Fields" : [
{"_id" : "PV1_1", "Val" : "1" },
{"_id" : "PV1_2", "Val" : "E" },
{"_id" : "PV1_7", "Val" : "D123^Jeff^Carron" },
{"_id" : "PV1_7_1", "Val" : "D123" },
{"_id" : "PV1_7_2", "Val" : "Jeff" },
{"_id" : "PV1_7_3", "Val" : "Carron" },
{"_id" : "PV1_10", "Val" : "MED" },
{"_id" : "PV1_14", "Val" : "7" },
{"_id" : "PV1_17", "Val" : "D123^Jeff^Carron" },
{"_id" : "PV1_17_1", "Val" : "D123" },
{"_id" : "PV1_17_2", "Val" : "Jeff" },
{"_id" : "PV1_17_3", "Val" : "Carron" },
{"_id" : "PV1_18", "Val" : "E" },
{"_id" : "PV1_19", "Val" : "3454" },
{"_id" : "PV1_20", "Val" : "R^20050215" },
{"_id" : "PV1_20_1", "Val" : "R" },
{"_id" : "PV1_20_2", "Val" : "20050215" },
{"_id" : "PV1_39", "Val" : "EM" },
{"_id" : "PV1_44", "Val" : "20050215" }] 
},
{
"_id" : "IN1",
"Rep" : 1,
"Seq" : 5,
"Val" : "IN1|1|I123|ICOMP1|INS COMP 1|PO BOX 1^^Lake City^WA^98125||||||||||1|John^Smith|01|19380818",
"FC" : 18,
"VF" : 9,
"Fields" : [
{"_id" : "IN1_1", "Val" : "1" },
{"_id" : "IN1_2", "Val" : "I123" },
{"_id" : "IN1_3", "Val" : "ICOMP1" },
{"_id" : "IN1_4", "Val" : "INS COMP 1" },
{"_id" : "IN1_5", "Val" : "PO BOX 1^^Lake City^WA^98125" },
{"_id" : "IN1_5_1", "Val" : "PO BOX 1" },
{"_id" : "IN1_5_3", "Val" : "Lake City" },
{"_id" : "IN1_5_4", "Val" : "WA" },
{"_id" : "IN1_5_5", "Val" : "98125" },
{"_id" : "IN1_15", "Val" : "1" },
{"_id" : "IN1_16", "Val" : "John^Smith" },
{"_id" : "IN1_16_1", "Val" : "John" },
{"_id" : "IN1_16_2", "Val" : "Smith" },
{"_id" : "IN1_17", "Val" : "01" },
{"_id" : "IN1_18", "Val" : "19380818" }] 
},
{
"_id" : "IN1",
"Rep" : 2,
"Seq" : 7,
"Val" : "IN1|2|I456|ICOMP2|INS COMP 1|PO BOX 2^^Lake City^WA^98125||||||||||8|John^Smith|01|19380818",
"FC" : 18,
"VF" : 9,
"Fields" : [
{"_id" : "IN1_1", "Val" : "2" },
{"_id" : "IN1_2", "Val" : "I456" },
{"_id" : "IN1_3", "Val" : "ICOMP2" },
{"_id" : "IN1_4", "Val" : "INS COMP 1" },
{"_id" : "IN1_5", "Val" : "PO BOX 2^^Lake City^WA^98125" },
{"_id" : "IN1_5_1", "Val" : "PO BOX 2" },
{"_id" : "IN1_5_3", "Val" : "Lake City" },
{"_id" : "IN1_5_4", "Val" : "WA" },
{"_id" : "IN1_5_5", "Val" : "98125" },
{"_id" : "IN1_15", "Val" : "8" },
{"_id" : "IN1_16", "Val" : "John^Smith" },
{"_id" : "IN1_16_1", "Val" : "John" },
{"_id" : "IN1_16_2", "Val" : "Smith" },
{"_id" : "IN1_17", "Val" : "01" },
{"_id" : "IN1_18", "Val" : "19380818" }] 
},
{
"_id" : "IN2",
"Rep" : 1,
"Seq" : 6,
"Val" : "IN2|1||RETIRED",
"FC" : 3,
"VF" : 2,
"Fields" : [
{"_id" : "IN2_1", "Val" : "1" },
{"_id" : "IN2_3", "Val" : "RETIRED" }] 
},
{
"_id" : "IN2",
"Rep" : 2,
"Seq" : 8,
"Val" : "IN2|2||RETIRED",
"FC" : 3,
"VF" : 2,
"Fields" : [
{"_id" : "IN2_1", "Val" : "2" },
{"_id" : "IN2_3", "Val" : "RETIRED" }] 
}] 
}

Here is a snapshot of sample document from MongoVUE

HL7_MongoVUE_Snap


Example: different Trigger Events with count using Map Reduce

function Map() {
   emit(this.Event, 1);
}

function Reduce(key, values) {
   returnArray.sum(values);
}

Example: different Trigger Events with count using Aggregation

{ $project : { "Event" : 1 }},
{ $group: { _id: "$Event", count: {$sum: 1} } }

Example: different Segments with count using Map Reduce

function Map() {
   this.Segments.forEach(function (s) {
      emit(s._id, 1);
   });
}

function Reduce(key, values) {
   returnArray.sum(values);
}

Example: different Segments with count using Aggregation

{ $unwind: "$Segments" },
{ $project : { "Segments._id": 1}},
{ $group: { _id: "$Segments._id", count: {$sum: 1} } }

Example: distinct values for MSH_3 using Map Reduce

function Map() {
   this.Segments.forEach(function (s) {
      if(s._id == "MSH") {
         s.Fields.forEach(function (f) {
            if(f._id == "MSH_3")
              emit(f.Val, 1);
         });
     }
   });
}

function Reduce(key, values) {
returnArray.sum(values);
}

Example: distinct values for MSH_3 using Aggregation

{ $unwind: "$Segments" },
{ $unwind: "$Segments.Fields" },
{ $match : { "Segments.Fields._id": "MSH_3"} },
{ $group: { _id: "$Segments.Fields.Val" } }

Example: List of MSH_10 with PV1_2="E" using Aggregation

{ $unwind: "$Segments" },
{ $unwind: "$Segments.Fields" },
{ $match : { "Segments.Fields._id": "PV1_2", "Segments.Fields.Val": "E"}},
{ $project : { "_id": 1}}

Here is the list of queries that I have executed successfully with this schema in MongoDB.

  1. Total number of messages in sample
  2. Distinct MSH_9_2 (Trigger Events) [with count]
    1. This will tell you which Messages you should handle in your interface
  3. Different Segments in all messages [with count]
    1. This will tell you which segments you should handle in you interface
  4. Different Fields (with values)
    1. This will tell you which fields you need to process
    2. This list contains only those fields for which we have received values or HL7 null in any of the message.
    3. If the field is empty in all the messages in sample then it will not appear here in the list.
  5. List of fields which HL7 null (“”) value [with count]
  6. List of fields with components  [with count]
  7. Check if particular field has components
  8. Check if particular field has specified leading character
  9. List of fields with repetitions  [with count]
  10. Maximum number of repetitions received for each field
  11. List of different Z Segments  [with count]
  12. Find messages with particular segment
  13. Find messages without particular segment
  14. Find messages with particular field. For example messages where PID_19 is present
  15. Find messages with particular field & value. For example messages where PV1_2 = E
  16. Find messages with particular field excluding given value. For example messages where PV1_2 != E
  17. Find messages without particular field
  18. Distinct values of particular field. For example different values for MSH_3
  19. Maximum number of fields received in given segment
  20. Total fields with values (excluding empty fields) in given segment

You can download python script to insert HL7 messages into MongoDB database from my github. I am working on other scripts

https://github.com/j4jayant/HL7-2-MongoDB-Analysis/blob/master/python/hl7-2-mongodb.py

https://github.com/j4jayant/HL7-2-MongoDB-Analysis/blob/master/python/readme.txt

 

I welcome any comments or criticism to help improve this.