Jayant Singh's HealthTech Blog: HL7 Analysis with NoSQL MongoDB

I would like to thank all who read my article on HL7 analysis with NoSQL MongoDB - 1 and provided their valuable inputs. This has helped me improve the schema design & execute more queries for HL7 message analysis.

Some of the important considerations during this schema design were

We will be inserting data once while loading messages for analysis
There will not be any updates
Schema should be designed to facilitate faster reads

Here is a snapshot of schema that I have created. I have provided some sample queries using Map Reduce & Aggregation at the end of this article.

Let's take same example HL7 message from previous post

 MSH|^~\&|ADT|ADI|ADT-1|ADI-1|20050215||ADT^A01|MSGADT003|T|2.4  
 EVN|A01|20031016000000  
 PID|1|111222333|H123123^^^^MR^ADT~111-222-333^^^^SS^ADT||John^Smith|GARSEN^^Melissa|19380818|M||2028-9|241 AVE^^Lake City^WA^98125^^^^100|100|(425)111-2222|(425)111-2222||S|CHR|1234567|111-222-333  
 NK1|2|GARSEN^Melissa  
 PV1|1|E|||||D123^Jeff^Carron|||MED||||7|||D123^Jeff^Taylor|E|3454|R^20050215|||||||||||||||||||EM|||||20050215  
 IN1|1|I123|ICOMP1|INS COMP 1|PO BOX 1^^Lake City^WA^98125||||||||||1|John^Smith|01|19380818  
 IN2|1||RETIRED  
 IN1|2|I456|ICOMP2|INS COMP 1|PO BOX 2^^Lake City^WA^98125||||||||||8|John^Smith|01|19380818  
 IN2|2||RETIRED

Here is a json representation of this sample HL7 message:

 {  
      "_id": "MSGADT003",  
      "Event": "A01",  
      "MsgDt": "20050215",  
      "Segments": [  
           {  
                "_id": "MSH",  
                "Rep": 1,  
                "Seq": 0,  
                "Val": "MSH|^~\\&|ADT|ADI|ADT-1|ADI-1|20050215||ADT^A01|MSGADT003|T|2.4",  
                "FC": 12,  
                "VF": 11,  
                "Fields": [  
                     {  
                          "_id": "MSH_1",  
                          "Val": "|"  
                     },  
                     {  
                          "_id": "MSH_2",  
                          "Val": "^~\\&"  
                     },  
                     {  
                          "_id": "MSH_3",  
                          "Val": "ADT"  
                     },  
                     {  
                          "_id": "MSH_4",  
                          "Val": "ADI"  
                     },  
                     {  
                          "_id": "MSH_5",  
                          "Val": "ADT-1"  
                     },  
                     {  
                          "_id": "MSH_6",  
                          "Val": "ADI-1"  
                     },  
                     {  
                          "_id": "MSH_7",  
                          "Val": "20050215"  
                     },  
                     {  
                          "_id": "MSH_9",  
                          "Val": "ADT^A01"  
                     },  
                     {  
                          "_id": "MSH_9_1",  
                          "Val": "ADT"  
                     },  
                     {  
                          "_id": "MSH_9_2",  
                          "Val": "A01"  
                     },  
                     {  
                          "_id": "MSH_10",  
                          "Val": "MSGADT003"  
                     },  
                     {  
                          "_id": "MSH_11",  
                          "Val": "T"  
                     },  
                     {  
                          "_id": "MSH_12",  
                          "Val": "2.4"  
                     }  
                ]  
           },  
           {  
                "_id": "EVN",  
                "Rep": 1,  
                "Seq": 1,  
                "Val": "EVN|A01|20031016000000",  
                "FC": 2,  
                "VF": 2,  
                "Fields": [  
                     {  
                          "_id": "EVN_1",  
                          "Val": "A01"  
                     },  
                     {  
                          "_id": "EVN_2",  
                          "Val": "20031016000000"  
                     }  
                ]  
           },  
           {  
                "_id": "PID",  
                "Rep": 1,  
                "Seq": 2,  
                "Val": "PID|1|111222333|H123123^^^^MR^ADT~111-222-333^^^^SS^ADT||John^Smith|GARSEN^^Melissa|19380818|M|||241 AVE^^Lake City^WA^98125||(425)111-2222|||S|CHR|1234567",  
                "FC": 18,  
                "VF": 12,  
                "Fields": [  
                     {  
                          "_id": "PID_1",  
                          "Val": "1"  
                     },  
                     {  
                          "_id": "PID_2",  
                          "Val": "111222333"  
                     },  
                     {  
                          "_id": "PID_3",  
                          "Val": "H123123^^^^MR^ADT~111-222-333^^^^SS^ADT",  
                          "Repetitions": [  
                               {  
                                    "_id": "PID_3",  
                                    "Val": "H123123^^^^MR^ADT",  
                                    "Rep": 1  
                               },  
                               {  
                                    "_id": "PID_3_1",  
                                    "Val": "H123123",  
                                    "Rep": 1  
                               },  
                               {  
                                    "_id": "PID_3_5",  
                                    "Val": "MR",  
                                    "Rep": 1  
                               },  
                               {  
                                    "_id": "PID_3_6",  
                                    "Val": "ADT",  
                                    "Rep": 1  
                               },  
                               {  
                                    "_id": "PID_3",  
                                    "Val": "111-222-333^^^^SS^ADT",  
                                    "Rep": 2  
                               },  
                               {  
                                    "_id": "PID_3_1",  
                                    "Val": "111-222-333",  
                                    "Rep": 2  
                               },  
                               {  
                                    "_id": "PID_3_5",  
                                    "Val": "SS",  
                                    "Rep": 2  
                               },  
                               {  
                                    "_id": "PID_3_6",  
                                    "Val": "ADT",  
                                    "Rep": 2  
                               }  
                          ]  
                     },  
                     {  
                          "_id": "PID_5",  
                          "Val": "John^Smith"  
                     },  
                     {  
                          "_id": "PID_5_1",  
                          "Val": "John"  
                     },  
                     {  
                          "_id": "PID_5_2",  
                          "Val": "Smith"  
                     },  
                     {  
                          "_id": "PID_6",  
                          "Val": "GARSEN^^Melissa"  
                     },  
                     {  
                          "_id": "PID_6_1",  
                          "Val": "GARSEN"  
                     },  
                     {  
                          "_id": "PID_6_3",  
                          "Val": "Melissa"  
                     },  
                     {  
                          "_id": "PID_7",  
                          "Val": "19380818"  
                     },  
                     {  
                          "_id": "PID_8",  
                          "Val": "M"  
                     },  
                     {  
                          "_id": "PID_11",  
                          "Val": "241 AVE^^Lake City^WA^98125"  
                     },  
                     {  
                          "_id": "PID_11_1",  
                          "Val": "241 AVE"  
                     },  
                     {  
                          "_id": "PID_11_3",  
                          "Val": "Lake City"  
                     },  
                     {  
                          "_id": "PID_11_4",  
                          "Val": "WA"  
                     },  
                     {  
                          "_id": "PID_11_5",  
                          "Val": "98125"  
                     },  
                     {  
                          "_id": "PID_13",  
                          "Val": "(425)111-2222"  
                     },  
                     {  
                          "_id": "PID_16",  
                          "Val": "S"  
                     },  
                     {  
                          "_id": "PID_17",  
                          "Val": "CHR"  
                     },  
                     {  
                          "_id": "PID_18",  
                          "Val": "1234567"  
                     }  
                ]  
           },  
           {  
                "_id": "PV1",  
                "Rep": 1,  
                "Seq": 4,  
                "Val": "PV1|1|E|||||D123^Jeff^Carron|||MED||||7|||D123^Jeff^Carron|E|3454|R^20050215|||||||||||||||||||EM|||||20050215",  
                "FC": 44,  
                "VF": 11,  
                "Fields": [  
                     {  
                          "_id": "PV1_1",  
                          "Val": "1"  
                     },  
                     {  
                          "_id": "PV1_2",  
                          "Val": "E"  
                     },  
                     {  
                          "_id": "PV1_7",  
                          "Val": "D123^Jeff^Carron"  
                     },  
                     {  
                          "_id": "PV1_7_1",  
                          "Val": "D123"  
                     },  
                     {  
                          "_id": "PV1_7_2",  
                          "Val": "Jeff"  
                     },  
                     {  
                          "_id": "PV1_7_3",  
                          "Val": "Carron"  
                     },  
                     {  
                          "_id": "PV1_10",  
                          "Val": "MED"  
                     },  
                     {  
                          "_id": "PV1_14",  
                          "Val": "7"  
                     },  
                     {  
                          "_id": "PV1_17",  
                          "Val": "D123^Jeff^Carron"  
                     },  
                     {  
                          "_id": "PV1_17_1",  
                          "Val": "D123"  
                     },  
                     {  
                          "_id": "PV1_17_2",  
                          "Val": "Jeff"  
                     },  
                     {  
                          "_id": "PV1_17_3",  
                          "Val": "Carron"  
                     },  
                     {  
                          "_id": "PV1_18",  
                          "Val": "E"  
                     },  
                     {  
                          "_id": "PV1_19",  
                          "Val": "3454"  
                     },  
                     {  
                          "_id": "PV1_20",  
                          "Val": "R^20050215"  
                     },  
                     {  
                          "_id": "PV1_20_1",  
                          "Val": "R"  
                     },  
                     {  
                          "_id": "PV1_20_2",  
                          "Val": "20050215"  
                     },  
                     {  
                          "_id": "PV1_39",  
                          "Val": "EM"  
                     },  
                     {  
                          "_id": "PV1_44",  
                          "Val": "20050215"  
                     }  
                ]  
           },  
           {  
                "_id": "IN1",  
                "Rep": 1,  
                "Seq": 5,  
                "Val": "IN1|1|I123|ICOMP1|INS COMP 1|PO BOX 1^^Lake City^WA^98125||||||||||1|John^Smith|01|19380818",  
                "FC": 18,  
                "VF": 9,  
                "Fields": [  
                     {  
                          "_id": "IN1_1",  
                          "Val": "1"  
                     },  
                     {  
                          "_id": "IN1_2",  
                          "Val": "I123"  
                     },  
                     {  
                          "_id": "IN1_3",  
                          "Val": "ICOMP1"  
                     },  
                     {  
                          "_id": "IN1_4",  
                          "Val": "INS COMP 1"  
                     },  
                     {  
                          "_id": "IN1_5",  
                          "Val": "PO BOX 1^^Lake City^WA^98125"  
                     },  
                     {  
                          "_id": "IN1_5_1",  
                          "Val": "PO BOX 1"  
                     },  
                     {  
                          "_id": "IN1_5_3",  
                          "Val": "Lake City"  
                     },  
                     {  
                          "_id": "IN1_5_4",  
                          "Val": "WA"  
                     },  
                     {  
                          "_id": "IN1_5_5",  
                          "Val": "98125"  
                     },  
                     {  
                          "_id": "IN1_15",  
                          "Val": "1"  
                     },  
                     {  
                          "_id": "IN1_16",  
                          "Val": "John^Smith"  
                     },  
                     {  
                          "_id": "IN1_16_1",  
                          "Val": "John"  
                     },  
                     {  
                          "_id": "IN1_16_2",  
                          "Val": "Smith"  
                     },  
                     {  
                          "_id": "IN1_17",  
                          "Val": "01"  
                     },  
                     {  
                          "_id": "IN1_18",  
                          "Val": "19380818"  
                     }  
                ]  
           },  
           {  
                "_id": "IN1",  
                "Rep": 2,  
                "Seq": 7,  
                "Val": "IN1|2|I456|ICOMP2|INS COMP 1|PO BOX 2^^Lake City^WA^98125||||||||||8|John^Smith|01|19380818",  
                "FC": 18,  
                "VF": 9,  
                "Fields": [  
                     {  
                          "_id": "IN1_1",  
                          "Val": "2"  
                     },  
                     {  
                          "_id": "IN1_2",  
                          "Val": "I456"  
                     },  
                     {  
                          "_id": "IN1_3",  
                          "Val": "ICOMP2"  
                     },  
                     {  
                          "_id": "IN1_4",  
                          "Val": "INS COMP 1"  
                     },  
                     {  
                          "_id": "IN1_5",  
                          "Val": "PO BOX 2^^Lake City^WA^98125"  
                     },  
                     {  
                          "_id": "IN1_5_1",  
                          "Val": "PO BOX 2"  
                     },  
                     {  
                          "_id": "IN1_5_3",  
                          "Val": "Lake City"  
                     },  
                     {  
                          "_id": "IN1_5_4",  
                          "Val": "WA"  
                     },  
                     {  
                          "_id": "IN1_5_5",  
                          "Val": "98125"  
                     },  
                     {  
                          "_id": "IN1_15",  
                          "Val": "8"  
                     },  
                     {  
                          "_id": "IN1_16",  
                          "Val": "John^Smith"  
                     },  
                     {  
                          "_id": "IN1_16_1",  
                          "Val": "John"  
                     },  
                     {  
                          "_id": "IN1_16_2",  
                          "Val": "Smith"  
                     },  
                     {  
                          "_id": "IN1_17",  
                          "Val": "01"  
                     },  
                     {  
                          "_id": "IN1_18",  
                          "Val": "19380818"  
                     }  
                ]  
           },  
           {  
                "_id": "IN2",  
                "Rep": 1,  
                "Seq": 6,  
                "Val": "IN2|1||RETIRED",  
                "FC": 3,  
                "VF": 2,  
                "Fields": [  
                     {  
                          "_id": "IN2_1",  
                          "Val": "1"  
                     },  
                     {  
                          "_id": "IN2_3",  
                          "Val": "RETIRED"  
                     }  
                ]  
           },  
           {  
                "_id": "IN2",  
                "Rep": 2,  
                "Seq": 8,  
                "Val": "IN2|2||RETIRED",  
                "FC": 3,  
                "VF": 2,  
                "Fields": [  
                     {  
                          "_id": "IN2_1",  
                          "Val": "2"  
                     },  
                     {  
                          "_id": "IN2_3",  
                          "Val": "RETIRED"  
                     }  
                ]  
           }  
      ]  
 }

Here is a snapshot of sample document from MongoVUE

Example: different Trigger Events with count using Map Reduce

 function Map() {  
   emit(this.Event, 1);  
 }  
 function Reduce(key, values) {  
   returnArray.sum(values);  
 }

Example: different Trigger Events with count using Aggregation

 { $project : { "Event" : 1 }},  
 { $group: { _id: "$Event", count: {$sum: 1} } }

Example: different Segments with count using Map Reduce

 function Map() {  
   this.Segments.forEach(function (s) {  
    emit(s._id, 1);  
   });  
 }  
 function Reduce(key, values) {  
   returnArray.sum(values);  
 }

Example: different Segments with count using Aggregation

 { $unwind: "$Segments" },  
 { $project : { "Segments._id": 1}},  
 { $group: { _id: "$Segments._id", count: {$sum: 1} } }

Example: distinct values for MSH_3 using Map Reduce

 function Map() {  
   this.Segments.forEach(function (s) {  
    if(s._id == "MSH") {  
      s.Fields.forEach(function (f) {  
       if(f._id == "MSH_3")  
        emit(f.Val, 1);  
      });  
    }  
   });  
 }  
 function Reduce(key, values) {  
 returnArray.sum(values);  
 }

Example: distinct values for MSH_3 using Aggregation

 { $unwind: "$Segments" },  
 { $unwind: "$Segments.Fields" },  
 { $match : { "Segments.Fields._id": "MSH_3"} },  
 { $group: { _id: "$Segments.Fields.Val" } }

Example: List of MSH_10 with PV1_2="E" using Aggregation

 { $unwind: "$Segments" },  
 { $unwind: "$Segments.Fields" },  
 { $match : { "Segments.Fields._id": "PV1_2", "Segments.Fields.Val": "E"}},  
 { $project : { "_id": 1}}

Here is the list of queries that I have executed successfully with this schema in MongoDB.

Total number of messages in sample
Distinct MSH_9_2 (Trigger Events) [with count]
1. This will tell you which Messages you should handle in your interface
Different Segments in all messages [with count]
1. This will tell you which segments you should handle in you interface
Different Fields (with values)
1. This will tell you which fields you need to process
2. This list contains only those fields for which we have received values or HL7 null in any of the message.
3. If the field is empty in all the messages in sample then it will not appear here in the list.
List of fields which HL7 null (“”) value [with count]
List of fields with components [with count]
Check if particular field has components
Check if particular field has specified leading character
List of fields with repetitions [with count]
Maximum number of repetitions received for each field
List of different Z Segments [with count]
Find messages with particular segment
Find messages without particular segment
Find messages with particular field. For example messages where PID_19 is present
Find messages with particular field & value. For example messages where PV1_2 = E
Find messages with particular field excluding given value. For example messages where PV1_2 != E
Find messages without particular field
Distinct values of particular field. For example different values for MSH_3
Maximum number of fields received in given segment
Total fields with values (excluding empty fields) in given segment

You can download python script to insert HL7 messages into MongoDB database from my github. I am working on other scripts
https://github.com/j4jayant/HL7-2-MongoDB-Analysis/blob/master/python/hl7-2-mongodb.py
https://github.com/j4jayant/HL7-2-MongoDB-Analysis/blob/master/python/readme.txt

I welcome any comments or criticism to help improve this.

Jayant Singh's HealthTech Blog

Monday, 8 July 2013

HL7 Analysis with NoSQL MongoDB - 2

Popular Posts

Search This Blog