I would like to thank all who read my article on HL7 analysis with NoSQL MongoDB - 1 and provided their valuable inputs. This has helped me improve the schema design & execute more queries for HL7 message analysis.
Some of the important considerations during this schema design were
- We will be inserting data once while loading messages for analysis
- There will not be any updates
- Schema should be designed to facilitate faster reads
Here is a snapshot of schema that I have created. I have provided some sample queries using Map Reduce & Aggregation at the end of this article.
Let's take same example HL7 message from previous post
MSH|^~\&|ADT|ADI|ADT-1|ADI-1|20050215||ADT^A01|MSGADT003|T|2.4
EVN|A01|20031016000000
PID|1|111222333|H123123^^^^MR^ADT~111-222-333^^^^SS^ADT||John^Smith|GARSEN^^Melissa|19380818|M||2028-9|241 AVE^^Lake City^WA^98125^^^^100|100|(425)111-2222|(425)111-2222||S|CHR|1234567|111-222-333
NK1|2|GARSEN^Melissa
PV1|1|E|||||D123^Jeff^Carron|||MED||||7|||D123^Jeff^Taylor|E|3454|R^20050215|||||||||||||||||||EM|||||20050215
IN1|1|I123|ICOMP1|INS COMP 1|PO BOX 1^^Lake City^WA^98125||||||||||1|John^Smith|01|19380818
IN2|1||RETIRED
IN1|2|I456|ICOMP2|INS COMP 1|PO BOX 2^^Lake City^WA^98125||||||||||8|John^Smith|01|19380818
IN2|2||RETIRED
Here is a json representation of this sample HL7 message:
{
"_id": "MSGADT003",
"Event": "A01",
"MsgDt": "20050215",
"Segments": [
{
"_id": "MSH",
"Rep": 1,
"Seq": 0,
"Val": "MSH|^~\\&|ADT|ADI|ADT-1|ADI-1|20050215||ADT^A01|MSGADT003|T|2.4",
"FC": 12,
"VF": 11,
"Fields": [
{
"_id": "MSH_1",
"Val": "|"
},
{
"_id": "MSH_2",
"Val": "^~\\&"
},
{
"_id": "MSH_3",
"Val": "ADT"
},
{
"_id": "MSH_4",
"Val": "ADI"
},
{
"_id": "MSH_5",
"Val": "ADT-1"
},
{
"_id": "MSH_6",
"Val": "ADI-1"
},
{
"_id": "MSH_7",
"Val": "20050215"
},
{
"_id": "MSH_9",
"Val": "ADT^A01"
},
{
"_id": "MSH_9_1",
"Val": "ADT"
},
{
"_id": "MSH_9_2",
"Val": "A01"
},
{
"_id": "MSH_10",
"Val": "MSGADT003"
},
{
"_id": "MSH_11",
"Val": "T"
},
{
"_id": "MSH_12",
"Val": "2.4"
}
]
},
{
"_id": "EVN",
"Rep": 1,
"Seq": 1,
"Val": "EVN|A01|20031016000000",
"FC": 2,
"VF": 2,
"Fields": [
{
"_id": "EVN_1",
"Val": "A01"
},
{
"_id": "EVN_2",
"Val": "20031016000000"
}
]
},
{
"_id": "PID",
"Rep": 1,
"Seq": 2,
"Val": "PID|1|111222333|H123123^^^^MR^ADT~111-222-333^^^^SS^ADT||John^Smith|GARSEN^^Melissa|19380818|M|||241 AVE^^Lake City^WA^98125||(425)111-2222|||S|CHR|1234567",
"FC": 18,
"VF": 12,
"Fields": [
{
"_id": "PID_1",
"Val": "1"
},
{
"_id": "PID_2",
"Val": "111222333"
},
{
"_id": "PID_3",
"Val": "H123123^^^^MR^ADT~111-222-333^^^^SS^ADT",
"Repetitions": [
{
"_id": "PID_3",
"Val": "H123123^^^^MR^ADT",
"Rep": 1
},
{
"_id": "PID_3_1",
"Val": "H123123",
"Rep": 1
},
{
"_id": "PID_3_5",
"Val": "MR",
"Rep": 1
},
{
"_id": "PID_3_6",
"Val": "ADT",
"Rep": 1
},
{
"_id": "PID_3",
"Val": "111-222-333^^^^SS^ADT",
"Rep": 2
},
{
"_id": "PID_3_1",
"Val": "111-222-333",
"Rep": 2
},
{
"_id": "PID_3_5",
"Val": "SS",
"Rep": 2
},
{
"_id": "PID_3_6",
"Val": "ADT",
"Rep": 2
}
]
},
{
"_id": "PID_5",
"Val": "John^Smith"
},
{
"_id": "PID_5_1",
"Val": "John"
},
{
"_id": "PID_5_2",
"Val": "Smith"
},
{
"_id": "PID_6",
"Val": "GARSEN^^Melissa"
},
{
"_id": "PID_6_1",
"Val": "GARSEN"
},
{
"_id": "PID_6_3",
"Val": "Melissa"
},
{
"_id": "PID_7",
"Val": "19380818"
},
{
"_id": "PID_8",
"Val": "M"
},
{
"_id": "PID_11",
"Val": "241 AVE^^Lake City^WA^98125"
},
{
"_id": "PID_11_1",
"Val": "241 AVE"
},
{
"_id": "PID_11_3",
"Val": "Lake City"
},
{
"_id": "PID_11_4",
"Val": "WA"
},
{
"_id": "PID_11_5",
"Val": "98125"
},
{
"_id": "PID_13",
"Val": "(425)111-2222"
},
{
"_id": "PID_16",
"Val": "S"
},
{
"_id": "PID_17",
"Val": "CHR"
},
{
"_id": "PID_18",
"Val": "1234567"
}
]
},
{
"_id": "PV1",
"Rep": 1,
"Seq": 4,
"Val": "PV1|1|E|||||D123^Jeff^Carron|||MED||||7|||D123^Jeff^Carron|E|3454|R^20050215|||||||||||||||||||EM|||||20050215",
"FC": 44,
"VF": 11,
"Fields": [
{
"_id": "PV1_1",
"Val": "1"
},
{
"_id": "PV1_2",
"Val": "E"
},
{
"_id": "PV1_7",
"Val": "D123^Jeff^Carron"
},
{
"_id": "PV1_7_1",
"Val": "D123"
},
{
"_id": "PV1_7_2",
"Val": "Jeff"
},
{
"_id": "PV1_7_3",
"Val": "Carron"
},
{
"_id": "PV1_10",
"Val": "MED"
},
{
"_id": "PV1_14",
"Val": "7"
},
{
"_id": "PV1_17",
"Val": "D123^Jeff^Carron"
},
{
"_id": "PV1_17_1",
"Val": "D123"
},
{
"_id": "PV1_17_2",
"Val": "Jeff"
},
{
"_id": "PV1_17_3",
"Val": "Carron"
},
{
"_id": "PV1_18",
"Val": "E"
},
{
"_id": "PV1_19",
"Val": "3454"
},
{
"_id": "PV1_20",
"Val": "R^20050215"
},
{
"_id": "PV1_20_1",
"Val": "R"
},
{
"_id": "PV1_20_2",
"Val": "20050215"
},
{
"_id": "PV1_39",
"Val": "EM"
},
{
"_id": "PV1_44",
"Val": "20050215"
}
]
},
{
"_id": "IN1",
"Rep": 1,
"Seq": 5,
"Val": "IN1|1|I123|ICOMP1|INS COMP 1|PO BOX 1^^Lake City^WA^98125||||||||||1|John^Smith|01|19380818",
"FC": 18,
"VF": 9,
"Fields": [
{
"_id": "IN1_1",
"Val": "1"
},
{
"_id": "IN1_2",
"Val": "I123"
},
{
"_id": "IN1_3",
"Val": "ICOMP1"
},
{
"_id": "IN1_4",
"Val": "INS COMP 1"
},
{
"_id": "IN1_5",
"Val": "PO BOX 1^^Lake City^WA^98125"
},
{
"_id": "IN1_5_1",
"Val": "PO BOX 1"
},
{
"_id": "IN1_5_3",
"Val": "Lake City"
},
{
"_id": "IN1_5_4",
"Val": "WA"
},
{
"_id": "IN1_5_5",
"Val": "98125"
},
{
"_id": "IN1_15",
"Val": "1"
},
{
"_id": "IN1_16",
"Val": "John^Smith"
},
{
"_id": "IN1_16_1",
"Val": "John"
},
{
"_id": "IN1_16_2",
"Val": "Smith"
},
{
"_id": "IN1_17",
"Val": "01"
},
{
"_id": "IN1_18",
"Val": "19380818"
}
]
},
{
"_id": "IN1",
"Rep": 2,
"Seq": 7,
"Val": "IN1|2|I456|ICOMP2|INS COMP 1|PO BOX 2^^Lake City^WA^98125||||||||||8|John^Smith|01|19380818",
"FC": 18,
"VF": 9,
"Fields": [
{
"_id": "IN1_1",
"Val": "2"
},
{
"_id": "IN1_2",
"Val": "I456"
},
{
"_id": "IN1_3",
"Val": "ICOMP2"
},
{
"_id": "IN1_4",
"Val": "INS COMP 1"
},
{
"_id": "IN1_5",
"Val": "PO BOX 2^^Lake City^WA^98125"
},
{
"_id": "IN1_5_1",
"Val": "PO BOX 2"
},
{
"_id": "IN1_5_3",
"Val": "Lake City"
},
{
"_id": "IN1_5_4",
"Val": "WA"
},
{
"_id": "IN1_5_5",
"Val": "98125"
},
{
"_id": "IN1_15",
"Val": "8"
},
{
"_id": "IN1_16",
"Val": "John^Smith"
},
{
"_id": "IN1_16_1",
"Val": "John"
},
{
"_id": "IN1_16_2",
"Val": "Smith"
},
{
"_id": "IN1_17",
"Val": "01"
},
{
"_id": "IN1_18",
"Val": "19380818"
}
]
},
{
"_id": "IN2",
"Rep": 1,
"Seq": 6,
"Val": "IN2|1||RETIRED",
"FC": 3,
"VF": 2,
"Fields": [
{
"_id": "IN2_1",
"Val": "1"
},
{
"_id": "IN2_3",
"Val": "RETIRED"
}
]
},
{
"_id": "IN2",
"Rep": 2,
"Seq": 8,
"Val": "IN2|2||RETIRED",
"FC": 3,
"VF": 2,
"Fields": [
{
"_id": "IN2_1",
"Val": "2"
},
{
"_id": "IN2_3",
"Val": "RETIRED"
}
]
}
]
}
Here is a snapshot of sample document from MongoVUE
Example: different Trigger Events with count using Map Reduce
function Map() {
emit(this.Event, 1);
}
function Reduce(key, values) {
returnArray.sum(values);
}
Example: different Trigger Events with count using Aggregation
{ $project : { "Event" : 1 }},
{ $group: { _id: "$Event", count: {$sum: 1} } }
Example: different Segments with count using Map Reduce
function Map() {
this.Segments.forEach(function (s) {
emit(s._id, 1);
});
}
function Reduce(key, values) {
returnArray.sum(values);
}
Example: different Segments with count using Aggregation
{ $unwind: "$Segments" },
{ $project : { "Segments._id": 1}},
{ $group: { _id: "$Segments._id", count: {$sum: 1} } }
Example: distinct values for MSH_3 using Map Reduce
function Map() {
this.Segments.forEach(function (s) {
if(s._id == "MSH") {
s.Fields.forEach(function (f) {
if(f._id == "MSH_3")
emit(f.Val, 1);
});
}
});
}
function Reduce(key, values) {
returnArray.sum(values);
}
Example: distinct values for MSH_3 using Aggregation
{ $unwind: "$Segments" },
{ $unwind: "$Segments.Fields" },
{ $match : { "Segments.Fields._id": "MSH_3"} },
{ $group: { _id: "$Segments.Fields.Val" } }
Example: List of MSH_10 with PV1_2="E" using Aggregation
{ $unwind: "$Segments" },
{ $unwind: "$Segments.Fields" },
{ $match : { "Segments.Fields._id": "PV1_2", "Segments.Fields.Val": "E"}},
{ $project : { "_id": 1}}
Here is the list of queries that I have executed successfully with this schema in MongoDB.
- Total number of messages in sample
-
Distinct MSH_9_2 (Trigger Events) [with count]
- This will tell you which Messages you should handle in your interface
-
Different Segments in all messages [with count]
- This will tell you which segments you should handle in you interface
-
Different Fields (with values)
- This will tell you which fields you need to process
- This list contains only those fields for which we have received values or HL7 null in any of the message.
- If the field is empty in all the messages in sample then it will not appear here in the list.
- List of fields which HL7 null (“”) value [with count]
- List of fields with components [with count]
- Check if particular field has components
- Check if particular field has specified leading character
- List of fields with repetitions [with count]
- Maximum number of repetitions received for each field
- List of different Z Segments [with count]
- Find messages with particular segment
- Find messages without particular segment
- Find messages with particular field. For example messages where PID_19 is present
- Find messages with particular field & value. For example messages where PV1_2 = E
- Find messages with particular field excluding given value. For example messages where PV1_2 != E
- Find messages without particular field
- Distinct values of particular field. For example different values for MSH_3
- Maximum number of fields received in given segment
- Total fields with values (excluding empty fields) in given segment
You can download python script to insert HL7 messages into MongoDB database from my github. I am working on other scripts
https://github.com/j4jayant/HL7-2-MongoDB-Analysis/blob/master/python/hl7-2-mongodb.py
https://github.com/j4jayant/HL7-2-MongoDB-Analysis/blob/master/python/readme.txt
I welcome any comments or criticism to help improve this.