Cube Design


i2b2 cube design to enable drill-through


As we know, we can not cross reference individual concept category (Diagnosis, Medication) to each other since all concepts are
in a single concept_dimension table, therefore a way is needed to separate out individual concept category into individual concept dimension table
and to be able to cross reference individual concepts, we need to find a way to distribute concept key (CONCEPT_CD) among
individual dimensions. Drill-through will be much more meaningful if we separate out individual concept category and distribute concept key
among them.

If we look into OBSERVATION_FACT table and group by ENCOUNTR_NUM we get following data:


            select ENCOUNTER_NUM,CONCEPT_CD,count(*)
            from observation_fact
            group by encounter_num,CONCEPT_CD
            having count(*) > 1
            order by encounter_num

ENCOUNTER_NUM    CONCEPT_CD        COUNT
...
...
1000000001        ICD9:V76.2                 2
1000000001        LOINC:11277-1           2
1000000001        LOINC:2000-8             2
1000000001        NDC:00005306343       2
...
...   


and detail fact data without grouping:

ENCOUNTER_NUM    CONCEPT_CD            PATIENT_NUM     OTHER ATTrbs.
1000000001               ICD9:V76.2                   100001
1000000001               ICD9:V76.2                   100001
1000000001               LOINC:11277-1              100001
1000000001               LOINC:11277-1              100001
1000000001               LOINC:2000-8                100001
1000000001               LOINC:2000-8                100001
1000000001               NDC:00005306343          100001
1000000001               NDC:00005306343          100001
1000000002               ICD9:382.10                  100002
1000000002               NDC:00005306000          100002

Here, we find that there is grouping of concepts by encounter number (and 'CONCEPT_CD' being attribute and 'PATIENT_NUM' and 'OTHER ATTrbs' being values
in a EAV model. NOTE: tried out different ways to flatten the attributes and values but none is successful since multiple value columns ('PATIENT_NUM' and 'OTHER ATTrbs' are involved for a single concept attribute and there is no indication of which particular concept can be referenced with which particular concept, for e.g., can we reference 'ICD9:V76.2' with 'NDC:00005306343'?? There is no other indication except for their ENCOUNTER_NUM is same. But there
are 'OTHER ATTrbs' like 'TVal_NUM' with value '101.01' for concept 'ICD9:V76.2' which is not applicable to concept 'NDC:00005306343')

Other option is to include distinct ENCOUNTER_NUM in the individual dimensions by joining to OBSERVATION_FACT but it was also unsuccessful for the
reason given above. 

If we omit 'OTHER ATTrbs' and consider only ENCOUNTER_NUM, CONCEPT_CD and PATIENT_NUM, using SQL we get following data: 

ENCOUNTER_NUM    Diagnosis        Medication                Lab_Test            PATIENT_NUM
1000000001            ICD9:V76.2        NDC:00005306343    LOINC:11277-1        100001
1000000001            ICD9:V76.2        NDC:00005306343    LOINC:11277-1        100001
1000000001                                                                   LOINC:2000-8         100001
1000000001                                                                   LOINC:2000-8         100001
1000000002            ICD9:382.10      NDC:00005306000                                  100002

Data above looks valid from ENCOUNTER view perspective since unique patient count (2) is equal to unique patient count in the fact data and as long as we omit 'OTHER ATTrbs' we can cross reference the concepts because there participation is for a single ENCOUNTER_NUM.

There are seven concepts represented in i2b2 concept dimension which are:

 Concept  concept_cd
 Diagnoses  ICD9:xx     (for e.g., ICD9:569.84, ICD9:E985.6)
 Procedures  ICD9:xx     (for e.g., ICD9:80.23, ICD9:00.93)
 Medications  NDC:xx     (for e.g., NDC:00008077101, NDC:00049276066)
 Lab-tests  LONIC:xx  (for e.g., LOINC:1754-1, LOINC:33238-7)
 Reports  LCS:xx     (for e.g., LCS-I2B2:CT_RPT, LCS-I2B2:MRI_RPT)
 Expression Profiles Data  Affy:xx      (for e.g., Affy:221615_at, Affy:221619_s_at)
 Demographics  DEM|LANGUAGE:x, DEM|AGE:x, DEM|ZIPCODE:x, DEM|MARITAL:x, etc
(for e.g., DEM|LANGUAGE:tois, DEM|LANGUAGE:chinese, DEM|AGE:67, DEM|AGE:72)

Out of which three dimensions- Diagnosis, Medication and Lab-test have been included in the cube since:
  • Diagnoses and Procedures are both ICD9 codes, whether ICD9 codes (for e.g ICD9:569.84) > 100 are Diagnoses is not confirmed.
  • Reports keys are not in fact table.
  • Makes sense to include Demographics concept dimension??

Encounter Year calculation/selection


NOTE: start_date and end_date fields of observation_fact records beginning and ending date for an event or observation or encounter.
While start_date is a not null field, i.e., an event date must be recorded, end_date may not be recorded and the active_status_cd
is used to record whether the event is still ongoing.

Concept Keys of the observation_fact have been distributed among lab_test_concept_cd, medication_concept_cd and diagnosis_concept_cd
fields of encounter_view_fact table but this table does not record start_date and end_date of encounters so to calculate/select 'Year'
of the encounters we need to join observation_fact and select min start_date from a Lab test or a Medication or a Diagnosis and join
to encounter_view_fact.
Please see below scripts.

Scripts:

    CREATE TABLE "I2B2DEMODATA"."ENCOUNTER_YEAR_VIEW_FACT"
   (    "ENCOUNTER_NUM" NUMBER(38,0),
    "PATIENT_NUM" NUMBER(38,0),
    "PROVIDER_ID" VARCHAR2(50 BYTE),
    "LABTEST_CD" VARCHAR2(50 BYTE),
    "MEDICATION_CD" VARCHAR2(50 BYTE),
    "DIAGNOSIS_CD" VARCHAR2(50 BYTE),
    "START_DATE" DATE,
    "END_DATE" DATE,
    "YEAR" VARCHAR2(50 BYTE)
   );

    insert into ENCOUNTER_YEAR_VIEW_FACT(ENCOUNTER_NUM,PATIENT_NUM,PROVIDER_ID,LABTEST_CD,MEDICATION_CD,DIAGNOSIS_CD,START_DATE,END_DATE,YEAR)
    select evf.ENCOUNTER_NUM,PATIENT_NUM,PROVIDER_ID,LABTEST_CD,MEDICATION_CD,DIAGNOSIS_CD,a.start_date,a.end_date,to_char(a.start_date,'YYYY')
    from encounter_view_fact evf
    inner join
    (
    select encounter_num,concept_cd,min(start_date)start_date,max(end_date)end_date
    from observation_fact
    where concept_cd like '%LOINC%'
      or concept_cd like '%ICD9%'
      or concept_cd like '%NDC%'
    group by encounter_num,concept_cd
    )a
    on evf.encounter_num = a.encounter_num 
    and evf.lab_cd = a.concept_cd
    order by a.start_date ;

    create index idx_eyvf_en_no on ENCOUNTER_YEAR_VIEW_FACT(ENCOUNTER_NUM);
    create index idx_eyvf_lab_cd on ENCOUNTER_YEAR_VIEW_FACT(LABTEST_CD);
    create index idx_eyvf_med_cd on ENCOUNTER_YEAR_VIEW_FACT(MEDICATION_CD);
    create index idx_eyvf_diag_cd on ENCOUNTER_YEAR_VIEW_FACT(DIAGNOSIS_CD);
    create index idx_eyvf_year on ENCOUNTER_YEAR_VIEW_FACT(YEAR);

Verify that start_date and end_date from OBSERVATION_FACT are not outside of start_date and end_date
in ENCOUNTER_YEAR_VIEW_FACT, i.e., 'Year' recorded in ENCOUNTER_YEAR_VIEW_FACT is correct.

    select count(*)
    from
    (
    select encounter_num,concept_cd,start_date,end_date from observation_fact
    where concept_cd like '%LOINC%'
    )f
    join ENCOUNTER_YEAR_VIEW_FACT ev
    on f.encounter_num = ev.encounter_num 
    and f.concept_cd = ev.labtest_cd
    and f.start_date < ev.start_date
    and f.end_date > ev.end_date
    --0
    select count(*)
    from
    (
    select encounter_num,concept_cd,start_date,end_date from observation_fact
    where concept_cd like '%ICD9%'
    )f
    join ENCOUNTER_YEAR_VIEW_FACT ev
    on f.encounter_num = ev.encounter_num 
    and f.concept_cd = ev.DIAGNOSIS_CD
    and f.start_date < ev.start_date
    and f.end_date > ev.end_date
    --0
    select count(*)
    from
    (
    select encounter_num,concept_cd,start_date,end_date from observation_fact
    where concept_cd like '%NDC%'
    )f
    join ENCOUNTER_YEAR_VIEW_FACT ev
    on f.encounter_num = ev.encounter_num 
    and f.concept_cd = ev.MEDICATION_CD
    and f.start_date < ev.start_date
    and f.end_date > ev.end_date
    --0

Verify observation_fact and encounter_year_view_fact unique patient count are equal

    select count(distinct patient_num) Patients
    from observation_fact
    where concept_cd like '%LOINC%'
          or concept_cd like '%ICD9%'
          or concept_cd like '%NDC%'
    --131
    select count(distinct encounter_num) Encounters
    from observation_fact
    where concept_cd like '%LOINC%'
          or concept_cd like '%ICD9%'
          or concept_cd like '%NDC%'
    --131
    select count(distinct encounter_num) Patients
    from encounter_year_view_fact
    --131

NOTE: In observation_fact, encounter_num and patient_num represented by same patient visit number, i.e., encounter_num
and patient_num are indentical so measure Patients equals to Encounters.
Comments