Skip to content

Mapping Architecture & Semantic Indexing

How the unified healthcare schema maps to the existing systems of record, how the semantic indexing pipeline processes it, and how the ReAct agent uses the indexed ontology to serve queries and take actions.


Schema-First Architecture

The healthcare ontology uses a schema-first approach — the unified schema is the primary knowledge source, not the FDW foreign tables. FDW becomes a mapping resolution layer that annotates which schema entities have live database connections.

flowchart TD
    subgraph SchemaOverlay [Unified Schema - healthcare-schema.yaml]
        EHR["EHR/EMR Entities<br/>Patient_Master, Encounter, Diagnosis, Medication_Order,<br/>Lab_Result, Vital_Sign, Allergy, Problem_List,<br/>Clinical_Note, Procedure_Record"]
        HIS["HIS Entities - virtual<br/>Bed_Census, Staff_Schedule, OR_Schedule,<br/>Department, Facility, Transfer_Record..."]
        RCM["RCM Entities - virtual<br/>Claim, Charge, Payment, Denial,<br/>Authorization, Payer_Contract..."]
        IMG["Imaging & Lab - virtual<br/>Imaging_Order, Imaging_Result, Lab_Order,<br/>Pathology_Report, DICOM_Study..."]
        POP["Population & External - virtual<br/>Quality_Measure, Care_Gap, SDOH_Assessment,<br/>Registry_Submission, Payer_Roster..."]
        NOTES["Clinical Notes - virtual<br/>Progress_Note, Discharge_Summary,<br/>Operative_Note, Consult_Note..."]
        DER["Derived - unmapped<br/>Risk_Stratification, Early_Warning_Score,<br/>Readmission_Predictor, Acuity_Index, Cohort_Definition"]
    end

    subgraph FDWLayer [FDW Mapping Resolution]
        DISC["FDW Discovery Service<br/>pg_foreign_server + information_schema"]
        MATCH["Match schema fdw_table<br/>to live foreign tables"]
    end

    subgraph Status [Mapping Status]
        MAPPED["MAPPED<br/>Live FDW table exists<br/>Queryable via SQL"]
        VIRTUAL["VIRTUAL<br/>Data via integration<br/>Not directly queryable"]
        UNMAPPED["UNMAPPED<br/>Expected FDW table<br/>not yet connected"]
    end

    EHR --> MATCH
    DISC --> MATCH
    MATCH --> MAPPED
    HIS --> VIRTUAL
    RCM --> VIRTUAL
    IMG --> VIRTUAL
    POP --> VIRTUAL
    NOTES --> VIRTUAL
    DER --> UNMAPPED

Entity Mapping Status

Status Meaning Count Example
Mapped Live FDW foreign table exists; entity is queryable via query_table tool ~10 Patient_Master, Encounter, Diagnosis, Medication_Order, Lab_Result, Vital_Sign, Allergy, Problem_List, Clinical_Note, Procedure_Record
Virtual Entity data flows via integration sync; not directly queryable via FDW ~35 Bed_Census, Claim, Imaging_Order, Quality_Measure, Care_Gap, Progress_Note
Unmapped Schema defines the entity but no FDW table or integration connected yet ~5 Risk_Stratification, Early_Warning_Score, Readmission_Predictor, Acuity_Index, Cohort_Definition

Semantic RAG Pipeline (12 Steps)

The pipeline follows the same 12-step process as other vertical ontologies, extended with Steps 1A (Unified Schema Extraction) and 1B (FDW Mapping Resolution):

flowchart LR
    subgraph Extraction [Extraction Phase]
        S1A["Step 1A<br/>Unified Schema Extract<br/>~50 entities from *-schema.yaml"]
        S1B["Step 1B<br/>FDW Mapping Resolution<br/>Match to live FDW tables"]
        S1C["Step 1C<br/>Legacy FDW Extract<br/>Non-schema FDW tables"]
        S2["Step 2<br/>Policy Extract<br/>8 policies from *.md"]
        S3["Step 3<br/>Workflow Extract<br/>10 workflows from *.yaml"]
        S4["Step 4<br/>Integration Extract<br/>6 integrations from *.yaml"]
    end

    subgraph Processing [Processing Phase]
        S5["Step 5<br/>Normalize + Dedupe<br/>Merge into OntoBundle"]
        S6["Step 6<br/>Enrich<br/>FHIR/CMS-aware"]
        S7["Step 7<br/>Chunk + Embed"]
    end

    subgraph Loading [Loading Phase]
        S8["Step 8<br/>Load pgvector"]
        S9["Step 9<br/>Load Apache AGE Graph"]
        S10["Step 10<br/>Validate - scenarios"]
    end

    S1A --> S1B --> S1C --> S5
    S2 --> S5
    S3 --> S5
    S4 --> S5
    S5 --> S6 --> S7 --> S8 --> S9 --> S10

Step Details

Step File What It Does
1A unified_schema_extractor.py Reads healthcare-schema.yaml; creates OntoDocuments for every entity with FHIR resource type, CMS quality program, ICD/CPT/SNOMED mapping, and FDW mapping status annotations
1B fdw_mapping_resolver.py Queries FDWDiscoveryService to match schema entities to live FDW foreign tables; annotates as mapped/virtual/unmapped; enriches mapped entities with live column metadata
1C fdw_extractor.py Original FDW extractor for non-schema tables (backward compatibility)
2 policy_extractor.py Auto-discovers all *.md from enterprise-knowledge/policies/ — includes 8 healthcare policies
3 workflow_extractor.py Auto-discovers all *.yaml from enterprise-knowledge/workflows/ — includes 10 healthcare workflows
4 integration_extractor.py Auto-discovers all *.yaml from enterprise-knowledge/integrations/ — includes 6 healthcare integrations
5 normalizer.py Merges all extracted documents; deduplicates by ID; merges relationships and structured_metadata on collision
6 enricher.py Schema entities: auto-enriched with FHIR resource type, CMS quality program classification, and FDW status. Policies/workflows/integrations: LLM-enriched via gpt-4o-mini
7 chunker.py 1 document = 1 chunk; batch embedded (20/batch) via OpenAI text-embedding-3-small
8 vector_loader.py Upserted to pgvector control_plane_embeddings with content_type: onto_schema, onto_policy, onto_workflow, onto_integration
9 graph_loader.py Nodes (Entity) and edges (triggers/syncs_to/constrained_by/depends_on/validates) merged into Apache AGE enterprise_onto graph
10 validator.py Black-box test scenarios validating retrieval quality across 6 dimensions

What Gets Indexed

Source Content Type Approx Count
Unified schema (EHR + HIS + RCM + Imaging/Lab + Population + Notes) onto_schema ~50
Policies (8 healthcare) onto_policy ~50+ (split by section)
Workflows (10 healthcare) onto_workflow ~10
Integrations (6 healthcare) onto_integration ~6
Total ~116+

ReAct Agent and Tools

The ReAct agent uses the indexed ontology to answer questions and take actions. The flow is: Search ontology -> Reason with policies -> Execute actions -> Validate compliance.

Tool Inventory

Read Tools

Tool Domain What It Does
search_enterprise_knowledge Core Hybrid vector + graph search across all ontology types
search_schema_knowledge Core Vector search over FDW table definitions
discover_tables / discover_columns / query_table Core FDW table discovery and parameterized SQL queries
check_policy_compliance Governance Validates proposed actions against indexed policies
get_patient_360 Healthcare Assemble unified patient profile across EHR, HIS, RCM, Imaging, and Notes
get_clinical_history Healthcare Query longitudinal clinical history for a patient (diagnoses, procedures, medications)
get_encounter_timeline Healthcare Retrieve chronological encounter timeline with associated orders, results, and notes
get_care_gaps Healthcare Identify open care gaps for a patient or population cohort against HEDIS/CMS measures
get_bed_status Healthcare Query real-time bed census and availability by unit, department, or facility

Write Tools

Tool Risk Level What It Does
create_clinical_order LOW_RISK_WRITE Create a clinical order (lab, imaging, referral) in the EHR
update_care_gap LOW_RISK_WRITE Close or update a care gap with intervention documentation
create_care_coordinator_task LOW_RISK_WRITE Assign a care coordination task to a care manager
escalate_critical_result HIGH_RISK_WRITE Escalate a critical lab or imaging result to the ordering provider
update_discharge_plan LOW_RISK_WRITE Update discharge plan with disposition, follow-up, and medication reconciliation

End-to-End ReAct Flow

sequenceDiagram
    participant User
    participant Agent as ReAct Agent
    participant RAG as Ontology Search
    participant Policy as Policy Check
    participant SoR as System of Record

    User->>Agent: "Which patients on Unit 4B are high risk for readmission and have open care gaps?"
    Agent->>RAG: search_enterprise_knowledge("readmission risk Unit 4B care gaps")
    RAG-->>Agent: Risk_Stratification schema + readmission-prevention-policy + care-gap-closure workflow + Bed_Census
    Agent->>Agent: REASON: Need current census for Unit 4B, readmission risk scores, and open care gaps
    Agent->>SoR: get_bed_status(unit="4B")
    SoR-->>Agent: 28 patients currently on Unit 4B
    Agent->>SoR: get_care_gaps(unit="4B", risk_level="high")
    SoR-->>Agent: 6 patients flagged high-risk, 14 open care gaps across cohort
    Agent->>Policy: check_policy_compliance("care_gap_outreach", "Risk_Stratification", "high-risk cohort")
    Policy-->>Agent: COMPLIANT — proactive outreach required per POL-QM-001 Section 4
    Agent->>User: 6 high-risk patients on Unit 4B with 14 open care gaps. Top priority: 2 patients with HbA1c and depression screening gaps due within 48 hrs. Recommend care coordinator task assignment per quality policy.

UI Integration

Data Plane Page

  • Vertical selector filters data sources by domain (All / Healthcare / Supply Chain / CRM)
  • Each source node shows ontology entity count and FHIR resource coverage
  • Source cards display FDW mapping status (mapped / virtual) and entity count badges

Control Plane Page

  • Semantic Layer tab shows vertical-level stats (Healthcare: 50 entities, 10 workflows, 8 policies, 6 integrations)
  • FHIR resource distribution badges (Patient, Encounter, Condition, MedicationRequest, Observation, Claim, Procedure, DiagnosticReport)
  • Knowledge Formation and Semantic Explorer tabs support system and FHIR filtering

Reasoning Page

  • ReAct Tools tab organizes tools into Read Tools and Write Tools with domain badges (Core / Healthcare / Governance)
  • AI Copilot system prompt includes healthcare context and tool selection strategy

Configuration

The pipeline is configured via SemanticRagConfig:

Parameter Default Purpose
schema_dir enterprise-knowledge/ Directory containing *-schema.yaml files
policy_path enterprise-knowledge/policies/ Directory with policy Markdown files
workflow_path enterprise-knowledge/workflows/ Directory with workflow YAML files
integration_path enterprise-knowledge/integrations/ Directory with integration YAML files
skip_unified_schema false Skip Step 1A (unified schema extraction)
skip_fdw_mapping false Skip Step 1B (FDW mapping resolution)
enrich_with_llm true Enable LLM enrichment for non-schema docs
skip_graph false Skip Apache AGE graph loading

Trigger reindex via: POST /api/v1/control-plane/reindex


← Back to Ontology Overview