Enterprise Data Simulator - CRM (Salesforce)¶

Screenshots

To capture additional screenshots of the simulator UI, run the Playwright script: python3 docs/simulators/capture-simulator-screenshots.py

Overview¶

The SFDC (Salesforce) Simulator replaces the live PostgreSQL Foreign Data Wrapper (FDW) connection to Salesforce with an in-memory data store backed by S3 dumps. This allows the NexusAI platform to query Salesforce CRM data (Accounts, Leads, Opportunities, Contacts, and custom objects) during development and testing without a live Salesforce org or a PostgreSQL FDW extension.

Key capabilities:

Loads a complete snapshot of Salesforce FDW tables from S3 into memory
Implements the same FDWDiscoveryService and FDWQueryService interfaces as the real FDW, so consuming code works unchanged
Supports filtering, sorting, pagination, and column selection on in-memory data
Can dump fresh data from a live FDW runtime to S3 for later use
Mode can be toggled between Simulator and Real at runtime via the UI or API

Architecture¶

Real Mode vs Simulator Mode¶

The platform uses SFDC_SIMULATOR_ENABLED (and the SSM parameter /nexus-ai/{env}/sfdc/enabled) to decide how FDW queries are served. When the simulator is enabled, fdw_api.configure() injects the SFDCSimulatorService in place of the real PostgreSQL FDW services.

flowchart TB
    subgraph realMode [Real Mode]
        AppR[NexusAI Backend]
        FDW_API_R[FDW API Router]
        Discovery_R[FDWDiscoveryService]
        Query_R[FDWQueryService]
        PG[(PostgreSQL\nwith FDW extension)]
        SFDC_R[Salesforce API]

        AppR --> FDW_API_R
        FDW_API_R --> Discovery_R
        FDW_API_R --> Query_R
        Discovery_R --> PG
        Query_R --> PG
        PG -->|Foreign Tables| SFDC_R
    end

    subgraph simMode [Simulator Mode]
        AppS[NexusAI Backend]
        FDW_API_S[FDW API Router]
        SimService[SFDCSimulatorService\nin-memory dicts]
        S3[(S3 Bucket\nsfdc-simulator)]

        AppS --> FDW_API_S
        FDW_API_S --> SimService
        SimService -->|load_from_s3| S3
    end

Interface Compatibility¶

The SFDCSimulatorService implements the same method signatures as the real services:

Interface	Methods	Description
`FDWDiscoveryService`	`list_foreign_servers()`, `list_foreign_tables()`, `get_table_columns()`, `refresh_cache()`	Schema discovery (servers, tables, columns)
`FDWQueryService`	`execute_query()`	Data queries with filters, sorting, pagination

This means no code changes are required in any FDW consumer -- the swap is transparent.

Component Map¶

Component	Location	Role
SFDCSimulatorService	`nexus-backend/src/services/sfdc_simulator_service.py`	Core service: loads S3 dumps, serves queries in-memory
Simulator API	`nexus-backend/src/apis/sfdc_simulator_api.py`	FastAPI router for simulator admin operations
Dump Script	`nexus-backend/scripts/sfdc-simulator/dump_sfdc_via_fdw.py`	CLI tool to dump live FDW data to S3
FDW Models	`nexus-backend/src/models/fdw_models.py`	Shared Pydantic models (ForeignServerInfo, ForeignTableInfo, etc.)
UI Panel	`nexus-ui/src/components/sfdc-simulator/SFDCSimulatorPanel.tsx`	Main React component for the Salesforce Simulator tab
UI Service	`nexus-ui/src/services/sfdcSimulatorService.ts`	Frontend API client

S3 Storage Layout¶

Data is stored in a structured JSON format under a single S3 prefix:

s3://{bucket}/{prefix}/
├── manifest.json                        # Dump metadata (timestamp, source, counts)
├── servers.json                         # Foreign server definitions
├── tables.json                          # Foreign table list (all schemas)
├── columns/
│   ├── nexus_data.Account.json    # Column metadata per table
│   ├── nexus_data.Lead.json
│   └── ...
├── data/
│   ├── nexus_data.Account.json    # Row data per table
│   ├── nexus_data.Lead.json
│   └── ...
└── _dump_status.json                    # Current dump operation status

File	Contents
`manifest.json`	Dump timestamp, source URL, schema name, table/row counts, elapsed time
`servers.json`	Array of FDW server definitions (server name, wrapper type, options)
`tables.json`	Array of all foreign tables (schema, table name, server, column count)
`columns/{schema}.{table}.json`	Array of column metadata (name, data type, nullable, ordinal position)
`data/{schema}.{table}.json`	Array of row objects (or `{"rows": [...]}` format)
`_dump_status.json`	Tracks whether a dump is currently running (for multi-pod consistency)

Data Sources and Population¶

Where the Data Comes From¶

Unlike the WXCC simulator which generates synthetic data, the SFDC simulator uses real Salesforce data that has been dumped from a live FDW runtime. The data flow is:

flowchart LR
    subgraph liveEnv [Live Environment]
        SFDC_Live[Salesforce Org]
        PG_Live[PostgreSQL + FDW]
        ALB[Application Load Balancer\nFDW REST API]
    end

    subgraph dumpProcess [Dump Process]
        DumpScript["dump_sfdc_via_fdw.py\nor POST /dump"]
    end

    subgraph s3Storage [S3 Storage]
        S3[(S3 Bucket\nsfdc-simulator)]
    end

    subgraph simEnv [Simulator Environment]
        SimService[SFDCSimulatorService]
        InMemory["In-Memory Dicts\n_servers, _tables,\n_columns, _data"]
    end

    SFDC_Live --> PG_Live
    PG_Live --> ALB
    ALB --> DumpScript
    DumpScript --> S3
    S3 -->|load_from_s3| SimService
    SimService --> InMemory

Dump Process¶

The dump process queries a running FDW API to extract:

Foreign servers -- via GET /api/fdw/servers
Foreign tables -- via GET /api/fdw/tables
Column metadata -- via GET /api/fdw/tables/{schema}/{table}/columns for each table
Row data -- via POST /api/fdw/query for each selected table

Priority Tables¶

When dumping without specifying tables, only high-value tables are included by default:

Table	Type
`Account`	Standard Salesforce object
`Lead`	Standard Salesforce object
`Opportunity`	Standard Salesforce object
`Contact`	Standard Salesforce object
`RecordType`	Standard Salesforce object
`cspmb__Price_Item__c`	Custom managed package object
`ContentVersion`	File/document versions
`ContentDocumentLink`	Document-to-record associations
`Note`	Note records

Use --all to dump every table in the schema (slower, but comprehensive).

Load Process¶

When the simulator starts (or when Load Data is clicked in the UI):

manifest.json is downloaded and validated
servers.json is parsed into ForeignServerInfo objects
tables.json is parsed into ForeignTableInfo objects
All columns/*.json files are downloaded and parsed into ColumnInfo lists, indexed by (schema, table) tuple
All data/*.json files are downloaded and parsed into row lists, indexed by (schema, table) tuple
The service sets _loaded = True and records the load timestamp

The loaded data is fully queryable via the standard FDW query interface.

Using the Simulator (UI)¶

The SFDC Simulator is accessed via Operations > Simulators > Salesforce Simulator tab.

Page Layout¶

SFDC Simulator Overview

The Salesforce Simulator page is organized into these sections:

Header -- Title, Refresh / Load Data / Reset buttons
SFDC Mode Toggle -- Switch between Simulator and Real mode
Simulator Status -- Card showing enabled/loaded state, S3 bucket/prefix, table and row counts
Statistics -- Visual summary: Tables, Tables With Data, Total Rows
FDW Objects Table -- List of all loaded foreign tables with schema, name, column count, row count
Dump from Live FDW -- Section for creating fresh dumps from a running real FDW

Mode Toggle¶

The mode toggle at the top of the page (visible in the overview screenshot as "SFDC Mode") shows the current data source:

Simulator badge (green) -- The platform is using in-memory simulator data.
Real badge -- The platform is using the live PostgreSQL FDW connection.

Click Switch to Real or Switch to Simulator to toggle the mode. This:

Updates the SSM parameter /nexus-ai/{env}/sfdc/enabled
Changes how fdw_api resolves its service providers
Takes effect immediately for subsequent FDW queries

Warning

Switching to Real mode requires a working PostgreSQL FDW connection with valid Salesforce credentials. If the FDW is not configured, queries will fail.

Loading Data¶

Click Load Data in the header to trigger SFDCSimulatorService.load_from_s3(). This downloads all S3 dump files into memory. The status card will update to show:

Loaded badge (green) -- Data is ready to serve
Not Loaded badge (red) -- No data in memory (need to click Load Data)

The load operation runs in the background and typically completes in a few seconds for small datasets, or up to 30 seconds for large schemas.

Auto-Load

The service auto-loads from S3 on the first query if data hasn't been loaded yet (ensure_loaded() is called before any query). However, clicking Load Data explicitly gives you visibility into the load result.

Simulator Status Card¶

The status card displays:

Field	Description
Enabled	Whether the simulator service is active
Loaded	Whether S3 data has been loaded into memory
S3 Bucket	The S3 bucket holding the dump data
S3 Prefix	The S3 key prefix (default: `sfdc-simulator`)
Tables	Total number of foreign tables in the dump
Tables with Data	Number of tables that have at least one row
Total Rows	Sum of all row counts across all tables
Dump Timestamp	When the S3 data was originally dumped
Last Loaded	When the data was last loaded into memory

FDW Objects Table¶

The FDW Objects table (visible at the bottom of the overview screenshot) lists all loaded foreign tables:

Column	Description
Schema	Salesforce schema name (e.g., `nexus_data`)
Table	Table name (e.g., `Account`, `Lead`, `Opportunity`)
Columns	Number of columns in the table
Rows	Number of data rows loaded

Click any row to open the Data Browser Modal, which shows the actual records in a paginated table.

A filter input above the table lets you search tables by name.

Clicking a table row opens a modal displaying:

Column headers matching the table schema
Row data in a scrollable, paginated table
Navigation controls (limit, offset)

This is useful for verifying that the dump contains the expected data before running analytics pipelines.

Dump from Live FDW¶

The Dump from Live FDW section at the bottom of the page allows you to create a fresh data snapshot from a running real FDW environment:

Enter the Source URL of a deployed FDW runtime (the ALB URL).
Optionally specify a schema name (default: nexus_data).
Click Start Dump.

The dump runs in the background, querying the live FDW API and uploading results to S3. Progress is tracked via _dump_status.json in S3 for multi-pod consistency.

After the dump completes, click Load Data to refresh the in-memory data from the new S3 dump.

Resetting Data¶

Click Reset to reload the original S3 dump into memory. This discards any in-memory modifications and re-fetches all data from S3, effectively restoring the simulator to its last-dumped state.

Using the Simulator (CLI)¶

Dump Script¶

The dump script (nexus-backend/scripts/sfdc-simulator/dump_sfdc_via_fdw.py) extracts data from a live FDW runtime and saves it to S3 or a local directory.

Basic Usage¶

# Dump priority tables from a deployed environment to S3
python scripts/sfdc-simulator/dump_sfdc_via_fdw.py \
  --url http://k8s-alb.elb.amazonaws.com \
  --s3-bucket nexus-ai-dev-sfdc-simulator

# Auto-detect ALB URL from kubectl and dump to S3
python scripts/sfdc-simulator/dump_sfdc_via_fdw.py \
  --auto-detect \
  --s3-bucket nexus-ai-dev-sfdc-simulator

# Dump specific tables only
python scripts/sfdc-simulator/dump_sfdc_via_fdw.py \
  --url http://k8s-alb.elb.amazonaws.com \
  --tables Account,Lead,Contact,Opportunity \
  --s3-bucket nexus-ai-dev-sfdc-simulator

# Dump ALL tables (slow -- queries every table in the schema)
python scripts/sfdc-simulator/dump_sfdc_via_fdw.py \
  --url http://k8s-alb.elb.amazonaws.com \
  --all \
  --s3-bucket nexus-ai-dev-sfdc-simulator

# Local-only dump (no S3 upload)
python scripts/sfdc-simulator/dump_sfdc_via_fdw.py \
  --url http://k8s-alb.elb.amazonaws.com \
  --local-dir ./sfdc-dump

CLI Options Reference¶

Option	Type	Default	Description
`--url`	URL	--	ALB URL of the deployed FDW runtime
`--auto-detect`	flag	off	Auto-detect ALB URL from `kubectl get ingress`
`--namespace`	string	all	Kubernetes namespace for auto-detection
`--schema`	string	`nexus_data`	PostgreSQL schema to dump
`--tables`	string	--	Comma-separated table names (overrides priority list)
`--all`	flag	off	Dump data for all tables in the schema
`--limit`	int	10,000	Max rows per table
`--s3-bucket`	string	--	S3 bucket for upload
`--s3-prefix`	string	`sfdc-simulator`	S3 key prefix
`--aws-profile`	string	`external-access`	AWS CLI profile name
`--local-dir`	path	--	Local directory for dump (alternative to S3)

Note

Either --s3-bucket or --local-dir (or both) must be specified.

API Reference¶

Simulator Admin Endpoints (`/api/v1/sfdc-simulator`)¶

Method	Endpoint	Description
GET	`/status`	Simulator status (enabled, loaded, bucket, table/row counts)
POST	`/load`	Load data from S3 into memory (background)
POST	`/reset`	Reset (reload from S3)
GET	`/objects`	List all loaded tables with row counts
GET	`/objects/{schema}/{table}`	Get records for a table (supports `limit`, `offset` query params)
GET	`/objects/{schema}/{table}/columns`	Get column metadata for a table
POST	`/dump`	Dump from a live FDW runtime to S3 (background)
GET	`/dump/status`	Check dump operation progress
POST	`/restart-pods`	Rolling restart of backend pods
GET	`/health`	Health check

Config Endpoints (`/api/v1/config`)¶

Method	Endpoint	Description
GET	`/sfdc-mode`	Get current SFDC mode (`simulator` or `real`)
PUT	`/sfdc-mode`	Set SFDC mode (body: `{"enabled": true/false}`)

Example: Get Simulator Status¶

curl http://localhost:8000/api/v1/sfdc-simulator/status

Response:

{
  "enabled": true,
  "loaded": true,
  "bucket": "nexus-ai-dev-sfdc-simulator",
  "prefix": "sfdc-simulator",
  "schema": "nexus_data",
  "table_count": 245,
  "tables_with_data": 12,
  "total_rows": 8543,
  "dump_timestamp": "2026-03-14T10:30:00+00:00",
  "last_loaded": "2026-03-15T08:00:00+00:00"
}

Example: Load Data from S3¶

curl -X POST http://localhost:8000/api/v1/sfdc-simulator/load

Response:

{
  "success": true,
  "message": "Loaded 245 tables, 12 with data, 8543 rows in 3.2s",
  "tables": 245,
  "tables_with_data": 12,
  "total_rows": 8543
}

Example: Toggle Mode¶

# Switch to simulator mode
curl -X PUT http://localhost:8000/api/v1/config/sfdc-mode \
  -H "Content-Type: application/json" \
  -d '{"enabled": true}'

# Switch to real FDW mode
curl -X PUT http://localhost:8000/api/v1/config/sfdc-mode \
  -H "Content-Type: application/json" \
  -d '{"enabled": false}'

Example: Browse Table Data¶

# Get first 50 rows from the Account table
curl "http://localhost:8000/api/v1/sfdc-simulator/objects/nexus_data/Account?limit=50&offset=0"

Configuration Reference¶

Environment Variables¶

Variable	Default	Description
`SFDC_SIMULATOR_ENABLED`	`false`	Enable the SFDC simulator (use in-memory data instead of real FDW)
`SFDC_SIMULATOR_BUCKET`	--	S3 bucket containing the FDW data dump
`SFDC_SIMULATOR_PREFIX`	`sfdc-simulator`	S3 key prefix for dump files

System Settings (UI)¶

The SFDC Simulator settings are also available in the application Settings page under System > SFDC Simulator. This provides an admin-friendly UI for editing the SSM parameters without using the CLI or API:

Setting	Type	Default	Description
Enabled	Toggle	off	Enable SFDC simulator mode (use in-memory data instead of real FDW)
Bucket	Text	`nexus-ai-{env}-sfdc-simulator`	S3 bucket for SFDC simulator data dumps
Prefix	Text	`sfdc-simulator`	S3 prefix for SFDC simulator data

Changes are persisted to AWS SSM Parameter Store and take effect immediately. Use Save Changes to apply or Reset to revert to defaults.

AWS SSM Parameters¶

Parameter Path	Description
`/nexus-ai/{env}/sfdc/enabled`	`true`/`false` -- master enable flag (toggled by Config API or Settings UI)
`/nexus-ai/{env}/sfdc/bucket`	S3 bucket name for simulator data dumps
`/nexus-ai/{env}/sfdc/prefix`	S3 key prefix for dump files

In-Memory Data Structure¶

When loaded, the service holds data in these Python structures:

Attribute	Type	Description
`_servers`	`List[ForeignServerInfo]`	Foreign server definitions
`_tables`	`List[ForeignTableInfo]`	Foreign table metadata
`_columns`	`Dict[(schema, table), List[ColumnInfo]]`	Column metadata indexed by table
`_data`	`Dict[(schema, table), List[dict]]`	Row data indexed by table
`_manifest`	`Dict`	Dump metadata (source, timestamp, counts)
`_loaded`	`bool`	Whether data has been loaded from S3
`_loaded_at`	`datetime`	When the last load completed

Query Capabilities¶

The in-memory query engine supports:

Feature	Description
Column selection	Return only specified columns
Filtering	Operators: `eq`, `neq`, `gt`, `gte`, `lt`, `lte`, `like`, `in`
Sorting	Single or multiple `order_by` clauses, ascending or descending
Pagination	`limit` and `offset` for page-based access

Troubleshooting¶

Status shows "Not Loaded" after pod restart¶

The in-memory data does not persist across pod restarts. Click Load Data or wait for the first query to trigger auto-loading via ensure_loaded().

"manifest.json not found in S3" error¶

The S3 bucket has no dump data. You need to create a dump first:

Use the Dump from Live FDW section in the UI, or
Run the dump script: python scripts/sfdc-simulator/dump_sfdc_via_fdw.py --url <ALB_URL> --s3-bucket <BUCKET>

Load takes a long time or times out¶

Large schemas with hundreds of tables and thousands of rows can take 30+ seconds to load. Solutions:

Dump only priority tables (the default) instead of using --all.
Check the S3 bucket region matches the pod's region to minimize latency.
If a single table has an excessive number of rows, re-dump with a lower --limit.

Mode switch has no effect¶

Verify the SSM parameter was updated: GET /api/v1/config/sfdc-mode
The mode change affects new queries but does not reload data. If switching to simulator, ensure data is loaded.
Some operations may cache the mode at startup. A pod restart (POST /api/v1/sfdc-simulator/restart-pods) ensures all pods pick up the new mode.

FDW Objects table shows 0 rows for all tables¶

The dump may have only captured metadata (servers, tables, columns) without row data. Check manifest.json:

aws s3 cp s3://<bucket>/sfdc-simulator/manifest.json - | python -m json.tool

Look at tables_with_data and total_rows_dumped. If zero, re-run the dump with explicit table names or --all.

Dump fails with connection errors¶

Verify the source URL points to a running FDW runtime with valid /api/fdw/servers endpoint.
Check that the source environment has valid Salesforce FDW credentials configured.
Increase --limit timeout if tables have very large row counts.
Ensure the dump script has network access to the ALB (VPN, security groups, etc.).

Data appears stale¶

The simulator serves whatever was last loaded from S3. To get fresh data:

Run a new dump from the live FDW (UI or CLI).
Click Load Data (or Reset) to reload from S3.