Provider Plugins
Declarative Cloudflare-Hosted Provider Plugins
Status: Proposal (CAS-3499)
Date: 2026-05-17
Owner: Backend (Artificer)
Supersedes: PLAN-unified-providers.md Phase 3, CAS-3494
1. Decision Summary
Casaconomy provider plugins become downloadable data specs hosted on Cloudflare and executed by one in-app interpreter on desktop and iOS.
Hard invariant:
- Downloaded provider plugins are declarative data only.
- No downloaded executable code.
- No eval/runtime scripting.
- No WASM/native plugin loading.
The existing config-driven pipeline (ProviderConfig transport + ProcessorConfig parse + generic processors) is promoted to the long-term architecture, not deprecated.
2. Goals And Non-Goals
Goals:
- Add/update providers without app-store release.
- Keep one provider model across desktop + iOS.
- Preserve App Store policy safety boundary.
- Make provider authoring compatible with recorder workflow.
- Add strong trust + provenance controls because plugins parse financial data.
Non-goals:
- Arbitrary scripting language for providers.
- JS/WASM/native provider runtime.
- Replacing current transport/parser model with hardcoded provider modules.
- Dynamic execution outside the bounded interpreter.
3. Architecture
3.1 Runtime Components
- Provider Registry Service (new): fetches plugin index + specs from Cloudflare, verifies signatures, validates schema, stores approved versions locally.
- Provider Execution Engine (existing+hardening): interprets provider transport/fetch spec and processor mapping spec, then yields normalized transaction rows.
- Provider Store (existing DB tables extended): stores installed plugin metadata, enabled/disabled state, pinning, provenance, and local override markers.
- UI surfaces (new): plugin catalog, install/update/rollback, trust status, enable toggles.
3.2 Data Model
Plugin package is JSON (or compact JSON+detached signature), versioned:
plugin_id: stable logical id (seb.sestyle namespace)version: semantic-ish provider version (2026.05.17.1acceptable)provider_config: declarative transport specprocessor_config: declarative parse/transform speccapabilities: declared feature flags (pagination, oauth, csv, json, etc.)constraints: engine version range + platform compatibilityredaction_hints: declarative map of sensitive fields so masking/redaction remains enforced for downloaded specsintegrity: hash of canonical payloadsignature: detached signature over canonical payloadpublished_at,publisher,changelog
No field in schema may contain executable snippets. Any existing free-form script surfaces are removed/locked.
3.3 Storage And Resolution
Resolution order at runtime:
- Local pinned plugin version (if pinned)
- Latest locally installed verified version
- Bundled fallback provider templates (read-only)
Provider execution only runs fully validated + signature-verified specs.
3.4 Schema-Level Safety Constraints
Safety constraints that apply at plugin-schema validation time (not deferred to runtime hardening):
- Regex pattern length limits for
FetchValue::Prefetch::patternand processorRegex.pattern. - Regex feature restrictions to avoid catastrophic backtracking patterns.
- Maximum replacement length for processor
Regex.replacement. redaction_hintsrequired when fields contain account numbers, personal identifiers, auth tokens, or provider session identifiers.- Reject plugin activation if any schema safety constraint fails.
3.5 Provider Expression Model (Existing Engine, No New Code)
The central design question (“how does pure data express bank-specific extraction without per-provider code?”) is already answered by the current engine:
preload.jsis already a generic parameterized extractor that readswindow.providerConfigand adapts behavior from config fields (readySelector,autoClickSelectors,dataArrayKey,parentFields,dateField,idField, etc.).ProviderConfigis the transport DSL: typed fetch/lookup expressions (Prefetch { method, url, pattern },Template,Static,Initial) with data-only parameters.ProcessorConfigis the parse DSL: typed transforms and mapping/filter vocabulary (ParseDate,ToNumber,Regex, field mappings, JSON path, excel sheet/header selectors,Equals/Contains/IsEmptyconditions).
Design consequence:
- A downloadable provider spec is a
ProviderConfig + ProcessorConfigbundle interpreted by already-bundled engines. - Adding a new provider requires zero new
preload.jscode and zero new Rust execution-engine code. - Downloaded plugins remain data-only and stay inside the iOS-safe execution boundary.
Engine-hardening audit target:
- Audit all current provider configs to confirm there is no free-form script/eval surface; reject future schema additions that introduce one.
4. Cloudflare Distribution Model
4.1 Hosting Layout
Cloudflare R2 + CDN (or Workers static assets):
plugins/index.jsonsigned catalogplugins/<plugin_id>/<version>/plugin.jsonplugins/<plugin_id>/<version>/plugin.sig- optional metadata blobs (
CHANGELOG.md, diagnostics)
index.json contains current channel pointers (stable, maybe beta later), minimum app engine version, revoked versions.
4.2 Update Protocol
- App fetches signed
index.jsonon schedule + manual refresh. - For each enabled plugin, app compares installed version vs catalog target.
- Download candidate spec, verify hash + signature + schema.
- Stage as
pendinguntil user approves update policy (auto/manual depends on setting). - Promote to
activeonly after successful dry-run sanity checks.
4.3 Offline Behavior
If network/catalog unavailable:
- Keep using last verified installed versions.
- Never disable an already working installed provider solely due to fetch outage.
- Surface stale-catalog warning in UI.
5. Security And Trust Model
5.1 Threats
- Tampered plugin payload in transit/storage
- Malicious or compromised publisher key
- Unsafe parser config causing data exfiltration or overreach
- Regression update breaking parse correctness silently
5.2 Controls
- Detached signatures (Ed25519 recommended) on catalog and every plugin payload.
- Trust root key(s) pinned in app bundle; support key rotation manifest.
- Strict JSON schema validation with deny-by-default unknown executable fields.
- Domain allowlist constraints in
ProviderConfigenforced by engine. - Bounded transform language only (existing typed transforms), no code interpolation.
- Size/time guards: response size cap, parse row cap, mapping recursion cap.
- Provenance ledger in DB: who published, hash, installed-at, activated-at.
- Revocation list in signed catalog; revoked versions are blocked from activation.
5.3 User Trust UX
Every plugin shows:
- publisher identity
- signature status
- publish date
- permissions/capabilities summary
- whether it is official-household or imported-local
Imported local configs remain supported but explicitly marked untrusted until approved.
5.4 Redaction And Telemetry Parity
Downloaded specs must not weaken current data-protection guarantees:
- Extracted financial data follows existing redaction/masking guarantees (parity with current
redact.tsbehavior) before any bug-report, telemetry, diagnostics, or review-snapshot surface. redaction_hintsis part of plugin metadata and is validated before activation.- Engine diagnostics must default to masked output for sensitive fields.
6. Generic Scraper Engine Design
Per regent direction: “think really hard how to do this — the ENGINE will get really complex.”
The engine splits into two distinct parts:
Scraper (this section — the hard part): Navigate to the right authenticated page state, execute the right fetch or in-page extraction, deliver raw bytes/JSON to Rust.
Mapper (§3.5 — the easy part): Take raw bytes/JSON from the scraper, apply ProcessorConfig field mappings and transforms, produce Vec<TransactionRow>.
A downloadable plugin is a data bundle configuring both halves. The scraper config lives in ProviderConfig; the mapper config lives in ProcessorConfig. The bank-specific knowledge is entirely in the data. Neither half downloads or executes new code.
6.1 Extraction Strategy Taxonomy
The fetch_url field is the strategy selector. The bundled engine dispatches on it:
| Strategy | fetch_url pattern | How data is obtained |
|---|---|---|
http-fetch | https://... real URL | HTTP GET or POST within the authenticated webview session. Session cookies are present; response bytes forwarded to Rust + ProcessorConfig. Current example: AMEX (/api/servicing/v1/financials/documents?file_format=excel). |
nextjs-rsc | extract://nextjs-rsc | Extracts from the RSC flight payload (self.__next_f global). Handles Next.js 13+ App Router banks. Retries up to 20× at 500 ms intervals to cover async streaming/hydration. Current example: SEB/Spendwise portal. |
nextjs-data | extract://nextjs-data | Extracts from the __NEXT_DATA__ JSON script tag. Handles Next.js Pages Router banks (the older, still-common variant). |
script-json | extract://script-json | Scans all inline <script> tags for embedded JSON matching a declared dataArrayKey. Handles SPAs that inline transaction data in script blocks. |
(future v1.1) xhr-capture | extract://xhr-capture | Intercepts the XHR the bank’s own page issues during normal load, forwards the matching response. Requires new bundled extractor entry. See §6.7 Gap 1. |
(future v1.2) dom-table | extract://dom-table | Extracts rows from an HTML <table> matching a declared CSS selector and column index mapping. See §6.7 Gap 2. |
The strategy name is a string constant in the config. Adding a new strategy means adding one entry to the bundled extractor registry in preload.js — a new engine capability, not a per-provider script. Plugins declare their required engine version in constraints.engine_version; the app will not activate a plugin requiring a strategy the installed engine does not support.
6.2 Auth and Navigation Layer
Before the main fetch or extract, the scraper handles page navigation and auth completion. All auth/navigation config is declarative — no provider code.
| Config key | Location | Purpose |
|---|---|---|
readySelector | fetch_body static | CSS selector: wait until this element appears in DOM before triggering the fetch. Handles SPA hydration. Timeout: 15 s, then overlay hides and fetch proceeds anyway. |
autoClickSelectors | fetch_body static | Comma-separated CSS selectors: click matching elements as they appear. Handles BankID initiation buttons, cookie banners, “continue” steps. Per-selector timeout: 1.5 s. |
overlayText | fetch_body static | User-visible overlay text while the engine navigates (default: “Loading…”). |
For http-fetch providers, the webview navigates to login_url, then landing_page_url. The readySelector / autoClickSelectors can run on the landing page before the final fetch_url request fires.
6.3 Prefetch Step Model
Some banks require extracting a CSRF token, session ID, or account key from one endpoint before the main data fetch. FetchValue::Prefetch models this:
"fetch_query": { "account_key": { "prefetch": { "method": "GET", "url": "https://bank/api/session", "pattern": "\"accountKey\":\"([^\"]+)\"" } }}- The webview issues the prefetch request within its authenticated session.
- The
pattern(a regex) extracts a named value from the response body. - The extracted value is stored in
window.extractedValues[key]. - In the main fetch, a body/query field left as an empty string is substituted with the matching extracted value.
- Multiple prefetch steps execute in parallel before the main fetch fires.
Prefetch is data-only. The pattern is a regex string, not code. Length and construct constraints apply (§3.4).
6.4 Extractor Configuration Vocabulary
When using extract:// strategies, the fetch_body carries extractor configuration as static values alongside the navigation config:
| Key | Default | Description |
|---|---|---|
dataArrayKey | "transactions" | JSON key of the target data array within the extracted content. |
parentFields | "" | Comma-separated field names to copy from a parent object (e.g., card group) into each extracted row. Example: "nameOnCard,maskedCardNumber". |
wrapperFormat | "cardGroups" | Output wrapper structure: "cardGroups" (nested groups → transaction groups → transactions) or "flat" (direct array). |
dateField | "" | Field containing dates to normalize to YYYY-MM-DD (ISO truncation). |
idField | "" | Numeric ID field to convert to string (avoids JS integer precision loss for large IDs). |
6.5 Mapper Layer (ProcessorConfig — the Easy Part)
Once the scraper delivers raw bytes/JSON to Rust, ProcessorConfig takes over. This is conceptually solved — the bounded typed vocabulary already exists:
- Formats:
Json,Excel,Csv - Transforms:
ParseDate { format, timezone },ToNumber,Regex { pattern, replacement } - Filters:
Equals { field, value },Contains { field, value },IsEmpty { field } - Selectors:
json_path,excel_sheet,excel_header_rows,field_mappings
Each FieldMapping declares target_field (destination in TransactionRow), source_field (column index for Excel, JSON path for JSON), default_value, and optional transform. No code; entirely data.
6.6 Failure Handling and Robustness
Failure handling is bundled in the engine, not per-provider:
- Async hydration delays:
nextjs-rscretries up to 20 times at 500 ms each (10 s total) before failing. This covers Next.js RSC streaming. Constant in the bundle; not configurable per plugin in v1. - Page readiness timeout:
readySelectortimes out at 15 s; overlay hides and fetch proceeds. Prevents indefinite hang on slow-loading banks. - Empty extraction: Engine emits typed diagnostics (“why row dropped”, mapping misses, transform failures — Phase B hardening). Surfaces error to UI rather than silently succeeding with zero rows.
- HTTP errors: Non-2xx responses from
http-fetchorprefetchsteps produce typed errors forwarded to Rust and surfaced to the UI. - Auth redirect: If the bank’s landing page redirects to login mid-session, the webview shows the login page; the user completes auth. The
readySelectorwaits for landing-page readiness after the redirect resolves. No special handling needed. - Plugin-declared timeouts:
constraints.scraper_timeout_msis reserved in the schema for a future engine version to support overrides. Not in v1.
6.7 Honest Gap Assessment
The engine covers a well-defined set of real-world bank patterns. The gaps below are explicit follow-up engine capability work — not downloadable code, but new entries in the bundled extractor registry. Each has a severity estimate and requires a dedicated engine CAS before Phase C rollout can target banks that need it.
Covered by engine v1.0:
- REST/JSON API accessible via session-cookie auth (AMEX pattern)
- Next.js App Router RSC payload (SEB/Spendwise pattern)
- Next.js Pages Router
__NEXT_DATA__ - Generic inline script JSON
- BankID / single-button auth flows via
autoClickSelectors - CSRF / auth-token prefetch via
FetchValue::Prefetch - Wait-for-hydration via
readySelector - Excel, CSV, JSON file downloads from API endpoint
Gap 1 — XHR intercept/capture (severity: HIGH)
Some banks serve transaction data only through XHR requests the bank’s own page makes during normal load. The window.providerConfig recorder already captures these (it wraps fetch and XMLHttpRequest in recording mode). The gap is that there is no extract://xhr-capture replay strategy: the engine has no way to intercept a specific outgoing XHR at load time and forward its response, because the XHR is authenticated with runtime state that exists only during the page’s own JS execution.
Solution design: add extract://xhr-capture extractor that installs a fetch/XHR wrapper at document_start, matches outgoing requests against a declared URL pattern, and forwards the first matching response to Rust. Config: urlPattern (regex), method (GET/POST). This requires a new extractor function in preload.js — a bundled capability addition, not downloadable code.
Action: file engine capability CAS for xhr-capture as part of Phase B hardening.
Gap 2 — HTML DOM table scraping (severity: MEDIUM)
Legacy bank portals rendering transactions as HTML <table> elements with no embedded JSON or API endpoint. No extractor handles this today.
Solution design: add extract://dom-table extractor with declared config: tableSelector (CSS selector), headerRow (boolean), columnMappings (column index → named field). Returns JSON array of row objects for ProcessorConfig to transform.
Action: file engine capability CAS for dom-table.
Gap 3 — Multi-fetch pagination (severity: MEDIUM)
Banks returning page 1 of N where N is only known from the first response (a response header or body field). Current engine issues one fetch per sync.
Solution design: add a pagination block to ProviderConfig: totalPagesField (JSON path into first response), pageUrlTemplate (URL with {page} placeholder), maxPages (safety cap). Engine runs the page loop internally, merges results before handing off to ProcessorConfig.
Action: file engine capability CAS for paginated fetch.
Gap 4 — Conditional multi-step navigation (severity: LOW-MEDIUM)
Banks requiring a conditional navigation sequence (select account → pick date range → confirm → download). autoClickSelectors handles simple linear click sequences; conditional state machines are not expressible.
Assessment: most Swedish target banks fall into the readySelector + autoClickSelectors pattern or have accessible API endpoints. Defer until a specific bank pattern requires it.
Action: document as known limitation; flag during recorder-based provider authoring.
Gap 5 — OAuth 2.0 authorization code flow (severity: LOW)
Banks requiring a proper OAuth dance where the app intercepts the redirect callback and exchanges codes. Most current targets use session-cookie auth within the webview.
Assessment: deferred. A separate design is warranted before attempting implementation.
6.8 Engine Version Contract
The constraints.engine_version field in the plugin schema creates a capability contract between the plugin and the installed app:
| Engine version | Strategies added |
|---|---|
| v1.0 | http-fetch, nextjs-rsc, nextjs-data, script-json |
| v1.1 | xhr-capture |
| v1.2 | dom-table, paginated |
The Registry Service rejects activation of a plugin declaring engine_version: ">=1.1" on an app with engine v1.0. The app surfaces a “plugin requires app update” message in the catalog UX.
Engine versioning is separate from app version: the engine capability level is a named constant in the bundle, incremented only when new extractor strategies are added.
6.9 Hardening Checklist (Phase B)
Before rollout, harden the engine:
- Remove any remaining script/eval path from provider execution flow.
- Constrain URL templating to declared
FetchValueplaceholders only. - Validate all variable substitutions against declared schema; reject unknown keys.
- Add deterministic per-row diagnostics: why a row was dropped, which field mapping missed, which transform failed.
- Add canonical test vectors per plugin version (used in dry-run gate).
- Apply regex safety constraints (§3.4) to all existing provider configs before migration.
7. Recorder Pipeline Reconciliation
PLAN-provider-recorder.md becomes the authoring pipeline for this model:
- Record provider traffic (desktop tooling).
- Generate draft
ProviderConfig+ProcessorConfig. - Iterate with dry-run parse/live harness.
- Publish signed plugin artifact to Cloudflare catalog.
- App downloads + verifies + offers activation.
Recorder output format should be extended with publish-ready metadata (plugin id, changelog, constraints), but still emits declarative config payloads.
8. Superseded And Preserved Parts
Superseded from PLAN-unified-providers.md:
- Phase 3 premise that config-driven path should be removed/deprecated.
- End-state of code-first compiled provider modules as the canonical model.
Preserved:
- Separation of transport orchestration in services from parse execution.
- Existing
ProviderConfig/ProcessorConfigsemantics as base contract. - Generic processor path as the single runtime engine.
9. Rollout Plan
Phase A: Spec + Trust Foundations
- Define plugin schema v1 and catalog schema v1.
- Encode regex safety constraints directly in schema v1 (length bounds + restricted constructs).
- Add
redaction_hintsto schema v1 and enforce validation requirements. - Implement signature verification + trust root loading.
- Add local DB tables/columns for plugin provenance and state.
- Retire/rewrite
PLAN-unified-providers.mdto remove the superseded code-first Phase 3 direction.
Phase B: Execution Hardening
- Remove any dynamic script surfaces.
- Add strict validators and runtime guards.
- Add deterministic diagnostics and test vectors.
Phase C: Cloudflare Distribution
- Publish signed catalog + artifacts.
- Implement fetch/stage/activate lifecycle.
- Add revocation + key-rotation support.
Phase D: UX + Policy
- Catalog UI and enable/disable/update flows.
- Trust surfaces and warnings.
- Update policy controls (manual/auto).
Phase E: Recorder Integration
- Recorder outputs publish-ready plugin manifests.
- Publish toolchain signs and pushes to Cloudflare.
Phase F: iOS Provider Scraping Port (sequenced follow-on — NOT concurrent with A–E)
Gate: desktop Phases A–E complete, desktop test harness green, and regent has verified a working desktop round-trip with a real provider.
Scope:
- iOS consumer side only: bundled generic scraper (
preload.js+ProviderConfig+ProcessorConfig) running under iOS WKWebView, provider catalog/enable UI on mobile, auth/login flow on iOS including BankID app-handoff. - Recorder and authoring tooling stay desktop-only; iOS consumes specs, does not record.
- iOS no-downloaded-code invariant: already satisfied — the declarative model downloads data specs only; the bundled scraper interprets them. This holds identically on the iOS execution path; no change to the invariant.
- iOS test architecture leg: see §11.8.
Phase F must NOT be broken into implementation CASes until the desktop gate above is satisfied.
10. Open Decisions For Regent Sign-Off
Recommendations are included (Saga + MoC). Regent confirms or overrides from phone.
| Decision | Recommendation | Rationale |
|---|---|---|
| Signing authority model | Single household key for v1; delegated per-provider keys deferred | Conservative start; rotation manifest in bundle from day one so delegation can be added without bundle change |
| Auto-update default for stable channel | Manual approve; opt-in to auto | Consistent with risk-mitigation.md conservative ethos; users of financial data tooling should see what changed |
| Dry-run gate before activation | Mandatory strict | A plugin that fails dry-run never activates; no soft-warning bypass in v1 |
| Revocation of already-enabled versions | Grace window + loud warning; hard disable only if confirmed-malicious | Avoids breaking active users on a flag error; kill-switch is still fast (config push, hours not days) |
| Beta channel | Defer | No second distribution tier in v1; stable channel only |
11. Automated Test Architecture (CF-Hosted Fake Provider Harness)
Regent mandate (2026-05-18): testing is a first-class design concern because the system is by nature painful to test manually (real bank auth, BankID, no determinism). The test architecture must exercise the real trust+distribution chain — not a local shortcut — and run in CI without a Mac, a human, or a real bank.
11.1 Design Principles
- Real distribution chain, fake data. Fake providers are distributed via the same signed-catalog → download → hash+signature verify → enable path as production plugins. No test bypasses the integrity or signature check.
- Auth exercised, not skipped. Fake bank servers serve canned login pages with stub auth flows (the same
autoClickSelectors/readySelectorconfig the real scraper drives). BankID is replaced by a fake button; the navigation code path is identical. - Gap↔fake 1:1 mapping. One fake provider per scraper extraction strategy. A green test proves “the generic scraper handles this pattern via config alone.” A red test marks a gap not yet implemented.
- RED→GREEN is the acceptance gate. Each Phase A–E implementation CAS is not done until its corresponding fake-provider end-to-end test is green in CI.
- Mapper tests are independent. ProcessorConfig fixture tests (raw bytes →
Vec<TransactionRow>) run in the standard Rust test harness — no webview, fast, pure unit.
11.2 Fake Provider Infrastructure
Fake CF distribution layer:
A second CF Workers namespace (plugins-test.casaconomy.workers.dev or equivalent) hosts:
index.json— signed fake catalog pointing at fake plugin versionsplugins/<fake_id>/<version>/plugin.json— real signed plugin specs (using a test signing key)plugins/<fake_id>/<version>/plugin.sig— detached signatures
App in test mode reads PLUGIN_CATALOG_URL from env to point at the fake catalog. The trust chain (download, hash, signature, schema validation) runs identically to production. The test signing key is pinned in the test bundle separately from the production trust root.
Fake bank server (CF Worker per fake provider):
Each fake bank Worker serves:
GET /login— stub login page HTML with a fake auth button matching the provider’sautoClickSelectorsGET /landing— landing page HTML containing canned transaction data in the extraction format the strategy readsGET /api/data(forhttp-fetchfakes) — canned Excel/JSON bytes response
The fake bank Worker is stateless and deterministic: same request always returns the same canned response.
11.3 Gap↔Fake Mapping Table
Each row is one fake provider. “Phase gate” is the CI gate that must turn GREEN for the corresponding implementation phase to close.
| Fake provider ID | Extraction strategy | Auth simulation | Fixture data | Mirrors | Phase gate |
|---|---|---|---|---|---|
fake-amex.test | http-fetch | Login redirect + readySelector | Canned Excel bytes (3 rows) | AMEX | Phase B |
fake-seb.test | extract://nextjs-rsc | Login redirect + autoClickSelectors (fake BankID btn) + readySelector | Landing page with self.__next_f RSC payload | SEB/Spendwise | Phase B |
fake-nextjs-pages.test | extract://nextjs-data | Login redirect + readySelector | Landing page with __NEXT_DATA__ script tag | (new pattern) | Phase B |
fake-script-json.test | extract://script-json | Login redirect + readySelector | Landing page with inline <script> JSON block | (new pattern) | Phase B |
fake-xhr-capture.test | extract://xhr-capture | Login redirect + readySelector | Landing page that issues an XHR to /api/data on load | Gap 1 (§6.7) | Gap 1 capability CAS |
fake-dom-table.test | extract://dom-table | Login redirect + readySelector | Landing page with <table> transaction rows | Gap 2 (§6.7) | Gap 2 capability CAS |
fake-paginated.test | paginated http-fetch | Login redirect + readySelector | Canned 3-page paginated API (?page=1,2,3) | Gap 3 (§6.7) | Gap 3 capability CAS |
Gap fakes start RED at Phase B. They turn GREEN when the corresponding engine capability CAS ships. That transition is the proof that “the generic scraper covers this bank pattern via config alone, zero new code.”
11.4 Full End-to-End Test Sequence
For each fake provider in §11.3, the automated test executes:
- Catalog fetch: fetch and verify the signed
index.jsonfrom the fake CF catalog Worker - Plugin download: download
plugin.json+plugin.sigfor the target fake provider version - Integrity + signature verify: run the production verification path (hash, Ed25519 sig against test trust root)
- Schema validate: run the production schema validator against the plugin payload
- Enable plugin: install and activate via the production Provider Registry Service
- Scraper run: open a headless WebView pointing at
fake-<id>/login; let the scraper drive nav + auth + fetch/extract via the same bundledpreload.jscode path as a real bank - Assert output: compare extracted
Vec<TransactionRow>against the expected fixture (deterministic per fake)
Steps 1–5 test the trust/distribution chain. Steps 6–7 test the scraper engine.
11.5 CI Environment
Tests run on a Linux GitHub Actions runner:
- Headless WebView via WebKitGTK + Xvfb (the same WebKit engine Tauri uses on Linux)
- Fake CF Workers deployed to test namespace before the CI run (or mocked via a local HTTP server for speed; the CF deployment gate runs as a separate smoke step)
- No Mac-specific dependencies; no real bank connectivity
PLUGIN_CATALOG_URL,TRUST_ROOT_KEY(test key), andFAKE_BANK_BASE_URLinjected via CI env vars
11.6 Mapper Fixture Tests (Separate Track)
The mapper side (ProcessorConfig) has independent fixture-based unit tests that don’t require a WebView:
- Input: raw bytes (synthetic Excel/JSON/CSV files matching the real schemas)
- Run:
ProcessorConfigfield-mapping pipeline + transforms - Assert:
Vec<TransactionRow>matches expected baseline - Parity test: synthetic AMEX Excel fixture → AMEX
ProcessorConfig→ assert expected rows
Run via cargo test, fast, no infrastructure dependency.
11.7 Relationship to Provider Recorder
| Provider Recorder | Fake Harness | |
|---|---|---|
| Purpose | Author real provider specs from live traffic | Test the scraper/mapper against deterministic fixtures in CI |
| When used | During provider development (desktop, human-in-loop) | Every PR |
| Data source | Real bank traffic | Canned/synthetic fixture data |
| Output | ProviderConfig + ProcessorConfig spec | Test pass/fail + gap coverage signal |
Recorder output seeds fake providers: capture real traffic once, adapt for the stub server, commit to the fake-providers CF deployment. They share the schema; they don’t share infrastructure.
11.8 iOS Test Leg (Phase F)
The gap↔fake test set (§11.3) runs on both legs: desktop (§11.5) and iOS (this section). The iOS leg replaces the “manual device nightmare” — the regent’s core concern — with automated iOS Simulator runs using the same fake CF infrastructure.
CI environment for iOS leg:
- macOS GitHub Actions runner (required — iOS builds and Simulator require Xcode/macOS)
- iOS Simulator (not physical device) via
xcodebuild test - Same fake CF Worker endpoints used by the desktop leg — no iOS-specific server changes
- Same test signing key + fake catalog URL injected via CI env vars
- Gate: iOS leg tests are added in Phase F and must be green before Phase F implementation CASes can close
What the iOS leg exercises vs. desktop leg:
The end-to-end sequence (§11.4 steps 1–7) runs identically. What differs:
- Step 6 uses iOS WKWebView instead of WebKitGTK; all extractor strategies must behave identically
- Fake BankID auth uses the same stub button +
autoClickSelectorsapproach for the in-webview flow. Deep-link handoff (BankID as a separate app) is NOT tested in CI — that remains manual for now; the CI leg proves the engine code path works
iOS-specific design risks for Phase F (explicit, not glossed):
-
BankID iOS deep-link handoff (risk: HIGH): On iOS, real BankID auth redirects the user to the BankID.app and back to the calling app via a custom URL scheme. This is fundamentally different from the desktop flow (where BankID runs in the webview). The declarative
autoClickSelectorsconfig covers the in-webview tap, but the OS-level app redirect requires a registered URL scheme handler in the Tauri iOS app and a resume flow. Phase F must design this explicitly before implementation. -
WKWebView Tauri IPC on iOS (risk: MEDIUM): The
window.__TAURI_INTERNALS__.invokebridge is implemented differently on iOS (usesWKScriptMessageHandlerrather than the desktop IPC mechanism).handle_base64_dataand related commands must be verified under the iOS bridge. Known to work in current Tauri iOS builds for other features — but must be confirmed for the scraper data path. -
xhr-capturestrategy on iOS WKWebView (risk: MEDIUM): The XHR/fetch wrapping in thexhr-captureextractor usesXMLHttpRequest.prototype.openpatching. This is standard JS and should work in WKWebView on iOS, but App Store review has historically scrutinised JavaScript injection more tightly on iOS. Must verify the approach passes review and that WKWebView on iOS does not restrict the prototype patching. -
Cookie persistence and session scope (risk: LOW): WKWebView on iOS has its own separate cookie store (not shared with Safari). Session cookies from the bank login are scoped to the WKWebView instance. This matches desktop behavior — no architectural difference — but must be confirmed during Phase F bringup.
-
WebKit API surface differences (risk: LOW): Minor API differences between WebKitGTK (desktop CI) and WKWebView (iOS) could affect edge cases in the RSC/
__NEXT_DATA__/script-json extractors. The fake-provider CI run on iOS is the only reliable way to catch these without real devices.
12. Acceptance Criteria For This Design CAS
- Proposal explicitly states declarative-only invariant and iOS policy boundary.
- Proposal supersedes unified-providers Phase 3 deprecation premise.
- Proposal reconciles recorder plan as authoring path.
- Proposal defines trust, signature, provenance, and revocation model.
- Proposal defines a staged rollout sequence suitable for implementation breakdown.
- Proposal includes a concrete Generic Scraper Engine chapter (§6) with extraction-strategy taxonomy, auth/navigation model, prefetch step model, honest gap assessment, and engine version contract.
- Proposal includes a first-class Automated Test Architecture chapter (§11) with gap↔fake 1:1 mapping table, full end-to-end test sequence, CI acceptance-gate definition, mapper fixture test track, and iOS test leg with explicit design risks.
- §9 rollout plan includes sequenced Phase F (iOS port), gated on desktop A–E + harness green, with explicit iOS-specific scope constraints and design risks.
- §10 open decisions confirmed by regent (2026-05-18: “agree with all”).
What changed {#what-changed}
2026-05-18 — CAS-3499 Declarative Provider Plugins (shipped)
The declarative plugin system shipped across Phases A–E. Provider plugins are now downloadable from Cloudflare R2, signature-verified (Ed25519), schema-validated, and activated inside the app without any executable code leaving the trust boundary. The full catalog UI is live in Settings. This architecture doc was updated from proposal to delivered status.