Data Reconciliation Strategy for High-Volume Inform Implementations

Overview

When scaling Validic Inform to handle large data volumes, using main streams for reconciliation offers advantages over replay streams, particularly for organizations processing significant amounts of data.

Challenge

While replay streams are useful for smaller implementations, they have limitations for high-volume scenarios:

  • No load balancing capability

  • Single connection per stream

  • Processing overhead for full 30-day data retrieval

Solution: Main Streams for Reconciliation

Using main streams for reconciliation provides:

  • Load balancing across up to 3 client connections per stream

  • Automatic storage of disconnect timestamps for 7 days

  • Ability to run alongside primary streams

  • More efficient data processing

Implementation Strategy

  1. Stream Creation

POST <https://streams.v2.validic.com/streams> { "name": "reconciliation-stream", "resource_filter": ["summary", "measurement"] // Customize based on needs }
  1. Monitor Until Caught Up:

    • Track created_at timestamps from incoming payloads

    • Compare against current time considering source sync patterns

    • Most sources sync within 2-15 minutes

    • Some sources like Dexcom have built-in delays (e.g., 1 hour)

  2. Disconnect Process:

    • Clean disconnect when caught up

    • Validic stores disconnect timestamp for 7 days

    • No need to track disconnect time locally

  3. Next Reconciliation Run:

    • Must reconnect within 7 days

    • Stream automatically resumes from the last disconnect

    • Beyond 7 days, the stream will return the default 30-day window of data

Implementation Notes

Stream Management

  • Consider multiple streams for different data types if the volume is high enough to warrant it

  • Default limits:

    • 5 streams per organization

    • 3 connections per stream

      • Contact your Validic representative to discuss limit adjustments for specific use cases

Data Processing

Appendix: Estimated Processing Times

Intraday:

Historical Data Processing:

  • Most sources: ≤ 60 minutes

  • Fitbit: Up to 24 hours

  • Garmin: ≤ 8 hours

New Data Availability:

  • Most sources: 2-15 minutes

  • Some activity sources have built-in delays

  • Mobile SDK data: varies by platform and configuration

 

CGM:

Historical Data Processing:

  • Abbott: Up to 24 hours

  • Dexcom: Up to 24 hours

New Data Availability:

  • CGM sources have built-in delays

  • Generally ~1.5 hours