Data Reconciliation Strategy for High-Volume Inform Implementations
Overview
When scaling Validic Inform to handle large data volumes, using main streams for reconciliation offers advantages over replay streams, particularly for organizations processing significant amounts of data.
Challenge
While replay streams are useful for smaller implementations, they have limitations for high-volume scenarios:
No load balancing capability
Single connection per stream
Processing overhead for full 30-day data retrieval
Solution: Main Streams for Reconciliation
Using main streams for reconciliation provides:
Load balancing across up to 3 client connections per stream
Automatic storage of disconnect timestamps for 7 days
Ability to run alongside primary streams
More efficient data processing
Implementation Strategy
Stream Creation
POST <https://streams.v2.validic.com/streams>
{
"name": "reconciliation-stream",
"resource_filter": ["summary", "measurement"] // Customize based on needs
}
Monitor Until Caught Up:
Track created_at timestamps from incoming payloads
Compare against current time considering source sync patterns
Most sources sync within 2-15 minutes
Some sources like Dexcom have built-in delays (e.g., 1 hour)
Disconnect Process:
Clean disconnect when caught up
Validic stores disconnect timestamp for 7 days
No need to track disconnect time locally
Next Reconciliation Run:
Must reconnect within 7 days
Stream automatically resumes from the last disconnect
Beyond 7 days, the stream will return the default 30-day window of data
Implementation Notes
Stream Management
Consider multiple streams for different data types if the volume is high enough to warrant it
Default limits:
5 streams per organization
3 connections per stream
Contact your Validic representative to discuss limit adjustments for specific use cases
Data Processing
Implement checksum validation to ensure you don’t write duplicates to your database.
Account for source-specific sync patterns.
Consider time zones in timestamp comparisons
Appendix: Estimated Processing Times
Intraday:
Historical Data Processing:
Most sources: ≤ 60 minutes
Fitbit: Up to 24 hours
Garmin: ≤ 8 hours
New Data Availability:
Most sources: 2-15 minutes
Some activity sources have built-in delays
Mobile SDK data: varies by platform and configuration
CGM:
Historical Data Processing:
Abbott: Up to 24 hours
Dexcom: Up to 24 hours
New Data Availability:
CGM sources have built-in delays
Generally ~1.5 hours