Missing Data when Using latest.json
This article pertains to: Legacy API (V1)
Why is my data out of sync with Validic when using latest.json?
There are a few ways that data can be missed while using latest.json. Here is a little background information to start with. latest.json was designed for near real-time data retrieval. A response set is generated each time a request is made. That means subsequent page requests are generated at the time of request also. Any start_date and end_date filters provided will apply to the last_updated field of the record. Here's where data can be missed.
When a result set has more than one page, there is a possibility that if a record gets updated from the currently requested page right before the next page is requested, the first record on the second page will jump to the first page and be missed when the second page is requested. The potential for missing data increases greatly with the amount of pages being requested or the time it takes to process a single page.
Using the end_date filter when requesting latest.json can falsely create confidence that data was retrieved for the entire time period specified. The end_date sent in a request can be ahead of Validic's server time making it seem as though data updated between Validic's server time and the request time was retrieved.
Incrementing the start_date filter from your last end_date when requesting latest.json will result in updates being missed that were made to records within that second.
Here are a few solutions to these issues respectively:
Make requests more frequently with shorter time intervals and increase the page record limit to 1000 by specifying the limit parameter when making requests to the Validic API to reduce the chances of pagination.
Omit the end_date filter when requesting latest.json and use the summary -> end_date sent in Validic's response as the start_date for your next request. This will prevent server time differences and partial seconds from impacting result sets. Alternatively, you may want to delay your end_date and/or overlap your end_date with the next start_date.