What are the best practices to avoid slow API responses?
There are a few optimizations that you can do to avoid slow API responses:
Increasing the number of records returned per page
By removing the limit parameter, our API, by default, returns 1000 records per request. This helps in reducing the number of requests you need to make to paginate through all the records you need to retrieve. Consequently, with lower page counts, the responses are faster which improves your data retrieval speed overall.
Reducing the number of records you need to paginate through
Even with a higher number of records returned in a single request, it is possible that the number of records you need to retrieve are still significant requiring you to paginate through a large recordset. When paginating through large page count, typically, anything beyond page=500, the response time becomes significantly slower. To reduce the required pages to paginate through, you can customize the scope of your request for a smaller number of records by adjusting your start_date and end_date parameters. For example, instead of running your requests every hour with 1-hour scope, you can instead conduct your data retrieval every 30 minutes with a 30-minute scope. This in turn will reduce the number of pages you need to paginate through, with each request having a fast request-response cycle.
Utilizing latest.json as a change log service
A better improvement for option "1" above is to actually make API calls to latest.json at even shorter intervals. We usually recommend making them every 5 minutes to immediately get the latest updates and have them persisted immediately on your database. In practice, this usually returns very few records, only needing one or a few requests to get everything within the 5 minute scope. Others have even implemented this on a per 2-minute basis, usually to avoid having to paginate anymore. This helps, so that while there may be a lot more HTTP requests being made, the delivery time for your data is faster.
Even further, you may add logic to when you poll at 5 minutes intervals and when you should poll once an hour. By analyzing the number of records that are being updated on your database per hour, you should be able to identify which hours are busy (a lot of updates coming in), and which hours are not. You can, for example, poll more frequently on busy hours, while during off-hours, poll less.
Paginating in parallel processes
In our API responses, we include a summary object which shows how many records are available based on the scope of the request you made. For example, in the sample response below, the "results" field shows 3159, which means there are as much records that you can retrieve based on the given start_date and end_date. In practice, once you make the initial request, you would already know how many pages would be available knowing there would be 1000 records returned per request. For example below, you need 3 extra requests to get page 2-4, which you can then run in parallel processes as an example below:
1st request: Get the first page and determine the number of records to retrieve
2nd request (parallel): Get with params "page=2" for record 1001-2000
3rd request (parallel): Get with params "page=3" for record 2001-3000
4th request (parallel): Get with params "page=4" for record 3001-3159
We highly recommend considering implementing the options above not just individually, but even in combination with each other. The goal is to ensure you are able to receive updates consistently and immediately, improving your users' experience.