Comscore crawls a new URL whenever it is called by a partner using the Comscore programmatic API and data pertaining to that URL is not currently in our cache. Our standard process is to cache URL categorization for a maximum of 48-hours, but URLs that change frequently may be scanned every 10 – 15 minutes.
The most common (and ideal) scenario of an API request with a "good" (processable) URL generating a cache miss goes like this:
- Request 1) The API returns a transient error
- Request 2) The API returns an environment-level response (fetched from subsequent caching tier) with a short TTL of 30 seconds (indicating that the underlying error is transient)
- Request 3) The API returns a page-level response
A cache miss is immediately forwarded to the back-end for crawling and processing. The overall processing time mostly depends on the availability of the publishing host as well as the current request load for this particular domain. In general a URL's content is downloaded in 1 to 3 seconds. The processing itself takes just milliseconds.
0 Comments