The Comscore crawler uses a scanner which identifies the elements containing the main content.
- Navigational elements in the header or footer are automatically excluded from processing as they might skew the accuracy of the result.
- Also, other elements such as “most popular articles” are excluded because in case of a terrible breaking news story, almost every page within a news site will in some form refer to this event in the surrounding elements of the actual content. We are preventing that whole sites are rendered unsafe by focusing our analysis on the actual content on the page
Once we’ve extracted the content, the next step is to create a pattern profile. This is the fundamental difference to a semantic approach used by other vendors in the market. Our technology identifies patterns which contribute to the aboutness of a page. The result is a weighted profile which can be compared to a fingerprint of the page. We also subtract normality to increase the focus on the specifics even further.
This pattern profile is now being matched against a dynamic mesh of contextual nodes (Dynamic Category matching). This mesh is ultra-granular and spans across roughly 350k contextual nodes. Each node is a cluster of highly similar patterns and is interconnected, at varying strength, to other nodes. By doing a multivariable search which considers the full context of the profile, the most relevant nodes are identified. The final output of our category analysis contains not just one but several categories, all being weighted by contextual relevance. The level of detail of this mesh exceeds traditional semantic / linguistic approaches at an order of magnitudes. For example, a page on "Computer Waste" will identify relevance to "Green Tech" or "Sustainable Computing", as well as terms not even present on the page itself, such as "Eco-Labeling of IT Products". Another example is we see sports and rugby but also legal issues because we identified content related to “domestic violence allegations” in the page’s content.
There is also the assessment of Brand Safety for the page which comes with its own dataset because even the most mundane categories could show up in a potentially objectionable context. The actual category of a page’s content might not be brand harming by itself and vice versa a single keyword such as “[fashion] disaster” is not objectionable either. We therefore use a combination of ultra-granular contextual analysis with keyword analysis to ensure that all objectionable elements are being identified and taken into consideration for the analysis.
On top of the available Brand Safety attributes our customers can apply a customization layer by specifying keywords and key phrases they would like to avoid within the page’s content.
0 Comments