Mayer-Schonberger and Cuckier's
Big Data ... book (my review
here) sparked a few thoughts on how its N=All argument -redundancy of sample extrapolation in the availability of
all of the datapoints- relates to media audience research.
Audience measurement today uniquely straddles both the big data world of digital media where the entire audience is captured in real time - indeed, in the N=All sense, it is one of the purest play big data situations there is - and the 'small data' world of offline media with sample-extrapolated audience.
I say 'uniquely' not because other domains are exclusively one or the other but because of two unique characteristics. First is the asymmetry between data availability and the share of pie of each part. Half or more of the ad spend is based on “small data” measurements. Too much is riding on sample extrapolation for it to go away anytime soon.
The second unique characteristic is the less obvious reason why : the interdependency between the two types of data. Sample-based ‘small’ data is a critical requirement for unlocking the full value of the available big data itself.
At the heart of this is the need for machine (server / IP / browser) –level data to be translated into human target audience- level data. Sample surveys are the crucial bridge. The interdependency flows in both directions. Big data is being used to improve sample-based systems.
These dynamics, together with concerns about whether current systems are adequately measuring the complex media consumption of today are driving ‘hybrid’ research - the fusion of ‘census’ (N=All) big data and ‘survey’ or ‘panel’ sample data to estimate audiences that each could not do adequately alone.
Three key applications here would be (a) Target Audience Reach/Frequency metrics for online campaigns - wherein clickstream data (impressions, uniques, etc.) are matched against panel demographics through website tags / cookies (b) Supplementing existing TV audience measurement with Return Path Data (RPD) from set top boxes - thereby augmenting the sample size especially for vehicles like Pay TV which may be underrepresented in the normal panel. It'll also be a crucial part of ‘Addressable TV’ advertising (the digital insertion of one-on-one adverts on the TV set ) and (c) 'Total Video' measurment - combined TV + online video viewership at both levels - the total audience of TV stations across standard and online channels AND the total video exposure of ad campaigns across various online and offline channels.
Given the size of TV and the growth of multiscreen online video, ‘Total Video’ as a combination of both is arguably the Holy Grail of audience research today. This is where most of the industry focus is currently on and where much is indeed underway. Many projects are being directed by industry Joint Industry Committees (JIC) and at the forefront of most are the current TV measurement agencies - Nielsen and Rentrak in the US, BARB in the UK, AGF/AGOF in Germany, Mediametrie in France, etc -. Among the notable others are comScore and (reflecting the broadening stakeholder base of this field ) and Google which has projects with measurement agenies like Mediametrie and Kantar. Closer home in the MENA region, Ipsos Connect have a Fusion project in its initial stages. These are only a handful of examples mostly from the US and Western Europe but other similar projects are in progress there and elsewhere.
All of which is not to say that there is a magic button in place now or even around the corner. These projects are very much work in progress. Adding to the methodological complexities and logistical difficulties (increasingly including issues of viewability and ‘Bot’ fraud) are the operational challenges of working with multiple bodies, stakeholders and partners with their various expectations and interests. Costs are a major challenge - especially when there's often not enough clarity in the first place about the demand.
Given these challenges and complexities, turnaround into usable planning software is bound to lag the rapid evolution of media consumption. That should not, however, prevent planners from applying the understanding of these measurement issues to creatively think out of the box vis-à-vis the available data and develop their own back-of-the-envelope planning guidelines.
In the MENA region,for example, one needs to go beyond E-GRPs, which is relatively simple, and into calibration against TV GRPs, wherein suboptimal measurement differences add significant complexity. As it stands, it's an apples-to- oranges comparison : 'actual’ online video ad views captured in real time versus sample-extrapolated, next-day telephone interviews (CATI)-based views on TV. The latter relies on the respondent’s memory recall down to 15 minutes on the previous day and does not capture commercial break ad views.
An apples-to-apples calibration could work via estimating the ‘accuracy losses’ involved between the two systems via comparable differences versus electronically measured TV systems such as Peoplemeters which capture ad viewership. While such a system is only indicative at best, it can nonetheless provide useful guidelines for video budget allocation between online and offline channels rather than have this decided more arbitrarily.
Returning to the starting point of this piece in conclusion, big data in media audience research in the strictest N=All sense is limited mainly to online campaign analytics data wherein all of the exposures are captured. The rest needs to be qualified to N=Nearly All. Either because most census big data is actually proprietary (e.g. server data of websites) or is difficult to extract (e.g., all STB data across all operators in a specific market) or is non representative of the total market (e.g. IPTV STB data in the MENA market) or any combination of these, all of the data points are simply not available. What census big data does is to add on to panel sample data to give it more depth and breadth.
In other words, be it with the largest formal systems being developed at an industry level or the smallest in-house calibration projects, the alleged death of sampling in the big data era simply does not hold up to the reality of media audience measurement. Sampling quality affects not only the offline media it directly measures but also the comparative relative value - and therefore budget share- of the online media it shares the pie with. Hence there's a real need to ensure that the old fashioned checks are in place, that samples are robust, random and representative and that the analyses are rigorous. This is particularly true for markets where those checks were much less in focus to begin with. The big data era, far from making it redundant, accords an arguably greater importance than ever before to ‘small data’.
Welcome to the Medium Data world of audience research!