Syndicated data pt2: Why is syndicated data such a pain?

  •  
  • 3
  •  
  •  
  •  
  •  

Syndicated data is a total pain because of (somewhat needless) complexity and horrible interfaces.

Why is it so complicated?

The first thing to understand about syndicated data is that there is no exact answer.  If you wanted to study real estate transactions in a county you could come up with an exact number and dollar amount for transactions in a year.  The data is publicly reported and stored in a central location.  If you worked at Bank of America and wanted to know how many of your customers were late paying their credit cards, you could look up the exact number in your internal databases.  However if you worked at General Mills and wanted to know how much cereal Publix sold last year, you’re going to have to use an estimate.

Retailers do not all give their complete sales data to syndicated providers.  Sometimes they do not give any information to syndicated providers.  Yet customers of syndicated data want to know what the entire market looks like.  They want to know the same information for all retailers.  This forces the syndicated data providers come up with estimates.

Nielsen and IRI have excellent data scientists tasked with providing the estimates.  Yet that does not stop it from having some fundamental problems for analysts.  Projecting future sales based on past results automatically incurs uncertainty.  However when those past results are estimates, which by their nature contain uncertainty, the randomness of the projections increases.  The estimates also change over time, e.g., if you pull Nielsen’s reported July sales for Nestle in September it will not always be the same number that you pulled in August and will probably change again in October.

Another complicating factor is that the totals do not match.  For example, if you added up the total sales all reported cereals, it would not equal the total of the cereal category.  It would be close, but not exact because they both involve estimates.

One of the cornerstones of analysis techniques is error checking analysis by making sure the sum of the parts adds up to the total at all stages.  That is impossible when dealing with syndicated data.

There are also multiple different databases.  The main Nielsen and IRI deal with grocery stores and other large outlets like Walmart and dollar stores.  But if you want natural and specialty data, like Whole Foods, or non-UPC data, such as sales behind the deli counter, those are in different databases which do not connect with the overall data.

Businesses results are measured in time periods.  They want to see what their results are over a year, a quarter, a month, and sometimes even a week.  Syndicated data is measured by weeks, so weekly and yearly results are easy but to do monthly and quarterly results you need to estimate.

If that were not enough to convince you syndicated data was a pain, wait until you see the programs you need to use to pull the data.

Why are the interfaces so bad?

Syndicated data user interfaces are objectively terrible and painful to use.  It does not matter which one, I have never heard of a single one that is as easy to use as simply querying a database with SQL Server or Teradata.  From the antiquated MS Excel interface of Nielsen Planners to the web based IRI Liquid Data or Nielsen AOD, all of the interfaces add complexity and time to the process.

No one knows for sure why Nielsen and IRI built such awful user interfaces.  My theory is that they are an attempt to differentiate their services.  They provide methods of querying their data, in order to “make things easier/better” for end users.  In reality, these are little more than interfaces that build a SQL query.

There are three reasons why the interfaces are so bad.  First, there is no way to quickly build a query.  Instead of typing a query the user needs to click through multiple menus that do not populate until prior choices are made.  You know those web based questionnaires that load the next question only after you have answered the current question?  Yep, that’s what all of the syndicated interfaces are like.  It is maddening.

The second is that you have to click certain selections in order, or they do not work correctly.  If you want dollar sales to be summed instead of individual rows you had better click ‘sum’ before you start, or it does not work.  The same thing happens with other options like ‘through’ or ‘relative.’

The third reason the interface is so painful is that there is no way to quickly preview your results.  Experienced analysts will typically work through building a query by selecting small portions of their results to see the results.  ‘Select top 100…’ allows a SQL user to rapidly see only the top 100 results of their query.  This allows rapid troubleshooting and exploration.  Syndicated interfaces do not have any preview function.

Summary

To be completely fair, much of the complexity of syndicated data is necessary.  It is a complicated thing they are estimating; the sales of multiple products in multiple retailers and their share of the total.  They have also been updating interfaces, though in my opinion AOD is different than Planners but not significantly better.

Unfortunately for analysts in grocery or consumer packaged goods, syndicated data is both awful and necessary.  I do not see a scenario where that changes any time soon.  The data is getting even more fragmented with several retailers going to either IRI or Nielsen only; perhaps that will spur a change?

In case you missed it, here is my first post on syndicated data; Part 1: What is syndicated data?

 

 

1 thought on “Syndicated data pt2: Why is syndicated data such a pain?

Comments are closed.