Addressing Gaps in Market Level Databases Shu Wen Ng University of North Carolina at Chapel Hill March 2012 UNC Food Research Program Team Barry Popkin, PhD Shu Wen Ng, PhD Meghan Slining, PhD David Guilkey, PhD Phil Bardsley, PhD Izabela Annis, MS Donna Miles, PhD Dan Blanchette Lauren Butler, RD Emily Ford, RD Peggy Rupert, RD Doctoral Students Funding Sources: Robert Wood Johnson Foundation, NationalInstitutes of Health 1
Food & Nutrient Availability Data (ERS, USDA) Inedible share, wastage & spoilage 3 Foods & beverages from stores Away from Home CPGs products purchased AVAILABLE without FOODS UPCs &(e.g., NUTRIENTS foods (e.g., food from stores free-weight service, vending, Food & Nutrient produce, Availability meats, Surveys (ARS, concessions) USDA) bulk items) Other (e.g., gifts, inkind) Inedible share, wastage & spoilage 4 2
Homescan Commercially available barcode/upc level data on CPGs purchased by households (key respondent scans food purchases with UPCs) Data history back to mid-1990s; ongoing Nationally representative survey of 35,000-60,000 households in static panel per year. Targeted direct recruitment by mail and online Cross-sectional (but 30-40% are retained from year to year) Raw data is at the household-shopping episode-upc level Includes >400,000 UPCs/ year Costs: Raw disaggregated UPC data is about $100,000/year Key measures: Demographics (gender, age, race/ethnicity, income, education, employment, marital status, household composition) Geography (market& census region) Purchase details such as quantity, price, date, store UPC details such as brand, product description, category type Can be linked to TD-links (containing store information) Homescan Key Advantages: Large sample; potential longitudinal design Analyses of household purchases, including prices and quantities, patterns & frequency of purchases Captures purchases from all stores Key Limitations: No detailed random-weight items since 2008 No information about away-from-home food and beverage purchases that were never brought into the house by all members of the household Do not know what is actually eaten (large bias if spoilage is high or systematically different) Do not know proportions of consumption among household members No nutrition information already linked 3
Scantrack Point-of-sale, UPC level data on volume & dollar sales of food stores, food/drug combinations, drug stores, mass merchandisers (F/D/M), and convenience stores (Cstores) Data history for F/D/M goes back 3-yrs; C-stores only 2 yrs; ongoing Stratified systematic probability sample, designed to measure consumer sales across major markets, region, and projectable to the US Raw data is at the market (or total US)-weekly (or quarterly or annual)-upc level Includes >400,000 UPCs/ yr Costs: Raw market level, weekly data is about $200,000/yr Key measures: Brand,Product description, Category type,dollar sales, Unit or volume sales, Vendor, Percentage of stores in sample selling each product, Price (regular & promotion) Scantrack Key Advantages Analyses of sales, prices, and quantities by brand and UPC, sold from stores within major markets and nationally Retailer-specific information also is available for purchase (using TD-Links) More accurate measure of what products are sold/purchased Key Limitations Only representative of the stores included in their sample No information about sales from smaller F/D/M & C-stores, vending machines, restaurants, food trucks, mom and pop stores, ethnic markets, specialty markets, etc. Does not include non-upc coded products No nutrition information 4
Total Store Advantage Similar to Scantrack, but larger samples of stores Data history goes back 5-yrs Market coverage across channels are better than Scantrack Costs: Raw market level, weekly data is about $400,000/yr Key Advantages: same as Scantrack Key Limitations Mostly the same as Scantrack Claims to contain some NFP data for about 20% of UPCs Gladson Database that includes NFP labels, ingredient list and claims for products with UPCs (national brands and private label items) Since 1999; claims to be updated weekly Sample grows as new products enter the market (claims to have ~ 2,000 new or reformulated UPCs each week) Key Measures: NFP information Full ingredient listings, warning, claims on the packaging, date of last update Manufacturer & brand by UPC Costs depends on vintage of data, snapshot or annual subscription 5
Gladson Key Advantages Merge by UPC to sales or purchase data Possibly capture UPCs, brands or categories of foods that have reformulated significantly Key Limitations Much fewer records than actual UPCs (often limited flavors or product size of a product is included) New nutritional information for the same UPC writes over prior data Unclear how frequently or comprehensively data is updated Nutrition information limited to FDA requirements Only basic nutrients NFP data not required for raw produce & fish, delicatessen foods, bakery & confections sold directly to consumers from preparation locales, self-service bulk foods ±20% allowance Reporting rules (<5 kcals/serving can be calorie free ; <0.5g fat/serving can be fat free ; rounding to tenth or fifth unit) Not currently linked to food or dietary guidance systems Mintel GNPD + Datamonitor PLA Mintel Global New Product Database (GNPD) Global monitoring of new CPGs at the UPC level Sample grows as new or re-launched products enter market Key Measures: NFP information Manufacturer & brand by UPC Costs: annual subscription for GNPD Academic is $ 9000 Key Advantages Merge by UPC to sales or purchase data Possibly capture UPCs, brands or categories of foods that have reformulated significantly Key Limitations Nutrition information more incomplete compared to Gladson NFP data entered as string and requires intensive data management to parse (data entry errors make this challenging) Same limitations as Gladson 6
Mintel GNPD + Datamonitor PLA Datamonitor Product Launch Analytics (PLA) Basically the same as Mintel GNPD, but Started including NFP data starting Jan 2009 Smaller sample (in 2010, only had <5,000 recordswith any nutrition information) Merge by barcode (UPC) Weekly/monthly/annual data on sales/purchases & prices with basic nutrition information for each barcode in the US, specific markets or by households Mintel GNPD + Datamonitor PLA 7
How valid is NFP data? Nutrition information limited to FDA requirements Outside box vs. inside box Chemical analyses is the best way to validate this NDL work Comparing commercial source of NFP data to what is on the outside of boxes Need geographically diverse field work to compare most current NFP data with what exists in the stores Can food industry assist by creating centralized database with their NFP information? 15 Estimating Nutrients One way to derive missing nutrient information USDA University of Minnesota NCC Linear programming at the UPC level Order of ingredients Each UPC has basic nutrition information Link ingredients to a database of foods including commercial ingredients with more complete nutrient information Minimize error based on nutrient specific error tolerances Applications: Added sugars; Potassium Even better: more information required on NFP 16 8
Merge by barcode (UPC) Mintel GNPD + Datamonitor PLA ESHA Linear programming to estimate nutrients Weekly/monthly/annual data on sales/purchases & prices with basic + select nutrition information for each barcode in the US, specific markets or by households ESHA Research, Inc. nutritional data Includes >44,000 food items (including food additives and Industrial Ingredients), and recipe database Widely used by nutrition researchers & food product developers for nutrient analysis and food label creation Each item contains nutrient information on: total calories, calories from fat, total fat, saturated fat, trans fat, total sugars, total carbohydrate, protein, dietary fiber, sodium, cholesterol, vitamin A, vitamin C, calcium, iron and potassium Has information on MyPyramid Equivalence Exchanges and Groups Sample grows and is updated monthly Sources: USDA, industrial ingredient manufacturers, commodity groups Cost: Depends on categories specified Merge by barcode (UPC) Mintel GNPD + Datamonitor PLA Weekly/monthly/annual data on sales/purchases & prices with basic + select nutrition information for each barcode in the US, specific markets or by households Translating Household data to individuals As proxy for individual intake Regression approach Focus on certain subpopulation (e.g., single adult households) Disadvantages: Incomplete data on household food acquisition (random weights, AFH) Don t really know individual share within household Wastage & spoilage Advantages: 10-12 months of purchase Possibly conduct longitudinal analyses to determine causality ESHA Linear programming to estimate nutrients 9
Current Gaps in Commercial Data Foods & beverages from stores Away from Home CPGs products purchased without UPCs (e.g., foods (e.g., food from stores Food & Nutrient Availability free-weight Data (ERS, service, USDA) vending, produce, meats, concessions) bulk items) Other (e.g., gifts, inkind) Inedible share, wastage & spoilage 19 Thank you Shu Wen Ng: shuwen@unc.edu Meghan Slining: slining@unc.edu 20 10