Tuesday, January 1, 2008

Nutritional Information for Recipes

You may remember a while ago when I talked about certain idiots that seem to feel that caloric intake defines the healthfulness of a food. As it turns out, calories are important, but they're not the only thing that tells us about the health content of our food. There are other important factors, such as the amounts of protein, fat, carbohydrates, sodium, etc. Even the types of carbs or fat are important, so much in fact that companies who sell packaged food (in the US, at least) are required to list these and other data somewhere on the package.

The problem is, this kind of data isn't easy to calculate. Large manufacturers possess the resources to do this, but what about restaurants? The big chains also have the ability to do this, but what about the mom and pop places? Even popular restaurants with only one or two locations have difficulty doing this. And unfortunately, there are some loose cannons out there in our local and national governments that are trying for force restaurants to do these things. If these places are unable to do so, then they will be forced to close their doors. It's the sort of legislation that seems to be designed to hurt the big guys, but is really more effective at killing the little guys.

Eventually, the loose cannons may win. And when they do, the little guys need to be able to conform in order to stay in business. Fortunately, some of the smarter parts of the US government have provided us with various resources to do so. One of the more important resources is called the USDA National Nutrient Database for Standard Reference, and is available as a free download from the USDA website. If you want something a little easier to use than the raw data, you can also search the database online, or download an application to let you search on your home computer (Windows or Linux with WINE) or on your Palm OS device.

A lot of companies are already using the Standard Reference, which at the time of this writing is at Release 20. As it turns out, most of these companies aren't in the business of providing nutritional analysis (such as on the side of a package) are they in the business of providing diet software and services. While some packaged foods are listed, much of the database contains information about individual ingredients. Information for an ingredient's raw state is almost always there, but sometimes information for that ingredient is listed for frozen, steamed, packaged in various states, etc. Information is also listed for various serving sizes, with 100 grams almost always being present. If you use their software, you can also specify your own serving sizes, which is really just an abstracted view of the database.

There are three types of downloads available: Full, Abbreviated and Update Files. The Full version is available in ASCII and MS Access formats. Abbreviated (which contains all of the products, but not all of the nutritional data such as alcohol, caffeine, theobrimine, etc) is available in ASCII and MS Excel formats. The Update Files (ASCII only) are designed to update from one release to the next (such as from SR19 to SR20). Because the ASCII files are viewable on all platforms, and most people are interested in the entire database, I will focus on the Full ASCII files (available packaged together in a .zip file).

There are several files that belong to the standard reference. Many of the files refer to each other, kind of like a pseudo-relational database. In fact, using a simple filter, the data in these files is pretty simple to import into an actual database. The files are:

FOOD_DES.txt Food Description File
FD_GROUP.txt Food Group File
NUT_DATA.txt Nutrient Data File
NUTR_DEF.txt Nutrient Definition File
SRC_CD.txt Source Code File
DERIV_CD.txt Derivation Code File
WEIGHT.txt Weight File
FOOTNOTE.txt Footnote File
DATA_SRC.txt Sources of Data File
DATSRCLN.txt Data Source Link File

The files are tilde-delimited, but not in the way you might think. Each field has a tilde (the ~ character) on either side of it. If the field is empty, it will have a caret (the ^ above the 6 key) in it. For instance, the first line of the FOOD_DES.txt file looks like this:

~01001~^~0100~^~Butter, salted~^~BUTTER,WITH SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87

The first field in this file is the NDB_No, the 5-digit unique identifier for the food item. The second field is the FdGrp_Cd, a 4-digit unique identifier for the food group that the item belongs to (as laid out in the FD_GROUP.txt file). The third and fourth fields are long and short descriptions, and so on. Full descriptions for each of the fields is available in the SR20_doc.pdf packaged in the accompanying .zip file, starting on page 22.

Of particular interest is the NUT_DATA.txt file, which contains the bulk of the data. There are several rows for each ingredient, each of which represents a particular nutrient (as laid out in the NUTR_DEF.txt file). Some of the nutrients available (from NUTR_DEF) are:

~203~^~g~^~PROCNT~^~Protein~^~2~^~600~
~204~^~g~^~FAT~^~Total lipid (fat)~^~2~^~800~
~205~^~g~^~CHOCDF~^~Carbohydrate, by difference~^~2~^~1100~
~207~^~g~^~ASH~^~Ash~^~2~^~1000~
~208~^~kcal~^~ENERC_KCAL~^~Energy~^~0~^~300~
~209~^~g~^~STARCH~^~Starch~^~2~^~2200~
~210~^~g~^~SUCS~^~Sucrose~^~2~^~1600~
~211~^~g~^~GLUS~^~Glucose (dextrose)~^~2~^~1700~
~212~^~g~^~FRUS~^~Fructose~^~2~^~1800~
~213~^~g~^~LACS~^~Lactose~^~2~^~1900~
~214~^~g~^~MALS~^~Maltose~^~2~^~2000~
~221~^~g~^~ALC~^~Alcohol, ethyl~^~1~^~18200~
~255~^~g~^~WATER~^~Water~^~2~^~100~
~257~^~g~^~~^~Adjusted Protein~^~2~^~700~
~262~^~mg~^~CAFFN~^~Caffeine~^~0~^~18300~
~263~^~mg~^~THEBRN~^~Theobromine~^~0~^~18400~
~268~^~kj~^~ENERC_KJ~^~Energy~^~0~^~400~
~269~^~g~^~SUGAR~^~Sugars, total~^~2~^~1500~

Let's take for instance Red Bull:

~14154~^~1400~^~Energy drink, RED BULL, with added caffeine, niacin, pantothenic acid, vitamins B6 and B12~^~ENGY DRK,RED BULL,W/ ADD CAFFEINE,NIACIN,PANTO,VIT B6 & B12~^~~^~Red Bull North America/Red Bull GmbH~^~Y~^~~^0^~~^6.25^4.00^9.00^4.00

The nutrient data line for the caffeine looks like this:

~14154~^~262~^30^2^^~13~^~AI~^~~^~~^1^29^31^1^^^~~^

These rows are defined for 100 gram portions. This particular product has 30mg of caffeine per 100 grams. The gram weight of a can of Red Bull is 255, which means that each little 8.3oz can of Red Bull contains 76mg of caffeine.

Lost? Do what I did: go to the online search form and look up Red Bull. The file format is pretty difficult to read manually, and not really easy to work in programmatically. When I'm working with my own import of the database, I use the online search form to check my work, to make sure I'm doing it right.

So if the database is such a pain to work with, why bother? Because, as the name implies, it's a Standard Reference. There are a lot of companies (not just online diet services) that use this database. The important thing is not how easy the raw data is to read. Raw data was never meant to be used like that. What we need is an abstraction. The online search form is an abstraction. The software available for Palm or Windows is an abstraction. It's something that's easier to read and understand.

The database is also considered to be very accurate. And it's updated on a regular basis. When I first started playing with it, it was at SR15. And most importantly, it's free! Why bother spending loads of cash on a book that will be out of date in a few months anyway? Once you understand how to import the main database, importing the updates is trivial.

I'll be taking a closer look at the database in future posts, but I wanted to make people aware of what it is, and what it's about first. Feel free to take a look at the docs, but unless you're techincally inclined, they probably won't do you much good. But the online search form is useful for pretty much anybody.

No comments:

Post a Comment

Comments for posts over 14 days are moderated

Note: Only a member of this blog may post a comment.