Michael Wimsatt's GitHub blog

About Quantary
Contact Me

View my GitHub profile
View my vanity page

Deep Dive on One Year of Weight Data - Part III

In the last post, we did a deep dive on calories consumed and burned to test how close actual results were to predicted results. While they were (perhaps surprisingly) close, they weren’t right on. I seemed to lose more weight than I should have based on my average calorie deficit over the year. So, why might that be?

Analyzing sources of error

What are some possible sources of error in this analysis?

  • Errors in estimated calorie consumption
  • Errors in estimated exercise calories
  • Errors in base calorie consumption estimate
  • Missing data
  • Errors in the 3,500 calories per pound estimate

I would expect the calorie consumption and missing data (basically more calorie consumption) to be skewed the opposite direction (overestimating calorie deficit), and I have no reason to question my exercise calories, so if this is a result of miscalculating the calorie deficit it probably comes down to the estimated base calorie consumption.

Of course, this is wrapped up in the estimated exercise calories, since it varies based on average base activity level. My recorded exercise is meant to include just “exercise events” - running, mostly, plus some long walks, and maybe some strength training here and there. But, through the course of the day at work, and on weekends (I live in the very walkable New York City), I’m walking a lot (maybe three miles per day, on average). Some of this walking is included in the “sedentary” multiplier of 1.2, but perhaps not enough?

Let’s calculate the change in TDEE for various activity levels based on my average weight and age during the weight loss period:

for mult in arange(1.2, 1.5, .05):
    print str(mult) + '\t' + str(tdee(235,72,36,'m',mult) - tdee(235,72,36,'m',1.2))
1.2	0.0
1.25	109.7969722
1.3	219.5939444
1.35	329.3909166
1.4	439.1878888
1.45	548.984861
1.5	658.7818332

The first takeaway here is that this calculation is very sensitive to activity level (at least for men of a certain size). When we measure calorie deficits in hundreds of calories per day, the difference between “sedentary” and “lightly active” in our base calorie consumption is almost 400 calories. That means if you are trying to lose one pound per week, choosing the wrong activity level could nearly obliterate (or double) your weight loss! Whoa!

It looks like a multiplier of 1.3 would equate to an extra 220 calories deficit per day. This is more than the 100 calories we need to explain, but I think we have some oppositely skewed errors as mentioned above, and 1.3 seems to be pretty close to where the LoseIt! app is estimating an office worker’s activity level.

While this could explain why I seemingly lost more weight than I should have, I’d also like to understand how my imperfect data collection might have skewed results.

Estimating the missing impact of missing data

So, now I want to check my assumptions about the missing data being skewed toward exaggerating my calorie deficit. One way would be to look for discrepancies at the weekly level (daily weight data is plagued by noise, as illustrated above).

First, to take as much noise out as possible, I’m going to work with the moving average weight as illustrated above.

weightdf['Weight Trend'] = smooth
weightdf['Weight Trend'][ts('6-29-2009')] - weightdf['Weight Trend'][ts('6-20-2010')]

Note that since the weight trend is a lagging indicator, it shows a slightly lower weight loss (68 lbs vs 71 lbs) than the absolute weight measurements. This works out to .19 lbs per day and a calorie deficit of 670 calories. These are a bit different from what we saw above, but small compared to the size of the errors we’re trying to assess.

Now let’s turn our daily data into weekly data.

###Resampling daily data to weekly

First, let’s create an indicator which will give us an “average” number of missing data days we have in a week. We’ll also use Pandas’ resample function to aggregate average weekly data for each week in the set.

weightdf['Missing Data'] = weightdf['Calorie Deficit'].isnull() * 1
wkweightdf = weightdf[['Base Calories', 'Exercise Calories', 
                       'Food Calories', 'Calorie Deficit']].resample('W-SUN', 
                                                                     how='mean', kind='period')
wkweightdf['Missing Data'] = weightdf[['Missing Data']].resample('W-SUN', 
                                                                 how='sum', kind='period')
wkweightdf['Final Weight'] = weightdf[['Weight Trend']].resample('W-SUN', 
                                                                 how='last', kind='period')
start_weight = smooth['2009-6-28']
wkweightdf['Weight Lost'] = -wkweightdf['Final Weight'].diff()
           Base Calories  Exercise Calories  Food Calories  Calorie Deficit  Missing Data  Final Weight  Weight Lost
    count      51.000000          51.000000      51.000000        51.000000     51.000000     51.000000    50.000000
    mean     2616.102449         228.925728    2255.773560       605.649044      0.745098    231.886921     1.290055
    std       135.951091          93.386778     229.313245       285.993383      1.180562     18.016530     1.103964
    min      2421.003397          36.115429    1732.755000       -70.308145      0.000000    205.732605    -1.362209
    25%      2496.532686         180.785714    2099.063929       409.795619      0.000000    215.728934     0.550104
    50%      2611.690882         235.428571    2271.230000       574.015235      0.000000    231.019346     1.363960
    75%      2718.113674         291.338571    2405.318571       800.780589      1.500000    244.978118     2.103550
    max      2907.842395         438.086143    2770.237500      1156.036502      5.000000    270.235350     3.374956

Ideally, we would have summed some, struck a difference among others, and took averages on yet others, but I got lazy and just took averages. Possible correction in the future.

Looking at the data

Now, for what you’ve all been waiting to see - a scatter matrix!

pd.scatter_matrix(wkweightdf[['Base Calories', 'Exercise Calories', 'Food Calories', 'Calorie Deficit',
                             'Missing Data', 'Weight Lost']], figsize=(10,10), diagonal='kde');

What jumps out here?

  • There does seem to be a bit of correlation between number of days of missing data and (lack of) weight loss, or at least the very worst weight weeks were also those where I recorded data on only a few days.
  • There’s the expected positive correlation between calorie deficit and weight loss.
  • There’s an apparently strong correlation between base calories and calorie deficit (i.e. the higher my base calorie burn, the higher my calorie deficit).
  • There might be a positive correlation between exercise calories and food calories, but that’s to be expected because I was managing toward a net calorie budget.
  • There’s no apparent strong correlation between exercise calories and calorie deficit, but food calories are strongly negatively correlated with calorie deficit.

Let’s dig into a few of these observations a little more closely.

Missing data

While there does appear to be some relationship between missing data and weight loss (mostly in the extreme), it doesn’t appear to be incredibly strong. But it does show that what we don’t know is probably skewing total estimated calorie deficit to the optimistic side. In general, though, missing a couple of days in a week doesn’t seem to throw off the numbers too badly. I believe the value of tracking plays out more in longer term trends (as evidenced by the weight gain I experienced at the end of this process when I stopped tracking altogether).

Weight Loss vs Calorie Deficit

The correlation is positive, as we’d expect, which seems to imply that the weight loss is experienced somewhat coincident with the behavior change (or the delay is within a week’s time). However, it may be that a week with good behavior is more likely to be surrounded by weeks of good behavior and the effects seen in one well-behaved week might be the result of the week or weeks before it. It’s still satisfying to see the relationship here, though.

As to the specific relationship (i.e. does the weight loss match theory?), I think that deserves some more targeted analysis. We’ll come back to this.

Base Calories as driver of Calorie Deficit

Since the calorie deficit is a calculation involving base calories, it’s not surprising to see a relationship here. And it is clear that as my weight dropped and my base calorie consumption fell, I did not maintain the same calorie deficit. This plays out in the graph of my weight loss. Toward the end I was losing about one pound per week as opposed to the 1.5+ pounds I was losing early on. I simply was not cutting my food calories (or adding offsetting exercise) at a rate sufficient to maintain my target calorie deficit.

I don’t know if maintaining that deficit was just less realistic at the lower weight or if, after six months of pretty strict calorie control, I just started letting go a bit. I know my lifestyle began to loosen up about that time, so I suspect it’s a mix of the two.

I guess this isn’t really correlation analysis of independent phenomena since one is calculated from the other, but it gets at which pieces of the equation contribute the most variance.

Exercise vs Food as drivers of weight loss

I’m sure you’ve read many times that exercise is not sufficient for weight loss. That was definitely my experience. My exercise appears to have no correlation with calorie deficit or weight loss. It was all about the food I ate and my base calorie burn. Food was obviously a much more sensitive lever on my calorie deficit. Absent ultramarathon training, it’s just very hard to get enough exercise calories to offset a diet out of control.

Exercise is certainly important for overall health, and running is an addition to my lifestyle that I will value as long as I can still get out there. It also made the food restrictions more bearable. On days when I ran, I could maybe have an extra beer or two without torpedoing my goals. Also, I feel (but don’t have the data to show) that the general healthfulness that comes with running also subtly encourages me to make healthier food choices.

How predictive is my Calorie Deficit model?

Now let’s take a closer look at the calorie deficit and related weight loss. Is “calories in and calories out” a sufficient model for weight loss? Let’s model what the theoretical weight loss was for each week and compare it to the results.

Calculating theoretical weekly weight loss

First we need to estimate how much weight i should have lost in a given week. For a few reasons I won’t go into here, I’ll go back to the daily data, then recalculate the weekly data.

wkweightdf['Total Deficit'] = weightdf[['Calorie Deficit']].resample('W-SUN', 
                                                                 how='sum', kind='period')
wkweightdf['Theoretical Weight Loss'] = wkweightdf['Total Deficit']/3500.0
plt.plot(wkweightdf['Theoretical Weight Loss'], wkweightdf['Weight Lost'], 'bo');
plt.ylabel('Actual Weight Lost (lb)')
plt.xlabel('Projected Weight Lost (lb)');
z = np.polyfit(wkweightdf['Theoretical Weight Loss'].tail(50),wkweightdf['Weight Lost'].tail(50),1)
m = z[0]; b = z[1]
x = linspace(-2,4)
y = m*x + b
plt.plot(x,y,'r-', label='Fit')
plt.plot(x,x,'r--', label='Perfect')

From this data, it appears that my actual weight loss experience was more sensitive to calorie deficit than the standard rule of 3,500 calories per pound would imply. Why might that be?

  1. There aren’t 3,500 calories per pound.
  2. It’s a little more complex than 3,500 calories per pound if body fat/muscle composition are taken into account.
  3. There were biases in my estimation of calories (in and out) that seemed to be proportionally off to the same degree (or the data would look less linear)

So, I did a little research and number 2 seems to be the culprit. While there are 3,500 metabolizable calories in a pound of fat, there are only 600 metabolizable calories in a pound of muscle. With a slope of 1.35 pounds per pound between actual and theoretical, it looks like I was burning, on average, 2,500 calories per pound lost. I’ll leave the math to you, but this works out to me losing about one pound of muscle for every pound of fat I lost over the year.

bfpct = fatwatch_data['Body Fat'][:'06-2010'] # Some recorded body fat % data in original file
# bfpct[bfpct==0] = NaN                         # Drop zero values in sparse data
fat = (bfpct * w1y/100)                         # body fat
muscle = w1y - fat                            # muscle
bodycomp = DataFrame({'Fat': fat, 'Muscle': muscle, 'Body Fat %': bfpct})
for col in bodycomp.columns:
    bodycomp[col] = bodycomp[col].interpolate()
bodycomp[:'05-2010'].plot(secondary_y=['Body Fat %'], style='-');

Well, this chart seems to belie my earlier conclusion. If these body fat percentages are to be believed, I managed to gain muscle mass during the year of weight loss. To be honest, I’m skeptical of this. My body fat measurements came from a standard body weight scale - which is notoriously inaccurate. I was doing almost no strength training during this period, and it’s hard to imagine that running alone would have made up the difference. I’ll admit, I’m at a loss. Any thoughts?


In this series of posts, we used tools from the Python data analysis ecosystem to really dig into the data I generated while losing weight over one year. While there were no earth-shattering discoveries, I got more familiar with some excellent tools (pandas, iPython notebook), and some minor insight into what is important and not so important in weight loss and maintenance.

I’m back on the weight-loss wagon, and now I’m tracking a lot more data. I use a FitBit One to track motion for calorie consumption. I’m much more diligent about capturing calorie consumption. I’m tracking my sleep nightly, too. Hopefully, I’ll get down to my goal and have another success story to dissect in a year or so.

    comments powered by Disqus