Reproducible Research Peer Assignment 2 Bmis

Observable Weight Distributions and Children's Individual Weight Assessment


  • H. Shelton Brown III,

    Corresponding author
    1. Division of Management, Policy and Community Health, University of Texas School of Public Health, Austin, Texas, USA
    Search for more papers by this author
  • Alexandra E. Evans,

    1. Michael & Susan Dell Center for Advancement of Healthy Living, University of Texas School of Public Health, Austin, Texas, USA
    2. Division of Behavioral Science, University of Texas School of Public Health, Austin, Texas, USA
    Search for more papers by this author
  • Gita G. Mirchandani,

    1. Family Health Research and Program Development, Texas Department of State Health Services, Austin, Texas, USA
    Search for more papers by this author
  • Steven H. Kelder,

    1. Michael & Susan Dell Center for Advancement of Healthy Living, University of Texas School of Public Health, Austin, Texas, USA
    2. Division of Epidemiology, University of Texas School of Public Health, Austin, Texas, USA
    Search for more papers by this author
  • Deanna M. Hoelscher

    1. Michael & Susan Dell Center for Advancement of Healthy Living, University of Texas School of Public Health, Austin, Texas, USA
    2. Division of Behavioral Science, University of Texas School of Public Health, Austin, Texas, USA
    Search for more papers by this author


Social networks theory suggests obesity is “contagious” within peer groups in that known friends highly influence weight. On the other hand, an alternative model suggests that observable weight distributions affect perception of one's own obesity level. We examine whether the BMI levels of the most obese classmates in the individual student's grade by gender is positively associated with “under-assessment” of obesity and overweight (i.e., independently measured obesity or overweight, but subjective self-assessment of normal weight). The data are the 2004–2005 School Physical Activity and Nutrition III (SPAN), a stratified, multistage probability sample of 4th, 8th, and 11th grade public school children in Texas. We used logistic regression to test whether the gender-specific 85th percentile BMI level within the individual student's grade at their school is positively associated with “under-assessment” of obesity and overweight. The results show that students are much more likely to under-assess their own weight if the gender-specific 85th percentile BMI level is higher in their grade at their school. These data suggest that observable weight distributions play a key role in the obesity epidemic.

Recently, peer weight has been shown to be strongly associated with obesity among adults (1) and children (2). Peer effects have also been shown to be important in alcohol and smoking research (3,4). But while peer effects are undoubtedly important, the fact that peer groups are voluntarily formed between people with common tastes for diet, body types, etc., makes causation a difficult claim. Alternatively, people may assess their own weight against a fixed ideal (5), or an ideal which evolves over time (6). In the latter case, people subconsciously place themselves in the weight distribution they observe. Whether all of the people in the observed weight distribution are personally known is irrelevant, as long as they are seen on a regular basis. As with peer effects (2,3,4), the center of the BMI distribution may set the norm in terms of self-assessment of weight. However, heavier people in an individual's observable distribution may also be thought of as setting the norm for heavier people, defining the weight above which the observing person would feel unhappy about his or her size. In that respect, observable weight models are more inherently rightward looking than peer-based models based on the center of the BMI distribution in that people also compare themselves to observable people weighing more.

Self- or parent-assessed weight is often incongruent with objectively measured weight among children (7,8,9,10,11,12). Rather than using (and understanding) the clinical definitions of obesity as a reference, children may be assessing their own weight within the context of the weight distribution of other children they observe. Children spend a large proportion of their time in school around a set group of children, some of whom they become friends with, others whom they know less well. Although they mutually form peer groups, many children they observe in school are not friends. By analogy, children may feel short if they observe a “taller” height distribution for children in their grade. Whether the taller children are known to them is irrelevant.

The purpose of this paper is to examine the effect of within grade weight distributions on self-perceived weight among classmates. We employ a unique data set of 4th, 8th, and 11th grade classmates in Texas during the school years 2004–2005. Within each school, ∼50 students per grade in two to three classrooms were surveyed. BMI, which is the weight in kilograms divided by height in meters squared, is measured by School Physical Activity and Nutrition III (SPAN) staff. It is therefore possible to estimate summary measures of the length of the right tail of the BMI distribution, which we hypothesize are used in forming social norms related to acceptable weight or body size. Our summary measures for the right tail are the 85th and 95th percentile BMI in the grade at each school.

Understanding the relative importance of peer groups and observed distributions is important. Attempts at weight loss for children are linked to the correct perception of overweight or obese status (13).

Data and Methods

The SPAN project is a child obesity surveillance system developed and implemented by the Michael & Susan Dell Center for Advancement of Healthy Living at the University of Texas School of Public Health with support from the Texas Department of State Health Services during the 2004–2005 school year. SPAN uses a stratified, multistage probability sample of public school children in Texas that includes representative samples of white/other, black or African-American, and Mexican-American/Latino or Hispanic youth. A full description of the SPAN study design and its participants has been documented for the first wave (14), and a brief summary is provided here. The sampling scheme provided regionally representative data for the state. A single grade was selected to represent each developmental level of school children: 4th grade for elementary school, 8th grade for middle school, and 11th grade for high school. Within each school selected, students in two or three classrooms or ∼50 students were surveyed. Questions about family income were not asked, so we used percent socioeconomically disadvantaged at the school as a proxy. SPAN III data protocols and instruments have been assessed for reliability (15).

Heights and weights of the children were measured by either SPAN staff or trained state or local health department employees. Height was measured to the nearest 0.1 cm with a portable stadiometer (Perspective Enterprises Portable Adult Measuring Unit PE-AIM-101, Portage, MI) and weight was measured to the nearest 0.1 kg with a portable digital scale with remote display (SECA 770 or Tanita BWB-800S, Arlington Heights, IL). Overweight children have BMIs which are 85% ≥ BMI ≥ 95% for sex and age based on the Centers for Disease Control definition (http:www.cdc.govgrowthcharts); obese children have BMI ≥ 95% for sex and age based on the Centers for Disease Control definition and American Medical Association terminology (15).

A key question in SPAN III addresses self-perception of obesity status. It is “Compared to other students in your grade who are as tall as you, do you think you weigh “The right amount,” “Too much” or “Too little (or not enough)”?” Two dependent variables were modeled with logistic regressions. The first dependent variable was coded as one to mark obese children who incorrectly assess their weight, and as zero otherwise. The second dependent variable was coded as one to mark either overweight or obese children who incorrectly assess their weight, and as zero otherwise. In both dependent variables, the person incorrectly assesses him or herself to be normal weight when he or she is obese or overweight. We will henceforth refer to this as under-assessment of obesity and under-assessment of obesity or overweight.

Our hypothesis is that within grades at schools with BMI distributions which are right-skewed, the obese and overweight children will be more likely to under-assess their weight. Thus, we identify the 85th and 95th percentiles of BMI by gender for each grade by school. For grades by school with right-skewed distributions of BMI, the 85th and 95th percentiles of BMI will be higher, and correspondingly, the social norm for BMI will be higher in that graduating class.

All estimates and statistical tests were performed taking into account the complex sample design features. STATA (version 8.0; StataCorp, College Station, TX) was used to analyze the data.


Figure 1 shows histograms of 4th grade weights by school. The distribution of boys' and girls' median BMI and boys' and girls' 85th percentile BMI by school are displayed. Although there is modest variation in median BMI across schools, the 85th percentile BMI varies much more. Therefore, the (potential) pace-setters' BMI within each 4th grade by school vary, which we hypothesize will affect under-assessment of weight.

Twelve percent of the sample under-assess obesity, while 26.5% under-assess obesity or overweight.

Table 1 shows two models estimated for all three grades separately. The first set of models is in the upper half of the table. The dependent variable is the under-assessment of obesity. In the models for each grade, obese males are more likely to under-assess their own obesity compared to obese females. However, it is surprising that race and ethnicity do not appear to be associated with under-assessment. For 4th graders, speaking Spanish at home increases the likelihood of under-assessment, while speaking a language at home other than English or Spanish lowers the likelihood of under-assessment, but is only significant at the 10% level.

In the second set of models in the lower half of Table 1, the dependent variable is the under-assessment of obesity or overweight. The results are similar. However, 4th grade boys were not significantly more likely to perceive themselves to be normal or thin when they are actually obese or overweight. Hispanic students in the 11th grade are more likely to under-assess their weight, as are 8th grade African Americans. For the 4th grade, speaking Spanish at home is associated with under-assessment of weight. For 8th graders, the higher the gender-, grade-, and school-specific 85th percentile, the more likely the under-assessment of weight. It appears that under-assessment is more likely in the case of measured obesity than obesity or overweight.

The logistic regressions in Table 1 were repeated using median BMI by grade and school. The results were similar in magnitude and significance, indicting that obese and overweight children also pay attention to the center of the BMI distribution when assessing weight.

Discussion and Conclusion

Our results show that the right tail of locally observable weight distributions, in this case the weight distributions by grade by school, affect self-assessment of weight. This is particularly true amongst 4th and 8th graders, who are more likely to under-assess their own weight if the tail of the weight distribution for other classmates in their grade and school by gender is skewed-right. Eleventh graders are less likely to consider the context of their grade's weight distribution by school when assessing their own weight level. This may be due to greater influences outside of the school, such as the media, a larger network of friends and greater mobility; alternately, it may be that the 11th grade samples were drawn from larger schools than 4th and 8th grade samples, which may have attenuated the effects. In all grades, only gender was consistently as important as the school's BMI context in the under-assessment of weight.

Christakis and Fowler note that there are multiplier effects in that weight loss by one could inspire weight loss in others (1). Although our results are not mutually exclusive, our results imply that the multiplier effects of potential weight loss among the most heavy in the observable distribution may be as high as individuals near the center of the BMI distribution. Weight loss among the most obese is also more likely to affect the subgroup with the most potential problems with obesity: others who are obese or overweight or who are approaching obesity or overweight. This is because the most obese unintentionally affect the perception of weight for everyone. Therefore, it may be that programs and policies should place their scarce resources on the obese and overweight students in ways that do not stigmatize students. For instance, health promotion might be best implemented in schools not only with high obesity or overweight prevalence, but with skewed-right tails in their BMI distributions.

There are some limitations in the study. First, we would like to know the level of friendship bonds between students. Then we could compare the more proximal peer effects to wider peer group effects at the grade level. Second, we would ideally like to be able to follow students into the higher grades to determine whether previous weight distributions affect future weight perceptions. Third, the question used in determining under-assessment invited students to compare their weight to that of students of the same grade and the same height, rather than being asked whether they weighed more or less than fellow students, which might have led to some confusion about the construct being measured. However, this question was developed using an extensive process including focus groups, cognitive interviews with students, expert opinion, and showed moderate agreement in 4th grade students (16) and substantial agreement in 8th grade students (17).

Despite these limitations, these data suggest that grade context has a major influence on children's self-assessment of weight.


We thank the SPAN study group for comments. Approval for the SPAN III study was obtained from the Committee for the Protection of Human Subjects at the University of Texas Health Science Center at Houston, as well as the Texas Department of State Health Services Institutional Review Board and participating school districts. Parents completed either an active or passive consent (depending on school district procedures for parental consent), and children completed a child assent form.


Article Information

Format Available

Full text: HTML | PDF

2010 North American Association for the Study of Obesity (NAASO)

Request Permissions

Publication History

  • Issue online:
  • Version of record online:
  • Submitted for publication December 08, 2008; Accepted for publication in final form April 28, 2009


  • Christakis NA, Fowler JH. The spread of obesity in a large social network over 32 years. N Engl J Med2007;357:370–379.
  • Trogdon JG, Nonnemaker J, Pais J. Peer effects in adolescent overweight. J Health Econ2008;27:1388–1399.
  • Powell LM, Tauras JA, Ross H. The importance of peer effects, cigarette prices and tobacco control policies for youth smoking behavior. J Health Econ2005;24:950–968.
  • Norton EC, Lindrooth RC, Ennett ST. Controlling for the endogeneity of peer substance use on adolescent alcohol and tobacco use. Health Econ1998;7:439–453.
  • Lakdawalla D, Philipson T. The growth of obesity and technological change: a theoretical and empirical investigation National Bureau of Economic Research. 2002; Working Paper 8965.
  • Burke MA, Heiland F. Social dynamics of obesity. Econ Inq2007;45:571–591.
  • Akerman A, Williams ME, Meunier J. Perception versus reality: an exploration of children's measured body mass in relation to caregivers' estimates. J Health Psychol2007;12:871–882.
  • Boa-Sorte N, Neri LA, Leite MEet al. Maternal perceptions and self-perception of the nutritional status of children and adolescents from private schools. J Pediatr (Rio J)2007;83:349–356.
  • Hackie M, Bowles CL. Maternal perception of their overweight children. Public Health Nurs2007;24:538–546.
  • He M, Evans A. Are parents aware that their children are overweight or obese? Do they careCan Fam Physician2007;53:1493–1499.
  • Huang JS, Becerra K, Oda Tet al. Parental ability to discriminate the weight status of children: results of a survey. Pediatrics2007;120:e112–e119.
  • Wald ER, Ewing LJ, Cluss Pet al. Parental perception of children's weight in a paediatric primary care setting. Child Care Health Dev2007;33:738–743.
  • Ojala K, Vereecken C, Välimaa Ret al. Attempts to lose weight among overweight and non-overweight adolescents: a cross-national survey. Int J Behav Nutr Phys Act2007;4:50.
  • Hoelscher DM, Day RS, Lee ESet al. Measuring the prevalence of overweight in Texas schoolchildren. Am J Public Health2004;94:1002–1008.
  • Barlow Expert Committee. Expert committee recommendations regarding the prevention, assessment, and treatment of child and adolescent overweight and obesity: summary report. Pediatrics2007;120Suppl 4:S164–S192.
  • Penkilo M, George GC, Hoelscher DM. Reproducibility of the School-Based Nutrition Monitoring Questionnaire among fourth-grade students in Texas. J Nutr Educ Behav2008;40:20–27.
  • Hoelscher DM, Day RS, Kelder SH, Ward JL. Reproducibility and validity of the secondary level School-Based Nutrition Monitoring student questionnaire. J Am Diet Assoc2003;103:186–194.

Related content

Articles related to the one you are viewing

Citing Literature

BMI data and models

Population for empirical evaluation. The Swiss Health Survey (SHS) is a population-based cross-sectional survey. Since 1992, it has been conducted every five years by the Swiss Federal Statistical Office17. For this study, we restricted the sample from the 2012 survey to 16,427 individuals aged between 18 and 74 years. Height and weight were self-reported by telephone interview. Records with extreme values of height or weight were excluded (highest and lowest percentile by sex). Smoking status was categorized into never smoked, former smokers, light smokers (1 – 9 cigarettes per day), moderate smokers (10 – 19), and heavy smokers (> 19). Individuals who never smoked stated that they did not currently smoke and never regularly smoked for longer than a six-month period; former smokers had quit smoking but had smoked for more than 6 months during their life. One cigarillo or pipe was counted as two cigarettes, and one cigar was counted as four cigarettes. The following adjustment variables were included: fruit and vegetable consumption, physical activity, and alcohol intake. Information on the number of days per week fruits and vegetables were consumed was available. We chose to categorize as close to the “5-a-day” recommendation as possible18. Fruit and vegetable consumption was combined in one binary variable that comprised the information on whether both fruits and vegetables were consumed daily or not. The variable describing physical activity was defined as the number of days per week a subject started to sweat during leisure time physical activity and was categorized as > 2 days, 1 – 2 days, or none. Alcohol intake was included using the continuous variable grams per day. Education was included as highest degree obtained and was categorized as mandatory (International Standard Classification of Education, ISCED 1-2), secondary II (ISCED 3-4), or tertiary (ISCED 5-8)19. Nationality had the two categories: Swiss and foreign. Language region reflecting cultural differences within Switzerland was categorized as German/Romansh, French, or Italian.

Models for BMI distributions. Binary logistic regression, ordered, and unordered polytomous logistic regression20 were previously applied to the analysis of BMI distributions based on ad hoc categorized BMI values. We will review the corresponding parameterizations and compare the model parameters in the common framework of model (1) before introducing the novel continuous outcome logistic regression for the analysis of BMI distributions.

  • Binary logistic regression For a binary outcome, such as non-obesity vs. obesity (BMI30 = I(BMI ≤ 30)), the regression function is defined for non-obese individuals only

  •                                                 r(30 | smk, sex, x) = α30 + γsmk:sex + xβ,

  • with intercept α30, main and interaction parameters γ of smoking and sex, and regression coefficients or covariate parameters β. This model evaluates the conditional distribution function for BMI only at b = 30. Note that a change of the BMI cut-off point b leads to a different model, and thus different parameter estimates for all parameters αb, γ, and β. Such models have been reported for b = 25 or b = 3011,12.

  • Ordered polytomous logistic regression This model is also known as proportional odds logistic regression for an ordered categorical outcome, such as the WHO categories3 underweight (BMI18.5 = I(BMI ≤ 18.5)), normal weight (BMI(18.5,25] = I(18.5 < BMI ≤ 25)), overweight (BMI(25,30] = I(25 < BMI ≤ 30)), and obese (BMI > 30). For these four categories, the model is defined by three category-specific regression functions

  •                                                 r(18.5 | smk, sex, x) = α18.5 + γsmk:sex + xβ

  •                                                    r(25 | smk, sex, x) = α(18.5,25] + γsmk:sex + xβ

  •                                                    r(30 | smk, sex, x) = α(25,30] + γsmk:sex + xβ

  • or, in more compact notation, by r(b | smk, sex, x) = α(b) + γsmk:sex + xβ with intercept function

  • The parameters γ and β are the same for all three regression functions and can be interpreted as category-independent log-odds ratios as a consequence of the proportional odds assumption on these parameters. The intercept function increases monotonically. Ordered polytomous logistic regression can be understood as a series of binary logistic regression models where only the intercept is allowed to change with increasing BMI values at cut-off points chosen ad hoc. Self-reported BMI values using the WHO criteria have been analyzed by such a model in 7. The BMI distribution of children categorized at marginal percentiles has been analyzed by a proportional odds model in 13.

  • An extension of ordered polytomous regression to continuous responses, treating the intercept function α as a step-function at the observations with subsequent non-parametric maximum likelihood estimation, was recently suggested by 21. Unlike the model and estimation procedure discussed here, their method does not allow for the different likelihood contributions presented in the next section.

  • Unordered polytomous logistic regression Multinomial logistic regression is equivalent to polytomous logistic regression for an unordered outcome and is a generalization of the proportional odds model as it allows for category-specific parameters γ(b) and β(b) in the regression function

  •                                                 r(b | smk, sex, x) = α(b) + γ(b)smk:sex + xβ(b)

  • for b ∈ {18.5, 25, 30}. The model can be used to test the proportional odds assumption, i.e.,γγ(b) and ββ(b) for all b ∈ {18.5, 25, 30}. Typically, the model is introduced as a model of the conditional density by the relationship between density and distribution function for discrete variables (as in (2)). This model is very popular for the analysis of BMI-related outcomes8–10.

The novel continuous outcome logistic regression model can be viewed as a generalization of the above-introduced models from discrete to continuous outcomes. Like these discrete models, the continuous BMI logistic regression model does not require strong parametric assumptions for the conditional BMI distribution, yet it allows to model the conceptually continuous BMI variable by a continuous distribution, regardless of the scale of the actual BMI measurements.

The most important aspect here is a smooth and monotonically increasing intercept function α(b). In an unconditional model for the marginal BMI distribution

                                     logit(ℙ(BMI ≤ b)) = r(b) = α(b),

such an intercept function can model arbitrary BMI distribution functions by the term expit(α(b)) (technical details of the specification and estimation of such an intercept function are given in the Appendix). This essentially removes the need to specify a strict parametric distribution, such as the normal, for BMI. Because of a potential impact of both smoking and sex of the individual on the entire distribution, we stratify this intercept function with respect to these two variables, i.e., one specific intercept function is dedicated to each combination of smoking and sex:

logit(ℙ(BMI ≤ b | smk, sex)) = r(b | smk, sex) = α(b)smk:sex.

This model is also assumption free, because arbitrary BMI distribution functions can be assigned to each combination of sex and smoking.

To facilitate model interpretation, we assume that regression coefficients β of the remaining covariates are constant across the entire BMI distribution in our final model

                                logit(ℙ(BMI ≤ b | smk, sex, x)) = r(b | smk, sex)     (4)

                                                                                 = α(b)smk:sex + xβ.

The regression coefficients β are log-odds ratios of all possible events BMI ≤ b, b > 0. The interpretation of the parameters β is the same in logistic regression, proportional odds regression, and the novel continuous BMI logistic regression (4). Of course, these constant regression coefficients might be incorrectly specified. Residual analysis, for example using the residual U = ℙ(BMI ≤ b | smk, sex, x) for a subject with BMI b, can help to detect such misspecifications. Similar to Cox-Snell residuals, the residual U is uniform when the model is correct.

Our model (4) can be understood as a joint model of all possible binary logistic regression models for the outcomes BMI ≤ b with b > 0 under two constraints: (1) the sex- and smoking-level-specific intercept is not allowed to jump abruptly, thus less parameters are required in this joint model, and increases for increasing cut-off points b; (2) the regression coefficients β are held constant as b increases. Instead of restricting our attention to specific binary logistic regression models defined by some cut-off points chosen ad hoc, we can answer questions about the odds ratios for all or specific events BMI ≤ b post hoc based on this model.

The interpretation of the sex- and smoking-specific intercept functions, and thus the associations of smoking and sex with BMI, however, is fundamentally different from the interpretation of the regression coefficients β. Because we allow the entire BMI distribution to change with these two variables in more complex ways, there is no simple interaction term γ that captures these parameters in model (4). However, model (4) allows computation of the log-odds ratios for some event BMI ≤ b between, for example, female former smokers and females who never smoked for all x as

r(b | former smoker, female, x) – r(b | never smoked, female, x) = α(b)former smoker:femaleα(b)never smoked:female

In this way, the parameters and contrasts we are interested in are not directly parameterized in model (4) but nevertheless can be obtained from this model by relatively simple contrasts. The events BMI ≤ b are not restricted to those of a specific categorization of the BMI measurements (such as the WHO categories). Due to the smoothness of the underlying intercept functions, log-odds ratios can be computed for arbitrary BMI values b > 0.

Likelihoods for BMI models. Because the regression function r is defined for all possible BMI values b in model (4), the likelihood (2) can be evaluated for all types of intervals (b, b] and also for “exact” BMI values computed as the ratio of weight and squared height. We distinguished between four different likelihood contributions corresponding to four different BMI measurement scales.

  • WHO categories (WHO) The BMI for each individual was reported in one of the four WHO categories corresponding to the intervals ≤ 18.5 (under-weight), (18.5, 25] (normal weight), (25, 30] (over-weight), > 30 (obese). The likelihood contribution of a normal-weight individual is thus

  •                                                 expit(r(25 | smk, sex, x)) – expit(r(18.5 | smk, sex, x)).

  • Other categories (Int 1) Other studies might have used a different categorization scheme, e.g., the 21 categories defined by BMI intervals for length two:

                                                    ≤ 17, (17, 19], (19, 21], . . . , (35, 37], > 37.                                                

  • An individual with a BMI value between 19 and 21 thus contributes

  •                                                 expit(r(21 | smk, sex, x))–expit(r(19 | smk, sex, x))

  • to the likelihood.

  • Numeric intervals (Int 2) With weight measured in kilogram and height in meters, the BMI is calculated according to its definition as BMI = weight/height2. However, for an individual 1.75m tall weighting 76kg, all BMI values between 75.5/1.7552 = 24.51 and 76.5/1.7452 = 25.12 are consistent with this individual due to rounding error. Thus, this individual contributes

  •                                                 expit(r(25.12 | smk, sex, x)) − expit(r(24.51 | smk, sex, x))

  • to the likelihood, which automatically takes the measurement error into account. These intervals can be expected to be much larger in studies that rely on self-reported weights and heights.

  • Exact measurements (Exact) If extreme precision was used to measure weight and height, BMI = weight/height2 can be considered an “exact” observation. Because the interval around this value is very narrow, one can approximate the likelihood contribution by the density of the conditional BMI distribution

  • evaluated at the “exact” BMI value.

It is important to note that it is possible to evaluate the likelihood when a mixture of these different BMI measurement scales is applied to subsets of the individuals. In subject-level meta analyses, for example, it would be possible to estimate a joint model based on studies using different BMI categorizations or no categorization at all. From a purely theoretical point of view, the application of numeric intervals that take rounding error into account (Int 2) is most appropriate. The remaining three procedures must be considered approximate.

0 Replies to “Reproducible Research Peer Assignment 2 Bmis”

Lascia un Commento

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *