Validation of 24-Hour Dietary Recalls Across the Lifespan: Methodological Considerations for Clinical and Biomedical Research

Carter Jenkins Dec 02, 2025 197

This article provides a comprehensive analysis of the validity of the 24-hour dietary recall method across different age groups, from young children to older adults.

Validation of 24-Hour Dietary Recalls Across the Lifespan: Methodological Considerations for Clinical and Biomedical Research

Abstract

This article provides a comprehensive analysis of the validity of the 24-hour dietary recall method across different age groups, from young children to older adults. It explores the foundational principles of dietary validation, details age-specific methodological applications, and addresses key challenges such as under-reporting and measurement error. Aimed at researchers, scientists, and drug development professionals, the content synthesizes current evidence on validation techniques, including comparison with doubly labeled water and weighed food records. The review emphasizes the critical implications of accurate dietary assessment for nutritional epidemiology, clinical trial design, and the development of safe and effective pharmaceuticals for diverse populations.

The Core Principles and Age-Specific Challenges of Dietary Recall Validation

In nutritional research, the concept of "validity" is not monolithic. It encompasses distinct dimensions that serve different research purposes. Group-level validity refers to the accuracy of a dietary assessment method for estimating mean intakes and distributions within a population, which is sufficient for epidemiological studies examining diet-disease relationships at the population level. In contrast, individual-level validity denotes the precision required to accurately classify an individual's intake relative to others or to a reference standard, which is necessary for clinical diagnostics, personalized nutrition, and dietary counseling. This distinction is crucial when evaluating 24-hour dietary recalls (24HR), as their performance varies significantly depending on whether the intended use involves population-level surveillance or individual-level assessment.

The growing integration of technology into dietary assessment tools, including web-based and mobile applications, has introduced new dimensions to this validity paradigm. While these tools can reduce administrative burden and improve data standardization, they do not automatically eliminate fundamental challenges such as misreporting, recall bias, and portion size estimation errors [1] [2]. This analysis examines how different 24HR validation approaches and their outcomes vary across age groups, with specific implications for research design and interpretation.

Quantitative Validity Comparisons Across Age Groups

Table 1: Validity Metrics of 24-Hour Recalls Across Different Populations

Population Group	Reference Method	Energy Intake Correlation	Misreporting Direction	Key Findings
Danish Adults (n=71, 53.2±9.1 years) [1]	Biomarkers (Urinary potassium, serum folate)	Energy vs. TEE: ρ=0.38; Protein: ρ=0.45; Potassium: ρ=0.42	Under-reporting (87% classified as acceptable reporters)	Strong correlation for folate (ρ=0.62); Useful for ranking individuals by intake
Burkinabe Adolescents (12-14 years) [3]	Observed Weighed Records	Equivalence within 15% bound	Under-reporting (Mean ratio: 0.92)	Better accuracy in older adolescents (12-14 years) vs. younger (10-11 years)
French-Canadian Adolescents (n=111, 12-17 years) [4]	Interviewer-administered 24HR	Significant for most nutrients (range: 0.24-0.52)	Over-reporting (8.8% higher energy intake)	36.6% classified in same quartile, 39.6% in adjacent quartile
Older Korean Adults (n=119, 72.2±8.0 years) [5]	Weighed Food Intake	No significant difference for energy/macronutrients	Portion size overestimation (mean ratio: 1.34)	Recalled 71.4% of foods consumed; women more accurate than men
Various Adults (Systematic Review) [6]	Doubly Labeled Water	Variable across studies	Predominant under-reporting (more frequent in females)	24HR had less variation in under-reporting compared to other methods

Table 2: Technology-Based Dietary Assessment Tools and Their Applications

Tool Name	Target Population	Key Features	Validation Evidence
myfood24 [1]	Adults and adolescents	Web-based, supports weighed food records, recipe builder	Validated against biomarkers; Strong reproducibility for most nutrients (ρ≥0.50)
ASA24 [7]	Age 12+ (5th grade reading level)	Automated self-administered, multiple-pass method	Adapted from USDA AMPM; Over 1,000 publications using collected data
Foodbook24 [8]	Diverse populations	Multilingual, expanded food lists for different ethnicities	Strong correlations for 58% of nutrients compared to interviewer-led recall
R24W (French-Canadian) [4]	French-speaking adolescents	Web-based, automated multiple-pass method in French	Acceptable relative validity vs. interviewer-administered 24HR
Traqq App [9]	Adolescents (under evaluation)	Ecological momentary assessment with short recall windows (2-hour & 4-hour recalls)	Protocol includes comparison with FFQ and interviewer-administered 24HR

Methodological Protocols in Validation Studies

Biomarker-Based Validation Protocols

The highest standard for validating dietary assessment methods involves comparison against objective biomarkers, which provides a reference measure independent of self-reporting errors. The myfood24 validation study exemplifies this approach in healthy Danish adults [1]. The protocol incorporated:

Study Design: A repeated cross-sectional design with two assessment periods 4±1 weeks apart
Dietary Assessment: Participants completed seven-day weighed food records using the myfood24 web-based tool
Biomarker Collection: Fasting blood samples (for serum folate) and 24-hour urine collections (for urea and potassium)
Energy Metabolism Assessment: Resting energy expenditure measured via indirect calorimetry
Statistical Classification: Application of the Goldberg cut-off to identify acceptable energy reporters

This comprehensive approach allowed researchers to compare estimated nutrient intakes against objective measures of actual nutrient exposure, strengthening validity conclusions beyond what is possible with method-to-method comparisons alone.

Direct Observation Protocols

For populations where biomarker collection is challenging, such as adolescents, direct observation protocols provide a robust alternative. The Burkinabe adolescent validation study employed rigorous observational methods [3]:

Trained Observers: Research assistants accompanied adolescents from first to last meal, weighing all foods and beverages before and after consumption using digital scales accurate to 1g
Standardized Procedures: When recipe preparation couldn't be observed directly, research assistants recorded food descriptions with distinguishing ingredients
24HR Implementation: Multiple-pass 24-hour recalls conducted the day after observation using standardized portion estimation aids
Equivalence Testing: Statistical analysis to determine if mean differences between methods fell within predefined acceptable bounds (±15%)

This protocol demonstrated that adolescents aged 12-14 years could provide valid recalls without parental assistance, an important consideration for study design in this age group.

Comparative Method Protocols

Many validation studies compare new digital tools against established interviewer-administered methods. The French-Canadian R24W validation in adolescents exemplifies this approach [4]:

Counterbalanced Design: Participants completed both web-based (R24W) and interviewer-administered 24HR recalls in varying order to control for sequence effects
Standardized Reference Method: Registered dietitians conducted interviews using the USDA Automated Multiple-Pass Method (AMPM) with portion size aids
Multiple Administrations: Participants completed up to three R24W recalls within one month to assess impact of repeated administrations
Comprehensive Statistical Analysis: Paired t-tests, correlation analysis, cross-classification, and Bland-Altman plots for agreement assessment

This protocol revealed that the web-based system produced slightly higher intake estimates for most nutrients compared to interviewer-administered recalls, highlighting how method characteristics can influence reporting patterns.

Visualizing the Validation Concept Framework

Age-Specific Considerations in 24-Hour Recall Validity

Adolescent Populations

Adolescents present unique validation challenges due to cognitive development stages, irregular eating patterns, and social influences. Research indicates that validity improves with age during adolescence, with studies supporting the use of 24HR without parental assistance for those aged 12-14 years [3]. The French-Canadian R24W validation found that while the tool showed acceptable relative validity for group-level assessment among adolescents aged 12-17, nutrient-specific variations occurred, with saturated fat intake overestimated by 25.2% compared to interviewer-administered recalls [4]. Technological adaptations like the Traqq app, which uses repeated short recall windows (2-hour and 4-hour recalls) instead of traditional 24-hour periods, aim to address memory-related limitations in this population [9].

Older Adult Populations

Older adults present distinct challenges related to cognitive changes and dietary patterns. The Korean validation study in adults aged ≥60 years revealed that while energy and macronutrient intake estimates were generally accurate at the group level, participants recalled only 71.4% of foods consumed and significantly overestimated portion sizes (mean ratio: 1.34) [5]. This suggests that 24HR methods maintain group-level validity for energy and macronutrients in older populations despite substantial errors in individual food reporting. The study also found sex differences, with women demonstrating better food item recall accuracy than men (75.6% vs. 65.2%), highlighting how participant characteristics can moderate validity in this age group.

General Adult Populations

In general adult populations, systematic reviews comparing self-reported energy intake against doubly labeled water measurements reveal consistent under-reporting across various 24HR methodologies [6]. This under-reporting is more pronounced in women and shows substantial variability between individuals, supporting the use of these methods for group-level rather than individual-level assessment. Technology-based tools like myfood24 demonstrate strong reproducibility for most nutrients (ρ≥0.50) when administered repeatedly, though performance varies by specific nutrient [1].

Table 3: Essential Research Reagents and Tools for Dietary Validation Studies

Tool Category	Specific Examples	Research Application	Key Considerations
Reference Standards	Doubly Labeled Water (DLW) [6], Urinary Nitrogen [1], Serum Folate [1]	Objective validation against energy expenditure or nutrient status	Consider cost, participant burden, and analytical requirements
Portion Size Estimation Aids	Digital food scales [3], Geometric portion models [5], Food atlases, Image-assisted methods	Improve quantification of consumed amounts	Cultural appropriateness and food-specific accuracy varies
Technology Platforms	myfood24 [1], ASA24 [7], Foodbook24 [8], R24W [4]	Automated dietary data collection and nutrient analysis	Requires validation for specific study populations and cultural contexts
Dietary Databases	UK CoFID [8], Canadian Nutrient File [4], Local composition databases	Nutrient calculation from reported foods	Database completeness for ethnic and culturally-specific foods
Statistical Packages	PC-SIDE [10], Equivalence testing protocols [3], Misreporting analysis	Adjust for within-person variation and analyze method agreement	Specialized methods required for dietary data structure

Implications for Research Design and Implementation

The distinction between group-level and individual-level validity has profound implications for research design. For group-level applications such as epidemiological studies and population surveillance, a single 24HR administration may suffice for estimating population means, though the number of recalls required depends on the nutrient of interest and study population heterogeneity [10]. Research demonstrates that increasing from one to three 24HR administrations significantly improves usual intake estimation and reduces misclassification in population assessments [10].

For individual-level applications including clinical assessment and personalized interventions, most 24HR methods show insufficient precision for reliable individual classification without repeated administrations. The finding that only 36.6% of adolescents were classified in the same quartile by both web-based and interviewer-administered recalls highlights this limitation [4]. Individual-level assessment requires either repeated administrations or integration with objective biomarkers to improve precision.

Cultural and linguistic adaptation emerges as a critical factor in tool validity. The expansion of Foodbook24 to include Brazilian and Polish food items and languages improved its appropriateness for diverse populations in Ireland [8]. Similarly, tools must be specifically validated for different age groups, as cognitive abilities, dietary patterns, and reporting capabilities vary significantly across the lifespan [3] [5] [4].

The validity of 24-hour dietary recalls is fundamentally contextual, dependent on both the population being assessed and the intended research application. Current evidence supports the use of these methods for group-level assessment across age groups when appropriately validated for specific populations. For individual-level applications, however, most 24HR methods require repeated administrations or biomarker integration to achieve sufficient precision. The growing integration of technology into dietary assessment offers opportunities to reduce administrative burden and improve standardization but does not eliminate fundamental validity limitations. Researchers must carefully match their choice of dietary assessment method to their specific research questions, recognizing the distinct evidence requirements for group-level surveillance versus individual-level assessment.

Accurate measurement of energy intake (EI) is fundamental to nutritional epidemiology, obesity research, and understanding the relationship between diet and chronic diseases. Without valid dietary assessment methods, linking nutritional exposures to health outcomes becomes unreliable, potentially leading to spurious conclusions. A persistent challenge in nutritional science has been the inherent inaccuracy of self-reported dietary data, which relies on participants' memory, perception of portion sizes, and honesty in reporting. For decades, this problem plagued research, even leading to misconceptions such as the belief that individuals with obesity had low energy intakes, when the issue was actually systematic under-reporting of consumption.

The doubly labeled water (DLW) method has emerged as the reference standard for validating dietary assessment tools because it provides an objective measure of total energy expenditure (TEE). In weight-stable individuals, TEE equals EI, creating a robust benchmark against which self-reported intake can be compared. This guide examines how DLW validation has revealed systematic misreporting across different dietary assessment methods and age populations, providing researchers with crucial insights for interpreting nutritional data and designing future studies.

Understanding Doubly Labeled Water Methodology

Fundamental Principles and Physiological Basis

The doubly labeled water technique measures carbon dioxide production to calculate energy expenditure in free-living individuals over extended periods. The method is founded on the principle that when body weight and composition remain stable, total energy expenditure must equal energy intake. This makes it an ideal reference method for validating self-reported dietary intake without the biases inherent in self-report measures.

The DLW method involves administering orally a dose of water containing two stable isotopes: deuterium (²H) and oxygen-18 (¹⁸O). The deuterium washes out of the body as water, while the oxygen-18 eliminates as both water and carbon dioxide. The difference in elimination rates between the two isotopes therefore provides a measure of carbon dioxide production, which can be converted to energy expenditure using standard calorimetric equations. This process occurs over 7-14 days in most validation studies, capturing typical variations in physical activity patterns.

Detailed Experimental Protocol

The standard DLW protocol involves precise measurements and careful sample handling:

Baseline urine sample collection prior to isotope administration to determine natural background levels of the isotopes.
Oral administration of a prepared dose of DLW based on body weight (typically 1.1 g per kg of body weight), containing precisely measured quantities of ²H₂O and H₂¹⁸O [11].
Post-dose urine sampling at predetermined intervals (typically at days 1, 2, 13, and 14 in a 14-day protocol) to track isotope elimination rates [11].
Isotopic analysis of urine samples using isotope ratio mass spectrometry to determine the enrichment of both isotopes in each sample.
Calculation of carbon dioxide production using the formula: rCO₂ (mol/day) = 0.4554 × TBW × (1.007ko - 1.041kh), where TBW is total body water, ko is the elimination rate of ¹⁸O, and kh is the elimination rate of ²H [11].
Conversion to total energy expenditure using the modified Weir equation: TEE (kcal/day) = 3.9 × (rCO₂) + 1.1 × (rCO₂), where rCO₂ is the rate of carbon dioxide production [11].

The following diagram illustrates the experimental workflow for DLW validation studies:

Comparative Performance of Dietary Assessment Methods Across Age Groups

Validation studies using DLW have consistently revealed that all self-reported dietary assessment methods exhibit some degree of misreporting, though the magnitude and direction vary considerably by method, age group, and population characteristics.

Quantitative Comparison of Method Accuracy

Table 1: Performance of Dietary Assessment Methods Validated by Doubly Labeled Water

Assessment Method	Population Age	Mean Bias (EI vs TEE)	Group-Level Validity	Individual-Level Validity	Key References
24-Hour Multiple Pass Recall	Children (5-7 years)	+250 kJ/d overestimation	Acceptable	Poor	[12]
24-Hour Multiple Pass Recall	Young Children (4-7 years)	No significant difference	Valid	Poor	[13]
24-Hour Recall	Adults (20-49 years)	-307.5 kcal/d underestimation	Questionable	Poor	[11]
Food Records	Children (1-18 years)	-262.9 kcal/d underestimation	Variable	Poor	[14]
Food Frequency Questionnaire	Children (1-18 years)	+44.5 kcal/d overestimation	Variable	Poor	[14]
Observer-recorded Food Records + Recall	Overweight/Obese Adults	-4% to +3% underreporting	Valid	Acceptable	[15]

Age-Specific Patterns in Reporting Accuracy

Children and Adolescents: In younger children (ages 4-7), the 24-hour multiple pass recall method shows a slight tendency toward overreporting, with one study finding a median overestimation of 250 kJ/d, though this was only statistically significant in girls [12]. Interestingly, the degree of inaccuracy appears to decrease as children age, with school-aged children showing less bias than preschool children in longitudinal assessments. A systematic review of 33 studies comparing dietary assessment methods with DLW in children aged 1-18 years found that food records significantly underestimated TEI by a mean of 262.9 kcal/day, while food frequency questionnaires and 24-hour recalls showed no significant differences with DLW-estimated TEE at the group level [14].

Adults: In adult populations, underreporting becomes more pronounced. A 2022 study of Korean adults aged 20-49 years found that 24-hour diet recalls underestimated energy intake by 12.0% compared to TEE measured by DLW, with underprediction rates of 60.5% across all subjects [11]. This pattern of underreporting was consistent across genders, though slightly more pronounced in women (11.8% underreporting) than men (12.2% underreporting). A systematic review from 2019 encompassing 59 studies and 6,298 adults confirmed that the majority of dietary assessment methods demonstrate significant underreporting when compared to DLW-measured TEE [6].

Elderly Populations: Research on rural elderly populations (mean age 74 years) has shown that 3-day self-reported diet records consistently underestimated energy intake compared to TEE measured by DLW [16]. However, physical activity recall methods using age- and gender-specific estimates of resting metabolic rate accurately estimated TEE for this demographic group.

Detailed Experimental Protocols for Key Validation Studies

24-Hour Multiple Pass Recall in Children

The validation study of 24-hour multiple pass recall against DLW in sixty-three children (median age 6 years) followed a rigorous protocol:

DLW Protocol: Total energy expenditure was measured using the DLW method over a specified period. The DLW was prepared on a per kilogram of total body weight basis by combining 1.03 g of H₂¹⁸O (10% enriched) and 0.07 g of ²H₂O (99.9% enriched) [11].
Dietary Assessment: Energy intake was estimated using the standardized 24-hour multiple pass recall method, which employs a structured interview process to enhance recall accuracy.
Statistical Analysis: The agreement between TEE and EI was assessed using Bland-Altman analysis, which revealed a group bias of overestimation of EI by 250 kJ/d with wide limits of agreement (-2880, 2380 kJ/d), indicating poor accuracy at the individual level [12].

Food Record Validation in Adult Populations

A comprehensive validation study in overweight and obese individuals implemented this protocol:

Study Population: 32 healthy women and 22 healthy men with mean BMIs of 29.5 and 30.3, respectively.
Dietary Assessment Method: Combined observer-recorded weighed-food records for cafeteria meals with 24-hour snack recalls for foods consumed outside the cafeteria. This approach reduced reliance on participant memory and estimation skills.
Validation Period: 2-week assessment during which body weight was measured at beginning and end to confirm energy balance.
Results: The mean EI was 96.9% ± 17.0% and 103% ± 18.9% of measured TEE for women and men, respectively, with no significant weight changes, supporting the validity of this combined method for this population [15].

Research Reagent Solutions for DLW Studies

Table 2: Essential Research Reagents and Materials for DLW Validation Studies

Reagent/Material	Technical Specifications	Primary Function	Example Application
Deuterium Oxide (²H₂O)	99.9% isotopic enrichment	Stable isotope tracer for water turnover measurement	Labeled water component in DLW dose [11]
Oxygen-18 Water (H₂¹⁸O)	10% isotopic enrichment	Stable isotope tracer for both water and carbon dioxide turnover	Second labeled component in DLW dose [11]
Isotope Ratio Mass Spectrometer	High-precision analytical instrument	Measurement of isotope ratios in biological samples	Analysis of ²H and ¹⁸O enrichment in urine samples [11]
Urine Collection Vials	Cryogenic storage capabilities	Secure sample preservation during study period	Collection and storage of urine samples at predetermined intervals
Laboratory Information Management System	Specialized software for isotopic studies	Tracking and managing sample data throughout analysis	Maintaining chain of custody for numerous samples in large studies
Certified Reference Materials	Isotopic standards for calibration	Quality control and method validation	Ensuring analytical accuracy across multiple batches

Advancements and Alternative Approaches

Predictive Equations from Large-Scale DLW Data

The International Atomic Energy Agency Doubly Labeled Water Database has enabled the development of sophisticated predictive equations for TEE. Using 6,497 DLW measurements from individuals aged 4 to 96 years, researchers have derived a regression equation that predicts expected TEE from easily acquired variables such as body weight, age, and sex [17]. This approach provides 95% predictive limits that can screen for misreporting in dietary studies without requiring actual DLW measurement for every participant.

When applied to large datasets like the National Diet and Nutrition Survey and National Health and Nutrition Examination Survey, this equation identified misreporting in approximately 27.4% of dietary reports. Furthermore, the analysis revealed that macronutrient composition from dietary reports was systematically biased as the level of misreporting increased, potentially leading to spurious associations between diet components and body mass index [17].

Mathematical Modeling as an Alternative to DLW

Recent research has validated mathematical methods that estimate long-term changes in free-living energy intake using only repeated body weight measurements and initial demographic information. In a study of 140 individuals over two years, this approach produced mean energy intake change values within 40 kcal/d of those obtained using the DLW method combined with DXA scans [18]. For individual subjects, the root mean square deviation between the model and DLW method was 215 kcal/d, making this a promising inexpensive alternative to resource-intensive DLW studies for certain research applications.

Implications for Research and Practice

The consistent finding across validation studies is that while some dietary assessment methods may provide reasonable estimates of energy intake at the group level, all self-report methods show poor accuracy at the individual level. This has crucial implications for both research and clinical practice:

Nutritional Epidemiology: Associations between self-reported dietary intake and health outcomes should be interpreted with caution, particularly for nutrients that correlate strongly with total energy intake.
Study Design: Researchers should consider incorporating objective measures like DLW or predictive equations in validation subsamples to quantify and adjust for misreporting biases.
Clinical Assessment: In clinical settings where precise energy intake measurement is crucial, methods with less reliance on participant memory and estimation (such as observer-recorded records) should be prioritized.
Method Selection: The appropriate dietary assessment method depends on the research question, population characteristics, and available resources, with the understanding that all self-report methods have limitations that must be acknowledged and addressed analytically.

The DLW method remains the gold standard for validating dietary assessment tools, and its application has fundamentally advanced our understanding of the limitations inherent in self-reported dietary data across age groups and populations.

Accurate dietary and memory recall is a cornerstone of nutritional epidemiology, cognitive health assessment, and clinical trials. However, the accuracy of self-reported information is not uniform across the population; it is significantly influenced by age-related changes in cognitive function, physiological processes, and lifestyle factors. Understanding these variations is critical for researchers and drug development professionals who rely on precise data, such as that from 24-hour dietary recalls (24HR) and memory tests, to draw valid conclusions about diet-disease relationships and cognitive health interventions. This guide objectively compares recall accuracy across different age groups by synthesizing current validation studies, providing detailed experimental protocols, and presenting quantitative data to inform methodological choices in research.

The following tables summarize key findings from recent studies on recall accuracy across different age groups and assessment domains.

Table 1: Age-Related Differences in 24-Hour Dietary Recall (24HR) Accuracy

Age Group	Study/Context	Recall Accuracy Metric	Key Finding	Reference
Older Adults (≥60 years)	Korean adults (mean 72.2 yrs) vs. weighed intake	Food item recall rate	Recalled 71.4% of foods consumed	[19]
		Portion size estimation	Overestimated by a mean ratio of 1.34	[19]
		Energy & nutrient intake	Differences vs. weighed intake were not statistically significant	[19]
		Sex difference	Women (75.6%) recalled more foods than men (65.2%)	[19]
Adults (General)	Japanese adults using Web-based 24HR	Correlation with Weighed Food Record (WFR)	Moderate correlation for energy/nutrients (Median r: Men=0.51, Women=0.38)	[20]
		Bias for most nutrients	Within ±10% of WFR	[20]

Table 2: Age-Related Differences in Cognitive Task Performance

Cognitive Domain	Age Group	Experimental Condition	Key Finding	Reference
Auditory Sentence Recall	Young (mean 21.15 yrs)	Time-compressed, dual sentences	Higher recall accuracy than older adults	[21]
	Older (mean 64.50 yrs)	Time-compressed, dual sentences	Significant decline in recall accuracy; most pronounced in this condition	[21]
Working Memory	Healthy Older Adults (mean 64.8 yrs)	After tPBM (Transcranial Photobiomodulation)	3-back task accuracy significantly improved	[22]
Verbal Memory	Older Adults (60-85 yrs)	After overnight olfactory enrichment	Verbal memory retention improved by 215%	[23]

Factors Influencing Recall Accuracy Across the Lifespan

The decline in recall accuracy with age is not monolithic but is influenced by a confluence of interacting factors. The diagram below synthesizes findings from the literature to illustrate the primary pathways through which age affects recall accuracy.

Cognitive Factors

Cognitive resources are fundamental for accurate encoding and retrieval of information. Age-related cognitive decline directly impacts the efficacy of this process.

Working Memory and Processing Speed: The auditory recall study [21] demonstrated that older adults struggle significantly more than younger adults when recalling time-compressed sentences in a dual-task setting. The researchers attributed this to age-related reductions in working memory capacity and processing speed, which are essential for holding and manipulating auditory information. The study found that older adults with higher baseline processing speed were somewhat protected against the negative effects of time compression.
Attentional Resources and Cognitive Load: The same study observed that as cognitive load increased (from single to dual sentences and from natural to compressed speech), older adults showed a disproportionate decline in recall accuracy [21]. This supports the framework that understanding speech and recalling it later is an effortful process that draws upon a finite pool of cognitive resources, which diminishes with age.
Memory Systems: Recall tasks engage different memory systems. Dietary recalls rely heavily on episodic memory (recalling specific events) and prospective memory (remembering to report everything) [24]. Aging is associated with a well-documented decline in episodic memory, which directly contributes to the omission of food items observed in dietary studies of older adults [19].

Physiological Factors

Biological aging, from sensory perception to cellular processes, forms the foundation upon which cognitive functions operate.

Sensory Acuity: The auditory study [21] was conducted with participants having normal or near-normal hearing thresholds, yet age-related performance declines were still stark. This suggests that even mild, subclinical hearing loss, which is prevalent in aging, can compound cognitive load. When listening effort increases to decode degraded auditory signals, fewer cognitive resources are available for encoding and recall.
Brain Structure and Function: Neuroimaging from the auditory study revealed that younger adults robustly activated the premotor cortex during listening tasks, an area critical for speech perception and planning [21]. Older adults, in contrast, showed significantly weaker activation in these key regions, indicating a fundamental shift in the neural mechanisms supporting recall. Furthermore, a new biomarker of brain aging, DunedinPACNI, quantifies brain structure from MRI scans and is linked to faster cognitive decline, reduced physical capabilities, and higher risk for dementia and chronic disease [25].
Pace of Biological Aging: The DunedinPACNI research [25] demonstrates that the speed of biological aging varies between individuals. This "Pace of Aging" is a multisystem physiological process that predicts cognitive and physical decline, providing a biological basis for the variability in recall accuracy observed among chronologically similar older adults.

Lifestyle and Methodological Factors

External circumstances and the way recall is assessed also play a critical role in accuracy.

Socioeconomic and Educational Status: The dietary study in Niger highlighted how food insecurity and lack of access to diverse foods can influence dietary patterns and, by extension, the complexity of conducting accurate dietary recalls [26]. Higher education levels are generally associated with better cognitive performance and have been investigated as a factor that might improve dietary reporting accuracy, though findings can be complex [19].
Dietary Complexity: The validation of the Japanese web-based 24HR system noted the particular challenge of assessing intake in populations that consume many mixed dishes [20]. This requires sophisticated recipe databases, and inaccuracies in describing these complex meals are a potential source of error that may disproportionately affect older adults who may have less familiarity with digital interfaces for detailing such information.
Interview Mode and Retention Interval: A study on progressive recall methods found that shortening the time between eating and reporting (retention interval) led to a significantly higher number of foods being reported for evening meals [27]. This suggests that memory decay is a major source of error in traditional 24HR, a factor that likely has a greater impact on older adults with declining memory function.

Detailed Experimental Protocols in Recall Validation

To critically appraise the data on recall accuracy, it is essential to understand the methodologies used in the validation studies cited.

Protocol 1: Validation of 24HR Against Weighed Food Records in Older Adults

This protocol is adapted from the study by Mun et al. (2025) [19].

Objective: To assess the validity of 24HRs in free-living older Korean adults by comparison with weighed food intakes.
Study Design: A one-day feeding study with a subsequent 24HR interview.
Participants: 119 adults aged 60 years and older (mean age 72.2 ± 8.0 years).
Methodology:
- Weighed Food Intake (Reference Method): Participants consumed three self-served meals in a controlled setting. The weight of all foods served and the leftovers was discreetly measured to calculate the exact weight of food consumed.
- 24-Hour Recall (Test Method): On the day following the feeding study, a detailed 24HR interview was conducted by trained personnel. Interviews were performed either in person or via an online video call.
- Data Analysis:
  - Food Item Matching: The proportion of foods consumed that were correctly reported (matches), omitted (exclusions), and incorrectly reported (intrusions) was calculated.
  - Portion Size and Nutrient Analysis: The ratio of reported to weighed portion sizes was computed. Energy and nutrient intakes from the 24HR were compared to those from the weighed records using paired t-tests or non-parametric equivalents. Linear regression was used to examine associations with participant characteristics (sex, age, BMI, education, interview mode).

Protocol 2: Assessing Auditory Sentence Recall Under Cognitive Load

This protocol is adapted from Sinfield et al. (2025) as reported [21].

Objective: To investigate how sentence quantity and speech rate affect sentence recall accuracy and neural activity in young and older adults.
Study Design: A multi-modal, within-subjects laboratory study.
Participants: 40 participants with normal hearing (20 young adults, mean 21.15 years; 20 older adults, mean 64.50 years).
Methodology:
- Experimental Tasks: Participants completed a delayed sentence recall task under four conditions, crossing sentence number (single vs. dual) and speech rate (natural vs. 50% time-compressed).
- Behavioral Measures: Recall accuracy was scored based on the correct repetition of the sentences.
- Neurophysiological Recording: Systemic Physiological Augmentation functional Near-Infrared Spectroscopy (SPA-fNIRS) was used to monitor hemodynamic responses in brain regions including the left inferior frontal gyrus, premotor cortex, and auditory cortex. Galvanic Skin Response (GSR) was measured to assess physiological arousal.
- Subjective and Cognitive Measures: Participants rated subjective workload using the NASA-TLX questionnaire. Cognitive abilities were assessed using working memory and processing speed subtests from the Wechsler Adult Intelligence Scale (WAIS).
- Data Analysis: Linear mixed models were used to assess the effects of age, sentence number, and speech rate on recall accuracy and brain activation.

The Scientist's Toolkit: Key Research Reagents and Materials

Table 3: Essential Materials for Recall Validation Research

Item	Function in Research	Specific Example from Literature
Weighed Food Records (WFR)	The gold standard reference method for validating dietary assessment tools. Participants weigh and record all consumed foods.	Used as the validation standard against the Web24HR in Japanese adults [20] and against the 24HR in older Korean adults [19].
Automated Self-Administered 24HR (ASA24)	A web-based, automated system for self-administered 24-hour dietary recalls, improving scalability and reducing cost.	One of three technology-assisted 24HR methods evaluated for accuracy and cost-effectiveness in a controlled feeding study protocol [28].
Functional Near-Infrared Spectroscopy (fNIRS)	A non-invasive neuroimaging technique using light to measure cortical hemodynamic responses, suitable for complex task environments.	Used to measure brain activation in the premotor cortex and other areas during auditory sentence recall tasks [21]. SPA-fNIRS was the specific variant used.
DunedinPACNI Algorithm	An open-source algorithm that calculates a biological brain age index from a single T1-weighted MRI scan, predicting cognitive decline and health outcomes.	Used to quantify individual differences in the pace of brain aging and link it to future cognitive and physical health [25].
Controlled Feeding Study	A study design where researchers provide all meals and unobtrusively weigh intake, creating a "true" value for comparison with recall methods.	The core design for validating the accuracy of ASA24, Intake24, and mFR24 dietary tools [28] and for validating 24HR in older adults [19].
n-back Task	A cognitive test used to assess working memory capacity. Participants indicate when the current stimulus matches one presented "n" steps back.	Used to measure working memory performance in healthy older adults before and after transcranial photobiomodulation (tPBM) intervention [22].
Multiple-Pass 24-Hour Recall Protocol	A structured interview technique involving multiple passes (quick list, forgotten foods, time/occasion, detail, final review) to enhance recall completeness.	The foundational method automated by systems like Intake24 and ASA24 [28] [27].

The evidence clearly demonstrates that age is a critical determinant of recall accuracy in both dietary and cognitive domains. Older adults consistently show higher rates of omission, greater portion size miscalibration, and increased vulnerability to high cognitive load conditions. These deficits are rooted in a combination of declining cognitive resources, age-related physiological changes in sensory and brain systems, and methodological challenges. For researchers and drug development professionals, these findings underscore the necessity of adopting age-sensitive methodologies. This includes using validated, technology-assisted tools, considering shorter retention intervals, accounting for sensory deficits, and potentially incorporating biomarkers of biological aging into study designs. Acknowledging and adjusting for these age-related influences is not merely a methodological refinement but a fundamental requirement for generating reliable and valid data in aging populations.

In nutritional epidemiology and public health research, self-reported dietary data serve as a cornerstone for understanding the relationship between diet and health outcomes across populations. The 24-hour dietary recall (24HR) is one of the most widely used methods for capturing individual dietary intake, where participants report all foods and beverages consumed over the previous 24 hours. However, these self-reported data are susceptible to various forms of measurement error that can substantially impact the validity of research findings and subsequent public health recommendations.

Understanding the nature and magnitude of these errors—categorized primarily as systematic error (bias) or random error—is crucial for interpreting study results accurately and designing robust research methodologies. This is particularly relevant when comparing validation studies across different demographic groups, such as children, adolescents, adults, and elderly populations, who may exhibit distinct reporting patterns and capabilities. This article examines the spectrum of error in self-reported dietary data, with a specific focus on 24-hour recall methodologies across different age groups, providing researchers with evidence-based strategies to identify, quantify, and mitigate these errors in nutritional research.

Theoretical Framework: Systematic vs. Random Error

Measurement error in self-reported data fundamentally falls into two distinct categories, each with different implications for research validity and statistical inference.

Systematic error, also known as bias, represents a consistent distortion in measurement that deviates from the true value in a predictable direction [29] [30]. In dietary assessment, this manifests as consistent over-reporting or under-reporting of food intake. Systematic error cannot be reduced simply by increasing sample size or repeating measurements, as it stems from inherent flaws in the measurement process itself [29]. Key subtypes of systematic error include:

Intake-related bias: The "flattened-slope" phenomenon where individuals with high true intake tend to under-report, while those with low true intake tend to over-report [29].
Person-specific bias: Systematic reporting patterns related to individual characteristics such as body image concerns, social desirability, or cognitive factors [29].

Random error (within-person random error) represents unpredictable fluctuations in measurements that occur from one administration to another [29]. In dietary assessment, this includes day-to-day variation in actual food intake as well as incidental errors in reporting [29]. Unlike systematic error, random error does not consistently push measurements in one direction but creates noise in the data. While it does not inherently bias mean estimates in large samples, it reduces precision and increases variance, potentially obscuring true relationships between variables [30].

Table 1: Fundamental Differences Between Systematic and Random Error

Characteristic	Systematic Error (Bias)	Random Error
Direction	Consistent deviation in one direction	Unpredictable fluctuations in both directions
Effect on results	Reduces accuracy	Reduces precision
Statistical impact	Biased mean estimates	Increased variance
Reduction through repetition	Not reduced by repeated measures	Reduced by repeated measures
Primary sources in dietary recall	Social desirability, memory bias, body image concerns	Day-to-day intake variation, incidental reporting mistakes

Visualizing Error Types and Their Impacts

The following diagram illustrates how systematic and random errors affect dietary data collection and analysis across different stages of the research process:

Figure 1: Pathways through which systematic and random errors enter and affect the dietary research process. Systematic errors (red) consistently distort reported intake, while random errors (blue) introduce variability into statistical analysis.

Case Study: Validation of Web-Based 24-Hour Recalls in Adolescents

A recent study investigating the relative validity of a web-based self-administered 24-hour dietary recall (R24W) among active adolescents provides compelling evidence of both systematic and random errors in self-reported dietary data [31]. The study compared the R24W against a traditional interviewer-administered 24-hour recall as the reference method in a sample of 272 French-speaking adolescents aged 12-17 years from Québec.

Experimental Protocol

The validation study employed a comparative design where participants completed both assessment methods:

Self-administered web-based recall (R24W): Participants completed three recalls using the web-based platform over one month, guided by a multiple-pass approach with portion size images [31].
Interviewer-administered recall: Registered dietitians conducted one interview using the USDA Automated Multiple-Pass Method (AMPM) with portion estimation aids [31].
Statistical analysis: Researchers used paired t-tests, correlation analysis, cross-classification, weighted Kappa, and Bland-Altman plots to assess agreement between methods [31].

Key Quantitative Findings

Table 2: Validation Results of Web-Based vs. Interviewer-Administered 24-Hour Recalls in Adolescents [31]

Metric	Web-Based R24W	Interviewer-Administered	Difference (%)	Statistical Significance
Energy intake (mean kcal)	2558 ± 1128	2444 ± 998	+8.8%	p < 0.05
Saturated fat intake	-	-	+25.2%	p < 0.001
% Energy from fat	-	-	+6.5%	p < 0.05
Correlation coefficients (nutrients)	Range: 0.24 to 0.52	-	-	p < 0.01 for most nutrients
Cross-classification same quartile	36.6%	-	-	-
Cross-classification adjacent quartile	39.6%	-	-	-
Cross-classification misclassified	5.7%	-	-	-

The findings revealed significant systematic error, with the web-based tool consistently yielding higher estimates for energy and most nutrients compared to the interviewer-administered recall [31]. The proportional bias observed in Bland-Altman plots for 7 out of 25 nutrients further confirms the presence of systematic error that varies by intake level [31]. The study also demonstrated that completing at least two recalls with the R24W increased precision, highlighting a strategy to mitigate random error [31].

Age-Specific Considerations in 24-Hour Recall Validation

The manifestation and magnitude of measurement errors in self-reported dietary data vary considerably across different age groups, necessitating age-appropriate validation and adjustment approaches.

Adolescent Populations

Adolescents present unique challenges for dietary assessment due to ongoing cognitive development, body image concerns, and irregular eating patterns. The R24W validation study demonstrated that web-based tools can achieve acceptable relative validity in this population, with correlation coefficients for nutrients ranging from 0.24 to 0.52 [31]. However, the consistent over-reporting observed (8.8% for energy intake) suggests presence of systematic error potentially linked to portion size estimation challenges or social desirability biases in this age group [31].

Broader Age Group Comparisons

Research indicates that the number of 24-hour recalls significantly impacts random error reduction across all age groups. A study in an urban Mexican population found that three 24-hour recalls substantially improved estimates of energy and nutrient intakes compared to single recalls, with particularly dramatic differences in prevalence of inadequacy estimates for nutrients like folate and calcium [10]. For example, in preschool children, the estimated prevalence of inadequacy for folate decreased from 30% with 1-day recall to 3.7% with 3-day recalls [10].

Table 3: Impact of Multiple 24-Hour Recalls on Random Error Reduction Across Age Groups

Age Group	Nutrient	Prevalence of Inadequacy (1-day)	Prevalence of Inadequacy (3-day)	Reduction in Misclassification
Preschool children	Folate	30%	3.7%	87.7%
Preschool children	Calcium	43%	4.6%	89.3%
Various age/sex groups	Fiber	73-99%	Improved estimation	Reduced variance
Various age/sex groups	Iron	31-94%	Improved estimation	Reduced variance

Methodological Strategies for Error Mitigation

Reducing Random Error

Multiple Recall Administrations: Collecting at least two to three 24-hour recalls per participant significantly reduces the impact of day-to-day variation and improves estimation of usual intake [31] [10]. Statistical modeling approaches (e.g., PC-SIDE software) can further adjust for within-person variation [10].
Large Sample Sizes: Increasing sample size helps counterbalance random error through the averaging effect across a larger population [30].
Standardized Protocols: Implementing controlled experimental procedures and standardized training for researchers reduces incidental measurement variability [30].

Addressing Systematic Error

Instrument Calibration: Regular validation against reference methods (e.g., recovery biomarkers, interviewer-administered recalls) helps identify and correct for consistent biases [31] [30].
Triangulation: Using multiple assessment methods (e.g., recalls, food records, biomarkers) provides complementary data streams to identify and adjust for systematic biases [30].
Age-Appropriate Tools: Developing and validating assessment instruments specifically for target age groups, such as the R24W for adolescents, improves contextual appropriateness and reduces systematic reporting biases [31].

Essential Research Reagent Solutions

Table 4: Key Methodological Tools for Dietary Recall Validation Studies

Research Tool	Primary Function	Application in Error Reduction
Web-Based 24HR Platforms (ASA24, R24W)	Self-administered dietary data collection	Standardizes data collection to reduce random error; incorporates multiple-pass approach to reduce systematic recall bias [31] [7]
Statistical Modeling Software (PC-SIDE)	Adjustment for day-to-day variation in intake	Corrects for random error in estimates of usual intake distribution [10]
Multiple-Pass Method (AMPM)	Structured interview technique for dietary recalls	Reduces systematic recall bias through progressive probing [31]
Portion Size Visualization Aids	Image-assisted food quantity estimation	Minimizes systematic error in portion size reporting [31]
Biomarker Validation	Objective verification of nutrient intake	Identifies and quantifies systematic reporting biases [32]

The validation of self-reported dietary data requires careful consideration of both systematic and random errors across different population groups. Evidence from recent studies demonstrates that while web-based 24-hour recalls show acceptable validity for use with adolescents, they still exhibit significant systematic biases in nutrient estimation [31]. The number of recall days substantially impacts random error reduction, with multiple administrations (2-3 days) providing markedly improved estimates of usual intake compared to single recalls [10].

Future research should continue to develop and validate age-appropriate assessment tools that minimize both types of error through improved interface design, enhanced portion size estimation aids, and statistical adjustment procedures. Particular attention should be paid to how systematic biases may vary across developmental stages, cultural contexts, and socioeconomic groups to ensure valid dietary assessment across diverse populations.

Implementing and Adapting 24-Hour Recall Protocols for Different Age Cohorts

The Automated Multiple-Pass Method (AMPM) represents the current gold standard for collecting 24-hour dietary recalls in large-scale nutritional studies. Developed by the United States Department of Agriculture (USDA), this computerized, interviewer-administered method employs a structured 5-step approach designed to enhance complete and accurate food recall while reducing respondent burden [33]. The method's robustness stems from its systematic approach to mitigating recall errors through multiple cognitive passes that stimulate memory.

Within nutritional epidemiology, accurate dietary assessment is fundamental for investigating diet-disease relationships, yet traditional methods often suffer from significant measurement error. The AMPM framework directly addresses these limitations through its standardized protocol, which has been validated across diverse populations [34] [35]. As research increasingly focuses on life-stage nutritional requirements, understanding how the AMPM performs across different age groups becomes crucial for interpreting dietary data and designing age-appropriate interventions.

Experimental Protocols and Methodological Framework

The AMPM Core Structure

The AMPM utilizes five distinct passes to comprehensively capture dietary intake:

Quick List: Respondents freely list all foods and beverages consumed the previous day without interviewer prompting.
Forgotten Foods: Interviewers probe for commonly omitted items (e.g., sweets, beverages, snacks).
Time and Occasion: Temporal organization of eating occasions to create a chronological framework.
Detail Cycle: Thorough probing for food descriptions, preparation methods, and portion sizes using standardized measurement aids.
Final Review: Opportunity for respondents to confirm completeness and accuracy of the reported information [33].

This multi-pass structure strategically addresses different cognitive processes to enhance memory retrieval, making it particularly valuable for populations with potential recall challenges, including older adults [5].

Validation Study Designs

The validity of the AMPM has been rigorously tested through various experimental designs:

Doubly Labeled Water (DLW) Validation: In a seminal study with 524 volunteers aged 30-69, energy intake collected via AMPM was compared against total energy expenditure measured using the doubly labeled water technique. This biomarker-based approach provides an objective measure of energy reporting accuracy [35].

Controlled Feeding Studies: Researchers discreetly weighed all food consumed by participants (n=119 older Korean adults) and compared these values with AMPM-derived recalls conducted the following day. This design provides a direct measure of reporting accuracy for specific foods and nutrients [5].

Cross-Method Comparisons: Large-scale field trials (n=1,081) have compared the AMPM with its self-administered counterpart, the Automated Self-Administered 24-Hour Recall (ASA24), to evaluate equivalence in reported intakes across different administration modes [36].

The following diagram illustrates the typical AMPM validation workflow against objective reference measures:

Figure 1: AMPM Validation Study Workflow

Performance Data Across Population Subgroups

Quantitative Validation Metrics

The AMPM's performance varies across demographic groups, with particular implications for research involving different age cohorts. The following table synthesizes key validation metrics from multiple studies:

Table 1: AMPM Validation Metrics Across Population Subgroups

Population	Sample Size	Reference Method	Energy Reporting Difference	Food Item Recall Accuracy	Key Findings
Adults (30-69 years) [35]	524 (50% female)	Doubly Labeled Water	Overall: -11% underreportingNormal BMI: -3% underreportingObese: Highest underreporting	Not specified	78% of men, 74% of women classified as "acceptable reporters"
Older Korean Adults (≥60 years) [5]	119 (60% female)	Weighed Food Intake	Non-significant difference	71.4% of foods recalledWomen: 75.6%Men: 65.2%	Significant portion size overestimation (mean ratio: 1.34)
Obese Women [34]	49 (BMI 30-45)	Controlled Feeding	Non-significant difference	Not specified	More accurate than normal-weight and overweight women
Normal-Weight & Overweight Women [34]	49 (BMI 20-29.9)	Controlled Feeding	Significant overestimation: 8-10% for energy and carbohydrates	Not specified	Protein intake also significantly overestimated

Age-Specific Considerations

Research specifically examining older adult populations reveals distinctive patterns in AMPM performance. In a study of free-living older Korean adults (mean age 72.2±8.0 years), participants recalled approximately 71.4% of foods consumed but demonstrated significant overestimation of portion sizes (mean ratio: 1.34) [5]. This discrepancy between food item recall and portion estimation highlights the complex nature of memory-related challenges in older populations.

Sex differences in reporting accuracy were particularly pronounced among older adults, with women recalling 75.6% of consumed foods compared to 65.2% in men (P=0.0001) [5]. This substantial gap suggests that age-related validation studies must consider sex as a critical effect modifier when interpreting dietary data.

Interestingly, despite these challenges in food item enumeration and portion estimation, energy and macronutrient intake estimates in older adults were generally accurate compared to weighed intakes, with no statistically significant differences [5]. This paradox suggests potential compensatory mechanisms in the AMPM structure that maintain overall nutrient estimation accuracy even when individual food reporting shows systematic errors.

Technological Evolution: From Interviewer-Administered to Automated Systems

The ASA24 Adaptation

The National Cancer Institute (NCI) adapted the AMPM methodology to create the Automated Self-Administered 24-Hour Recall (ASA24), a web-based, self-administered system that maintains the core multiple-pass structure while eliminating the need for interviewer administration [7]. This technological evolution has significant implications for large-scale studies across diverse age groups, particularly given different technological literacy levels.

Comparative studies demonstrate strong equivalence between AMPM and ASA24. In the Food Reporting Comparison Study (n=1,081), 87% of 20 analyzed nutrients and food groups were statistically equivalent at a 20% bound [36]. The proportions reporting supplement use were equivalent (ASA24: 46% vs. AMPM: 43%), with only minor subgroup variations [37]. Participant preference data revealed a strong inclination toward the self-administered system, with 70% preferring ASA24 over the interviewer-administered AMPM [36].

Administration Mode Considerations Across Age Groups

The table below compares key methodological considerations between AMPM and ASA24 across different administration contexts:

Table 2: Methodological Comparison Across Administration Modes

Characteristic	AMPM (Interviewer-Administered)	ASA24 (Self-Administered)
Staff Requirements	Requires trained interviewers	Automated administration
Cost Structure	Higher personnel costs	Lower marginal cost per recall
Technological Barriers	Minimal for participants	Requires computer/smartphone access and digital literacy
Participant Preference	30% preferred in FORCS trial [36]	70% preferred in FORCS trial [36]
Older Adult Suitability	Potentially better for those with limited tech experience	May present barriers for some older populations
Data Collection Context	Originally in-person or telephone [33]	Online platform, accessible 24/7 [7]
Supplement Reporting	43% reported use [37]	46% reported use [37]

Recent adaptations have explored hybrid models, particularly relevant for older adult populations. The validation of online video call administration of 24-hour recalls among older Korean adults found few significant differences in accuracy compared to in-person interviews [5], suggesting promising alternatives for balancing the benefits of interviewer assistance with the practical advantages of remote data collection.

Research Reagent Solutions

Successful implementation of the Multiple-Pass Method in validation studies requires specific materials and technical components:

Table 3: Essential Research Reagents for Dietary Recall Validation

Research Reagent	Function/Application	Example Implementation
Doubly Labeled Water (DLW)	Objective biomarker for total energy expenditure validation	Compared with AMPM-derived energy intake in 524 adults [35]
Standardized Portion Size Aids	Enhanced visual estimation of food amounts	Food models, measuring cups, rulers mailed to participants [36]
Weighed Food Protocol	Direct measurement of actual consumption for validation	Discreetly weighed self-served meals in older Korean adults [5]
Food and Nutrient Databases	Standardized nutrient composition analysis	USDA Food and Nutrient Database for Dietary Studies (FNDDS) [36]
ASA24 Web Platform	Self-administered 24-hour recall based on AMPM	Used in 1,265+ studies collecting ~173,000 recall days [36] [7]

The Multiple-Pass Method, particularly in its AMPM implementation, provides a validated, standardized approach for collecting 24-hour dietary recalls that demonstrates robust performance across diverse populations. However, the method's effectiveness varies across age groups and demographic characteristics, with older adults showing distinct patterns of food item recall accuracy and portion size estimation errors.

The ongoing evolution from interviewer-administered to automated systems like ASA24 presents both opportunities and challenges for dietary assessment across the lifespan. While these technological advancements offer scalability and participant preference advantages, they must be carefully evaluated for use with older populations who may face technological barriers. Future methodological research should continue to refine these approaches to address age-specific cognitive and behavioral factors in dietary recall, particularly as nutritional epidemiology increasingly focuses on life-stage specific dietary patterns and their health implications.

Accurately measuring dietary intake and health-related quality of life in children presents unique methodological challenges due to their evolving cognitive abilities. This guide compares the performance of parent-proxy reporting against child self-reporting and examines how memory limitations impact data validity across different age groups. Evidence indicates that while proxy reporting is necessary for young children, its concordance with child reports varies significantly based on the child's age, the domain being assessed, and the quality of the parent-child relationship. Similarly, dietary recall accuracy in children is substantially influenced by cognitive demands and retention intervals, with web-based tools offering promising but imperfect solutions.

Quantitative Comparison of Assessment Method Accuracy

Table 1: Performance Metrics of Dietary Assessment Methods in Children

Assessment Method	Population	Match Rate vs. Observed	Omission Rate vs. Observed	Intrusion Rate vs. Observed	Key Limitation
ASA24-Kids-2012 (Self-Administered) [38]	Children 9-11 yrs (Lunch)	37%	35%	27%	Less accurate than interviewer-administered recalls
Interviewer-Administered 24-hr Recall [38]	Children 9-11 yrs (Lunch)	57%	23%	20%	Requires trained staff, can be expensive
ASA24-Kids-2012 (Self-Administered) [38]	Children 9-11 yrs (Dinner)	53%	36%	12%	Performance varies by meal context
Interviewer-Administered 24-hr Recall [38]	Children 9-11 yrs (Dinner)	76%	15%	9%	Performance varies by meal context
Progressive Recall (Intake24) [39]	Adults (Evening Meal)	Reported 5.2 foods (mean)	N/A	N/A	Shorter retention interval improves detail
Standard 24-hr Recall (Intake24) [39]	Adults (Evening Meal)	Reported 4.2 foods (mean)	N/A	N/A	Longer retention interval reduces detail

Table 2: Concordance and Cognitive Factors in Pediatric Assessment

Factor	Impact on Assessment Reliability	Supporting Data
Child's Age	Children under 8 often require a proxy; those aged 10+ may self-report with assistance [38]. Concordance between parent and adolescent reports is lowest in mid-adolescence [40].	NHANES uses adult-assisted reporting for children 6-11 years [38].
Relationship Factors	Parent-child relationship quality (warmth, closeness, communication) is a stronger predictor of reporting concordance than demographic variables like parent gender or role [40].	Systematic review of 21 studies on adolescent wellbeing [40].
Domain Being Measured	Concordance varies by domain; mothers' reports more closely match adolescent self-reports on psychological and emotional wellbeing than other domains [40].	Patterns identified in systematic review [40].
Memory & Retention Interval	Shorter time between eating and recall significantly improves accuracy. Memory for eating events deteriorates within hours [39].	Progressive recalls had retention intervals 15.2 hours shorter on average, leading to more foods reported [39].
Neurodevelopmental Status	Children with Neurodevelopmental Disorders (NDD) show deficits in recognition and paired association tasks compared to typically developing peers [41].	Web-based memory testing in 57 children with NDD vs. 128 with typical development [41].

Detailed Experimental Protocols

Validation Protocol for Automated Dietary Recalls in Children

The validation study for ASA24-Kids-2012 provides a robust template for evaluating dietary assessment tools in pediatric populations [38].

Design: Quasi-experimental study conducted across two sites.
Participants: 69 children aged 9-11 years (n=38 in site 1, n=31 in site 2).
Criterion Method: Direct observation by trained staff. In site 1, observers recorded foods and portions consumed during school lunch. In site 2, observations occurred during a community-based dinner.
Intervention: The following day, children completed both the ASA24-Kids-2012 and a standardized interviewer-administered 24-hour dietary recall in randomized order.
Data Analysis: Foods were classified as:
- Matches: Correctly reported and consumed.
- Omissions: Consumed but not reported.
- Intrusions: Reported but not consumed.
Statistical Analysis: Calculation of match, omission, and intrusion rates. Correlation coefficients between observed and reported serving sizes for matched foods.

Protocol for Web-Based Cognitive and Memory Testing

This protocol assesses cognitive capabilities relevant to dietary recall in both typically developing children and those with neurodevelopmental disorders [41].

Ethics & Recruitment: Ethics approval obtained; participants recruited via school boards and family support groups. Parents/guardians provide informed consent and demographic/medical information.
Platform: Web-based "Memory Game" accessible via URL on participants' own devices.
Tasks: Two primary tasks administered:
- Object Recognition: A simpler, hippocampal-independent task to assess basic attention and participation.
- Paired Association: Requires learning and recalling associations between pictures, testing hippocampal function.
Testing Schedule:
- Day 1: Training followed by immediate testing for short-term memory.
- Day 2: Recall testing 24 hours later for long-term memory assessment.
Data Collected: Accuracy scores and reaction times for both immediate and 24-hour recall.

Visual Workflows for Key Experimental Designs

Diagram 1: Dietary Recall Validation Study Workflow. This chart outlines the quasi-experimental design used to validate the ASA24-Kids-2012 system against the gold standard of observed intake [38].

Diagram 2: Web-Based Memory Assessment Protocol. This workflow shows the remote testing procedure used to evaluate short-term and long-term memory in children, including those with neurodevelopmental disorders (NDD) compared to typically developing (TD) children [41].

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Tools for Pediatric Assessment Validation Research

Tool / Solution	Function in Research	Example Use Case
ASA24-Kids	Automated, self-administered 24-hour dietary recall system adapted for children with reduced food list and simplified probes [38].	Validating against observed intake in children 9-11 years old [38].
NIH Toolbox Cognition Battery (NIHTB-CB)	iPad-based battery measuring attention, episodic memory, language, working memory, and executive function [42].	Assessing cognitive domains relevant to recall ability in children 7-17 years [42].
Web-Based Memory Game	Remote testing platform for paired association and object recognition tasks across multiple time points [41].	Evaluating STM and LTM in children with NDD vs. typical development [41].
R24W	French-language web-based automated 24-hour recall using meal-based approach and portion size images [43].	Validation against controlled feeding studies in adults [43].
Intake24	Open-source system automating multiple-pass 24-hour recall method, validated against interviewer-led recalls [39].	Implementing progressive recall method with shorter retention intervals [39].
Parent-Proxy BPD-QoL Questionnaire	Disease-specific instrument measuring health-related quality of life in young children with bronchopulmonary dysplasia [44].	Assessing HRQoL in children 4-8 years via parent report when self-report is not feasible [44].

The validation of 24-hour recall methods across different age groups must account for fundamental developmental limitations. For children under 8-10 years, parent-proxy reporting remains necessary but requires careful interpretation, with relationship quality being a critical factor in accuracy. For self-reporting children, cognitive limitations—particularly in memory and association—significantly impact data validity. Emerging web-based and mobile technologies show promise in mitigating these challenges through shorter retention intervals, engaging interfaces, and automated coding, but they consistently show lower accuracy than interviewer-administered methods or observed intake. Future research should focus on developing standardized, validated tools that account for the cognitive developmental stage of the target pediatric population.

Accurate dietary assessment is a cornerstone of nutritional epidemiology, essential for understanding diet-disease relationships and informing public health policy. For the adult population, the 24-hour dietary recall (24HR) is a widely used method, available in both interviewer-led and self-administered formats. The emergence of web-based, self-administered tools promises reduced administrative costs and greater scalability for large studies. However, their validity must be rigorously assessed against established methods. This guide objectively compares the performance of self-administered web tools against interviewer-led recalls for adults, synthesizing recent validation evidence to inform researcher selection.

Comparative Performance Data: Self-Administered vs. Interviewer-Led Recalls

The table below summarizes key quantitative findings from recent validation studies conducted in adult populations across different geographic and cultural contexts.

Study (Country)	Tool/Method Name	Reference Method	Key Findings on Agreement & Accuracy
Italy (2025) [45]	FOODCONS (Self-Administered)	FOODCONS (Interviewer-Led)	- Energy & Nutrients: No statistically significant difference in mean intakes of energy, macros, or micronutrients over two days.- Agreement: Good agreement for energy, carbohydrates, and fiber (Bland-Altman).- Correlation: Good concordance for food group intakes.
United Kingdom [46]	myfood24 (Online)	Biomarkers (Urine, etc.)	- Attenuation: Results were attenuated compared to biomarkers (attenuation factors ~0.2-0.3).- Performance: This level of attenuation was similar to the interviewer-based tool.- Nutrient Estimates: Generally 10-20% lower than interviewer-based tool, with wide limits of agreement.
Ireland (2025) [8]	Foodbook24 (Web-based)	Interviewer-Led 24HR	- Correlation: Strong positive correlations for 58% of nutrients and 44% of food groups (r=0.70-0.99).- Food Omissions: Omission rates varied by nationality (e.g., 24% in Brazilian participants vs. 13% in Irish cohort).
Korea (2025) [5] [19]	Interviewer-Administered (In-person/Video)	Weighed Food Intake	- Food Item Recall: Participants recalled 71.4% of consumed foods; women (75.6%) were more accurate than men (65.2%).- Portion Size: Overestimated (mean ratio: 1.34).- Energy/Nutrients: No significant differences for energy and macronutrients.
Korea (2023) [47]	Interviewer-Administered	Weighed Food Intake	- Food Item Recall: High recall rate (95% of foods).- Portion Size: Only 24% of portions were reported within 10% error; 43% were underreported.- Nutrients: Energy and most nutrient intakes were similar to actual, except for underreporting of fat and sodium.

Detailed Experimental Protocols

To critically appraise the evidence, understanding the underlying methodology of these validation studies is crucial. The following workflows detail the protocols from key studies.

Protocol 1: Validation Against Weighed Food Intake (Gold Standard)

This protocol, used in Korean studies, validates the 24HR against a highly controlled measure of true intake [5] [47].

Key Steps Explained:

Participant Recruitment: Community-dwelling adults are recruited. The Korean study [5] focused on individuals aged 60 and older.
Controlled Feeding & Weighing: Participants consume meals in a controlled setting where they serve themselves. Crucially, all food items are discreetly weighed before and after eating to establish a "true" baseline intake with minimal participant awareness [5] [47].
24HR Administration: On the following day, participants complete a 24HR, unaware of the precise timing. Interviews can be conducted in-person or via online video calls to compare modes [5].
Data Comparison: The reported data is compared to the weighed data. Analysis includes:
- Item Accuracy: Calculating match (correctly reported), exclusion (omitted), and intrusion (falsely reported) rates for food items [5] [47].
- Portion Size Accuracy: Calculating the ratio or percent error between reported and actual weighed portions [5].
- Nutrient Intake Analysis: Comparing energy and nutrient intakes derived from the recall versus the weighed data [47].

Protocol 2: Validation Against Biomarkers and Interviewer-Led Recalls

This protocol is common for validating new self-administered tools where a controlled feeding study is not feasible, using biomarkers and established interviewer methods as reference [46] [8].

Key Steps Explained:

Participant Recruitment: Recruiting a metabolically stable adult population representative of the target group [46].
Reference Collection: Gathering data from one or more reference standards.
- Biomarkers: Collecting 24-hour urine samples to measure biomarkers like nitrogen (for protein), potassium, and sodium, which are not subject to memory bias [46].
- Interviewer-Led 24HR: Conducting recalls using a standardized method, such as the USDA Automated Multiple-Pass Method (AMPM), by trained dietitians [46] [4].
Test Tool Administration: Participants complete the self-administered web-based 24HR tool (e.g., myfood24, Foodbook24). To avoid learning effects, the order of reference and test methods is often randomized [46] [8].
Repeated Measures: To estimate usual intake and capture day-to-day variation, the process is repeated over multiple non-consecutive days, sometimes including weekend days [46] [48].
Statistical Comparison: A suite of analyses is performed:
- Attenuation: Comparing how much the tool weakens (attenuates) the diet-disease relationship compared to biomarkers [46].
- Bland-Altman Plots: Assessing the agreement between two methods and identifying any systematic bias [46] [45].
- Correlation Analysis: Measuring the strength of the relationship for nutrient and food group intakes [8].

The Scientist's Toolkit: Essential Research Reagents & Materials

This table outlines key components required for conducting and validating 24-hour dietary recalls in research settings.

Item	Function in Dietary Research
Standardized Food Composition Database	Converts reported food consumption into energy and nutrient values. Examples include the UK's CoFID, Canada's Nutrient File, and local databases. Studies must ensure databases are comprehensive and include culturally-specific foods [46] [8] [45].
Portion Size Estimation Aids	Assist participants in estimating the volume of food consumed. Web tools use photo atlases with multiple portion sizes [46] [4], while interviewer-led methods may use physical aids like clay models, rulers, or household utensils [4].
24HR Administration Software	Platforms like ASA24, myfood24, FOODCONS, and R24W structure the recall process, often using a multiple-pass method to enhance completeness and automate coding [46] [4] [45].
Biomarker Assay Kits	Provide an objective, non-self-report measure of intake for specific nutrients. Urinary nitrogen (protein), potassium, and sodium are commonly validated against [46]. Doubly labeled water is the gold standard for validating total energy expenditure [48].
Structured Interview Protocols	Guide interviewer-led recalls to minimize bias. The USDA's Automated Multiple-Pass Method is a widely adopted standard that uses a five-step process to enhance memory and probe for forgotten foods [4] [45].

For dietary assessment in adult populations, both self-administered web tools and interviewer-led recalls demonstrate utility, with the optimal choice depending on the research priorities.

Interviewer-Led Recalls remain a robust standard, particularly for complex diets and demographic groups where personal interaction can improve reporting completeness. However, they are resource-intensive [45] [47].
Self-Administered Web Tools like myfood24, FOODCONS, and Foodbook24 offer a scalable and cost-effective alternative. They perform comparably to interviewer-led methods for estimating energy and many macronutrients, making them suitable for large-scale studies where high-throughput is essential [46] [8] [45].

Researchers should note that all self-report methods are susceptible to systematic errors, such as portion size misestimation and food omission. The decision should therefore balance the need for precision, scale, resources, and the specific cultural dietary context of the study population.

The accurate assessment of dietary intake is a cornerstone of nutritional research, public health monitoring, and clinical care. The 24-hour dietary recall is one of the most widely used tools for this purpose. However, its validity is not uniform across all population subgroups. Older adults present a unique set of challenges—including a high prevalence of polypharmacy, multiple chronic conditions, and sensory deficits—that can significantly impact the accuracy of self-reported dietary data. This guide critically examines these special considerations, comparing validation data and methodological approaches relevant to researchers and drug development professionals focused on this demographic.

Comparative Validity and Key Challenges Across Age Groups

The validity of 24-hour dietary recalls varies across age groups due to differences in cognitive function, health status, and lifestyle. The table below summarizes key findings from validation studies in different populations, highlighting the specific challenges and performance metrics in older adults.

Table 1: Comparison of 24-Hour Dietary Recall Validity Across Age Groups

Age Group	Reported Energy Intake vs. Reference	Key Challenges Identified	Impact on Nutrient Reporting	Correlation with Reference Method
Adolescents [31]	8.8% higher than interviewer-administered recall	N/A for sensory/cognitive	Saturated fat intake significantly higher (25.2%)	Significant for most nutrients (range: 0.24–0.52)
General Adult Population [49]	Under-reporting common, requires biomarkers (e.g., DLW) for calibration	Day-to-day variation, memory	Varies by nutrient; requires statistical adjustment for usual intake	N/A
Older Adults (Korean Women) [50]	No significant difference in energy intake	Specific food under-reporting (sauces, kimchi), portion size estimation	Fat and sodium significantly under-reported	N/A

The data indicates that while adolescents may exhibit a tendency to over-report energy intake [31], the primary issue in the broader adult population, including older adults, is often under-reporting or inaccurate reporting of specific food items [49] [50]. For older adults, the challenge is less about overall energy recall and more about the accurate identification and portion estimation of specific food items, particularly sauces and foods with complex compositions like kimchi, leading to significant under-reporting of nutrients like sodium and fat [50].

Special Considerations in Older Adults

The unique physiological and psychosocial profile of older adults introduces specific biases and errors in dietary recall.

Polypharmacy and Comorbidities: Older adults are the most rapidly growing segment of the population with a high prevalence of multimorbidity [51]. This inevitably leads to polypharmacy, defined as the use of multiple medications [51]. The concomitant use of several drugs increases the risk of adverse drug reactions (ADEs) and drug-nutrient interactions [51]. For instance, certain medications can alter taste perception or cause dry mouth, which may influence food choices and, consequently, the reporting of dietary intake. Furthermore, conditions like chronic kidney disease can necessitate complex dietary restrictions that are difficult to capture accurately with a single 24-hour recall [51].
Sensory Deficits: Age-related sensory impairments are common and profoundly impact the dietary assessment process. Visual impairment can make it difficult to read food labels, distinguish between medications, and, crucially, see portion size aids or food models used during the interview process [52]. Hearing impairment can lead to misunderstandings during the interviewer-administered recall, potentially resulting in misreported foods or portions [52]. These challenges are magnified in individuals with dual sensory impairment [52].

Detailed Experimental Protocols for Validation

To ensure validity in older populations, specific methodological protocols are essential. The following are detailed from key studies.

The Multiple-Pass Method (MPM) with Adaptations

A validation study in older Korean women used a controlled-feeding design to test an interviewer-administered 24-hour recall based on the MPM [50].

Protocol:
- Quick List: Participants provided an unstructured list of all foods and beverages consumed.
- Forgotten Foods: Interviewers probed for foods commonly forgotten in Korean cuisine (e.g., snacks, condiments like soy sauce and red pepper paste).
- Time and Occasion: The time and name of each eating occasion were recorded.
- Detail Cycle: Each food was described in detail, including portion size using measuring guides, food models, and pictures.
- Final Probe: A final review was conducted for any additional items [50].
Key Adaptations for Older Adults:
- A picture of a traditional Korean food tray was shown to aid memory.
- Food models for staple foods (rice, soup) and various measuring guides (cups, grids, rulers) were used to improve portion size estimation [50].

Controlled-Feeding Study as a Reference

The same study employed a controlled-feeding protocol as the reference standard to validate the recall [50].

Design: Participants were housed and provided all meals and snacks for 5 days. Research staff monitored compliance during mealtimes.
Dietary Measurement: All ingredients were measured to the nearest 0.1 gram during preparation, and prepared foods were weighed to the nearest 1 gram before serving.
Recall Interview: On a randomly selected day, participants completed a single 24-hour recall using the adapted MPM before breakfast, without prior warning [50].

This rigorous design allows for a direct comparison between actual and reported intake, providing a strong measure of accuracy specific to the challenges of a rice-based diet and an older demographic.

Research Workflow for Dietary Recall in Older Adults

The diagram below illustrates a comprehensive research workflow for implementing and validating 24-hour dietary recalls in older populations, integrating the critical considerations of polypharmacy, comorbidities, and sensory deficits.

The Scientist's Toolkit: Essential Research Reagents and Materials

The following table details key materials and their functions for conducting validated dietary assessment research in older adults.

Table 2: Essential Research Reagents and Materials for Dietary Recall Studies in Older Adults

Tool/Reagent	Function/Application in Research
Standardized Multiple-Pass Method (MPM) Protocol	A structured interview guide with multiple memory prompts to minimize forgetting and standardize data collection across participants and interviewers [31] [50].
Portion Size Visualization Aids	Food models, photographs, graduated bowls, cups, and rulers to help participants with visual impairments or cognitive decline estimate and report quantities of consumed foods more accurately [50].
Biomarker Assays (e.g., Doubly Labeled Water, Urinary Sodium)	Objective reference measures to validate self-reported energy intake (DLW) or specific nutrient intake (urinary sodium for salt), crucial for identifying and correcting for systematic biases like under-reporting [49].
Canadian Nutrient File (CNF) or Country-Specific Database	A standardized food composition database that links reported food items to nutrient values, enabling the calculation of energy and nutrient intake from qualitative food recall data [31].
Statistical Software (e.g., PC-SIDE)	Specialized software used to adjust intake data for within-person variation and estimate the distribution of "usual intake" in a population, which is essential for assessing nutrient adequacy and diet-disease relationships [10] [49].
Sensory & Cognitive Assessment Tools	Questionnaires or simple tests to screen for hearing loss, visual acuity, and cognitive impairment, allowing researchers to adapt their methodology or account for these factors in the analysis [52].

Validating 24-hour dietary recalls in older adults demands a tailored approach that accounts for polypharmacy, comorbidities, and sensory deficits. Evidence suggests that while overall energy intake may be recalled with reasonable accuracy in controlled settings, significant biases exist for specific nutrients and food items. Employing robust, adapted methodologies—such as the multiple-pass technique with sensory aids, multiple recalls, and the use of objective biomarkers—is critical for generating reliable data. For researchers in drug development and public health, acknowledging and mitigating these specific sources of error is essential for understanding the true diet-health relationship in this growing demographic.

The 24-hour dietary recall (24HR) is a foundational tool in nutritional epidemiology, providing a detailed snapshot of an individual's food and beverage intake over a single day. However, a fundamental challenge lies in the high day-to-day variation in what people consume. A single day's intake is rarely a true reflection of an individual's "usual" or long-term diet. Consequently, a critical question for researchers designing studies and interpreting data is: how many recalls are needed to estimate usual intake accurately? The answer has profound implications for the validity of findings linking diet to health outcomes, the assessment of nutrient inadequacy prevalence, and the development of public health policies. This guide synthesizes current experimental data to objectively compare the performance of different recall frequencies and provide evidence-based protocols for determining the optimal number of recall days.

Core Principles: Why Multiple Recalls Are Essential

A single 24-hour dietary recall is sufficient to estimate the mean usual intake of a population. However, because it captures only one day, it cannot characterize an individual's habitual diet or accurately determine the distribution of intakes within a group—both of which are essential for estimating the prevalence of inadequate or excessive intake [53].

The following diagram illustrates the critical decision-making workflow for determining recall frequency, balancing accuracy with practical constraints like cost and participant burden.

The necessity for multiple recalls stems from within-person variation—the natural day-to-day fluctuations in an individual's diet. This variation is often greater than the between-person variation (the differences in usual intake between individuals in a population). If not accounted for, high within-person variation can severely distort the observed intake distribution, leading to overestimates of both very low and very high intakes. Statistical methods like the National Cancer Institute (NCI) method can correct for this, but their effectiveness is dependent on having multiple recalls from at least a subset of the study population [54] [53].

Comparative Analysis of Recall Frequencies

The optimal number of recall days is not a one-size-fits-all solution. It is influenced by the study's primary objective, the specific dietary components of interest, and practical constraints. The table below summarizes key experimental findings from recent validation studies.

Table 1: Comparison of Recall Frequencies and Their Performance

Recall Protocol	Key Experimental Findings	Impact on Accuracy	Recommended Use Cases
Single Day	In an urban Mexican population, a single day led to a high prevalence of nutrient inadequacy (e.g., folate at 30% in children) which changed dramatically with more days [10].	Does not account for day-to-day variation; can severely misrepresent the distribution of usual intake and prevalence of inadequacy.	Estimating population mean intake in cross-sectional studies.
Two Non-Consecutive Days	A study in Chinese adults found that two non-consecutive days could, to some extent, be substituted for three consecutive days, with the NCI method further improving accuracy [54].	Reduces within-person variation compared to a single day. Allows for application of statistical adjustment methods to estimate usual intake.	Large national surveys (e.g., NHANES in the US); studies where participant burden is a major concern.
Three Non-Consecutive Days	Research in China demonstrated that three non-consecutive days provided superior accuracy for percentile estimates compared to two days or three consecutive days [54].	Provides a more reliable basis for estimating the distribution of usual intake. Further improves the performance of statistical adjustment methods.	Studies requiring more precise estimates of the proportion of the population above/below a dietary threshold.
Three Consecutive Days	A Mexican study showed that using three days with variance adjustment led to a more accurate estimation of usual intake, drastically changing inadequacy estimates (e.g., child calcium inadequacy dropped from 43% to 4.6%) [10].	Consecutive days may be correlated (e.g., a high-intake day followed by a low-intake day). Non-consecutive days are generally preferred to avoid this.	Less ideal due to potential day-to-day correlation, but still a major improvement over a single day if logistics dictate.

Beyond the number of days, the pattern of data collection also matters. Evidence from a study of Chinese adults, which used over 23 recalls as a reference, found that non-consecutive days yield greater accuracy than consecutive days. For percentiles of intake, the accuracy order was three non-consecutive days, followed by three consecutive days, then two non-consecutive days, with two consecutive days being the least accurate. The difference between two and three days was more significant than the difference between consecutive and non-consecutive days [54].

Detailed Experimental Protocols and Methodologies

To ensure the collection of high-quality data, researchers employ standardized protocols. The following section details the methodologies cited in the comparative analysis.

The Multiple-Pass 24-Hour Recall Method

The Automated Multiple-Pass Method (AMPM), used in major surveys like NHANES, is a structured interview technique designed to enhance memory and reduce omission [7] [4]. Its five passes are:

Quick List: The respondent provides an uninterrupted list of all foods and beverages consumed the previous day.
Forgotten Foods: The interviewer uses specific probes for commonly omitted items (e.g., sweets, sugary drinks, condiments).
Time and Occasion: The respondent associates each food with a time and eating occasion (e.g., "breakfast," "afternoon snack").
Detail Cycle: The interviewer gathers detailed descriptions of each food, including preparation method and portion size, often aided by visual tools like food photographs or 3D models.
Final Review: The interviewer provides a final summary of all reported items for the respondent to confirm or correct.

This method has been adapted into web-based, self-administered tools like the ASA24 and Canada's R24W, which automate the multiple-pass process for use in large-scale studies [7] [4].

Protocol for a Validation Study Comparing Recall Frequencies

A 2022 study in China provides a robust protocol for comparing recall frequencies [54]. The workflow, from participant recruitment to final data analysis, is visualized below.

Participants: 595 adults from diverse geographic locations in China.
Gold Standard: Each participant completed a very high number of 24-hour recalls (28 over a single year). The average of all valid recalls (23 or more) for each individual was defined as their "true" usual intake.
Test Scenarios: The researchers created four test scenarios from the full dataset: two consecutive days (C2), three consecutive days (C3), two non-consecutive days (NC2), and three non-consecutive days (NC3).
Analysis: Dietary intake for each scenario was calculated using two methods: the simple within-person mean (WPM) and the more sophisticated NCI method. The estimates from the test scenarios were then compared to the "true" intake to calculate bias and mean relative bias.

This design allowed for a direct, empirical comparison of the accuracy afforded by different recall frequencies and statistical methods.

Protocol for a Study on Nutrient Inadequacy

A national survey in urban Mexico collected three non-consecutive 24HRs from 1,073 individuals [10].

Objective: To determine if estimates of nutrient intake and inadequacy based on three days were better than those based on one day.
Method: The researchers used the software PC-SIDE to adjust the intake distributions for day-to-day variability using the three days of data.
Comparison: They then compared the prevalence of inadequacy for various nutrients (e.g., fiber, iron, calcium) calculated from a single day's data versus the usual intake estimated from the three days after adjustment.
Finding: The variance in the estimated usual intake distribution from three days was smaller. More importantly, the prevalence of inadequacy was often drastically different, demonstrating that a single day can be highly misleading for this purpose.

Table 2: Key Reagents and Tools for 24-Hour Dietary Recall Research

Tool or Resource	Function	Example & Notes
Standardized Interview Protocol	Provides a structured, validated method for conducting recalls to minimize interviewer bias and improve completeness.	Automated Multiple-Pass Method (AMPM) is the gold standard [7] [4].
Portion Size Estimation Aids	Helps respondents visualize and accurately report the quantities of food consumed.	Food photographs (with multiple portion sizes), 3D food models, and common household measures [5] [4].
Nutrient Composition Database	Converts reported foods and portions into estimates of energy and nutrient intakes.	USDA FoodData Central, Canadian Nutrient File (CNF), or country-specific databases. Culturally relevant databases are critical [8] [55].
Dietary Recall Software	Automates the interview process, data coding, and nutrient analysis, improving efficiency and standardization.	ASA24 (US, Canada, Australia), Foodbook24 (Ireland), SER-24H (Chile) [8] [55] [7].
Statistical Modeling Software	Adjusts intake data from multiple recalls to estimate the distribution of usual intake, removing the effect of day-to-day variation.	Software implementing the NCI Method or the ISU Method (e.g., in R or SAS) is essential [54] [10].

The development of culturally and linguistically adapted software is a critical advancement. For instance, the SER-24H in Chile was developed to include over 7,000 local food items and 1,400 culturally based recipes, without which dietary assessment would be inaccurate [55]. Similarly, Foodbook24 was expanded with 546 foods and translated into Polish and Portuguese to accurately capture the diets of diverse populations in Ireland [8].

The evidence clearly demonstrates that a single 24-hour dietary recall is inadequate for characterizing usual dietary intake at the individual level or for determining the prevalence of nutrient inadequacy in a population. The choice of recall frequency is a balance between statistical precision and practical feasibility.

For estimating population means: A single 24HR administered to a representative sample may be sufficient [53].
For estimating usual intake distributions and prevalence of inadequacy: Collecting multiple non-consecutive 24HRs is mandatory. Evidence supports the use of at least two non-consecutive days, with three providing significantly greater accuracy, especially for nutrients with high day-to-day variability [54] [10].
To optimize resources: Researchers can combine a simple dietary tool (like a food frequency questionnaire) in the entire cohort with multiple 24HRs in a representative sub-sample. The data from the sub-sample can then be used to calibrate the main instrument and correct for measurement error using statistical methods like those from the NCI [56].

In summary, while two non-consecutive 24HRs are the minimum standard for moving beyond population means, three or more non-consecutive recalls, combined with advanced statistical adjustment, represent the current best practice for accurately estimating usual intake and informing meaningful public health recommendations.

Identifying and Mitigating Sources of Error and Bias in Age-Diverse Studies

The validity of 24-hour dietary recalls, a cornerstone of nutritional epidemiology, is consistently challenged by systematic under-reporting of energy intake. Research indicates that this bias is not random but is significantly influenced by participant characteristics such as Body Mass Index (BMI), age, and social desirability bias. This guide compares how these factors affect the accuracy of 24-hour recalls across different populations, providing researchers with a synthesis of experimental data and methodologies to critically evaluate and improve dietary assessment.

Quantitative Comparison of Key Studies on Under-Reporting

The table below summarizes findings from key studies investigating the impact of BMI, age, and social desirability on the accuracy of 24-hour dietary recalls.

Study Focus & Citation	Study Population & Design	Key Quantitative Findings on Under-Reporting
Social Desirability Bias [57]	Adults; comparison of 7-day diet recall (7DDR) with multiple 24-hr recalls [57].	A strong negative correlation was found between social desirability score and reported nutrient intake [57]. The bias was approximately twice as large for women as for men and about 50 kcal/point on the social desirability scale (approx. 450 kcal over its interquartile range) [57].
Age-Related Accuracy [58]	120 children (8-13 years); comparison of a web-based (ASA24) vs. interviewer-administered 24-hr recall [58].	The overall match rate between recall methods was 47.8% [58]. Match rates were significantly lower in younger children (8-9 years old) compared to older children (10-13 years old). Omissions were most common among 8-year-olds [58].
BMI and Weight Stigma [59]	39 adults with BMI ≥25; three 24-hr recalls compared to Resting Metabolic Rate (RMR) [59].	Participants with obesity under-reported by a mean of 477 kcals ((p = 0.02)) [59]. Participants classified as overweight over-reported by a mean of 144 kcals (not significant). Weight stigma constructs did not statistically predict reporting accuracy in this pilot study [59].
Physical Activity Recall [60]	Adolescents and adults; comparison of a Previous-Day Recall (PDR) for activity against the activPAL monitor [60].	Reporting errors in the activity PDR were not associated with BMI or social desirability [60]. The PDR showed high correlations with the reference measure (Sedentary: (r = 0.60) to (0.81); Active: (r = 0.52) to (0.80)) [60].

Detailed Experimental Protocols

Understanding the methodologies behind the data is crucial for interpretation and application.

This study design tests how the need for social approval skews self-reported intake [57].

Core Method: A cross-sectional comparison where participants complete multiple 24-hour diet recalls (24HR) on randomly assigned days, which serve as a basis for comparison. They also complete one 7-day diet recall (7DDR) at the beginning and another at the end of the test period [57].
Bias Measurement: All participants complete a standardized social desirability scale. The scores from this scale are then used in correlation and multiple linear regression analyses to determine their relationship with, and effect on, the nutrient intake scores derived from the 7DDRs relative to the 24HRs [57].
Key Outcome: The analysis quantifies the downward bias (in kcal) per point on the social desirability scale, revealing the magnitude of under-reporting attributable to this psychological trait [57].

Protocol for Validating Recalls in Pediatric Populations

This protocol assesses the feasibility and accuracy of self-administered recalls in children [58].

Core Method: A quasi-experimental study where children (e.g., ages 8-13) are randomly assigned to first complete either an automated self-administered 24-hour recall (ASA24) or a traditional interviewer-administered recall. They subsequently complete the other method for the same time interval [58].
Accuracy Analysis: The reported food items from the two methods are compared and categorized as:
- Matches: The same specific food is reported in both.
- Omissions: Foods reported only in the interviewer-administered recall.
- Intrusions: Foods reported only in the self-administered recall [58].
Statistical Evaluation: Multivariate analysis of variance (MANOVA) is used to determine if the percentages of matches, omissions, and intrusions differ significantly by age group, sex, and race/ethnicity [58].

Protocol for Assessing BMI and Weight Stigma Effects

This mixed-methods design explores the relationship between weight stigma and energy reporting accuracy [59].

Core Method: Participants with overweight and obesity complete three unannounced telephone-administered 24-hour recalls using a multiple-pass method [59].
Reference Measure: Resting Metabolic Rate (RMR) is measured via indirect calorimetry under standardized conditions (fasting, rest, no stimulants). While not a direct measure of total energy intake, RMR provides a lower-bound estimate for identifying implausible low energy reports [59].
Stigma and Bias Assessment: Participants complete validated surveys measuring weight bias internalization and experiences of weight stigma (e.g., the Modified Weight Bias Internalization Scale) [59].
Data Integration: Multiple linear regression analysis tests whether weight bias internalization, weight bias toward others, and experiences of weight stigma are predictive of the accuracy of energy reporting, calculated as (Reported Energy Intake / Measured RMR) [59].

Research Workflow and Factor Relationships

The following diagram illustrates the conceptual framework and relationships between key factors that influence under-reporting in dietary recalls, as revealed by experimental data.

The Scientist's Toolkit: Key Reagents and Materials

The table below lists essential tools and instruments used in the featured validation studies.

Item Name	Function in Research Context
Social Desirability Scales	Validated questionnaires (e.g., Marlowe-Crowne Scale for adults) used to quantify a participant's tendency to respond in a socially acceptable manner, which is a source of measurement bias [57] [60].
Multiple-Pass 24-Hour Recall Protocol	A structured interview technique involving multiple passes (steps) to minimize forgotten foods and improve portion size estimation. It is the methodological gold standard for interviewer-administered recalls [46] [49] [59].
Automated Self-Administered 24-Hour Recall (ASA24)	A web-based tool developed by the National Cancer Institute (NCI) that allows participants to self-report their dietary intake without an interviewer, reducing administrative costs and facilitating repeated measures [46] [58].
Indirect Calorimeter	A device (e.g., Parvo Medics TrueOne 2400) used to measure Resting Metabolic Rate (RMR) via oxygen consumption and carbon dioxide production. It provides an objective lower-bound estimate to identify implausible low energy intake reports [59].
Bioelectrical Impedance Analysis (BIA) Scale	A device (e.g., Tanita TBF-310) used to measure body composition (weight, body fat percentage, fat-free mass), which is used to characterize the study population and calculate BMI [59].
Activity Monitors (activPAL, ActiGraph)	Wearable sensors used as objective reference measures in validation studies for physical activity and sedentary behavior, providing a benchmark against which self-report tools like the Previous-Day Recall (PDR) are compared [60].

Accurate dietary assessment is fundamental for understanding the links between nutrition and non-communicable diseases, which account for over 80% of premature mortality in some regions [61]. While tools like the 24-hour dietary recall (24HR) and Food Frequency Questionnaires (FFQ) are widely used, their accuracy must be validated against objective measures. Biomarkers from serum and urine provide a critical, unbiased reference for this validation [61]. This process is not uniform across populations; age significantly influences dietary reporting accuracy and metabolic response [5] [62]. This guide compares the performance of dietary validation protocols, highlighting how age-specific factors affect the validation of both macronutrient and micronutrient intake.

Experimental Protocols for Validation Studies

A robust validation study requires a carefully designed protocol to ensure results are reliable and comparable. The following are detailed methodologies from key studies in the field.

The PERSIAN Cohort Validation Study

This large-scale study validated a 113-item FFQ against multiple reference methods [61].

Participant Recruitment: 978 individuals from seven different cohort centers in Iran were enrolled to ensure representation of diverse ethnic and dietary habits [61].
Study Design & Timeline:
- An initial FFQ (FFQ1) was completed upon enrollment.
- This was followed by two 24-hour dietary recalls (24HR) each month for twelve months (a total of 24 recalls per participant).
- Seasonal biological samples (serum and 24-hour urine) were collected four times throughout the year.
- A final FFQ (FFQ2) was administered at the end of the 12-month period [61].
Dietary Assessment: The FFQ was interviewer-administered using a food album and models to standardize portion size estimation. The 24HRs used the USDA multiple-pass method and were conducted either in person or over the phone [61].
Biomarker Analysis: Validity was assessed using the "triad method," comparing FFQ data to the 24HRs and to selected biomarkers in the serum and urine samples. Reproducibility was measured by comparing FFQ1 and FFQ2 [61].

Validation in Free-Living Older Adults

This study assessed the accuracy of 24HRs in an older East Asian population by comparing them to discreetly weighed food intake, a direct measure of "true" intake [5].

Participant Recruitment: 119 free-living Korean adults aged 60 and older (mean age 72.2) were recruited from the community [5].
Study Design:
- Participants attended a feeding session where they consumed three self-served meals. All food items were weighed before and after consumption without the participants' knowledge to determine actual intake.
- The following day, a single 24HR interview was conducted to assess recalled intake.
Interview Methods: The 24HR was administered either through an in-person interview or an online video call to evaluate the effect of interview mode [5].
Data Analysis: Accuracy was measured by the match rate (percentage of foods consumed that were reported), intrusion rate (foods reported but not consumed), and the ratio of reported to weighed portion sizes. Energy and nutrient intakes from the recall were compared to the weighed values [5].

Machine Learning Approach to Age and Nutrient Interactions

Leveraging data from the National Health and Nutrition Examination Survey (NHANES), this study used machine learning to explore the relationship between nutrient intake, age, and Metabolic Syndrome (MetS) [62].

Data Source: Ten cycles of NHANES data (1999-2018) were consolidated and preprocessed [62].
Cohort Definition:
- MetS Group: Defined by the presence of three or more risk factors (e.g., elevated waist circumference, blood pressure, triglycerides, fasting glucose, or low HDL-C).
- Optimal Cardiometabolic Health (OCH) Group: Defined by meeting all criteria for healthy adiposity, blood glucose, lipids, and blood pressure [62].
Machine Learning Modeling: The population was split into age groups (≤44 years and ≥45 years). The Extreme Gradient Boosting (XGBoost) algorithm was used to model the data and identify key predictive features, including nutritional indicators, for MetS across these age groups [62].

The following tables synthesize key quantitative findings from the cited research, facilitating a direct comparison of validation metrics across different nutrients, biomarkers, and age groups.

Table 1: Correlation Coefficients between FFQ and Reference Methods in the PERSIAN Cohort [61]

Nutrient	Correlation with 24HR (FFQ1)	Correlation with 24HR (FFQ2)	Correlation with Biomarkers (Validity Coefficient)
Energy	0.57	0.63	-
Protein	0.56	0.62	~0.4 (Urinary Nitrogen)
Lipids	0.51	0.55	>0.4 (Selected Serum Fatty Acids)
Carbohydrates	0.42	0.51	-
Sodium	-	-	>0.4 (Urinary Sodium)
Folate	-	-	>0.4 (Serum Folate)
Vitamin B6 / B12	<0.4	<0.4	-

Table 2: Age-Specific 24HR Accuracy Data from Weighed Intake Study [5]

Metric	Overall	Women	Men
Food Item Match Rate	71.4%	75.6%	65.2%
Exact Portion Size Match	38.0%	-	-
Mean Portion Size Overestimation	1.34x	-	-
Energy & Macronutrient Intake	Not significantly different from weighed intake	-	-

Table 3: Machine Learning Model Performance for Predicting Metabolic Syndrome [62]

Dataset Description	Model	Sensitivity	Specificity	Accuracy	AUC
All Age Groups (Balanced)	XGBoost	-	-	-	> 0.89
Younger Cohort (≤44 years)	XGBoost	-	-	-	-
Middle-aged & Elderly (≥45 years)	XGBoost	-	-	-	-

Visualization of Methodologies and Workflows

The following diagrams illustrate the core experimental designs and analytical processes described in the research, providing a clear visual summary of the complex workflows.

Experimental Workflow for the PERSIAN Cohort Validation Study

Analytical Workflow for Age-Specific Nutrient Impact Analysis

The Scientist's Toolkit: Essential Research Reagent Solutions

This table details key materials and tools required for conducting rigorous dietary validation studies.

Table 4: Key Reagents and Materials for Dietary Validation Research

Item	Function & Application
Semi-Quantitative FFQ	A standardized questionnaire (e.g., 113-item PERSIAN FFQ) to assess long-term, habitual dietary intake for ranking individuals within a population [61].
Validated 24-Hour Recall Protocol	A structured interview method (e.g., USDA multiple-pass) used as a reference to capture recent detailed intake or to validate an FFQ over multiple administrations [61] [5].
Biomarker Assay Kits	Commercial kits for quantifying specific nutrients in biological samples (e.g., serum folate, urinary nitrogen, fatty acid profiles) to provide an objective measure of nutrient intake [61].
Food Atlas / Portion Size Models	Visual aids (photo albums, utensils, 3D models) to help participants accurately estimate and report the portion sizes of consumed foods during recalls or FFQ administration [61].
Biological Sample Collection Supplies	Kits for the proper collection, storage, and transport of biological specimens, including serum vials and 24-hour urine containers [61].

The validation of dietary assessment tools is a nuanced process, where the choice of reference method and the demographic characteristics of the study population critically influence outcomes. Data shows that FFQs can effectively rank individuals by nutrient intake when validated against repeated 24HRs and biomarkers [61]. However, 24HRs themselves show age-related variations in accuracy, with older adults demonstrating different food item recall rates and portion size estimation errors [5]. Furthermore, advanced computational approaches confirm that the physiological impact of nutrients is not static but varies across the lifespan, necessitating age-specific analysis in nutritional epidemiology [62]. Therefore, a one-size-fits-all approach to dietary validation is inadequate. Future research and public health strategies must incorporate these age-specific and method-specific insights to develop more precise and effective nutritional interventions.

Accurate dietary assessment is a cornerstone of nutritional epidemiology, public health research, and clinical practice. Within this field, portion size estimation represents a fundamental challenge and a significant source of measurement error. Inaccurate self-report of portion sizes is a major cause of measurement error in dietary assessment, potentially compromising the validity of research findings and the effectiveness of nutritional interventions [63]. The "flat-slope phenomenon," where large portions tend to be underestimated and small portions overestimated, further complicates accurate reporting [63].

The growing digitization of dietary assessment methods has catalyzed innovation in portion size estimation aids (PSEAs). This article provides a comparative analysis of contemporary tools and methodologies, examining their relative performance across different population groups and settings. Understanding the strengths and limitations of these approaches is essential for researchers selecting appropriate methods for 24-hour dietary recall validation studies across diverse age groups.

Comparative Analysis of Portion Size Estimation Methodologies

Performance Characteristics of Major Estimation Aids

The table below summarizes the key performance characteristics of major portion size estimation methodologies based on recent validation studies:

Table 1: Performance Comparison of Portion Size Estimation Methodologies

Methodology	Reported Accuracy Range	Strengths	Limitations	Optimal Use Context
Text-Based (TB-PSE)(Household measures, standard portions)	31% within 10% of true intake [63]	• Superior accuracy for amorphous foods & liquids [63]• Avoids image perception issues	• Relies on understanding of measures [63]• Potential for vague descriptions	• Populations with cooking knowledge
Image-Based (IB-PSE)(Food photographs, image libraries)	13% within 10% of true intake [63]	• Visual reference point• Useful for single-unit foods [63]	• Influenced by perception & conceptualization [63]• "Flat-slope phenomenon" persists	• Mixed-diet studies• Tech-literate populations• Large-scale automated surveys
Progressive Recall(Multiple short recalls throughout day)	25% more foods reported for evening meals vs. single recall [39]	• Reduces retention interval (15.2 hours shorter on average) [39]• Mitigates memory decay	• Less convenient for daily lifestyle (65% preference for single recall) [39]• Higher participant burden	• High-accuracy requirement studies• Populations with cognitive limitations
Pictorial Recall Aids(Memory joggers for forgotten items)	Significantly changes dietary outcomes (p<0.05) [64]	• High user uptake• Effective for beverages, snacks, fruits [64]	• Requires development of context-specific aids	• Cross-cultural studies• Surveys of children (via caregivers)

Specialized Digital Tools for Dietary Assessment

Table 2: Validation Metrics of Specialized Digital Dietary Assessment Tools

Tool Name	Target Population	Validation Reference Method	Key Outcomes	Nutrient Correlation Coefficients
Intake24 [39] [65]	General population, adolescents, older adults	Interviewer-led 24HR, weighed food records	• Comparable to interviewer-led recalls [39]• Usability challenges with search terms & portion images [65]	Energy: r=0.79-0.94 [66]
R24W (Canadian) [4]	French-speaking adolescents	Interviewer-administered 24HR (USDA AMPM)	• 8.8% higher energy intake vs. interview [4]• Good acceptance with mandatory tutorial	Significant for all nutrients except % protein & thiamin (range: 0.24-0.52) [4]
Nutrition Data (Swedish) [66]	Adults with Type 1 Diabetes	Unannounced 24-hour recalls	• No significant difference in mean intakes [66]• High user acceptability (70% found easy)	Carbohydrates: r=0.94 [66]
FOODCONS 1.0 (Italian) [45]	Italian adults (18-64 years)	Interviewer-led 24HR using same software	• No significant difference in 2-day mean nutrients [45]• Good agreement for energy, carbohydrates, fiber	Good concordance at food group level [45]
PortionSize App [67]	Adults in free-living conditions	Digital photography	• Equivalent for food weight (grams) [67]• Overestimated energy intake	Overestimation in fruits, grains, dairy, protein (11-23% error) [67]

Experimental Protocols for Method Validation

Protocol 1: Direct Comparison of Text-Based vs. Image-Based PSEAs

Objective: To assess the accuracy of portion size estimation using food images (IB-PSE) versus textual descriptions (TB-PSE) [63].

Design: Cross-over study with random assignment to assessment sequences.

Participants: 40 Dutch-speaking adults (20-70 years), stratified by sex and age [63].

Intervention:

Participants consumed a pre-weighed ad libitum lunch with varied food types (amorphous, liquids, single-units, spreads)
Plate waste was weighed to ascertain true intake
Self-reported intake was collected at 2 and 24 hours post-meal using both TB-PSE and IB-PSE in counterbalanced order [63]

Measures:

Primary: Proportion of reported portion sizes within 10% and 25% of true intake
Secondary: Median relative error, agreement via Bland-Altman plots [63]

Key Finding: TB-PSE demonstrated significantly better accuracy than IB-PSE, with 31% vs. 13% of estimates within 10% of true intake [63].

Protocol 2: Validation of Web-Based 24-Hour Recall Tools

Objective: To assess the relative validity of web-based dietary recall tools against interviewer-led recalls.

Common Design Elements:

Participants complete both self-administered web-based and interviewer-administered 24-hour recalls
Non-consecutive assessment days with washout periods
Counterbalanced administration order to avoid bias [68] [4] [45]

Statistical Analysis:

Paired t-tests or Wilcoxon tests for mean intake differences
Correlation coefficients (Spearman/Pearson) for agreement
Bland-Altman plots to assess bias across intake ranges
Cross-classification analysis to determine misclassification rates [68] [66] [4]

Key Adaptation: The R24W validation study with French-Canadian adolescents included a mandatory tutorial video and was conducted partially in-school under supervision to ensure protocol adherence [4].

Protocol 3: Usability Evaluation of Digital Tools

Objective: To identify user challenges and improve the user experience of digital dietary assessment tools.

Design: Mixed-methods approach combining:

Screen observation recordings during tool use
Think-aloud techniques with real-time feedback
Post-completion usability surveys [65]

Application: This methodology applied to Intake24-NZ identified challenges with search functions, portion size estimation, and food prompts, leading to specific software improvements [65].

Visualizing Methodological Relationships in Portion Size Assessment

Diagram 1: Methodological Relationships in Portion Size Assessment. This diagram illustrates how primary estimation methodologies are implemented in digital tools and their relationship to key performance factors.

The Researcher's Toolkit: Essential Materials for Portion Size Research

Table 3: Essential Research Reagents and Solutions for Portion Size Estimation Studies

Tool/Category	Specific Examples	Research Function	Key Considerations
Validation Reference Standards	Weighed food protocol [63], Digital photography [67], Doubly labeled water [39]	Provides objective measure of true intake for method validation	Cost, participant burden, and ecological validity vary substantially
Portion Size Image Libraries	ASA24 Picture Book [63], Validated portion size photographs [65]	Visual aids for portion size estimation in web-based tools	Require cultural adaptation and validation for local foods [65]
Dietary Assessment Platforms	Intake24 [39] [65], ASA24 [63], R24W [4], FOODCONS [45]	Automated 24-hour recall administration with integrated PSEAs	Variable support for different languages, food databases, and age groups
Food Composition Databases	Canadian Nutrient File (CNF 2015) [4], Swedish Food Database [66], Italian Food Composition Database [45]	Convert reported food consumption to nutrient intakes	Regular updates required to reflect changing food supply
Usability Assessment Tools	Screen recording software, Think-aloud protocols, System Usability Surveys [65]	Identify interface problems and user challenges in digital tools	Essential for optimizing self-administered tools before large-scale deployment

The accurate estimation of portion sizes remains a complex challenge in dietary assessment, with clear trade-offs between different methodological approaches. Text-based estimation demonstrates superior accuracy for many food types, while image-based systems offer practical advantages for large-scale implementation. The emergence of validated digital tools like Intake24, R24W, and FOODCONS represents significant progress, making large-scale dietary surveys more feasible without sacrificing validity relative to interviewer-led methods.

Future research should prioritize addressing the systematic overestimation of certain food groups in digital tools, improving the accuracy of image-based estimation for amorphous foods, and developing more effective implementation strategies for progressive recall methodologies. Furthermore, as cultural adaptation emerges as a critical factor in tool effectiveness [64] [65], researchers should continue to develop and validate region-specific portion size estimation aids that account for local dietary patterns and food cultures.

For researchers designing 24-hour recall validation studies, the selection of portion size estimation methodology should be guided by the target population's technological literacy, the specific food types being assessed, and the balance between precision and participant burden required by the research context.

Accurate dietary assessment is fundamental for public health nutrition, epidemiological research, and understanding the diet-disease relationship. The 24-hour dietary recall (24HR) stands as one of the most widely used methods for capturing detailed dietary intake data at both individual and population levels. However, the validity of this method faces significant challenges when applied to low-income and multicultural populations, where cultural, socioeconomic, and cognitive factors can substantially impact reporting accuracy. This guide examines the performance of 24HR validation across these specific contexts, comparing methodological adaptations and their effectiveness in diverse population groups.

The growing ethnic and socioeconomic diversity in many countries necessitates dietary assessment tools that are both linguistically and culturally appropriate [8]. Simultaneously, cognitive considerations are particularly important when assessing vulnerable groups, including older adults and those with lower socioeconomic status, where memory constraints and limited nutrition literacy may affect recall accuracy [5] [69]. Understanding these factors is crucial for researchers aiming to collect valid dietary data in diverse population studies.

Comparative Performance of 24HR in Diverse Populations

Validation Metrics Across Different Groups

Table 1: Key Validation Findings of 24-Hour Dietary Recalls in Diverse Populations

Population Characteristic	Recall Accuracy Metric	Performance Findings	Study Details
Older Korean Adults (60+ years)	Food item match rate	71.4% overall recall; Women: 75.6%; Men: 65.2% [5]	Validation against weighed food intake [5]
Same population	Portion size estimation	Significant overestimation (mean ratio: 1.34, 95% CI: 1.33-1.34) [5]	Comparison with discreetly weighed portions [5]
Same population	Energy and nutrient intake	No statistically significant differences from weighed records [5]	Weighed intake comparison [5]
Elderly Low-SES Populations	Low energy reporting prevalence	40% of men and 60% of women classified as accurate reporters [69]	Multiple methods (24HR, FFQ, PSFFQ, MPQ) [69]
Brazilian Adults in Ireland	Food omission rate	24% of foods omitted in self-administered recalls [8]	Comparison with interviewer-led recalls [8]
Irish Adults in Ireland	Food omission rate	13% of foods omitted in self-administered recalls [8]	Comparison with interviewer-led recalls [8]

Impact of Demographic and Socioeconomic Factors

The data reveal substantial variation in 24HR accuracy across different demographic groups. Older adults demonstrate particular challenges with complete food reporting and portion size estimation [5]. Sex-based differences are also evident, with women consistently showing higher food item recall rates than men (75.6% versus 65.2%) in the Korean elderly population [5]. This suggests that sex-specific approaches may be necessary for optimal dietary assessment in older populations.

Socioeconomic status significantly impacts reporting accuracy, with low-income elderly populations showing particularly high rates of low-energy reporting [69]. Only 40% of men and 60% of women in this demographic were classified as accurate reporters, highlighting the substantial measurement challenges in these populations [69]. Cultural background also influences reporting completeness, with Brazilian participants in Ireland omitting nearly twice as many foods in self-administered recalls compared to their Irish counterparts (24% versus 13%) [8].

Methodological Protocols for Enhanced Validation

Experimental Designs for Validation Studies

Protocol 1: Weighed Food Intake Validation The most rigorous validation approach involves comparison with objectively measured food consumption. In a study with older Korean adults, researchers discreetly weighed all food items consumed during three self-served meals [5]. The following day, participants completed 24HR interviews through either in-person or online video calls. This design allowed direct comparison between actual consumption and reported intake, enabling calculation of precise match rates (71.4% overall), omission rates, and portion size estimation accuracy (mean overestimation ratio of 1.34) [5]. This protocol provides high-quality validation data but requires controlled feeding environments, making it resource-intensive.

Protocol 2: Multi-Method Comparison in Low-Income Elderly This approach utilizes multiple assessment tools to identify systematic reporting patterns. In a study with low-socioeconomic status elderly participants, researchers collected monthly 24HRs over six months, followed by administration of three different questionnaires: a traditional FFQ, a picture-sort FFQ, and a meal pattern questionnaire [69]. The Goldberg equation was applied to determine energy reporting status across all methods. This multi-method design allowed researchers to identify consistent under-reporting patterns and relate these to participant characteristics [69]. The protocol revealed that under-reporting resulted from omissions across both major food groups and discretionary energy foods.

Protocol 3: Cross-Cultural Tool Validation For multicultural adaptations, researchers expanded the Foodbook24 dietary recall tool by adding 546 foods commonly consumed by Brazilian and Polish populations and translating interfaces into relevant languages [8]. The validation consisted of three phases: (1) expansion of the food list, (2) acceptability testing using qualitative approaches, and (3) comparison studies where participants completed both self-administered (Foodbook24) and interviewer-led recalls on the same day, repeated after two weeks [8]. Correlation analyses (Spearman rank) assessed agreement for food groups and nutrients, identifying specific categories with lower concordance (e.g., potatoes and nuts).

Technological Adaptations for Diverse Populations

Table 2: Technology-Based Adaptations for 24-Hour Dietary Recalls

Adaptation Type	Specific Features	Target Populations	Reported Benefits
Multilingual Interfaces	Brazilian Portuguese, Polish translations [8]	Immigrant populations	Improved food identification and reporting accuracy [8]
Culturally Expanded Food Lists	546 additional foods for Brazilian/Polish diets [8]	Specific ethnic groups	86.5% of consumed foods available in updated list [8]
Web-Based Platforms (ASA24)	Automated self-administered 24HR [7]	General population aged 12+	Free, scalable, multiple non-consecutive day recalls [7]
Progressive Recall Methods	Multiple brief recalls throughout day [39]	Populations with memory limitations	Shorter retention intervals (15.2 hours less), more foods reported for evening meals [39]
Portion Size Visualization	Food photographs for estimation [39]	Low literacy populations	Standardized portion size assessment without requiring scales

Cognitive Considerations in Recall Methodology

Memory and Retention Interval Interventions

Cognitive limitations present significant challenges for dietary recall accuracy, particularly in older adults and those with lower educational attainment. Research indicates that memories of eating episodes begin deteriorating within an hour after consumption [39]. The progressive recall method, which involves multiple brief recalls throughout the day rather than a single 24-hour recall, has been shown to reduce retention intervals by an average of 15.2 hours [39]. This approach significantly increased the number of foods reported for evening meals (5.2 foods versus 4.2 foods in conventional 24HR) [39], suggesting that reducing memory burden through shortened retention intervals can enhance reporting completeness.

The multiple-pass 24-hour recall method was specifically designed to mitigate memory-related errors through structured interviewing techniques [49]. This method employs several distinct "passes" to prompt memory: first, a quick list of foods consumed; second, a detailed description of each food and its preparation; third, a time-based review of eating occasions; and finally, a final review for forgotten items [49]. This systematic approach helps overcome the natural limitations of human memory in recalling complex behaviors like eating.

Interview Modalities and Cognitive Support

Interview modality appears to have limited impact on reporting accuracy in some populations. Research with older Korean adults found no significant differences in accuracy between in-person and online video call interviews [5], suggesting that video-based methods may be viable alternatives when face-to-face interaction is impractical. This is particularly relevant for populations with mobility limitations or during public health emergencies such as the COVID-19 pandemic.

Cultural factors significantly influence cognitive aspects of dietary recall, including food identification, portion size estimation, and meal definition. Researchers emphasize that "multicomponent dishes that incorporate diverse ingredients may reduce recall accuracy" across different cultural contexts [5]. This is particularly relevant for Asian-style diets consisting of rice-based meals with multiple shared dishes, where individuals may struggle to recall and estimate portions of numerous component foods [5].

Figure 1: Systematic Adaptation Framework for Diverse Populations. This workflow outlines key considerations when modifying 24-hour dietary recall methods for specific population groups.

Table 3: Research Reagent Solutions for 24HR Validation Studies

Tool/Resource	Primary Function	Application Context	Key Features
Weighed Food Records	Validation reference method [5]	Feeding studies in controlled settings	Provides objective consumption data; gold standard for validation
Doubly Labeled Water (DLW)	Energy expenditure measurement [49] [70]	Identification of under-reporting	Objective biomarker for total energy expenditure comparison
Automated Self-Administered 24HR (ASA24)	Self-administered dietary recall [7]	Large-scale studies in diverse populations	Free, web-based, automated coding, multiple non-consecutive days
Foodbook24	Culturally adapted 24HR tool [8]	Multicultural population studies	Expandable food lists, multilingual interfaces, portion size images
Intake24	Progressive recall platform [39]	Populations with memory limitations	Multiple brief recalls throughout day, reduced retention intervals
Multiple-Pass Interview Protocol	Enhanced recall completeness [49]	Interviewer-administered recalls	Structured passes to prompt memory and reduce omissions

The validation of 24-hour dietary recalls in low-income and multicultural settings requires thoughtful adaptations addressing cultural, cognitive, and socioeconomic factors. Key considerations include expanding food lists to reflect cultural dietary patterns, providing multilingual interfaces, implementing methods to reduce memory burden, and acknowledging the significant challenges of accurate dietary reporting in low-income elderly populations.

The evidence suggests that while the 24HR method can provide reasonably accurate estimates of energy and nutrient intake at the group level, significant systematic errors persist at the individual level, particularly in vulnerable populations. Future methodological development should focus on enhancing portion size estimation accuracy, reducing participant burden through technological innovations, and developing standardized adaptation protocols for diverse cultural contexts. As one researcher notes, "Despite all of the challenges and flaws, the data collected using self-reported dietary assessment methods are extremely valuable" [70], highlighting the continued importance of methodological refinement in this field.

Measurement error is a pervasive challenge in scientific research, particularly in fields reliant on self-reported data like nutrition and epidemiology. These errors, which can be either random (reducing precision) or systematic (reducing accuracy), significantly threaten the validity of study findings [71] [72]. In dietary assessment, the 24-hour dietary recall (24HR) is a standard method, but it is susceptible to inaccuracies due to factors like memory lapses, estimation errors, and social desirability bias [71] [73].

The digital transformation of data collection offers promising solutions to mitigate these errors. This guide objectively compares digital platforms for 24HR collection, summarizes validation data, details experimental protocols, and provides visual workflows to help researchers select and implement the most appropriate technological tools for their studies.

Comparative Analysis of Digital Dietary Assessment Platforms

The following table summarizes key digital platforms developed for self-administered 24-hour dietary recalls, which can reduce the logistical burden and potential interviewer-induced biases of traditional methods [45].

Table 1: Comparison of Digital Self-Administered 24-Hour Dietary Recall Platforms

Platform Name	Key Features & Methodology	Reported Performance vs. Traditional 24HR	Target Population & Context
Nutrition Data (Sweden)	Web-based program with mobile view; linked to national food databases; features for carbohydrate counting and insulin tracking for diabetes management [74].	Good validity for energy and macronutrients (Spearman's r=0.79 for energy, r=0.94 for carbohydrates); high user acceptability (88% found it helpful for carb counting) [74].	Swedish adults with Type 1 Diabetes; research and clinical carbohydrate counting [74].
FOODCONS 1.0 (Italy)	Web-based software using the Multiple-Pass Method (quick list, forgotten foods, time/place, detail, review); linked to Italian food composition databases [45].	No significant difference in mean energy/nutrient intakes vs. interviewer-led 24HR; good agreement for energy, carbohydrates, and fiber (Bland-Altman analysis) [45].	Italian adult population; designed for national food consumption surveys following EU Menu guidelines [45].
ASA24 & Intake24 (International)	Automated, self-administered, web-based 24HR systems designed for large-scale data collection [73].	Error in energy estimation is associated with individual cognitive function (e.g., visual attention); poorer performance on Trail Making Test associated with greater error [73].	General population research; used in controlled feeding studies to investigate sources of measurement error [73].

Experimental Protocols for Validating Digital Tools

To ensure the validity of digital tools, rigorous comparison studies are essential. The core methodology involves a crossover design where participants complete both the digital tool and a reference method.

Core Workflow for Platform Validation

The general experimental approach for validating a self-administered digital recall against an interviewer-led recall is summarized in the workflow below.

Detailed Methodology

The validation of the FOODCONS 1.0 software provides a concrete example of a robust protocol [45]:

Participant Recruitment: A convenience sample of adults (e.g., 39 subjects) is recruited. Inclusion criteria typically involve age (18-64 years) and regular internet access. Exclusion criteria often include pregnancy, medically prescribed diets, and professional nutrition expertise to avoid biased results.
Study Design: A randomized crossover design is employed over two non-consecutive study days, including at least one weekend day to capture different eating patterns. Participants are randomized into two groups (A and B) to control for order effects.
Data Collection Procedure: On a study day, one group completes a self-administered 24HR using the digital platform (the test method). After a short interval (e.g., 3 hours), the same group completes an interviewer-led 24HR using the same software platform (the reference method). Using the same software for both methods ensures any differences are due to the mode of administration, not the underlying nutrient database. After a washout period (e.g., 15 days), the process is repeated with the groups switching the order of administration.
Data Analysis: Nutrient and food group intakes from both methods are compared. Statistical analyses include:
- Paired sample t-tests to check for significant differences in mean intakes of energy and nutrients.
- Correlation analysis (e.g., Spearman's rank) to assess the strength of the relationship between the two methods.
- Bland-Altman plots to evaluate the agreement between methods and identify any systematic bias [74] [45].

The Scientist's Toolkit: Key Reagents and Materials

For researchers designing a validation study for digital dietary tools, the following resources are essential.

Table 2: Essential Research Materials for Dietary Assessment Validation Studies

Item / Solution	Function & Application in Validation Research
Web-Based Dietary Software	The core test instrument. Platforms like FOODCONS 1.0 or ASA24 are used for both self-administered and interviewer-led recalls to ensure comparability [45].
Validated Food Composition Database	A critical backend component. The software must be linked to a comprehensive and updated national or regional food database (e.g., Sweden's, Italy's) to accurately convert reported foods into nutrient data [71] [74].
Portion Size Visualization Aids	Standardized tools like picture atlases, household measure guides, or portioning utensils. These help participants estimate quantities more accurately, reducing a major source of measurement error [45].
Cognitive Assessment Tools	Validated neuropsychological tests (e.g., Trail Making Test, Wisconsin Card Sorting Test) used to quantify participants' executive function, visual attention, and working memory, which are known to influence recall accuracy [73].
Quality Control Protocols	Standardized scripts for interviewer-led recalls, training manuals for participants using self-administered tools, and data quality checks to ensure consistency and minimize operator-dependent error [71].

Advanced Concepts: Understanding and Modeling Measurement Error

To effectively leverage technology, one must understand the error it aims to reduce. The "Classical Measurement Error" model is frequently applicable to self-reported dietary data. It posits that an error-prone measured value (X*) varies randomly around the true value (X) [72] [75]. This random error increases variability and attenuates (weakens) observed correlations toward zero.

Digital platforms can help mitigate this by standardizing probes and automating checks. In contrast, differential error occurs when the measurement error is related to the study outcome, which can cause more severe bias. An example would if individuals in a therapy group systematically under-reported unhealthy behaviors more than the control group due to perceived experimenter demand [72] [76]. Digital self-administration, by removing the direct interaction with an interviewer, can potentially reduce this type of social desirability bias.

Digital platforms for 24-hour dietary recall present a valid and efficient alternative to traditional interviewer-led methods. Evidence from studies on tools like FOODCONS 1.0 and Nutrition Data demonstrates they can achieve good agreement with reference methods while offering advantages in cost, scalability, and reduced participant burden [74] [45].

The choice of platform should be guided by the target population, the specific dietary components of interest, and the required level of precision. Researchers should consider factors such as the incorporated food composition database, user interface design, and the cognitive demands placed on participants. As technology evolves, the integration of digital tools with novel methods to account for individual cognitive differences and systematic biases will be crucial for further enhancing the accuracy of dietary assessment in research.

Comparative Analysis of Validation Studies from Pediatrics to Geriatrics

Accurately measuring dietary intake in pediatric populations is essential for nutrition research, yet it presents significant methodological challenges. This review evaluates the validity of the 24-hour dietary recall method for estimating energy intake in children by comparing it against total energy expenditure measured via the doubly labeled water technique, the established gold standard. The analysis synthesizes findings from key validation studies, examining how factors such as the number of recall days, the child's age, and specific interview protocols influence accuracy. Evidence indicates that while multiple days of 24-hour recalls can provide valid estimates of group-level energy intake, their precision for assessing individual intake is limited. This review consolidates the strengths and limitations of this dietary assessment tool to guide its appropriate application in pediatric research.

In the study of diet-disease relationships and energy balance in children, the accurate assessment of dietary intake is a fundamental prerequisite. Among the various available methods, the 24-hour dietary recall (24HR) is widely used in large-scale studies and national nutrition surveys due to its relatively low participant burden and cost [11]. However, as a self-reported method, its validity is perpetually in question, particularly in pediatric populations where memory, attention span, and comprehension abilities are still developing.

The principle of energy physiology states that in weight-stable individuals, total energy expenditure (TEE) should equal energy intake (EI). The doubly labeled water (DLW) method provides a non-invasive, precise measure of TEE under free-living conditions and is considered the gold standard for validating dietary assessment methods [11] [77]. Unlike food records or other dietary tools that also rely on self-report, the DLW method is objective and not subject to correlated reporting errors. Therefore, comparing estimated energy intake from 24-hour recalls to TEE from DLW offers the most robust approach for validating the recall method.

This review aims to critically examine studies that have employed this validation framework in pediatric populations, focusing on the performance of 24-hour recalls across different ages, the impact of study protocol variations, and the method's suitability for both group-level and individual-level dietary assessment.

Methodological Approaches in Key Validation Studies

The validation of 24-hour recalls against TEE involves meticulous protocols for both dietary assessment and energy expenditure measurement. The following table summarizes the design and key characteristics of pivotal studies in children:

Table 1: Overview of Key Pediatric Validation Studies Comparing 24-Hour Recall with Doubly Labeled Water

Study (Population)	Sample Size & Age	24-Hour Recall Protocol	Doubly Labeled Water Protocol	Key Comparison Metrics
Montgomery et al. (2005) [12]	63 childrenMedian: 6 years	A single 24-hour multiple pass recall (24h MPR).	TEE measured over a specific period (methodology standard for DLW studies).	Mean difference between EI and TEE (bias); Limits of agreement.
Johnson et al. (1996) [77]	24 childrenAges: 4-7 years	Three multiple-pass 24-hour recalls over a 14-day period.	TEE measured over 14 days under free-living conditions.	Paired t-test of mean 3-day EI vs TEE; Pearson correlation for individual measures.
Lytle et al. (1993) [78]	49 childrenAge: 8 years (3rd grade)	24-hour recall assisted by parent-completed food records.	Direct observation (school meals) + parent observation & recording (home).	Paired t-tests; Pearson/Spearman correlations; Percentage agreement on food items.

A critical examination of these methodologies reveals several key components:

The Doubly Labeled Water Technique: The DLW method involves oral administration of a dose of water containing stable, non-radioactive isotopes of hydrogen (²H) and oxygen (¹⁸O). Urine samples are then collected over a period of time (typically 7-14 days). The differential elimination rates of the two isotopes from the body are used to calculate the rate of carbon dioxide production (rCO₂), which is then converted to TEE using a standardized equation [11]. This process requires specialized isotopic analysis equipment and expertise.
The Multiple-Pass 24-Hour Recall: This structured interview technique is designed to enhance memory and reduce omission. It typically involves five distinct passes: 1) a quick list of all foods and beverages consumed; 2) a forgotten foods probe; 3) the collection of detailed descriptions and amounts; 4) a time and occasion review; and 5) a final probe [77]. This method has become the standard in many large-scale surveys.
Assisted Recall and Observation: In younger children or complex studies, recalls may be assisted by tools such as food records kept by parents [78] or, more recently, by images of meals captured on mobile devices [79]. These aids are intended to provide a reference to improve the accuracy of the child's recall.

The following diagram illustrates the typical workflow for a validation study comparing 24-hour recall to the doubly labeled water method.

Comparative Analysis of Validation Data

Synthesizing data from key studies allows for a critical evaluation of the 24-hour recall's performance. The following table presents a quantitative summary of validation outcomes:

Table 2: Summary of Validation Results from Pediatric Studies

Study (Population)	Mean Difference (EI - TEE)	Correlation between EI & TEE	Conclusion on Validity
Montgomery et al. (2005)6-year-olds [12]	Group bias: +250 kJ/dayOverreporting: 7% (girls), 0.9% (boys)*Wide limits of agreement:* (-2880, 2380 kJ/d)	Not specified in the abstract.	Group level: "Inaccurate at individual level." Estimates become less inaccurate with age.
Johnson et al. (1996)4-7 year-olds [77]	No significant difference for the group (p=0.65).	Not statistically significant (r=0.25, p=0.24).	Group level: "Sufficient to make valid group estimates."Individual level: "Not precise for individual measurements."
Lytle et al. (1993)8-year-olds [78]	No significant difference for % energy from fat or sodium; difference in total energy.	Spearman correlations ranged from .45 to .79 for nutrients.	Method is valid for group comparison in children as young as 8.

Interpretation of Key Findings

Group vs. Individual Validity: A consistent theme across studies is that multiple days of 24-hour recalls can provide a valid measure of mean energy intake for a group of children. This is evidenced by the lack of significant difference between mean estimated EI and TEE in the study by Johnson et al. [77]. However, the low and non-significant correlation at the individual level, coupled with the wide limits of agreement reported by Montgomery et al. [12], clearly demonstrate that the method is not suitable for classifying the intake of individual children (e.g., for diagnosing over- or under-consumption).
Impact of Age and Cognitive Development: The ability of children to accurately recall and report their food intake appears to improve with age. Montgomery et al. noted that the bias in their study of 6-year-olds was lower than in previous studies of pre-school children, suggesting that "estimates of EI become less inaccurate as children age" [12]. Lytle et al. specifically concluded that the method is valid for group comparisons in children as young as 8 years old [78].
The Importance of Multiple Recall Days: Relying on a single 24-hour recall introduces significant random error due to day-to-day variation in a child's diet. A study in an urban Mexican population, while not using DLW, highlighted that using three 24-hour recalls instead of one dramatically improved the estimation of usual intake distributions and reduced misclassification of nutrient inadequacy [10]. For example, the estimated prevalence of folate inadequacy in preschool children dropped from 30% with one day to 3.7% with three days [10]. This underscores that multiple non-consecutive recalls are essential for capturing habitual intake.

The Researcher's Toolkit: Essential Reagents and Materials

Successful execution of a DLW-validated 24-hour recall study requires specific materials and methodological tools. The following table details key components of the research toolkit.

Table 3: Essential Research Reagents and Materials for DLW-Recall Validation Studies

Item / Solution	Specifications / Function	Application in Research
Stable Isotopes	¹⁸O-labeled water (e.g., 10% enriched) and ²H-labeled water (e.g., 99.9% enriched) [11].	Administered orally to subjects to label the body water pool and initiate the TEE measurement period.
Isotope Ratio Mass Spectrometer	High-precision analytical instrument (e.g., Finnigan Delta Plus) [11].	Used to analyze the isotopic enrichment (²H and ¹⁸O) in collected urine samples.
Structured Interview Protocol	Multiple-Pass 24-Hour Recall format [77].	A standardized interview script to systematically guide the child/parent through the recall process, minimizing memory lapse.
Portion Size Estimation Aids	Standard household measures, food models, photographs, or digital interfaces.	Critical tools to help children and parents visually conceptualize and report the amounts of food consumed.
Nutrition Analysis Software	Culture-specific databases (e.g., CAN-Pro 4.0 for Korean diets [11]).	Converts the reported foods and portions into estimated energy and nutrient intakes.
Informed Assent Documents	Age-appropriate, easy-to-understand forms, potentially using comics or visuals [80].	Ethical requirement to ensure the child participant understands the research procedures and agrees to participate.

The validation of 24-hour dietary recalls against total energy expenditure measured by doubly labeled water provides a robust framework for assessing the utility of this common dietary assessment tool in pediatric research. The collective evidence indicates that the 24-hour recall, particularly when employing multiple-pass methods and administered over several non-consecutive days, can yield sufficiently accurate data for evaluating mean energy and nutrient intakes at the group level in children as young as 8 years old.

However, researchers must be acutely aware of the method's significant limitations. The wide limits of agreement and poor individual-level correlations mean that 24-hour recalls are not valid for assessing the intake of a single child or for precise classification within a population. The choice to use this method must therefore be guided by the specific research question and the required level of precision. Future advancements, such as the integration of image-assisted recalls and automated self-administered tools [79], hold promise for reducing measurement error and enhancing the feasibility of collecting a larger number of recall days, thereby improving the accuracy of estimating usual intake in pediatric populations.

Aging is a natural, gradual, and irreversible process associated with disruptions in homeostasis, causing several unfavorable changes in body composition and metabolism [81]. These physiological shifts present unique challenges for nutritional science, particularly for validating dietary assessment methods like the 24-hour recall. As global populations age, understanding how aging physiology affects the accuracy of self-reported dietary intake becomes crucial for research quality and public health policy. This review synthesizes current evidence on how age-related changes in body composition and energy metabolism impact the validation of 24-hour dietary recalls across different adult age groups, providing researchers with methodological insights for conducting age-stratified dietary validation studies.

The validation of dietary assessment tools is typically more complex in older adults compared to younger populations due to a constellation of age-related factors. These include changes in body composition (increased fat mass and decreased lean muscle mass), declining energy expenditure, and potential cognitive changes that affect memory recall [82] [81] [83]. Furthermore, the high prevalence of multimorbidity in older populations adds layers of complexity to dietary validation studies. Understanding these physiological factors is essential for designing robust validation protocols that account for age-specific characteristics rather than simply applying methods developed for younger populations.

Physiological Changes in Aging with Validation Implications

Body Composition Alterations

Age-related body composition changes follow a predictable pattern that directly impacts dietary assessment and validation protocols. The most significant changes include:

Sarcopenia: Progressive loss of skeletal muscle mass and strength, with muscle mass peaking around age 30 and declining by 20-40% by age 70 [81]. This change is particularly pronounced in women [81]. The Health ABC study demonstrated that both fat and lean mass independently contribute to mortality risk, with loss of thigh muscle area associated with higher mortality [82].
Adipose Tissue Redistribution: Increases in total body fat mass with a preferential redistribution toward abdominal viscera and infiltration into muscle tissue (myosteatosis) [82]. One study of older Sri Lankans found those ≥70 years had 2.17 times higher odds of high body fat mass compared to those aged 60-64, even after controlling for confounders [84].
Hydration Changes: Aging is associated with decreased muscle water content, which can affect both body composition measurements and potentially fluid intake reporting [81].

These body composition changes are not merely anthropometric concerns; they directly influence metabolic rate, nutrient partitioning, and energy requirements—all fundamental to dietary validation methodologies.

Metabolic and Energetic Adaptations

The aging process brings significant changes to energy metabolism that must be considered in validation studies:

Declining Energy Expenditure: Resting metabolic rate decreases with age due primarily to loss of metabolically active tissue (muscle) and changes in hormonal status [83]. This reduction in energy requirements creates challenges for using traditional energy intake plausibility cut-offs developed for younger populations with higher energy needs.
Altered Nutrient Partitioning: With advancing age, nutrients are differentially allocated between metabolic pathways, with a tendency toward increased fat storage and reduced muscle protein synthesis [81].
Disrupted Energy Homeostasis: The precision of energy intake regulation may diminish with age, potentially leading to more day-to-day variation in energy intake that complicates the estimation of usual intake from single 24-hour recalls [83].

Table 1: Key Age-Related Physiological Changes Affecting 24-Hour Recall Validation

Physiological Parameter	Young Adults (18-35 yrs)	Middle-Aged (36-65 yrs)	Older Adults (65+ yrs)	Validation Implications
Skeletal Muscle Mass	Stable at peak levels	Initial decline begins (∼3-8% per decade)	Accelerated loss (20-40% from peak)	Alters energy requirement estimates for plausibility checks
Fat Mass Distribution	Stable with typical distribution	Beginning of central adiposity	Visceral fat accumulation; muscle fat infiltration	Affects metabolic rate prediction equations
Resting Metabolic Rate	Highest level relative to body weight	Moderate decline (∼1-2% per decade)	Significant decline	Requires age-adjusted cut-offs for under/over-reporting detection
Body Water Content	Optimal hydration	Moderate decline in intracellular water	Significant decrease in muscle water content	Complicates fluid intake assessment and body composition measurement
Energy Requirement Variability	Low day-to-day variability	Moderate variability	High variability due to health fluctuations	Increases required number of recall days for usual intake estimation

Methodological Considerations for Validation Studies Across Age Groups

Reference Method Selection

Choosing appropriate reference methods for validating 24-hour recalls in different age groups requires careful consideration of age-related physiological factors:

Doubly Labeled Water (DLW): Considered the gold standard for measuring energy expenditure in free-living conditions [83]. However, the assumption of energy balance during measurement may be less valid in older populations experiencing unintentional weight changes. Recent research has proposed novel methods comparing reported energy intake (rEI) to measured energy intake (mEI) calculated as measured energy expenditure (mEE) plus changes in energy stores, which may better account for weight instability in older adults [83].
Controlled Feeding Studies: Provide the most precise measure of actual intake but may have limited ecological validity, especially for older adults with specific dietary habits [50]. The study with older Korean women demonstrated that interviewer-administered 24-hour recalls accurately reported 95% of foods consumed, though sauces and kimchi were frequently underreported [50].
Biomarkers: Recovery biomarkers (e.g., nitrogen, potassium) provide objective measures of specific nutrient intake but are limited in number and expensive to implement in large studies, particularly across multiple age groups.

Age-Stratified Validation Protocols

Validation protocols should be adapted for different age groups to account for physiological differences:

Younger Adults: Standard protocols using predicted energy expenditure equations generally perform adequately in this population with stable body composition and higher physical activity levels [4].
Middle-Aged Adults: Increasing variability in body composition and the emergence of chronic conditions necessitate more sophisticated approaches. The use of non-consecutive recall days (including both weekdays and weekends) becomes increasingly important [85].
Older Adults: Require special considerations including [84] [83] [50]:
- Age-adjusted physical activity levels for Goldberg cut-offs
- Accounting for weight stability issues in reference method selection
- Potential cognitive screening to ensure recall capability
- Consideration of multimorbidity and medication use
- Potentially longer data collection periods to account for higher day-to-day variability

Table 2: Comparison of Validation Study Findings Across Age Groups

Study Population	Validation Method	Key Findings	Age-Specific Considerations
Adolescents (12-17 years) [4]	Web-based vs interviewer-administered 24-hour recall	R24W showed 8.8% higher energy intake vs interviewer-administered recall; Significant differences for saturated fat (25.2% higher)	Rapid growth and development affect energy needs; Higher day-to-day intake variability
Young & Middle-Aged Adults [85]	Multiple 24HR forms (2-3 days) vs 28 days as reference	Non-consecutive days more accurate than consecutive; Including weekend day crucial	Form of 24HR (consecutive vs non-consecutive) significantly affects accuracy
Older Adults (Korean Women) [50]	Controlled feeding vs interviewer-administered 24-hour recall	95% match rate for foods; 43% of portion sizes underreported; Fat and sodium underreported	Traditional Korean meal structure (rice, soup, kimchi, banchans) presents specific recall challenges
Older Adults (50-75 years) [83]	Dietary recalls vs DLW and energy balance method	50% under-reporting rate; Novel mEI method identified more over-reporting than traditional mEE method	Weight instability common; Traditional methods may misclassify those in negative energy balance

Experimental Protocols for Age-Stratified Validation

Doubly Labeled Water Protocol for Energy Expenditure Measurement

The DLW method provides the most accurate measure of total energy expenditure in free-living populations across age groups [83]:

Baseline Sample Collection: Collect pre-dose urine sample after an overnight fast.
Dose Administration: Orally administer a dose comprising 1.68 g per kg of body water of oxygen-18 water (10.8 APE) and 0.12 g per kg of body water of deuterium oxide water (99.8 APE).
Post-Dose Sampling: Collect urine samples within 3-4 hours post-dose and again twice 12 days following ingestion using the two-point protocol.
Isotope Analysis: Analyze samples using isotope ratio mass spectrometers (Delta V IRMS and Delta Plus IRMS Thermo Fisher).
Energy Expenditure Calculation: Calculate carbon dioxide production (rCO₂) using the equation from Speakman et al. (2021), assuming a respiratory quotient of 0.86, then convert to total daily energy expenditure using the Weir equation.

This protocol requires modification for older adults, who may have impaired renal function or fluid balance issues that affect isotope elimination kinetics.

Controlled Feeding Study Protocol with 24-Hour Recall Validation

The controlled feeding protocol provides the strongest design for validating food and nutrient intake reporting across age groups [50]:

Participant Housing: House participants in a controlled environment for 5 days with all meals provided.
Dietary Provision: Provide 3 meals and 2-3 snacks per day, with all ingredients measured to the nearest 0.1 g during preparation and foods measured to the nearest 1 g before serving.
Compliance Monitoring: Research staff should directly monitor meal consumption during mealtimes and check returned food trays to ensure compliance.
Randomized Recall Administration: Randomly select participants for interviewer-administered 24-hour recalls on one of the feeding days, without prior notice of the specific recall day.
Structured Interview Protocol: Conduct face-to-face interviews using the Multiple-Pass Method, adapted to include culturally specific frequently forgotten foods and using appropriate portion size estimation aids.
Data Analysis: Compare reported versus actual food items (categorized as matches, exclusions, or intrusions) and portion sizes (categorized as corresponding [≤10% error], overreported, or underreported).

This protocol was successfully implemented with older Korean women, demonstrating high accuracy for most foods though revealing specific cultural challenges with items like sauces and kimchi [50].

Figure 1: Interrelationship Between Aging Physiology and Validation Challenges

The Scientist's Toolkit: Essential Reagents and Materials

Table 3: Essential Research Materials for Age-Stratified Dietary Validation Studies

Item	Specification	Application in Validation Research
Doubly Labeled Water Kits	¹⁸O and ²H isotopes with precise dosing materials	Gold-standard measurement of total energy expenditure in free-living conditions across age groups [83]
Bioelectrical Impedance Analyzers	Multi-frequency devices with age-specific equations	Assessment of body composition changes (fat mass, lean mass, body water) in field settings [84]
Dual-Energy X-ray Absorptiometry (DXA)	Fan-beamed systems with standardized calibration	Precise measurement of body composition compartments (fat mass, lean mass, bone mineral density) in controlled settings [82] [81]
Stadiometers and Calibrated Scales	Digital precision to 1 mm and 0.1 kg respectively	Accurate anthropometric measurements for metabolic prediction equations [83] [50]
Portion Size Estimation Aids	Food models, measuring guides, concentric circles, rectangular grids, rulers [50]	Improved accuracy of portion size reporting in 24-hour recalls across age groups
Structured Interview Protocols	Multiple-Pass Method adapted for cultural and age-specific factors [50]	Standardized administration of 24-hour recalls to minimize interviewer effects
Dietary Analysis Software	Country-specific nutrient databases (e.g., CAN-Pro 5.0, DIETA 6.0) [81] [50]	Conversion of food intake data to nutrient intake values for validation against reference methods
Quantitative Magnetic Resonance (QMR)	EchoMRI systems with precision <0.5% CV for fat mass [83]	Highly precise measurement of body composition changes for energy intake calculation via energy balance principle

The validation of 24-hour dietary recalls is significantly impacted by age-related physiological changes in body composition and metabolism. Researchers must employ age-stratified validation approaches that account for the specific characteristics of each life stage—from the growth and development phases of adolescence to the sarcopenia and metabolic decline of older age. Key considerations include using appropriate reference methods adjusted for age-specific factors, implementing validation protocols that address the particular challenges of each age group, and interpreting results in the context of known physiological changes. Future research should focus on developing standardized, age-adjusted cut-offs for identifying misreporting and establishing validation protocols specifically designed for older populations with multiple chronic conditions. As dietary assessment increasingly informs public health policy and clinical guidelines across the lifespan, robust age-specific validation methods become essential for generating reliable evidence.

Accurate dietary assessment is a cornerstone of nutritional epidemiology, essential for investigating diet-disease relationships and informing public health guidelines. The 24-hour dietary recall is among the most widely used methods for capturing individual intake in population studies. However, its susceptibility to measurement error, particularly under-reporting of specific nutrients, poses a significant threat to the validity of research findings. This case study examines the consistent and significant under-reporting of two nutrients—sucrose and vitamin C—in adult populations. We explore the magnitude of this under-reporting, the advanced methodologies used to detect it, and the implications for research and public health practice.

Quantitative Evidence of Under-Reporting

Data from validation studies reveal a consistent pattern of under-reporting for sucrose and vitamin C when assessed via self-report methods like 24-hour recalls. The following table summarizes key findings from the literature.

Table 1: Documented Under-Reporting of Sucrose and Vitamin C in Dietary Recalls

Nutrient	Reported Magnitude of Under-Reporting	Study Population & Method	Key Findings
Sucrose	Approximately -20% [86]	140 subjects (15-57 years); Recalled vs. observed intake [86]	Significant discrepancy for sucrose; validity unsatisfactory at individual level but satisfactory for groups [86].
Sucrose	Not directly quantifiable as % under-reporting, but self-report contradicted biomarker [87]	Analysis of Women's Health Initiative; Self-report vs. biomarker-calibrated intake [87]	Self-reported sugars were inversely associated with type 2 diabetes risk, but the biomarker-calibrated estimates showed no association, revealing differential misreporting [87].
Vitamin C	-16% [86]	140 subjects (15-57 years); Recalled vs. observed intake [86]	One of the largest discrepancies among studied nutrients [86].
Vitamin C	Overall intake declined by 23% (1999-2018) [88]	US National Health and Nutrition Examination Survey (NHANES) 24-hour recall data [88]	Mean vitamin C consumption fell from 97 mg/d to 75 mg/d. The proportion of the population with intake below the Estimated Average Requirement (EAR) increased from 38.3% to 47.4% [88].

The evidence indicates that under-reporting is not random. Sugars from solid foods, which are cognitively harder to report, are associated with higher levels of measurement error compared to sugars from beverages [87]. Furthermore, population-level data suggests a concerning decline in vitamin C intake, which may be exacerbated by under-reporting [88].

Experimental Protocols for Validation

To quantify the measurement error inherent in self-reported data, researchers employ rigorous validation studies using objective reference measures. The following experiments detail protocols used to validate intake of sucrose and general energy (a proxy for overall diet reporting).

Biomarker Validation for Sugars Intake (24uSF)

A controlled feeding study was designed to investigate the performance of 24-hour urinary sucrose and fructose (24uSF) as a predictive biomarker for total sugars intake [87].

Study Population: 98 participants, aged 18-70 years, from the US [87].
Study Design: A 15-day controlled feeding study where participants consumed their usual diet, replicated by researchers based on detailed baseline food records [87].
Data Collection:
- All foods were prepared in a metabolic kitchen, and uneaten items were weighed back to calculate actual consumption.
- Participants collected eight non-consecutive 24-hour urine samples throughout the study, which were analyzed for sucrose and fructose excretion [87].
Statistical Analysis: A linear mixed model was used to regress the log 24uSF biomarker on log total sugars intake, explaining 56% of the biomarker variance. The equation was inverted to solve for total sugars intake, creating a calibrated biomarker [87].
Key Outcome: Total sugars intake was the strongest predictor of the 24uSF biomarker, confirming its utility as an objective measure for validating self-reported sugars intake [87].

Energy Intake Validation Using Doubly Labeled Water

The gold standard for validating self-reported energy intake (EI) is the Doubly Labeled Water (DLW) method, which measures total energy expenditure (TEE) in free-living conditions [89].

Underlying Principle: In weight-stable individuals, energy intake is equivalent to total energy expenditure over time. Significant disparity between self-reported EI and TEE from DLW indicates under- or over-reporting [89].
Study Example: A UK study assessed the validity of an online self-reported 24-hour recall system (Intake24) against DLW in 98 adults [89].
DLW Protocol: Participants ingested a dose of water containing stable, non-radioactive isotopes of hydrogen (²H) and oxygen (¹⁸O). Urine samples were collected over 9-10 days. The difference in elimination rates of the two isotopes is used to calculate carbon dioxide production and, thus, TEE [89].
Findings: Participants under-reported energy intake by 25% on average using the online system, a level of under-reporting consistent with traditional interviewer-led recalls [89].

The following diagram illustrates the workflow for validating a self-reported dietary assessment tool against an objective reference method like DLW or a urinary biomarker.

The Scientist's Toolkit: Key Research Reagents & Materials

The following table details essential materials and methods used in the experimental protocols cited in this case study.

Table 2: Essential Reagents and Materials for Dietary Validation Research

Tool/Reagent	Function & Application in Validation
24-Hour Urinary Sucrose & Fructose (24uSF)	A predictive biomarker used to objectively assess total sugars intake. It is measured from 24-hour urine collections in controlled feeding studies to calibrate and validate self-reported sugar consumption [87].
Doubly Labeled Water (DLW)	The gold standard reference method for measuring total energy expenditure (TEE) in free-living individuals. It is used to validate the accuracy of self-reported energy intake by comparing EI to TEE [89].
Automated Multiple-Pass Method (AMPM)	A structured, computerized interview protocol for 24-hour recalls. Developed by the USDA, it uses multiple "passes" to minimize forgotten foods and improve portion size estimation, serving as a more robust standard for comparing other self-report tools [4] [90].
Controlled Feeding Study	The definitive study design for biomarker validation. All food is provided by a metabolic kitchen, allowing for precise measurement of "true" intake, which is then compared to biomarker levels or self-reports [87].
Para-Amino Benzoic Acid (PABA)	A compound used to verify the completeness of 24-hour urine collections, ensuring the validity of urinary biomarker measurements like 24uSF [87].

The significant under-reporting of sucrose and vitamin C is not merely a methodological footnote; it has profound implications for nutritional science and public health.

Distorted Diet-Disease Associations: Differential misreporting can create spurious associations or mask real ones. As seen in the Women's Health Initiative, self-reported sugars showed a protective effect against type 2 diabetes, while the biomarker-calibrated data showed no association, fundamentally changing the interpretation [87].
Compromised Monitoring and Guidelines: Reliable population intake data is crucial for monitoring nutritional status and developing evidence-based policies. Systematic under-reporting of vitamin C, coupled with actual declining intakes, can lead to an underestimation of nutritional inadequacy and misdirect public health efforts [88] [91].
Need for Advanced Methodologies: This evidence underscores the necessity of moving beyond simple self-report in key areas of research. The use of recovery biomarkers (like 24uSF and DLW) and concentration biomarkers (like serum vitamin C) is critical for validating dietary data and correcting for measurement error in diet-disease models [87] [88].

In conclusion, this case study demonstrates that the under-reporting of sucrose and vitamin C in adult populations is a significant and persistent challenge. While 24-hour recalls remain a valuable tool for capturing dietary patterns, their limitations must be acknowledged and addressed. Integrating objective biomarkers into research designs is no longer a luxury but a necessity for producing reliable data that can truly inform our understanding of the role of diet in health and disease.

This guide objectively compares the performance of the 24-hour dietary recall (24HR) method across different age groups by synthesizing findings from key validation studies. The data is critical for researchers, scientists, and drug development professionals to understand the limitations and appropriate applications of this dietary assessment tool in diverse populations.

Validation in Children and Adolescents

Table 1: Summary of Validation Studies in Children and Adolescents

Age Group	Study & Context	Validation Method	Key Findings on 24HR Accuracy	Notable Challenges
Children (8 years)	Lytle et al. (1993), USA [78]	24HR assisted by food records vs. direct observation	No significant difference in % energy from fat; Energy intake differences; 77.9% food item match rate [78]	Requires assistance (parental food records); Differences in energy intake recall [78]
Adolescents (10-14 years)	Burkina Faso Study [3]	Self-administered 24HR vs. Observed Weighed Records (OWR)	Underestimation of energy (mean ratio 0.88-0.92); Energy intake equivalent within 15% bound for 12-14-year-olds [3]	High omission rate (50%), especially snacks, fruits, beverages; Lower accuracy in 10-11-year-olds [3]

Validation in Adults and Older Adults

Table 2: Summary of Validation Studies in Adults and Older Adults

Age Group	Study & Context	Validation Method	Key Findings on 24HR Accuracy	Notable Challenges
Older Adults (60+ years)	Kim et al. (2025), Korea [5]	Interviewer 24HR (in-person/online) vs. Weighed food intake in feeding study	Recalled 71.4% of foods consumed; Overestimated portion sizes (mean ratio: 1.34); No significant difference for energy/macronutrients [5]	Portion size overestimation; Lower food item recall in men (65.2%) vs. women (75.6%) [5]
Advanced Age (80+ years)	Newcastle 85+ & LiLACS NZ [92]	Two 24-hour multiple pass recalls (24hr-MPR) on non-consecutive days	Method was feasible and acceptable; Mean completion time: 22-45 minutes; 83-94% felt it reflected usual intake [92]	Longer completion time for Māori participants; Protocol adaptations needed for indigenous contexts [92]

Detailed Experimental Protocols

The validity of the 24-hour recall is typically tested against a "gold standard" method, with data analyzed to quantify reporting accuracy.

Weighed Food Intake Protocol (Gold Standard)

Procedure: Researchers discreetly weigh all food and beverages served to participants before and after each eating occasion during a controlled feeding study. This provides a precise measure of actual consumption [5].
Application: This method was used in the validation study among free-living older Korean adults, where participants consumed three self-served meals with weighed intake [5].

Observed Weighed Record (OWR) Protocol

Procedure: Trained research assistants accompany participants throughout the day, weighing all foods and drinks consumed at each eating episode using digital scales. They also record the time, place of consumption, and recipe details when possible [3].
Application: This protocol was implemented for adolescents in Burkina Faso, where observers weighed foods both at home and in school settings [3].

24-Hour Dietary Recall Interview Protocol

Multiple-Pass Method: This structured interview technique is designed to enhance memory and reduce omission. The Automated Self-Administered 24-hour (ASA24) assessment tool, widely used in research, utilizes this method [7].
- Pass 1 (Quick List): The respondent gives an uninterrupted list of all foods and drinks consumed the previous day.
- Pass 2 (Forgotten Foods): The interviewer probes with specific questions about categories of easily forgotten foods (e.g., snacks, sweets, beverages).
- Pass 3 (Time & Occasion): The respondent recalls the time and eating occasion for each food.
- Pass 4 (Detail Cycle): Detailed probing for portion sizes (aided by photographs, household measures, or standard models) and further food descriptions.
- Pass 5 (Final Review): A final review of all collected information for any corrections or additions [45].

Diagram 1: Workflow for validating 24-hour dietary recall (24HR) methods, showing the comparison of test methods against gold standards and the key metrics used for analysis.

The Scientist's Toolkit: Essential Materials for 24HR Validation

Table 3: Key Research Reagents and Materials for 24HR Validation Studies

Tool/Reagent	Primary Function	Application Example
Digital Food Scales	Precisely weigh food items before and after consumption to establish "true" intake [3].	Used in Observed Weighed Records (OWR) in Burkina Faso adolescent study [3].
Standardized Portion Aids	Assist participants in estimating and reporting the volume of food consumed [5].	Standard bowls/plates provided to adolescents; food images in Foodbook24 and ASA24 [3] [8].
Web-Based 24HR Platforms	Automate the 24HR process for self-administration, standardize data collection, and reduce interviewer burden [7].	ASA24 (US), Foodbook24 (Ireland), FOODCONS (Italy), and Intake24 (UK) [8] [7] [45].
Validated Food Composition Databases	Convert reported food consumption into estimated nutrient intakes [8].	CoFID (UK), country-specific databases (Brazil, Poland); integrated into tools like Foodbook24 [8].
Multilingual & Culturally Adapted Food Lists	Ensure the tool is relevant and accessible to diverse ethnic and linguistic groups [8].	Foodbook24 expansion with 546 foods, translated into Polish and Portuguese [8].

Discussion and Key Takeaways

Age-Specific Validity: The 24HR method shows varying degrees of accuracy across the lifespan. While it can be valid for group-level comparisons even in children as young as 8 years when assisted, independent recall in younger adolescents (10-11 years) is less accurate. In older adults, the method remains feasible but is prone to portion size overestimation [5] [3] [78].
Technological Evolution: Web-based and self-administered tools (ASA24, Foodbook24, FOODCONS) demonstrate strong correlation with interviewer-led recalls for many nutrients and food groups, offering a cost-effective and scalable alternative without significantly compromising data quality [8] [45].
Critical of Inclusivity: Successful dietary assessment requires tools that are culturally and linguistically adapted. Expanding food lists and offering translations are necessary steps to collect accurate data from and ensure the representation of diverse ethnic minorities in nutritional research [8]. This aligns with broader regulatory guidance encouraging the enrollment of diverse populations in clinical studies [93].

The 24-hour dietary recall (24HR) serves as a foundational tool in nutritional epidemiology, enabling the assessment of food and nutrient intake for population-level surveillance and research into diet-disease relationships [85]. However, as a method reliant on human memory and perception, it is inherently subject to measurement error. Quantifying the precise limits of agreement between recalled and actual intake is therefore critical for interpreting dietary data accurately, especially within comparative studies across different age groups. This guide objectively examines the performance of various 24HR methodologies against benchmark measures, presenting experimental data that delineate their accuracy gaps.

The core challenge in dietary assessment lies in the fact that dietary intake is a highly variable behavior, influenced by day-of-week and seasonal effects [85]. Multiple, repeated 24HR collections can yield reliable estimates of usual intake, but this is often infeasible due to staffing, equipment, financial, and temporal constraints [85]. This has spurred the development of innovative approaches, including statistical correction methods and technology-assisted tools, all aimed at bridging the gap between recalled and actual consumption.

Experimental Protocols for 24HR Validation

To quantify the accuracy gaps in dietary recall, researchers employ rigorous experimental designs that compare reported intake to a known reference. The following protocols are central to generating the validation data presented in this guide.

Controlled Feeding Studies

Controlled feeding studies represent the gold standard for validation, as they provide a precise measure of "observed intake." In this design:

Participant Management: Healthy adult participants are recruited to attend a research center on multiple separate days to consume provided meals (e.g., breakfast, lunch, and dinner) [28]. The foods, beverages, and their exact amounts are meticulously measured and recorded by research staff, often to the nearest 0.1 gram during preparation and 1 gram before serving [47].
Unobtrusive Documentation: The consumption period is documented unobtrusively to avoid influencing participant behavior.
Recall Collection: Following a predetermined interval (typically the next day), participants complete a 24HR using the method under investigation [28]. This process is often repeated in a randomized crossover design, where each participant tests multiple 24HR methods, thus allowing for within-subject comparisons [28].
Data Comparison: The reported foods, portion sizes, and calculated nutrient intakes from the 24HR are then compared against the known values of the provided meals.

The Criterion of Multiple Recalls

An alternative to controlled feeding uses the average of a high number of repeated 24HRs as a reference value for an individual's usual intake. One seminal study defined the average of 28 recall days, collected as 7 consecutive days in each of the four seasons, as the "gold standard" [85]. The performance of shorter protocols (e.g., 2 or 3 non-consecutive days) corrected by the National Cancer Institute (NCI) method is then evaluated against this benchmark to determine how well they approximate long-term usual intake [85].

Quantitative Comparison of Recall Accuracy

The accuracy of 24HR methods is multi-faceted, encompassing the correct identification of foods consumed and the precise estimation of their amounts. The following tables summarize key performance data from recent validation studies.

Table 1: Food Item and Portion Size Reporting Accuracy

Study Population	Validation Method	Food Item Accuracy	Portion Size Accuracy	Key Omitted/Underreported Items
Older Korean Women (n=22) [47]	Controlled Feeding	95% Match Rate (Foods correctly reported)	24% Corresponding (≤10% error)43% Underreported	Sauces (most frequent omission)Kimchi (frequent underreporting)
Irish, Brazilian, Polish Adults (n=349 foods) [8]	Comparison to Visual Food Records	86.5% Availability (Foods listed were in database)	—	Omissions higher in Brazilian cohort (24%) vs. Irish (13%)
General Adult Population [85]	Comparison to 28 Recall Days	—	—	—

Table 2: Nutrient Intake Reporting Accuracy

Nutrient	Study Population	Reporting Accuracy vs. Actual Intake	Key Findings
Energy	Older Korean Women [47]	No significant difference	Recalled intakes were similar to actual intakes.
Fat	Older Korean Women [47]	Underreported	Statistically significant underreporting.
Sodium	Older Korean Women [47]	Underreported	Statistically significant underreporting.
Most Nutrients	Older Korean Women [47]	No significant difference	Protein, carbohydrates, etc., were accurately reported.

Table 3: Comparative Cost and Operational Factors of 24HR Methods

Method Type	Key Features	Relative Cost & Burden	Evidence of Accuracy
Interviewer-Administered (e.g., AMPM)	Structured interview with trained staff; uses food model booklets [28].	High (Personnel, training, travel) [28]	Considered a traditional standard [28].
Web-Based Self-Administered (e.g., ASA24, Intake24)	Automated, participant-led; uses standard images for portion estimation [28] [7].	Lower (No interviewer needed) [28]	Similar error levels to interviewer-administered methods per doubly labeled water studies [28].
Image-Assisted Recall (e.g., mFR24)	Participants take before/after photos; review starts the recall [28].	Moderate (Technology, data management)	Potential to reduce recall bias; under evaluation [28].

Impact of Administration Protocol on Accuracy

Beyond the method of collection, the structure and timing of recalls significantly impact data quality. Research indicates that the continuity between multiple survey days is a more critical factor for accuracy than the sheer number of days.

Non-Consecutive vs. Consecutive Days: Collecting 24HRs on non-consecutive days yields more accurate estimates of usual intake than collecting on consecutive days. The improved accuracy from using non-consecutive days outweighs the benefit of adding an extra consecutive day [85].
Inclusion of Weekend Days: Dietary intake on weekends often differs from weekdays. Consequently, a protocol of two non-consecutive 24HRs that includes one weekend day and one weekday demonstrates higher accuracy than a protocol involving two weekdays only [85].
Statistical Correction (NCI Method): Using the NCI method to correct data from short-term 24HRs (e.g., 2 non-consecutive days) can produce estimates of usual intake that are functionally identical to those obtained from more burdensome three-day protocols [85].

Table 4: Essential Tools for Dietary Recall Validation Research

Tool Name	Type/Function	Application in Validation Research
ASA24 (Automated Self-Administered 24-hr Recall) [7]	Web-based, self-administered 24HR tool.	Enables high-throughput, automated dietary data collection; used to compare against observed intake in feeding studies [28].
Foodbook24 [8]	Web-based 24HR tool with customizable food lists.	Allows for the inclusion of culturally-specific foods, facilitating accurate dietary assessment in diverse populations.
Image-Assisted mFR24 (mobile Food Record) [28]	Mobile app that uses participant-captured images of food.	Serves as a memory aid; images are used to verify food identification and portion size estimation against observed intake.
NCI Method [85]	Statistical modeling method.	Corrects for within-person variation and measurement error in short-term 24HR data to estimate usual dietary intake.
Doubly Labeled Water (DLW)	Biomarker for energy expenditure.	Provides an objective measure of total energy expenditure, used as a biomarker to validate the accuracy of energy intake reporting [28].

Workflow for Validating 24-Hour Dietary Recalls

The following diagram illustrates the standard experimental workflow for a controlled feeding study designed to validate 24-hour recall methods, integrating the key tools and protocols discussed.

The evidence demonstrates that while 24-hour dietary recalls are a vital assessment tool, they are not without significant accuracy gaps. Key limitations include the systematic underreporting of specific nutrients like fat and sodium, and consistent errors in reporting certain food items, such as sauces and condiments. The choice of administration protocol—specifically, using non-consecutive days that include a weekend day and applying statistical correction—is paramount for enhancing accuracy and cost-effectiveness.

For researchers comparing dietary intake across different age groups, these accuracy gaps must be carefully considered. The performance of any 24HR method can be influenced by the cognitive abilities, literacy, and tech-savviness of the age cohort being studied. Future validation research should continue to stratify findings by age and other demographic factors to better quantify and correct for these critical sources of measurement error.

Conclusion

The validation of 24-hour dietary recalls is not a one-size-fits-all endeavor; it requires careful, age-specific methodological tailoring. While the method can provide reasonably accurate data at the group level across all ages, its accuracy at the individual level remains limited, with varying degrees of under-reporting and measurement error. For children, proxy reporting and cognitive development are key considerations, whereas for older adults, physiological changes and polypharmacy introduce unique challenges. Future research must prioritize the development and adoption of standardized, validated protocols that account for this lifecycle variability. For biomedical research and drug development, this is paramount. Accurate dietary data is essential for understanding diet-drug interactions, designing inclusive clinical trials that represent real-world older populations, and ensuring the safety and efficacy of therapeutics. Investing in improved dietary assessment methodology is not merely an academic exercise—it is a critical component of advancing personalized medicine and public health.

Validation of 24-Hour Dietary Recalls Across the Lifespan: Methodological Considerations for Clinical and Biomedical Research

Validation of 24-Hour Dietary Recalls Across the Lifespan: Methodological Considerations for Clinical and Biomedical Research

Abstract

The Core Principles and Age-Specific Challenges of Dietary Recall Validation

Quantitative Validity Comparisons Across Age Groups

Methodological Protocols in Validation Studies

Biomarker-Based Validation Protocols

Direct Observation Protocols

Comparative Method Protocols

Visualizing the Validation Concept Framework

Age-Specific Considerations in 24-Hour Recall Validity

Adolescent Populations

Older Adult Populations

General Adult Populations

Implications for Research Design and Implementation

Understanding Doubly Labeled Water Methodology

Fundamental Principles and Physiological Basis

Detailed Experimental Protocol

Comparative Performance of Dietary Assessment Methods Across Age Groups

Quantitative Comparison of Method Accuracy

Age-Specific Patterns in Reporting Accuracy

Detailed Experimental Protocols for Key Validation Studies

24-Hour Multiple Pass Recall in Children

Food Record Validation in Adult Populations

Research Reagent Solutions for DLW Studies

Advancements and Alternative Approaches

Predictive Equations from Large-Scale DLW Data

Mathematical Modeling as an Alternative to DLW

Implications for Research and Practice

Comparative Data on Age-Related Recall Accuracy

Factors Influencing Recall Accuracy Across the Lifespan

Cognitive Factors

Physiological Factors

Lifestyle and Methodological Factors

Detailed Experimental Protocols in Recall Validation

Protocol 1: Validation of 24HR Against Weighed Food Records in Older Adults

Protocol 2: Assessing Auditory Sentence Recall Under Cognitive Load

The Scientist's Toolkit: Key Research Reagents and Materials

Theoretical Framework: Systematic vs. Random Error

Visualizing Error Types and Their Impacts

Case Study: Validation of Web-Based 24-Hour Recalls in Adolescents

Experimental Protocol

Key Quantitative Findings

Age-Specific Considerations in 24-Hour Recall Validation

Adolescent Populations

Broader Age Group Comparisons

Methodological Strategies for Error Mitigation

Reducing Random Error

Addressing Systematic Error

Essential Research Reagent Solutions

Implementing and Adapting 24-Hour Recall Protocols for Different Age Cohorts

Experimental Protocols and Methodological Framework

The AMPM Core Structure

Validation Study Designs

Performance Data Across Population Subgroups

Quantitative Validation Metrics

Age-Specific Considerations

Technological Evolution: From Interviewer-Administered to Automated Systems

The ASA24 Adaptation

Administration Mode Considerations Across Age Groups

Research Reagent Solutions

Quantitative Comparison of Assessment Method Accuracy

Detailed Experimental Protocols

Validation Protocol for Automated Dietary Recalls in Children

Protocol for Web-Based Cognitive and Memory Testing

Visual Workflows for Key Experimental Designs

The Scientist's Toolkit: Research Reagent Solutions

Comparative Performance Data: Self-Administered vs. Interviewer-Led Recalls

Detailed Experimental Protocols

Protocol 1: Validation Against Weighed Food Intake (Gold Standard)

Protocol 2: Validation Against Biomarkers and Interviewer-Led Recalls

The Scientist's Toolkit: Essential Research Reagents & Materials

Comparative Validity and Key Challenges Across Age Groups

Special Considerations in Older Adults

Detailed Experimental Protocols for Validation

The Multiple-Pass Method (MPM) with Adaptations

Controlled-Feeding Study as a Reference

Research Workflow for Dietary Recall in Older Adults

The Scientist's Toolkit: Essential Research Reagents and Materials

Core Principles: Why Multiple Recalls Are Essential