Decoding the Dutch Diet: A Data-Driven Approach to Identifying National Meal Patterns

Public health organizations like the Dutch National Institute for Public Health and Environment (RIVM) collect vast amounts of data on what citizens eat. But a list of individual ingredients—“potatoes,” “kale,” “sausage”—doesn’t tell you what people are actually having for dinner. Are they making a traditional stamppot, or three separate dishes?

In a recent project with RIVM, I tackled this exact challenge: transforming raw food consumption records from the Dutch National Food Consumption Survey (DNFCS) into meaningful, dish-level insights. The goal was to uncover what dishes the Dutch are commonly eating, how these patterns have changed over the last 15 years, and how healthy these meals really are.

The DNFCS dataset is incredibly rich, containing detailed food logs from thousands of participants across three survey waves (2007-2021). However, the data exists as a sequence of individual food items consumed during an “eating occasion.” My task was to develop a methodology to group these items into coherent dishes and meal types. I adopted a two-pronged data science approach to solve this.

Method 1: Finding Patterns with Association Rule Mining

My first approach was to identify foods that are frequently consumed together. For this, I used Association Rule Mining (ARM), a technique famous for its “market basket analysis” use case (e.g., “customers who buy diapers also tend to buy beer”).

I implemented the efficient FP-growth algorithm to mine the dataset for co-occurring food items within breakfast, lunch, and dinner. The algorithm generates rules like: {if: Boerenkool, Spekjes} -> {then: Aardappel}

This rule, with high confidence, essentially identifies a classic Dutch dish: Boerenkool stamppot met spekjes (Kale stamppot with bacon bits). By defining categories for common Dutch meals like AVG (potatoes, vegetables, and meat), Pasta, and Soups, I could use ARM to validate and quantify their prevalence in the national diet.

Method 2: Clustering Dishes with NLP and Semantic Similarity

While ARM is great for identifying pairings, it struggles with the nuance of complete, complex dishes. To address this, I turned to Natural Language Processing (NLP). The dataset included recipe names for many recorded meals, which provided a rich source of textual data.

My NLP pipeline involved several steps:

Text Combination: For each meal, I created a single text string combining the recipe name (e.g., “omelet (ongevuld)”) and its component ingredients (e.g., “ei heel kippen, olie olijf”).
Word Embeddings: I fed this text into a pre-trained transformer model (all-MiniLM-L6-v2) to convert the Dutch text into high-dimensional vector representations, capturing the semantic meaning of each dish.
Dimensionality Reduction: To make these complex vectors clusterable, I used UMAP (Uniform Manifold Approximation and Projection) to reduce their dimensionality from 384 to just two dimensions, while preserving the underlying structure.
Clustering: Finally, I applied the K-means++ algorithm to the reduced data, grouping semantically similar dishes into distinct clusters.

This unsupervised approach successfully identified clusters corresponding to “Smoothies,” “Traditional Dutch Stamppot,” “Pasta Dishes,” and more, without any prior labeling.

Key Discoveries and Public Health Impact

The combination of these methods yielded powerful insights into the Dutch diet:

Dietary Trends Revealed: The semantic clustering revealed fascinating changes over time. Between 2007 and 2021, traditional dishes like stamppot saw a decline in popularity at dinner, while lighter items like smoothies became a breakfast staple, especially for women.
Nutritional Quality Assessed: By mapping every food item to the Dutch “Schijf van Vijf” (Wheel of Five) dietary guidelines, I could score the nutritional compliance of each meal. The analysis showed that, on average, dinner is the most guideline-compliant meal, and compliance tends to increase with age. Crucially, the most popular dishes were often only moderately healthy, highlighting a clear target for public health campaigns.
Reproducible Pipeline: To ensure the entire analysis was robust and repeatable, I orchestrated the data extraction, cleaning, modeling, and visualization steps into an end-to-end pipeline using Prefect. This automates the workflow and guarantees that the results are reproducible, a critical aspect of modern data science.

Challenges and Lessons Learned

No project is without its challenges. The NLP model had to work directly with Dutch text, as translation proved unreliable for culturally specific dishes like “stamppot.” Furthermore, my initial nutritional compliance metric was based on the frequency of healthy ingredients, not their weight, which could slightly overestimate a meal’s healthiness. Acknowledging these limitations is the first step toward future improvements, such as developing a weight-adjusted nutritional score.

Conclusion

This project successfully demonstrates how a multi-faceted data science strategy can transform raw survey data into actionable public health intelligence. By combining classic techniques like Association Rule Mining with modern NLP-driven clustering, I was able to build a comprehensive picture of the Dutch diet, identifying key trends and areas for nutritional improvement.

This data-driven understanding of what people actually eat is invaluable for shaping effective dietary guidelines, educational materials, and public health policies.