CSSE team uses machine learning to win $10,000 in USDA-sanctioned challenge
Published: Sep 7, 2022 9:00 AM
By Joe McAdory
ShubhraKanti "Santu" Karmaker, assistant professor in computer science and software engineering, and his doctoral students Naman Bansal and Alex Knipper in collaboration with Jingyi Zheng (assistant professor, Department of Mathematics and Statistics) and Wenying Li (assistant professor, Department of Agricultural Economics), won first place and $10,000 in the recent Coleridge Initiative Food for Thought First Interim Challenge, in association with the U.S. Department of Agriculture (USDA).
Twelve teams of NLP researchers nationwide were selected (based on proposal solicitation and reviewing) to use natural language processing and machine learning techniques to help the USDA’s domestic food and nutrition assistance programs link databases to develop a better understanding of Americans’ dietary habits.
The winning team, named “Auburn Big Data,” spent approximately 750 hours in July and August developing a model predicting links between retail scanner data (IRI) and the Food and Nutrient Database for Dietary Studies (FDNSS). The model generated the competition’s most IRI-to-FNDDS links with the highest accuracy.
“The USDA is trying to track down how much nutrition people are getting,” said Karmaker. “This is very important for monitoring the health of the general people.
“The goal is to build a national database of how many calories people are buying, or who suffers hunger/malnutrition. But it is very difficult to track because there is no central database for 30 million people. They (the USDA) asked us, ‘Can your computers do this for us?’ That’s what we did — we matched it for them. It might not be perfect, but it’s close and saves them a lot of time.”
But how can machine-learning models possibly understand the dietary habits of Americans? Simple: Analyze people’s respective purchase history at the grocery store – then compare those product descriptions with the nutritional database. Teams were provided with confidential information, including anonymized transactions from a variety of grocery outlets, and worked remotely.
“Once computers learn what people are buying, we need to make them understand how the same food item can be expressed in many different ways by different people in many different databases, and so forth,” said Karmaker, who directs the college’s Big Data Intelligence laboratory. “We train the machine so that it automatically figures out different ways a food item can be described in English, such as guacamole is essentially avocados with some added spices.”
How? By transforming food items into real-valued vectors with latent semantic meaning, often referred to as embeddings, to ease the machine learning process.
“We’re asking machines to learn patterns,” Karmaker said. “We take natural language descriptions, convert that using deep learning techniques, into the series of numbers. Over time, the machine learns what vectors are similar and what vectors are not similar.”
Algorithm after algorithm is tested, until results produce nutritional similarities in avocados and guacamole, oranges and tangerines, or possibly chocolate ice cream and gelato.
“In academia, some work with simulated, or synthetic data, but we’re solving a real problem,” said Karmaker, who noted that students who participate in experiential, real-world projects have career advantages over those who do not. “We’re talking about real money. We’re talking about human effort. We’re talking about people’s health. Our students are getting their hands dirty with real data. You cannot gain this type of experience in a simulated environment.”
Media Contact: , jem0040@auburn.edu, 334.844.3447Computer science and software engineering doctoral students Alex Knipper, left, and Namal Bansal, right, with assistant professor ShubhraKanti "Santu" Karmaker, center.