Analyzing Child Malnutrition Using Data Mining

María Fernanda López Díaz
9 min readJun 23, 2021

Written by Najla Itzel Hernández Verduzco, Daniela Cruz Reyes, Aldo Carrillo, CHAVEZ TREJO LUIS ENRIQUE, María Fernanda López Díaz.

Resume

The purpose of this project is to develop a classifier model that helps us determine if there is a relationship between the amount of food produced in a country and the types of malnutrition that occur.

Data was obtained from the Kaggle site.

It was carried out with the use of the K-means method since it is the one that best adapts to our type of classification problem.

The results obtained with the model suggest that there is not a close relationship between the two data sets.

Malnutrition during childhood is a problem that has affected the majority of the population for many years, we know with certainty that economic and social factors are the main causes of this problem, having serious repercussions on the lives of children and in the future of an entire society. Now, what if we consider this problem in much simpler terms? That is, in terms of the food that is produced globally and the types of malnutrition within the younger population.

Given these two variables, the question then arises, does food, in terms of production, influence the different types of malnutrition within a society?. In other words, what we want to know is whether the amount of food produced worldwide can tell us anything about malnutrition within a population.

Is there a definite problem in terms of data mining?

Then, with the data set obtained, we will be able to find the different patterns and correlations existing in these variables that at the end of the day will help us to solve the questions posed above.

Hypothesis

Is there a relationship between the amount of food produced and malnutrition in a country?

Methodology

Case study

The world population is increasing at an alarming rate, this generates several problems, among which malnutrition stands out, the cause of main diseases and deaths, mainly in the child population, in this sense, food production plays an important role in addressing the impact that causes this lack. The objective of this project is to study if there is any direct connection between the amount of food produced by a country per year and the malnutrition that exists in the same.

For the collection of information we relied on two sets of data. The first contains information on food production around the world, divided into two sections: food produced for human consumption and food produced for animals. The second data set provides information on undernutrition in children around the world. This information will be useful, since it will allow us to carry out the multivariate analysis to reach the aforementioned relationships and with this type of analysis it is possible to make projections of possible future situations.

Exploratory Data Analysis (EDA)

The exploratory data analysis (EDA) or descriptive statistics is a preliminary and essential step when it comes to understanding the data with which you are going to work and highly recommended for a correct research methodology.

The objective of carrying out this analysis is to explore, describe, summarize and visualize the nature of the data collected in the random variables of the project or research of interest, through the application of simple data summary techniques and graphic methods without assuming assumptions for their interpretation.

Data pre-processing

Dataset 1

Dataset 2

Join tables

Univariate analysis for the resulting table

Numerical vs Numerical

Categorical vs Numerical

Categorical vs Categorical

Standardization

Multivariate model

Was a multivariate model found from the experience of the data?

According to the observed data, we need to validate the existing relationships between variables so it is necessary to group them for their respective analysis. For this we will apply the K-Means method due to its simple implementation and that unlike other methods it is more efficient to partition large specific data sets.

K-means

Table Food/Feed

Normalize the values

Table Malnutrition

Normalize values

Evaluation

Were the methods found evaluated?

During the first execution of the model, we could observe that the dimensions were not sufficient to interpret the results sought. So we decided to add a new dimension to the model allowing us to have a better visualization of the results.

Taking into account the columns Country, Type, People, Quantity

Results

Was any knowledge generated? Is the found model validated?

Graphs were generated on the relationship of the two data sets.

Since we were not looking for a predictive data analysis but rather a relationship between different variables through data grouping, it was not necessary to “train” our model.

Gráficas 3D

Correlation

To observe the process we follow in more detail, then enter the following link, where you can see the complete code of this investigation: Click here.

Conclusion and discussion

Is the hypothesis tested?

With the results obtained, we verify that there is a relationship between the amount of food produced in a country and the number of children suffering from some type of malnutrition.

We generated the correlation and as a conclusion we obtained that the relationship that exists is a negative correlation, which indicates that when more food is produced there is less malnutrition or vice versa, when less food is produced, there is greater malnutrition, that relationship is minimal but we find it meaning because malnutrition is influenced by various factors, social, political, economic, among others.

Future work

This work aims to be the basis for generating a project with which it can be predicted if malnutrition is going to increase or decrease in the future or if there is a way to prevent this phenomenon since, from the data obtained, it can be analyzed with other factors such as GDP, birth rate, or some other variables.

--

--