Imputación de datos faltantes en los ingresos por hogar en la Enaho utilizando el método del K-vecino más cercano

Collazos Tuesta, Oscar Ronald

dc.contributor.advisor	Menacho Chiok, César Higinio
dc.contributor.author	Collazos Tuesta, Oscar Ronald
dc.date.accessioned	2021-08-10T19:27:38Z
dc.date.available	2021-08-10T19:27:38Z
dc.date.issued	2021
dc.identifier.uri	https://hdl.handle.net/20.500.12996/4851
dc.description	Universidad Nacional Agraria La Molina. Facultad de Economía y Planificación. Departamento Académico de Estadística e Informática	es_PE
dc.description.abstract	La Encuesta Nacional de Hogares (ENAHO), es el instrumento que utiliza el Instituto Nacional de Estadística e Informática (INEI) para recopilar a nivel nacional los datos de los hogares sobre su condiciones económicas, educativas, salud, etc. y que permiten generar indicadores que miden el estado y la evolución de la pobreza, el bienestar y las condiciones de vida de los hogares del Perú, así como para efectuar diagnósticos y medir el alcance de los programas sociales (alimentarios y no alimentarios) en la mejora de las condiciones de vida de la población peruana. Sin embargo, un problema que debe enfrentar la ENAHO es la no respuesta total o parcial en las unidades de muestreo (no respuesta en unidades) o en una pregunta específica (no respuesta por ítem); sobre todo a las preguntas referidas a los ingresos de los hogares. Para el tratamiento de los datos faltantes, se han propuesto una variedad de métodos que comprenden desde el más simple que consiste en la eliminación de las observaciones que tengan algún dato faltante en una de las variables hasta métodos más consistentes basados en un proceso de imputación con los datos faltantes a partir de los datos completos. El objetivo de esta investigación es presentar y aplicar los métodos de imputación de la media y mediana, el método Hot-Deck y el k vecino más cercano para estimar los datos faltantes del Ingreso por hogar en la ENAHO 2017 trimestre 3. Los resultados indican que los datos faltantes del ingreso tienen un mecanismo MCAR. La estimación del intervalo de confianza del 95% para la media de los ingresos imputados, tuvieron amplitudes por el método de la media 131,41 (el menor) mientras que por el k vecino más cercano fue 139,4. Para estimación de la desviación estándar del ingreso, fue el menor para la media 92,97 y k vecino más cercano 100,99. Los resultados de la comparación de los métodos de imputación, fueron usando los datos completos para generar una muestra aleatoria de datos faltantes artificiales y luego se hallaron el Cuadrado Medio del Error (ECM) y correlaciones con los datos observados e imputados para cada método. El método del k vecino más cercano tuvo los menores valores de ECM 1412,6 y 444,4 para la media y mediana; mientras que los otros métodos sus valores fueron por la media 1504,5; por la mediana 1619,9 y por el Hot-Deck 1963,7. Los coeficientes de correlaciones resultaron con valores muy similares, para k vecino más cercano 0,968 con la media y 0,964 con la mediana.	es_PE
dc.description.abstract	The National Household Survey (ENAHO) is the instrument used by the National Institute of Statistics and Informatics (INEI) to collect national data on household economic, educational and health conditions, etc. and that allow generating indicators that measure the status and evolution of poverty, well-being and living conditions of Peruvian households, as well as to carry out diagnoses and measure the scope of social programs (food and non-food) in the improvement of the living conditions of the Peruvian population. However, a problem that ENAHO must face is the total or partial non-response in the sampling units (non-response in units) or in a specific question (non-response per item); especially to the questions referring to the income of the households. For the treatment of missing data, a variety of methods have been proposed , ranging from the simplest, which consists of elimination of observations that have some missing data in one of the variables, to most consistent methods based on an imputation process with the missing data from the complete data. The objective of this research is to present and apply the imputation methods of the mean and median, the Hot-Deck method and the nearest k neighbor to estimate the missing data of the Income per household in the ENAHO 2017 quarter 3. The results indicate that missing income data has a MCAR mechanism. The estimate of the 95% confidence interval for the mean of the imputed income, had amplitudes by the method of the mean 131.41 (the smallest) while for the nearest k neighbor it was 139.4. To estimate the standard deviation of income, it was the lowest for the mean 92.97 and k nearest neighbor 100.99. The results of the comparison of the imputation methods, were using the complete data to generate a random sample of artificial missing data, and then the Mean Square Error (ECM) and correlations with the observed and imputed data for each method were found. The closest neighbor k method had the lowest ECM values of 1412.6 and 444.4 for the mean and median; while the other methods their values were by the average 1504.5; by the median 1619.9 and by the Hot-Deck 1963.7. The correlation coefficients resulted in very similar values, for k nearest neighbor 0.968 with the mean and 0.964 with the median.	en_US
dc.format	application/pdf	en_US
dc.language.iso	spa	es_PE
dc.publisher	Universidad Nacional Agraria La Molina	es_PE
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	*
dc.subject	Hogares	es_PE
dc.subject	Familia	es_PE
dc.subject	Pobreza	es_PE
dc.subject	Encuestas	es_PE
dc.subject	Recolección de datos	es_PE
dc.subject	Análisis de datos	es_PE
dc.subject	Métodos estadísticos	es_PE
dc.subject	Datos estadísticos	es_PE
dc.subject	Perú	es_PE
dc.subject	Método de k-vecino	es_PE
dc.subject	Mecanismo MCAR	es_PE
dc.title	Imputación de datos faltantes en los ingresos por hogar en la Enaho utilizando el método del K-vecino más cercano	es_PE
dc.type	info:eu-repo/semantics/bachelorThesis	en_US
thesis.degree.discipline	Estadística e Informática	es_PE
thesis.degree.grantor	Universidad Nacional Agraria La Molina. Facultad de Economía y Planificación	es_PE
thesis.degree.name	Ingeniero Estadístico Informático	es_PE
dc.subject.ocde	https://purl.org/pe-repo/ocde/ford#4.05.00	es_PE
renati.author.dni	41290137	es_PE
dc.publisher.country	PE	es_PE
dc.type.version	info:eu-repo/semantics/publishedVersion	en_US
renati.advisor.orcid	https://orcid.org/0000-0003-1310-2551	es_PE
renati.advisor.dni	07108718	es_PE
renati.type	https://purl.org/pe-repo/renati/type#tesis	es_PE
renati.level	https://purl.org/pe-repo/renati/level#tituloProfesional	es_PE
renati.discipline	542026	es_PE
renati.juror	Miranda Villagomez, Clodomiro Fernando
renati.juror	Porras Cerrón, Jaime Carlos
renati.juror	López de Castilla Vásquez, Carlos