Normal view MARC view ISBD view

An analysis of local and global solutions to address Big Data imbalanced classification : a case study with SMOTE preprocessing

By:

Basgall, María José

Contributor(s):

Material type: Article

ArticleDescription: 1 archivo (592,0 kB)Subject(s):

Summary: Addressing the huge amount of data continuously generated is an important challenge in the Machine Learning field. The need to adapt the traditional techniques or create new ones is evident. To do so, distributed technologies have to be used to deal with the significant scalability constraints due to the Big Data context. In many Big Data applications for classification, there are some classes that are highly underrepresented, leading to what is known as the imbalanced classification problem. In this scenario, learning algorithms are often biased towards the majority classes, treating minority ones as outliers or noise. Consequently, preprocessing techniques to balance the class distribution were developed. This can be achieved by suppressing majority instances (undersampling) or by creating minority examples (oversampling). Regarding the oversampling methods, one of the most widespread is the SMOTE algorithm, which creates artificial examples according to the neighborhood of each minority class instance. In this work, our objective is to analyze the SMOTE behavior in Big Data as a function of some key aspects such as the oversampling degree, the neighborhood value and, specially, the type of distributed design (local vs. global).

Average rating: 0.0 (0 votes)

Holdings ( 1 )
Title notes ( 3 )

Holdings
Item type	Home library	Collection	Call number	URL	Status	Date due	Barcode
Capítulo de libro	Biblioteca de la Facultad de Informática	Biblioteca digital	A1131 (Browse shelf(Opens below))	Link to resource	No corresponde

Formato de archivo PDF. -- Este documento es producción intelectual de la Facultad de Informática - UNLP (Colección BIPA/Biblioteca)

Addressing the huge amount of data continuously generated is an important challenge in the Machine Learning field. The need to adapt the traditional techniques or create new ones is evident. To do so, distributed technologies have to be used to deal with the significant scalability constraints due to the Big Data context. In many Big Data applications for classification, there are some classes that are highly underrepresented, leading to what is known as the imbalanced classification problem. In this scenario, learning algorithms are often biased towards the majority classes, treating minority ones as outliers or noise. Consequently, preprocessing techniques to balance the class distribution were developed. This can be achieved by suppressing majority instances (undersampling) or by creating minority examples (oversampling). Regarding the oversampling methods, one of the most widespread is the SMOTE algorithm, which creates artificial examples according to the neighborhood of each minority class instance. In this work, our objective is to analyze the SMOTE behavior in Big Data as a function of some key aspects such as the oversampling degree, the neighborhood value and, specially, the type of distributed design (local vs. global).

Conference Cloud Computing and Big Data (7ª : 2019 : La Plata, Argentina)

Back to results

41 Prototipo móvil 3D para el aprendizaje de algoritmos básicos
by Cristina, Federico
42 A simplified multiplatform communication framework for mobile applications
by Cristina, Federico
43 Multi-platform mobile application development analysis
by Delía, Lisandro Nahuel
44 Diseño de Bases de Datos
by Bertone, Rodolfo Alfredo
45 Fundamentos de Organización de Datos
by Bertone, Rodolfo Alfredo
46 Enfoques de Desarrollo de Aplicaciones Móviles Multiplataforma
by Thomas, Pablo Javier
47 Dispositivos móviles :
by Thomas, Pablo Javier
48 Experiencias en el desarrollo de sistemas de software distribuidos
by Pesado, Patricia Mabel
49 Análisis de enfoques de aplicaciones para dispositivos móviles
by Thomas, Pablo Javier
50 Aplicaciones móviles 3D
by Thomas, Pablo Javier
51 Metodologías, técnicas y herramientas de ingeniería de software en escenarios híbridos
by Bertone, Rodolfo Alfredo
52 Diseño de Bases de Datos
by Bertone, Rodolfo Alfredo
53 Tendencias en el desarrollo de aplicaciones para dispositivos móviles
by Thomas, Pablo Javier
54 Aplicaciones móviles 3D inmersivas
by Thomas, Pablo Javier
55 Fundamentos de Organización de Datos
by Bertone, Rodolfo Alfredo
56 Aplicaciones para dispositivos móviles :
by Thomas, Pablo Javier
57 Aspectos de ingeniería de software y bases de datos para el desarrollo de sistemas de software en escenarios híbridos
by Marrero, Luciano
58 Fortalecimiento de las capacidades de gobernanza para ciudades inteligentes sostenibles
by De Giusti, Armando Eduardo
59 Taller de Tecnologías de Producción de Software :
by Thomas, Pablo Javier
60 Análisis de consumo de energía en aplicaciones 3D para dispositivos móviles
by Cristina, Federico