NLP Projects | Errol W. Mamani

Text Classification

1. Aggressive Language identification using Vocabulary Graph Convolutional Neural Network

Developed a robust model for detecting offensive and hate speech in text using Graph Neural Networks (GNNs). I represented vocabulary as a graph to capture word relationships, enhancing contextual understanding. Implemented the solution in PyTorch, utilizing various frameworks for efficient data processing and model training. The project achieved high accuracy in identifying harmful language.

The results of Aggressive detection using VGCN f1-score, *means our developed system model.

VGCN for Aggressive detection | paper_here git_repo

2. Sentiment Analysis for Spanish (Peru) using SVM

I developed a machine learning project using Support Vector Machines (SVM) for text classification. I collected real-time tweets from Twitter to train the model, focusing on sentiment analysis. The project demonstrated SVM’s effectiveness in classifying textual data accurately.

General Pipeline followed to implement SVM for Sentyment analysis case study.

Analisis masivo de datos en twitter para identificacion de opinion thesis_online_repor git_repo

Text Generation

1. Quechua Indigenous Lyrics Generation using LSTM.

Developed a text generation model for the Quechua language using LSTM networks. Collaborated with a team to create the dataset, then successfully training the model to generate coherent text lyrics of songs. This project aims to promote and preserve Quechua, enhancing its accessibility in digital formats and supporting cultural heritage.

the video in the right shows the demo of the Quechua lyrics generation.

Generación automática de letras de canciones usando redes neuronales recurrentes para el quechua Doc repository online thesis_online_repor
You can check out the git_repo

Speech Recognition

1. A Comparison Study of Speaker Identification models

This project compares three machine learning models: Gaussian Mixture Model (GMM), Hidden Markov Model (HMM), and Support Vector Machines (SVM). Using the same MFCC features, HMM and GMM achieved about 94% accuracy on 100 audio recordings. SVM reached around 90% accuracy for classifying 2-5 individuals but performed poorly with more classes.

You can check out the git_repo_here