Projects
Hindi to English: Transformer-Based Neural Machine Translation
Feb 2020 – Sep 2020
The project aims at translating texts from Hindi to English by training the Transformer model. For this, we first cleaned the IIT Bombay CLIFT English-Hindi parallel corpus and then applied word level and sub-word level tokenization for the vocabulary creation. We implemented back-translation to augment the training data by ~3.5 million parallel records. The Transformer model along with sub-word level tokenization and back-translation led us to achieve state-of-the-art BLEU score on Hindi to English translation task.
See Project
Topic Modeling on Research Articles
Aug 2020
Developed a multi-class classifier that takes ‘Title’ and ‘Abstract’ as the input and predicts the category to which it belongs it. Pre-processed the data to remove noise characters and then implemented stemming and lemmatization. Trained and compared the results of BERT, RoBerta, OVR Logistic Regression, Binary Relevance models. Achieved a mircro F1 score of 0.837 on the test data of ~9,000 records.
See Project
Dance form Identification using CNN and Transfer Learning
July 2020
Trained 364 images in the train set using 4 Pre-trained models viz: VGG16, VGG19, ResNet50 and InceptionV3. Using this Transfer-learning approach achieved an accuracy of 78.37% on the test set. To further increase the accuracy, implemented data augmentation technique to generate artificial images in the batches of 32. Data Augmentation along with Hyperparameter tuning finally led to an accuracy of 83.89% which placed me in top 2% in the pool of 5864 participants.
See Project
Employee Attrition Rate Prediction
June 2020
Implemented XGBRegressor and SGD Regressor for prediction the attrition rate if the employees using 22 input features. Applied different data pre-processing steps and achieved an accuracy of 81% on a test data of 3000 records.
See Project
Malaria Detection using Transfer Learning and GAN
May 2020 – June 2020
Fine-tuned InceptionV3, VGG16 and VGG19 models on the image data of the malaria-affected and unaffected cells. For this, I first pre-processed the dataset to resize into 48483 resolution. Implemented DCGAN to augment the training data with the images of the cells that were infected with malaria.
See Project
Real or Not? NLP with Disaster Tweets
Apr 2020 – May 2020
Implemented a Bi-directional LSTM model to predict whether the tweet is a real or an artificial one. To further increase the accuracy and reduce the error, implement a pre-trained BERT-Base model using PyTorch.
See Project
Malicious URL Detection
Nov 2019 – Dec 2019
The aim of the project was to develop a classifier that classifies whether the URL is malicious or not by analyzing its lexical features. From the URL, I extracted 15 lexical features such as length of the url, digits to letter ratio, etc. Performed EDA on the processed data and then trained Random Forest and Support Vector Classifier. Achieved an accuracy of ~90% on the test dataset.
See Project
GeoSharding
Aug 2018 – May 2019
Implemented sharding in the network plane of the blockchain for parallel execution of the transactions. Mapped the IP address of the blockchain miner nodes into longitude and latitude coordinates and then implemented K-means for clustering nodes that are geographically closer into one shard Devised and implemented a novel, secured leader election algorithm on each shard using Golang.
See Project