Deep Learning | Mihir Parmar

Semantic Scene Composition for Amodal Instance Segmentation

A novel deep learning architecture capable of generating realistic occluded compositions by placing provided individual objects in the scene, in a context-aware manner while preserving the original full masks of the occluded object. Used a combined STN-GAN framework to learn a projection matrix based on encoded geometry and semantic information. Evaluated the generated compositions by fine-tuning a pre-trained Mask-RCNN for the task of amodal instance segmentation and reported the COCO-style mean-Average Precision (mAP) metric.

Deep Learning for Vision

Implemented transfer learning on ResNet for image classification, and generative architectures:- Variational AutoEncoder, Least-Square GAN and DCGAN for image generation.

Sequential Models for Text Classification and Generation

Worked on data pre-processing of Amazon Reviews Dataset and implementing LSTM, BiLSTM, GRU, RNN architectures with Attention modules for review rating prediction using weighted loss and SMOTE techniques to handle class imbalance. Improved the F1 score accuracy by using b-directional transformer-based architectures BERT and RoBERTa. Designed seq2seq architecture with attention trained using teacher forcing strategy to generate summaries of review text.

Yelp Review Rating Prediction

Developed and implemented a CNN model for rating prediction and sentiment classification of YELP user restaurant-reviews. For the sentiment classification task, demonstrated that a simpler model which uses only adjectives of the review as its features yield similar results when compared to a complex model that utilizes the entire review.