Projects

AI- Work Pilot System

The AI-Powered Job Scraper and Notification System automates the job search process by extracting job listings and descriptions from platforms like LinkedIn and Indeed. It uses Natural Language Processing (NLP) models for job classification and skill extraction, providing personalized job recommendations based on predefined criteria. The system is built on a modular multi-agent architecture for web scraping, data cleaning, job classification, and notifications.

Key Features

Automated Job Extraction

Implemented intelligent web scraping agents using Selenium, BeautifulSoup, and Requests.
Developed job-specific search queries and custom scraping logic for efficient data extraction from LinkedIn and Indeed.
Integrated error handling and dynamic page loading to improve scraping performance.

Data Cleaning and Processing

Applied Pandas, NumPy, and Regex for cleaning and structuring raw job data.
Standardized and normalized job titles, descriptions, and company names for consistency.
Filtered irrelevant job postings based on predefined job roles, locations, and keywords.

NLP-Based Job Analysis

Used Hugging Face Transformers and BERT models for job classification and skill extraction.
Performed Named Entity Recognition (NER) to extract relevant skills, tools, and job-specific keywords from job descriptions.
Implemented ranking logic based on extracted keywords and job relevancy scores.

AI-Driven Notification System

Developed an API-based notification system to send personalized job alerts.
Automated daily job search updates and real-time notifications for matching jobs.

Multi-Agent AI System

Web Scraping Agent: Extracts job data from various sources.
Data Cleaning Agent: Processes and cleans raw job listings.
NLP Agent: Analyzes job titles and descriptions.
Notification Agent: Sends real-time notifications using APIs.

Deployment and Version Control

Managed source code using GitHub for version control and team collaboration.
Containerized the system using Docker for scalable and platform-independent deployment.
Scheduled regular updates and maintenance for operational efficiency.

Technical Stack

Web Scraping: Selenium, BeautifulSoup, Requests
Data Processing: Pandas, NumPy, Regex
Natural Language Processing: Hugging Face Transformers (BERT)
APIs & Notifications: Custom REST APIs
Version Control & Deployment: GitHub, Docker

Impact & Achievements

Automated the entire job search process, reducing manual job searching efforts.
Delivered personalized job recommendations based on advanced NLP and AI models.
Enabled seamless project management through GitHub and streamlined deployments using Docker.
Built a scalable, modular architecture supporting future expansions into more job platforms and deeper skill analysis.

This project demonstrates the power of AI-driven automation in job searching by integrating web scraping, data processing, and NLP-based job recommendations in a scalable, efficient system.

Industry Partnered Capstone Project

Associated with Seattle University and Fortune 500 Company

Project Duration: Jan 2024 - Present

Algorithm for Fed Markets: This project, conducted in collaboration with a Fortune 500 company and affiliated with Seattle University, is a comprehensive endeavor aimed at revolutionizing market research strategies for a leading IT distributor. Focused specifically within the Space Force sector, the project introduces an innovative advanced search engine powered by cutting-edge unsupervised Natural Language Processing (NLP) techniques.

The primary goal of this initiative is to overhaul the distributor's internal data analysis and discovery process, which currently relies on manual research and data aggregation from disparate sources. By leveraging advanced NLP algorithms, the project streamlines the extraction, aggregation, and transformation of data, significantly enhancing efficiency and accessibility for the distributor's team.

Through the integration of a streamlined user interface and a robust data pipeline, the project enables seamless navigation through vast amounts of data from various sources. This automation not only optimizes the distributor's ability to identify emerging market trends and contracting opportunities but also empowers them to make well-informed decisions swiftly and stay competitive in the corresponding landscape.

Throughout the project timeline, spanning from January 2024 to June 2024, the team will collaborate closely with the Fortune 500 company and Seattle University to develop and implement state-of-the-art solutions tailored to the distributor's specific needs. By the project's completion, it is anticipated that the distributor will experience a significant reduction in both time and effort required for market research activities, ultimately leading to improved operational efficiency and strategic decision-making.

Can't Reveal much here because we signed NDA for the company so but can showcase our work regarding what we worked on : "ADVANCED SEARCH TOOL(NLP-Powered)", "Topic Modelling", "Name Entity Recognition", "Sentiment Analysis", "PowerBi Dashboard"

Streamlit Application for Resume Analysis using NLP and GPT-3.5 turbo (Recruiter Friendly)

Project Duration: March 2024 - Present

Technical Synopsis: Streamlit-Powered ATS Resume Matcher with OpenAI GPT-3.5 Turbo

This project leverages OpenAI's powerful GPT-3.5 Turbo language model, seamlessly integrated within a user-friendly Streamlit web app. It utilizes CountVectorizer from Scikit-learn to transform resumes and job descriptions into numerical vectors for calculating cosine similarity and matching suitability. The system can dynamically generate job descriptions for different experience levels (entry, mid, and advanced) and suggest missing keywords to enhance resumes based on the job description, powered by GPT-3.5 Turbo. PyPDF2 library enables processing of PDF resumes, while secure key management safeguards API credentials. This Streamlit-based application significantly optimizes the resume screening process, offering a data-driven and user-friendly approach to talent acquisition for recruiters.

Functionality:

Power Analyze: This feature analyzes the compatibility of a resume within current technology domains, offering insights for job seekers at entry, mid-senior, and advanced levels. It provides scores for various domains including Computer Science, Data Science, Machine Learning, Business Analysis, and Cloud Computing.
Matching Analyze: This feature calculates the similarity score between a resume and a job description, suggesting missing keywords from the job description that are not present in the resume.
Comparative Analyze: This feature allows recruiters to parse multiple resumes received for a job posting, ranking candidates based on their suitability and providing options to filter candidates based on score thresholds.

Power Analyze Results:

General Analysis complete!

Technology	Entry-level	Mid-senior level	Advanced level
Computer Science	47.41%	53.78%	57.07%
Data Science	67.49%	60.96%	65.98%
Machine Learning	55.56%	39.30%	46.02%
Business Analysis	56.63%	48.57%	57.60%
Cloud Computing	43.41%	43.13%	41.37%

Skills: Large Language Models (LLM), Natural Language Processing (NLP), Application Programming Interfaces (API), Streamlit

Bird Species Classification using Deep Learning

Project Duration: September 2022 - December 2022

Technical Synopsis: Sound Recognition with Deep Learning for Bird Species Classification

This project explores the application of machine learning and deep learning techniques for the categorization of bird sounds, aiming to monitor bird population health and biodiversity. It covers the process of sound recognition from feature extraction to classification, utilizing spectrograms and neural networks. The methodology involves constructing a custom neural network for bird sound classification, including data preprocessing, binary and multi-class classification models, and transfer learning. Optimization algorithms and data augmentation techniques are employed to enhance model accuracy and robustness.

Functionality:

Binary Classification: Classifies between two bird species using spectrograms and deep learning models.
Multi-class Classification: Classifies sound clips into one of the twelve species categories using custom neural networks.
Transfer Learning: Explores adapting pre-trained neural network models for improved classification accuracy.

Conclusion:

The project discusses the theoretical background of sound recognition, including feature extraction and classification using spectrograms and neural networks. It also covers hyperparameter tuning and optimization techniques to improve model performance.

This study successfully builds custom neural network models for bird species classification, explores transfer learning, and addresses challenges such as dataset size and overfitting. The report highlights the importance of sound recognition in monitoring bird populations and biodiversity.

Stock Analysis using Yahoo Finance API

Project Duration: May 2021 - Jun 2021

This project delves into the intricate behavior of stock markets on an annual basis, focusing on various disciplines such as NFLX, TSLA, and more. It addresses a wide array of challenges related to stock market analysis, including:

Daily Returns: Analyzing the daily returns of stocks to understand their volatility and performance over time.
Moving Averages: Utilizing moving average techniques to identify trends and patterns in stock price movements.
Interdependence of Stocks: Investigating the relationships and dependencies between different stocks to assess their correlations and diversification benefits.
Value at Risk (VaR): Assessing the potential losses of an investment portfolio over a given time horizon under normal market conditions.
Forecasting Stock Behavior: Employing advanced techniques such as the Bootstrap method, Monte Carlo simulations, and Geometric Brownian Motion to forecast future stock prices and assess risk.

These techniques are complemented by effective visualizations using Seaborn and Matplotlib, providing insightful graphical representations of the analyzed data.

How it contributes to becoming a better data engineer, analyst, and scientist:

By working on this project, you will gain valuable skills and experience that are essential for becoming a proficient data engineer, analyst, and scientist:

Data Handling: You will learn how to efficiently handle and manipulate large datasets obtained from Yahoo Finance API, enhancing your data engineering skills.
Data Analysis: Through the application of various statistical and analytical techniques, you will develop a deeper understanding of stock market behavior and trends, honing your data analysis skills.
Problem-Solving: Addressing complex challenges such as forecasting stock behavior and assessing risk will sharpen your problem-solving abilities and critical thinking skills.
Programming: Working with Python libraries such as Pandas, NumPy, Seaborn, and Matplotlib will strengthen your programming skills, particularly in data manipulation and visualization.
Domain Knowledge: You will gain domain-specific knowledge in finance and stock market analysis, which is valuable for future roles in data science and analytics.

Data Analysis and Web Scraping for future Forecasting Purposes

Project Duration: Final year of Bachelor's project: Jan 2022 - Jun 2022

1.1 Introduction

Data analysis involves understanding and interpreting a dataset to find answers to questions. Web scraping and visualization are powerful methods for automatically generating content on the internet. This project focuses on creating a movie rating forecast by extracting data from the IMDB website, enabling users to make informed decisions about which movies to watch based on ratings.

1.2 Statement of the Problem

The project aims to develop an API that extracts data from multiple websites, preprocesses it, and visualizes it to provide business insights across various disciplines. By incorporating different perspectives into problem-solving, the project seeks to offer comprehensive solutions to queries.

1.3 Objectives

Extract data from various sources using web crawler software written in Python 3.7.
Create an open-source application for comprehending movie-related data.
Extract and analyze comments, ratings, or any attributes related to movies from commercial websites.
Ensure the application works effectively on different websites with or without static web pages.

1.4 Scope

The project utilizes Selenium and Beautiful Soup for web scraping, allowing users to extract data from different websites. The scraping process involves opening the website, inspecting HTML tags, and storing the desired elements into a data frame. Data scraping can be time-consuming and may require additional cleaning steps.

1.5 Applications

Cost-effective solution for data retrieval and analysis.
Low maintenance and easy implementation.
High data accuracy at scale.
Simplified data retrieval through automation.
Reliable performance and robustness.

1.6 Limitations

Web scraping is limited to predefined layouts on partner websites, and changes in layout may cause the script to fail.
The API service depends on partner websites, and usage may be restricted based on limitations set by these websites.
Performance and reliability of Python scripts depend on server availability and resources.

Conclusion

Web scraping enables the extraction of hidden web data, which is crucial for various applications. The project aims to provide an easy-to-use interface for searching, analyzing, and extracting data from websites. Future work may involve integrating machine learning techniques to automate decision-making processes.

Future Work

Python's popularity in data science continues to grow, and future enhancements may include integrating machine learning algorithms to automate decision-making processes. Addressing the challenges of web structure inconsistency and implementing AI-driven applications are potential areas for future development.

Exploratory Data Analysis on Titanic Dataset

Embark on a voyage into the world of data exploration with my first-ever project: Exploratory Data Analysis (EDA) on the Titanic Dataset. This project holds a special place in my heart as it marks the inception of my journey into the captivating realm of data science and machine learning.

As an aspiring data enthusiast during my undergraduate days, I embarked on this endeavor fueled by sheer passion and determination. Without the guidance of professors or faculty, I immersed myself in a sea of online resources, from courses on Coursera to tutorials on Udemy, diligently honing my skills and expanding my knowledge base.

This project was a pivotal milestone in my learning journey, intricately woven into the fabric of my pursuit of the Google Data Analytics Professional Certificate. Over the course of four months, I dedicated countless hours to mastering the intricacies of data analysis, drawing inspiration from every challenge encountered.

Initially, navigating through the complexities of data-driven insights seemed like an insurmountable task. However, with perseverance and unwavering determination, I gradually unearthed profound insights from the Titanic Dataset, unraveling the mysteries hidden within.

From discerning the distribution among passengers to uncovering correlations and factors influencing survival probabilities, every visualization and analysis served as a stepping stone in my evolution as a data scientist.

My journey extended beyond static datasets as I delved into the realms of web scraping, visualization techniques, and eventually, model training and optimization. Each step forward brought with it a deeper understanding of the intricacies of data science and machine learning.

As I delved deeper into model training and building, the culmination of my efforts manifested in the form of consistently accurate predictions, a testament to the efficacy of my methodology and the depth of my understanding.

More than just a project, this endeavor epitomizes my relentless pursuit of knowledge and my unwavering commitment to excellence. It is a testament to the transformative power of perseverance and the boundless possibilities that await those who dare to dream.

My Projects

AI- Work Pilot System

Key Features

Automated Job Extraction

Data Cleaning and Processing

NLP-Based Job Analysis

AI-Driven Notification System

Multi-Agent AI System

Deployment and Version Control

Technical Stack

Impact & Achievements

Industry Partnered Capstone Project

Streamlit Application for Resume Analysis using NLP and GPT-3.5 turbo (Recruiter Friendly)

Functionality:

Power Analyze Results:

Bird Species Classification using Deep Learning

Functionality:

Conclusion:

Stock Analysis using Yahoo Finance API

How it contributes to becoming a better data engineer, analyst, and scientist:

Data Analysis and Web Scraping for future Forecasting Purposes

1.1 Introduction

1.2 Statement of the Problem

1.3 Objectives

1.4 Scope

1.5 Applications

1.6 Limitations

Conclusion

Future Work

Exploratory Data Analysis on Titanic Dataset