ResumeMatch

This project matches job descriptions with a given resume based on TF-IDF similarity. It helps identify the job position that best suits the provided resume.

What is TF-IDF? and Why?

TF-IDF (Term Frequency-Inverse Document Frequency) is a way to measure the importance of words in a document or a collection of documents. It combines two factors: Term Frequency (TF): How often a word appears in a document. If a word appears multiple times, it is likely to be more important. Inverse Document Frequency (IDF): How rare a word is across all documents. If a word is rare, it is likely to be more significant.

TF-IDF helps identify important words that are specific to a document while filtering out common words. It has several benefits:

Highlighting Important Words: TF-IDF emphasizes words that are important and relevant to a document.
Extracting Features: It is useful for extracting meaningful features from text for tasks like classification, clustering, and retrieval.
Reducing Dimensionality: TF-IDF can reduce the complexity of text data while retaining crucial information.
Filtering Noise: TF-IDF helps filter out common words (stopwords) and focuses on distinctive and meaningful words.

One might think that longer documents have a greater chance to be more similar due to content length, however, TF-IDF does not prioritize longer documents. It looks at the significance of words within a document, considering both local and global measures. The IDF component ensures that important words are not overshadowed by document length. So, whether a document is short or long, TF-IDF allows us to effectively analyze and retrieve relevant information.

Requirements

Python 3.7x and above
NLTK library
scikit-learn library

Installation

Clone the repository:

git clone https://github.com/o-bm/ResumeMatch.git

Download the necessary NLTK resources:

import nltk

nltk.download('punkt')
nltk.download('stopwords')

Provide job description files in the "jobs" directory. Each job description should be a text file with a ".txt" extension. Put the resumes in the "resumes" directory. Each resume should also be a text file with a ".txt" extension.

Run the program:

python runner.py

Enter the full file name with the .txt extention

The program will calculate the similarity between the resume and each job description using TF-IDF. It will then display the most suited job position and a table of the top and bottom five matches.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
jobs		jobs
resumes		resumes
.gitignore		.gitignore
README.md		README.md
demo.png		demo.png
runner.py		runner.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ResumeMatch

What is TF-IDF? and Why?

Requirements

Installation

Demo

About

Releases

Packages

Languages

o-bm/ResumeMatch

Folders and files

Latest commit

History

Repository files navigation

ResumeMatch

What is TF-IDF? and Why?

Requirements

Installation

Demo

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages