Business Problem
- HireCurve’s value proposition is that their job skills match 80% accuracy for the candidate profile. While they were able to deliver this to a certain extent, most of this work was manual & supported by a simple keyword-based classification engine
- While this worked when they received 50 applications a day, they started facing challenges when this number went up to 500 applications and more per day
- goML worked with the HireCurve team to build a multi-entity classification engine to map the right skillsets from the candidate profile (structure) & resume (unstructured) data to the suitable job openings
Solution
- We worked with the team at HireCurve to build the data from 2 sources
- S3 Dump of Resume Files
- Application API(REST) to read structured data that was captured during profile creation
- We then performed text cleaning and keyword selection according to hiring criteria, leveraging Python on Amazon SageMaker
- Post the keyword selection, a cluster analysis to identify a keyword category that relates to a particular job description was, made followed by feature engineering using the frequency of keywords and n-gram
- We then encoded n-grams and used PCA to simplify the data-set
- Eventually, using word2vec and neural networks, the resumes were classified into different job groups to identify a relevant job opening leveraging Python, RNN, Pandas, Scikit learn on Amazon SageMaker
- Finally, the RNN model was deployed for inferencing Amazon SageMaker
- The output was published as an API endpoint to be consumed for the job(s) recommendation