Case studies
Hire Like a Data Scientist | How to screen 1000 resume in 50 sec with SQL?

Screen 1000+ resume using Roe AI

Right before YC demo day, Jason Wang and I posted two Founding Engineer jobs on LinkedIn (linked to Gem) and Y Combinator

In less than 2 weeks, 1000+ applications flooded in, with a daily growing rate of 100 applications.

As newbie recruiters ourselves, it takes 1 min to review one resume + 30 secs to navigate the tabs. That is 1500 minutes or 25 hrs to pass all 1000 resumes, which means in the next 5 weeks, one of us must dedicate 1 full day every week, or 1-2 focused hours per day, just to pass the resumes.

Screening Criteria

The signals we want to tell from resume are straightforward but need quite an eyeballing and arithmetics in mind

  • worked at a startup, ideally also a big tech company, so that one has expectations on the careers on both sides
  • accumulated 2+ years of industry experience
  • have domain expertise around general BE and infra, and we’re not picky about a particular language or web framework because this knowledge is honestly transferrable

Startup is stingy with the time, so what is a scalable way to screen candidates at scale?

We picked Roe AI - our own unstructured data warehouse!

Early Results

Let's look at the results first. With Roe AI, we got down to 72 applications from 1000. It took me

  • 15-min to write the Python resume crawler script
  • 3 min to write the Roe SQL query + natural language prompt

plus

  • 50 seconds for Roe to parse the data out from all the 1000 resumes based on my customized qualification

It is BLAZINGLY FAST!

Solution in detail

Step 1 - wrote a simple python script to download all the candidates resumes from Gem and YC application site.

Part of python script

Step 2 - Upload 1000+ resumes to my Roe data warehouse. Now I can query these resumes from Resume table just like querying a regular table. All the conventional SQL clauses just works.

Step 3 - To extract signals from resumes, I wrote a SQL query, leveraging Roe AI’s multi-modal extract_from(<prompt>, <column>) function. I put in a not-too-thoughtful prompt like below

I also put the file column from Resume table as the second argument. I also put the results into a new table to store the results.

(ps: JSON schema can also be defined in UI)

That's it, one magic SQL query!

Step 4 - the AI-powered SQL query runs on 1000 resumes in Resume table. After 50 seconds, I get a strictly followed JSON output containing the candidate school, worked at top tech boolean, and industry YoE, which I can further use for filtering!

Besides, I can check the resume PDF in detail

Closing Thoughts

Any time new information needs to be parsed from resume, like "what research papers a candidate has published", just change the prompt in SQL, and wait for seconds to get the results.

Of course, this is one of the first steps screening candidates but we're super excited about how powerful it is to query unstructured data like this with simple, AI-powered SQL on Roe!

Looking forward, can we take a fully data-driven approach for all kinds of internal unstructured enterprise data, including Slack messages, Google Docs, Notion and even Zoom?

If you’re interested in leveraging Roe to do the same, DM me directly!

And if we also pick your interest to join us, please apply here!