Hacking Our Hiring, Pt. 2: How to Screen Better
A compendium of wisdom and practical advice, to make hiring better for everyone
This article is adapted from a presentation given by Tiff Fehr and Ryann Grochowski Jones at SRCCON 2018. The version you’re reading features greater detail about hiring efforts within The New York Times’ Interactive News Team, specifically, but offers information on ProPublica’s processes for comparison. Here’s the full series so far.
We must note upfront: this is not the process necessarily followed by our parent news orgs. If you apply to either company, you might not experience this as an applicant. Our own smaller teams have adopted these processes, with the hope that our parent news orgs will come to see some of their virtues. We encourage you to hack your hiring processes, just as we are actively hacking ours! More about that in a bit.
At SRCCON and other news nerd gatherings, colleagues have participated in discussions about our hiring. Our teams worked on this independently, so it was reassuring when we realized we share significant overlaps:
Amid the overlap, we had some differences, too. The types of skills we’re looking for are different, between our teams. The Data team at ProPublica thinks a reporting test is a great idea, but Interactive News doesn’t use tests, for example. Here we’ll focus on the Interactive News team’s process at the NYT, so we can go into greater detail on some specific tactics we recommend.
Assess All Applicants Evenly, Make a Short List
The first step toward even assessments is helping ourselves. We recommend using an evaluation mechanism that ensures our hiring panel learns the same things about each and every applicant. We tailor this assessment to the specifics of the role—making sure we ask about mobile dev skills, or effective dataset collection and analysis—but every applicant for that role is evaluated using the same criteria.
To minimize bias, we recommend working in pairs when assessing all the initial applicants. In fact, use blind pairs, if possible: have two screeners assess every applicant without seeing each other’s comments. This helps mitigate prior-judgement bias.
Providing two initial screeners is a time and staffing investment, to be sure. You could use just one screener, but that puts undue pressure on their judgement, despite their own best efforts. One screener means a greater chance of résumé-reading fatigue, latent biases, or other problems. Two provides accountability and a more even playing field. (Three would be even better, though that’s often not possible.)
It’s also helpful to communicate to applicants that they will receive thorough reviews, from different team members, before any decisions are made. This is important for all sorts of reasons, including this one: if the candidate eventually receives a rejection letter from us, we want them to be assured we really cared about their application, and that we thoughtfully assessed them at every stage.
How We Make a Short List
Let’s say we received 75 applications. Two team members score them using our rubric (more on rubrics soon). Based on Screener A’s scores, 20 applicants would move forward. Based on Screener B’s scores, 25 applicants would move forward. But neither screener know where their lists overlap or scores align.
After both screeners are done reviewing all applicants, their scores are revealed to the broader team, side by side. Fifteen applicants got strong passing grades from both screeners, so those people become our short list. Subsequent discussion about applicants with score disparities may bump that count up to 18 applicants. Those 18 may then get an additional screen by our broader hiring panel (additional colleagues and possibly the hiring manager).
How many applicants should make it past initial screening? That depends on our hiring team’s attention span. We are about to ask a handful of teammates to review 18 applicants each—a definite time commitment—but it is fewer than the original 75 applicants. We could stagger it so they read only 3 at a time to avoid résumé fatigue.
We recommend that our initial screening should reduce an applicant pool by 50% or more. If our initial screening steps aren’t providing a strong enough filter, we make the criteria harder.
Our 18 short-listed candidates then go before our broader hiring panel as anonymized résumés.
Why do we recommend anonymizing applicant materials? We want to reduce bias-triggering information—the places where our brains would prefer to take a shortcut. Shortcuts perpetuate disadvantages and overlook talented applicants who just don’t quite fit our preconceptions about who should fill the role.
Let’s aspire to do better than that. Here are some common biases that appear when reading résumés:
Mental shortcuts we should not use to evaluate our applicants include:
School reputation or academic performance
Prominence or reputation of previous companies
Specific technologies (particularly if we said we don’t require specific experience)
Nebulous “culture fit” (this is often a grab-bag of biases and familiarity errors)
Aptitude for being friends with the applicant
Confidence level (allow for low confidence indicators or nervousness)
Anonymizing materials takes a lot of time, even if we have strongly filtered our applicants via the initial assessments. It means moving everything an applicant offers into a new, standardized template, with attentive editing. It’s a lot of copy/paste.
Who should do the anonymizing?
We recommend our two initial screeners, since they have already seen full details about each applicant. This protects the other hiring panel members. Perhaps you have hiring software or HR staff willing to help out with this step—if so, we recommend you take advantage of it.
The process of anonymizing is very educational. Even if we immediately agree we must avoid the shortcuts and biases listed above, the process forces us to watch our own brains try to help us out with exactly these shortcuts—particularly as résumé fatigue sets in.
If anonymizing seems too time consuming, consider trying it as a team workshop for its educational value. You can use it as a type of mini anti-bias training, particularly if you anonymize the résumés of current team members as a group activity. This exercise gives everyone a place to discuss the biases and shortcuts you want to avoid, while also testing out an anonymization process.
Redacting a résumé sounds relatively easy: blackout the name, identifying details, leading information, etc. But the blackout approach often obscures helpful data, or may even distract the person doing the assessment. If we end up redacting many details, we could obscure a résumé to the point of blandness which would be a disservice to the candidate. No journalist likes being on the receiving end of a heavily redacted document, so let’s not inflict it on ourselves.
Replacements are a better way to redact, by offering sensible but stripped-down equivalents. Here are some examples: (We recommend square braces to mark the edit.)
Replace name with a candidate number “Tiff Fehr” becomes “[Candidate #32]”
Replace a company name with a description of company type/tier “Led interactive projects at The New York Times for 8 years” becomes “Led data projects at [a national newspaper] for [multiple] years”. Is redacting the years too much? Possibly, unless teammates already know the applicant and could associate the specificity of tenure.
Replace URLs with a description of content found there “Check out my portfolio at myportfoliois.cool” becomes “[Applicant provided a link to personal portfolio website featuring both journalistic and freelance projects, including project write-ups]”
Replace identifiable product names with equivalent descriptions “Developed features using APIs for Bloomberg Terminal mobile app” becomes “Developed features using APIs for [subscriber-only, data-centric] mobile app.” This replacement is not perfect, but it’s better than a black bar.
Replace identifiable reporting projects with equivalents, where possible “Published an impactful series about military spending on toxic, munition-related site cleanup in ‘Bombs in Your Backyard’” becomes “Published an impactful series about [government organization] spending on [gov’t site] cleanup in ‘[award-finalist article series]”. Maybe we keep in the military references, depending on our hiring panel peers.
Other suggestions to consider redacting:
Portfolio content and descriptions
Skills/tech that is too specific
Geography that is too specific
Etc, often as a case-by-case judgement
In our SRCCON slide deck you can find more examples of anonymization thinking.
There are many small judgements to make in this process and very few guidelines. We simply try our best and trust our colleagues will want to help fight biases, even if they can guess the information under the replacement.
Consider Crediting Plans Instead of Anonymizing
Instead of anonymizing materials, you could opt to use a crediting plan. Crediting plans are in-depth review scripts that guide assessors through an applicant’s materials, similar to an elaborate checklist. Reviewers then “credit” the applicant for demonstrated abilities and characteristics, with very specific definitions for why they get credit.
However, crediting plans require more work and buy-in up front, particularly related to their ability to counteract biases. But a good crediting plan can replace anonymization—The NYT’s Interactive News team has been heading in this direction, for some of our open positions.
The U.S. federal agency 18F uses crediting plans and has a number of helpful pages on their hiring portal about biases and how to use crediting plans in a hiring process. The Times has started using crediting plans for some Technology groups, as well.
Crafting Our Assessment Rubric
Anonymizing materials won’t address all the possible problems with our screening, even if we’re trying hard to level the playing field. This is because our applications and our initial screening simply can’t get that far into our broader hiring goals, which are in fact quite complex. A good evaluation form helps determine not just an applicant’s background but also what an applicant knows, what an applicant can do and how an applicant works.
We are reading résumé bullet points but hoping to learn about an applicant’s capacity for:
Self-awareness and empathy
This list of goals could be extensive, depending on what the team values in finding new colleagues. The tension is clear, between simple résumés and “we really want to understand the full picture of an applicant.”
An assessment rubric helps diffuse some of this tension, by looking at all our applicants with a consistent framework for judging their abilities. Each point on our rubric should support a few high-level goals, like one specific question that focuses on an applicant’s problem-solving ability and self-teaching capacity.
When our assessment questions are focused, we can anticipate the types of answers that we expect to see. We can describe them in loose terms, on a spectrum of unhelpful to helpful.
The Interactive News desk at The New York Times frequently uses a ~7-part initial screening rubric, with stoplight-coded grades.
(We must note: here we are significantly indebted to Medium’s engineering hiring guidelines. Their series of blog posts about their engineering hiring and assessments filled in blanks we had, as our teams thought about what we could do better. The tables below are derivatives of Medium’s grades for evaluating engineers.)
In our opinion, three-point scales are a good fit for human attention spans, particularly when we are providing graded responses.
Rubrics that use points also help us convert assessment responses into quantitative data. Assessments that focus on themes and completeness can help us understand the emerging “bar” for who makes the short list, but tallying scores into a total helps convey candidate’s strength above that bar.
In addition to the rubric, we ask for freeform notes the assessor would like to share. We also ask about the assessor’s preferred next-step for the applicant—a simple “pass,” “more discussion warranted,” or “definitely advance” is another helpful signal for the hiring panel’s decision making. Some assessors may be generous graders but end up saying “pass”; we need to be able to contextualize that seeming contradiction.
(For those interested in crediting plans instead, we find our crediting plans and assessment rubrics are pretty close to the same thing, when done right. But we still need to hammer on the anti-bias reminders. Crediting leaves our potential shortcuts in plain view while trying to guide us away from them—it only works so long as we are self-aware and making an effort to reduce bias.)
Evaluating the Short List
Back to our short list, which we finished earlier. For our hypothetical short list of 18 candidates, we have two assessments already completed by those initial screeners. Now our hiring panel now looks at our (newly anonymized!) materials and they use the same rubric for their assessments. They submit their scores blind as well.
Once our hiring panel has completed their assessments, we tally all the scores for each applicant. That means two non-anonymous assessments and multiple anonymous assessments, hopefully correcting biases in the process.
Perhaps we tally things into a table like this: (Remember, this converts grades to points for each rubric item.)
We could sort by Average/Median Score (descending), then kick off hiring panel debate about all 18 applicants. Applicant details are still anonymous at this point. We might find ourselves discussing the “mid-sized regional newspaper” experience of Candidate X and other odd permutations. But hopefully that will be about the merits rather than the anonymization.
How to Guide the Debate
In order to push for active decision-making, we recommend starting with lower-scoring applicants and ask for particular “champions” from within the hiring panel. If all agree the score is appropriate, move onward to higher scoring applicants.
Discussion that arises should be yet another check on assessments and scores, too. But we need to make sure the discussion pushes beyond scores and into qualitative information about the specific applicant. We want to promote the qualitative data we collected, e.g. “Let’s go through the panel’s answers for ‘what stood out about this applicant for our in-between cohort.”
By the end of debate, we want to narrow down our applicant pool to our top ten (or so) who will go on to get phone screens. Ultimately, we need to be respectful of time and effort, for both interviewers and interviewees. We want our assessment rubrics to truly make an impact, in terms of decision making. If too many applicants are getting phone screens, it swamps or delays our process. That is not good for anyone involved, applicants or teammates.
For the 65 applicants who did not make it to the phone screen cohort, now is the point where it would be most polite to let them know. It is tempting to keep everyone in play— just in case!—but our hiring panel just invested heavily in decision making, and that effort should not be second-guessed for hypothetical edge cases.
Next in the Series
Next we will go into detail about phone screening and interviewing our short list, as well as collecting assessments from our colleagues that will help drive more intensive decision-making. And we challenge you to rethink cover letters, too.
Tiff Fehr is an assistant editor and lead developer on the Interactive News desk at The New York Times. Previously she worked at msnbc.com (now NBCNews.com) and various Seattle-area mediocre startups.