Independent algorithmic auditing firm Parity AI has partnered with talent acquisition and management platform Beamery to conduct ongoing scrutiny of bias in its artificial intelligence (AI) hiring tools.
Beamery, which uses AI to help businesses identify, recruit, develop, retain and redeploy talent, approached Parity to conduct a third-party audit of its systems, which was completed in early November 2022.
To accompany the audit, Beamery has also published an accompanying “explainability statement” outlining its commitment to responsible AI.
Liz O’Sullivan, CEO of Parity, says there is a “significant challenge” for businesses and human resources (HR) teams in reassuring all stakeholders involved that their AI tools are privacy-conscious and do not discriminate against disadvantaged or marginalised communities.
“To do this, businesses must be able to demonstrate that their systems comply with all relevant regulations, including local, federal and international human rights, civil rights and data protection laws,” she says. “We are delighted to work with the Beamery team as an example of a company that genuinely cares about minimising unintentional algorithmic bias, in order to serve their communities well. We look forward to further supporting the company as new regulations arise.”
Sultan Saidov, president and co-founder of Beamery, adds: “For AI to live up to its potential in providing social benefit, there has to be governance of how it is created and used. There is currently a lack of clarity on what this needs to look like, which is why we believe we have a duty to help set the standard in the HR industry by creating the benchmark for AI that is explainable, transparent, ethical and compliant with upcoming regulatory standards.”
Saidov says the transparency and auditability of AI models and their impacts is key.
To build in a higher degree of transparency, Beamery has, for example, implemented “explanation layers” in its platform, so it can articulate the mix and weight of skills, seniority, proficiency and industry relevance given to an algorithmic recommendation, ensuring that end-users can explain effectively what data impacted a recommendation, and which did not.
The purpose of AI auditing
Speaking with Computer Weekly about auditing Beamery’s AI, O’Sullivan says Parity looked at the entirety of the system, because the complex social and technical nature of AI systems means the problem cannot be reduced to simple mathematics.
“The first thing that we look at is: is this even possible to do with AI?” she says. “Is machine learning the right approach here? Is it transparent enough for the application, and does the company have enough expertise in place? Do they have the right data collection practices? Because there are some sensitive elements that we need to look at with regard to demographics and protected groups.”
O’Sullivan adds that this was important not simply for future regulatory compliance, but for reducing AI-induced harm generally.
Sultan Saidov, Beamery
“There have been a couple of times when we have encountered leads where clients have come to us and they’ve said all the right things, they’re doing the measurements, and they’re calculating the numbers that are specific to the model,” she says.
“But then, when you look at the entirety of the system, it’s just not something that’s possible to do with AI or it’s not appropriate for this context.”
O’Sullivan says that, although important, any AI audit based solely on quantitative analysis of technical models will fail to truly understand the impacts of the system.
“As much as we would love to say that anything can be reduced to a quantitative problem, ultimately it’s almost never that simple,” she says. “A lot of times we’re dealing with numbers that are so large that when these numbers get averaged out, that can actually cover up harm. We need to understand how the systems are touching and interacting with the world’s most vulnerable people in order to really get a better sense of whether harms are happening, and often those cases are the ones that are more commonly overlooked.
“That’s what the audits are for – it’s to uncover those difficult cases, those edge cases, to make sure that they are also being protected.”
Conducting effective AI audit
As a first step, O’Sullivan says Parity started the auditing process by conducting interviews with those involved in developing and deploying AI, as well as those affected by its operation, so it can gather qualitative information about how the system works in practice.
She says starting with qualitative interviews can help to “uncover areas of risk that we wouldn’t have seen before”, and give Parity a better understanding of which parts of the system need attention, who is ultimately benefiting from it, and what to measure.
For example, while having a human-in-the-loop is often used by companies as a way to signal responsible use of AI, it can also create a significant risk of the human operator’s biases being silently introduced into the system.
However, O’Sullivan says qualitative interviews can be helpful in terms of scrutinising this human-machine interaction. “Humans can interpret machine outputs in a variety of different ways, and in a lot of cases, that varies depending on their backgrounds – both demographically and societally – their job functions, and how they are incentivised. A lot of different things can play a role,” she says.
“Sometimes people just naturally trust machines. Sometimes they naturally distrust machines. And that’s only something you can measure through this process of interviewing – simply saying that you have a human-in-the-loop is not sufficient to mitigate or control harms. I think the bigger question is: how are those humans interacting with the data, and is that itself producing biases that can or should be eliminated?”
Once interviews have been conducted, Parity then examines the AI model itself, from initial data collection practices all the way through to its live implementation.
O’Sullivan adds: “How was it made? What kinds of features are in the model? Are there any standardisation practices? Are there known proxies? Are there any potential proxies? And then we actually do measure each feature in correspondence to protected groups to figure out if there are any unexpected correlations there.
“A lot of this analysis also comes down to the outputs of the model. So we’ll look at the training data, of course, to see if those datasets are balanced. We will look at the practice of evaluation, whether they are defining ground truth in a reasonable way. How are they testing the model? What does that test data look like? Is it also representative of the populations where they are trying to operate? We do this all the way down to production data and what the predictions actually say about these candidates.”
She adds that part of the problem, particularly with recruitment algorithms, is the sheer number of companies using large corpuses of data scraped from the internet to “extract insights” about job seekers, which invariably leads to other information being used as proxies for race, gender, disability or age.
“Those kinds of correlations are really difficult to tease apart when you’re using a black box model,” she says, adding that to combat this, organisations should be highly selective about which parts of a candidate’s resumé they are focusing on in recruitment algorithms, so that people are only assessed on their skills, rather than an aspect of their identity.
To achieve this with Beamery, Saidov says it uses AI to reduce bias by looking at information about skills, rather than details of a candidate’s background or education: “For example, recruiters can create jobs and focus their hiring on identifying the most important skills, rather than taking the more bias-prone traditional approach – such as years of experience, or where somebody went to school,” he says.
Even here, O’Sullivan says this still presents a challenge for auditors, who need to control for “different ways that those [skill-related] words can be expressed across different cultures”, but that it is still an easier approach “than just trying to figure out from this large blob of unstructured data how qualified the candidate is”.
However, O’Sullivan warns that because audits provide only a snapshot in timethey also need to be conducted at regular intervals, with progress carefully monitored against the last audit.
Beamery has therefore committed to further auditing by Parity in order to limit bias, as well as to ensure compliance with upcoming regulations.
This includes, for example, New York City’s Local Law 144an ordinance banning AI in employment decisions unless the technology has been subject to an independent bias audit within a year of use; and the European Union’s AI Act and accompanying AI Liability Directive.
The current AI auditing landscape
A major issue that algorithmic auditors keep highlighting with the tech industry is its general inability to document AI development and deployment processes properly.
Speaking during the inaugural Algorithmic Auditing Conference in November 2022Eticas director Gemma Galdon-Clavell said that in her experience, “people don’t document why things are done, so when you need to audit a system, you don’t know why decisions were taken…all you see is the model – you have no access to how that came about”.
This was corroborated by fellow panellist Jacob Metcalf, a tech ethics researcher at Data & Society, who said firms often will not know basic information, such as whether their AI training sets contain personal data or its demographic make-up. “If you spend time inside tech companies, you quickly learn that they often don’t know what they’re doing,” he said.
O’Sullivan shares similar sentiments: “For too long, technology companies have operated with this mentality of ‘move fast and break things’ at the expense of good documentation.”
She says that “having good documentation in place to at least leave an audit trail of who asked what questions at which time can really speed up the practice” of auditing, adding that it can also help organisations to iterate on their models and systems more quickly.
Liz O’Sullivan, Parity
On the various upcoming AI regulations, O’Sullivan says they are, if nothing else, an important first step in requiring organisations to examine their algorithms and treat the process seriously, rather than as just another box-ticking exercise.
“You can design an algorithm with the best possible intentions and it can turn out that it ends up harming people,” she says, pointing out that the only way to understand and prevent these harms is to conduct extensive, ongoing audits.
However, she says there is a catch-22 for businesses, in that if some problem is uncovered during an AI audit, they will incur additional liabilities. “We need to change that paradigm, and I am happy to say that it’s been evolving pretty consistently over the last four years and it’s much less of a worry today than it was, but it is still a concern,” she says.
O’Sullivan adds that she is particularly concerned about the tech sector’s lobbying effortsespecially from large, well-resourced companies that are “disincentivised from turning over those rocks” and properly examining their AI systems because of the business costs of problems being identified.
Regardless of the potential costs, O’Sullivan says auditors have an obligation to society to not pull their punches when examining a client’s systems.
“It doesn’t help a client if you try to go easy on them and tell them that there’s not a problem when there is a problem, because ultimately, those problems get compounded and they become bigger problems that will only cause greater risks to the organisation downstream,” she says.