Strict ethical and professional standards should be applied to the development of algorithms with social impacts to recover public trust in the technology, according to a report by BCS, the Chartered Institute for IT.
Titled The exam question: how do we make algorithms do the right thing?, the report analyses the failings that led to Ofqual’s algorithmic exams fiasco to identify how principles of openness, accountability and objectivity can be embedded in the development of algorithms that make high-stakes decisions about people lives.
Ofqual’s algorithm, which downgraded nearly 40% of student’s A-Level results, has since been abandoned by the Department of Education (DfE) in favour of teacher-predicted grades following massive backlash from students.
The report makes a number of recommendations on how pubic trust in algorithms can be restored, including that the government endorse the ongoing work of BCS, the Royal Statistical Society (RSS) and others to professionalise data science; that government takes a leadership role to ensure good ethical and professional practice in algorithm design becomes ubiquitous; and that algorithms are put through independent impact assessments before going live.
“Lack of public confidence in data analysis will be very damaging in the long term. Information systems that rely on algorithms can be a force for good, but – as students found out to huge cost – we have been using them to make high-stakes judgements about individuals based on data that is subjective, uncertain and partial,” said Bill Mitchell OBE, director of policy as BCS.
“We need true oversight of the way algorithms are used, including identifying unintended consequences, and the capability at a system level to remedy harm that might be caused to an individual when something goes wrong.
“That means, first of all, professionalising data science so that the UK has the most trusted, ethical and sought-after data science teams in the world.”
The report added there is a “lack of professional good practice and professional standards employed around the development and implementation of information systems”, and that any automated system that relies on making statistically based judgements in real time should be considered a “high-risk algorithmic system”.
“Mitigating the risks caused by such systems in policy formulation or implementation requires understanding all the organisational business practices and how interdisciplinary teams work together across policy boundaries,” it said.
Designing algorithms ethically: openness, accountability and objectivity
The report outlined the key stages of algorithmic development and design that organisations should be thinking about, including policy objectives, ownership, data models and data gathering.
“Only after choices about [the above] are made can a set of algorithms be developed that collectively automate those judgements the data is fit to be used for,” the report said, adding that rigorous testing of the system throughout its development cycle, both of distinct elements and the overall package, is needed to understand whether the system is good enough for use.
“All of the above stages require sound ethical judgement to make the best choices possible. They involve many different stakeholders who need the right governance mechanisms to work in close collaboration both quickly and effectively,” it said.
“Fortunately, work on professionalising these practices has already begun… to collaboratively shape and develop the data science profession. We recommend government support and join this collaboration, ensuring that it enthusiastically adopts the professional standards and practice that are developed by this partnership.”
While the report said DfE and Ofqual consulted a wide range of stakeholders on some issues, such as whether calculated grade estimations should be used to replace exams and what type of data should be used as the basis for these calculations, other issues were completely neglected, including how best to maintain standards and how to combine a teacher’s predicted grade with historical data to award a result.
It concluded that “openness means being open about what data will be used, the provenance of that data, how it will be used and what criteria will be used to determine whether the resultant information system is fit for purpose”.
Identifying weaknesses in the attempts to ensure objectivity, the BCS report also said there is a need for clarity around what information systems are intended to achieve at the individual level, and that this should be established “right at the start” of the process.
For example, distributing grades based on the characteristics of different cohorts of students so they are statistically in line with previous years – which is what the Ofqual algorithm did – is different to ensuring each individual student is treated as fairly as possible, something which should have been discussed and understood by all stakeholders from the beginning, it said.
In terms of accountability, BCS said: “It is essential to develop effective mechanisms for the joint governance of the design and development of information systems right at the start.”
Although it refrained from apportioning blame, it added: “The current exam-grading situation should not be attributed to any single government department or office.”
CEO of the RSS, Stian Westlake, however, told Sky News the results fiasco was “a predictable surprise” because of DfE’s demand that Ofqual reduce grade inflation.
“The fact that this could have produced a lot of inaccuracy, which translates into unfair grades for individuals, that was known,” he said.
Despite the RSS offering to help with the algorithm in April, Ofqual said at the time it would only agree to consider taking on its assistance if the academics involved signed non-disclosure agreements (NDAs), which would have prevented them from commenting on the chosen model for five years after the results day.