Data feminism, by Catherine D’Ignazio and Lauren F. Klein: A review by Marta Arniani
Data feminism. D’Ignazio, Catherine and Klein, Lauren F. (2020) Cambridge: MIT Press.
The book argues the case for data feminism as a novel approach to data science. Supported by a vast collection of case studies and projects, the two authors Catherine D’Ignazio, Emerson College and Lauren F. Klein, Georgia Institute of Technology, guide the reader across seven methodological principles for data feminism. The book addresses a diverse audience of public servants, academics, industry professionals and civil society organisations. Each of these groups can find a resource for researching and building inclusive digitally mediated services and products in Data Feminism.
Considering the role data have in any computer science-related innovation, and in the digital transformation of society tout court, the principles can be seamlessly extended beyond the disciplinary realms of data science. In 2019, the former UN Special Rapporteur on Extreme Poverty and Human Rights, Philip Alston, denounced how, in the current digitalisation of public infrastructures, states are at risk of “stumbling zombie-like into digital welfare dystopia”. The recent history of technology-driven social injustices always concern some “hidden features” of digital systems. Nonetheless, in the current techno-solutionist paradigm, the question of how to navigate everyday situations and objects when their digital layer is opaque is purportedly unaddressed. Data feminism provides us with a valuable compass. Indeed, the critical concept that D’Ignazio and Klein borrow from feminism is intersectionality, the idea that one person’s identity – and the systemic oppression they can be subject to, is a convolution of factors such as ethnicity, age, sexual orientation, education. Intersectionality questions the binaries that regulate the data world, from the way big data are organised, to mundane “female” or “male” cases in online forms. In data feminism, intersectionality is a lens to read through the pretended objectivity of data and uncover the system of power they embed.
As specified in the introduction, data feminism is not only about gender. Nor should it be discarded as a militant women’s playground from the ranks of computer science professionals and public decision-makers, who for historical reasons by and large are white men. Data feminism exhibits how standard practices in data science reinforce existing inequalities and make use of data science to challenge and change the distribution of power. The book debunks the presumed objectivity of the discipline, calling the reader to contextualise data, the collection process, and the people behind. A data project can be feminist “in content, in that it challenges power by choice of subject matter; in form, in that it challenges power by shifting the aesthetic and/or sensory registers of data communication; and/or process, in that it challenges power by building participatory, inclusive processes of knowledge production.”
Data Feminism is also a matter of ethos and integrity, and the authors put a particular effort into coherence. By applying the principle of reflexivity to their own work, they are transparent about their privileged socio-economic conditions, as well as about their methodology, especially for what concerns the laborious process put in place to represent a plurality of voices and sources in the book. One may wonder whether authors coming from a less privileged background would have developed the argument similarly, or simply had the chance to get a book about this topic published. For instance, another delicate territory to which the book enters, is exposing how the power balance favours white men historically. This is presented without slipping into a rhetoric that would disengage them as readers, with a distance allowed by the authors’ positionality.
Although a degree of confidence with basic data science concepts is given for granted, the writing is fluid and welcoming to non-specialised readers. However, its anecdotal development waters down the argument at times, and may disappoint a reader looking for some kind of scientific conventions in the establishment of the principles. Still, this aligns with the feminism the authors want to pursue, ditching the “performance of objectivity” in their writing and valuing instead a vast catalogue of real-life experiences, each proving how knowledge is always situated.
The book is divided into seven chapters, each covering one principle of data feminism with a careful selection of data case studies and elements from feminist theory. The fil rouge between them is not to take present practices for granted. This is reflected in explicitly analysing and challenging power; to probe into who (in terms of human labour and decision-making), is behind a particular data project, or dataset; and to rethinking categorisations so that it can embrace a plurality of experiences. These themes often overlap redundantly by articulating in different manners the urge for contextualising data and diversify their classification. Indeed, in wanting to provide a comprehensive overview on the topic, the authors are obliged to only hint at many concrete aspects of technology design and implementation. The principles all together trace a feminist epistemology in data science.
Although the reasoning and examples predominantly focus on gender inequalities, the authors also account for forms of data injustices affecting non-white and non-heterosexual groups. They also emphasise the difference (in chapter one) between minorities – in numbers, and minority groups, which are actively devalued and oppressed in areas like policing.
Chapter one lays the foundation for the principles architecture of the book by defining “power”, which is functional to principle one, “examine power”. Calling into force Patricia Hill Collins’ Matrix of domination, power is defined as the “configuration of structural privilege and structural oppression.” Hence, examining power means unveiling how the data world reflects inequalities that are embedded in society. The argument is built around power imbalances in a poorly gender and race diversified data workforce, whose viewpoints and biases impact datasets composition. Disadvantaged groups are either hyper-surveilled (e.g. in welfare, where they constitute the largest part of beneficiaries, hence also in the datasets) or disregarded (e.g. the lack of gendered datasets in healthcare and the car industry). Data serves the establishment because it is expensive and resource-intensive, meaning that only powerful institutions can harvest the benefits such as e.g. elite research universities for science, governments for surveillance, corporations for business purposes etc. Principle number two of the book is to “challenge power”. The chapter exposes how digitalisation processes are crippled by an excess of fate in machine learning which rarely considers how historical data is the product of structural inequalities, leading machine learning to “predict the past” and reinforce them. The four starting points of challenging power are: collect (compiling counter data); analyse (demonstrate inequitable outcomes); imagine (aiming for co-liberation of both the oppressors and the oppressed instead of simple fairness); teach (formation of new data feminists). Chapter III presents the principle of respecting multiple forms of knowledge, “elevate emotion and embodiment”. In data science, this can get as far as “data visceralisation”, data representations that the whole body can experience. Although the argumentation is less precise than in the previous chapters, chapter III introduces the fundamental concept of the feminist standpoint theory, which acknowledges knowledge as something situated. Opposed to the imagined absolute objectivity of data science, data feminists are aware that proximity to “objectivity” can only be achieved through the inclusion of multiple and partial standpoints. Chapter IV revolves around the fact that data is information made tractable. Consequently, the fourth principle is to “rethink binaries and hierarchies”. The binary systems of the technology resonate with those of gender in Western societies, while other cultures have names and classifications for individuals outside the gender binary category. Data feminists should exploit calculating and measuring to hold power accountable, reclaim overlooked histories, and build collective institutions. The fifth principle of the book refers to “embracing pluralism”. Knowledge is derived from synthesising a multitude of perspectives. Echoing the third principle above (focussing on situated knowledge), chapter IV targets the established practice of cleaning data, which tends to sacrifice information and views which do not fall into the predominant narrative. Standard techniques create “strangers in the dataset”, situations where the dataset is illegible to new people, or in new contexts. The suggested good practice for data scientists and anyone in data cleaning functions would be reflexivity: being open on standpoint and methods. Principle six embraces the significance of “considering context”. The chapter advocates for contextualisation in data acquisition and data analysis, as well as in their communication. For instance, racism, sexism, and other forces of oppression should always be named when they are present in numbers. Datasets are always part of what Christine Borgman defines as a “knowledge infrastructure”, an ecology of people, practices, technology, institutions, objects and relationships. Similarly, datasets standing alone are incomplete – they need to be complemented with contextual information, such as datasheets for datasets, or data user guides. Principle seven is “make labour visible”. This final focus is to make the often unpaid and invisible human work behind digitally mediated tasks visible.
Across the book, the authors contribute to advance the debate on technology social impact, by siding with social justice instead of ethics. They argue that ethics directs data scientists towards technological fixes because it pins down problems to individuals or technical systems. Framing data science as an abstract and technical discipline does not account for the differences in human impact between the use of data in astrophysics and data in the justice system. Also, by emphasising the imperative of individuals’ knowledge, the data science discipline lacks historically an engagement with the communities behind the datasets. Hence, ethics as an isolated factor does not take into account the matrix of domination where any notion of algorithmic fairness must account for the systemic nature of unfairness. The authors reject the trending “data for good” by exposing how “for good” is an imprecise concept which embeds a position of power. Instead, they advocate for data for co-liberation: holders of powers (whether voluntarily, or not), need to be liberated as well, and realise how to be inclusive. In addressing open data, the authors note how most resources have been put into making data available, clearly ignoring provision of context or documentation. The web is overflowing with zombie datasets, published without any clear purpose in mind. There is a missed opportunity in their argument about open data to explicitly criticise technology solutionism and present data feminism as an alternative. This connection would have contributed to further legitimise the book to readers concerned with technology social impact.
Data feminism is an extremely contemporary title addressing problems that are unfolding in society on a daily basis, and where nothing is natural in how the technology operates. The data technology is a socio-economic construct where the alleged objectivity entails its most dangerous attribute. By conceding technology a salvific problem-solving capability, decision-makers miss a chance to make it more inclusive. This is where a book like Data Feminism makes a valuable contribution. First, it unveils the historical reasons and the powerful interests that are reflected in technology in practice. Secondly, it provides us with an actionable approach that if applied by industrial professionals, public servants and academics will have an impact on the inclusiveness of services and products relying on technology, by steering them towards the common good. Finally, it provides citizens at large with a lens to better grasp the data behind the news and data visualisations. Nonetheless, despite the urgency of the topics addressed by Data Feminism and its relative easiness to read, its lack of a target audience makes the book more likely to make its way into academic circles with a degree of awareness or at least curiosity towards feminism.
The book also makes a valuable contribution to feminist theory in general. By exploring the relevance of concepts such as the matrix of domination, or situated knowledge, in the domain of data science, the book reinvigorates ideas that were conceived before the digital era. This not only proves their timeless value and the deep roots of the inequalities of what the book describes, but also gives feminism a contemporary legitimacy within our technologically driven contemporary society. Beyond the feminist approach of the book, this is about knowledge and power in the digital era, presenting Data Feminism as a lens to observe epistemological blind spots.
Data Feminism is part of the growing number of voices advocating for more inclusive development of technology, one that embeds and promotes social justice. The book provides a route of liberation for computer science. Its practitioners have been contented for too long with applying algorithmic binaries to any social challenge. Data feminism represents a resistance to the classification and banalisation of our daily lives. Instead of letting individuals and the society adapting to artificial intelligence and computing capabilities, it is computing that should take into account the variety and unpredictability of human experience.
Marta Arniani
Futuribile
@MartaArniani
E-mail: [email protected]