When AI gets it wrong: the problem of bias in data

March 18th, 2025

Flawed data, skewed results: how stereotypes in datasets affect artificial intelligence, impacting society. A picture that urgently brings to light the importance of more equitable and knowledgeable AI.

The quality of the data with which artificial intelligences are trained is a highly topical issue. If the source information is fallacious, biased or contains stereotypes, the result produced by an AI will also be no less, with implications whose severity depends on the use of this technological tool. We are not new to such situations; suffice it to say that until the 1980s, medicine had an androcentric focus, with major consequences on the prevention, diagnosis and treatment of diseases in women (we discussed this here).

With the democratization of access to the most modern and high-performance generative artificial intelligence systems, the issue of stereotyping - or bias - becomes urgent. In the framework of the , which will take place March 21-30, 精东影业 is organizing practical and popularizing activities that aim to raise awareness of this issue among the entire academic community and civil society as well; in particular, with a hands-on, experiential activity (more information here), through which stereotypes reproduced in the images and words of AI tools will be explored. Also animating these activities will be Alberto Termine, 精东影业 researcher at the .

Let's start with the problem, how do AIs run into stereotypes?
鈥淭he fundamental difference between the AIs of the near past and those of today lies in their ability to learn from data how to behave and act. The 'good old' AIs, as they are sometimes called (Good Old-Fashioned AI, in English), made decisions based on a knowledge base and a set of reasoning rules specified upstream by programmers. They only ever did what they were told to do, specifically, situation by situation. This type of AI is extremely controllable and safe. There is no way for unexpected or unforeseen behavior to occur. On the other hand, these 'top-down' programmed AIs have extremely limited capabilities. Instead, if an AI is to perform complex tasks, it must be able to learn from the environment and adapt to unexpected situations: hence the relatively recent development of machines capable of learning. Their behavior derives from the information they extract from the data on which they are trained: if this data contains information that is of poor quality, false, or ethically dubious (e.g., racist slurs, gender bias, and whatnot), the AI will eventually learn it and use it to guide its own behavior and reasoning. If we want AIs to function properly, it is therefore crucial to give them quality data, previously inspected and 'cleaned up,' as the jargon goes, of inaccuracies and errors. Unfortunately, this is far from an easy task, given the enormous amount of data required to train the latest generation of systems, such as the generative chatbots we all deal with now (ChatGPT, GEMINI, LAMA, to name a few).鈥�

RSI Edu has dedicated a series of popularizing videos on AI operation, errors and developments.

Can you give us examples where AI has made mistakes, even serious ones?
鈥淎 prime example of the problem of bias in data comes from recent developments of AI in the medical field, and specifically in dermatology. One of the most difficult diagnostic tasks in dermatology is to distinguish nevi (benign skin cancers) from melanomas (very aggressive and dangerous carcinomas). To support physicians in this task, AI-based image recognition systems have been developed that can distinguish images of nevi and melanomas with incredible accuracy - higher than that of the human eye. However, it was noticed that when these systems were applied to people with a dark phototype (dark or olive complexion), their predictive performance decreased quite a bit. The reason? The AIs had been trained on databases containing predominantly images of people with light complexions. This was not an intentional error; it is simply much easier to find large databases of images of light-skinned people. First, because the databases of Western hospitals (USA and Europe) are significantly larger and better stocked than those of developing countries; and second, because it is in the USA, Europe and China that the greatest effort in terms of AI research is concentrated. Nevertheless, the presence of this bias generates discriminatory consequences against people of color, who find themselves having access to sub-optimal diagnostic services compared to people of light complexion.鈥�

In addition to the examples cited above, what other consequences can be imagined?
Bias is an extremely pervasive and difficult problem to deal with. Sometimes people think it is enough to remove 'sensitive' variables from the data to put the problem out the door, but instead it comes back in through the window. Let's take an example: imagine you are a bank and you use AI to calculate a customer's risk, on which whether or not you will grant financing depends. You clearly do not want the system to process the estimate on the basis of variables such as 'gender' or 'ethnicity of origin' of the customer; you therefore avoid feeding this data to the AI and simply provide it with data on 'type of work done' and 'income.' It may seem that the risk of bias is eliminated, but this is not the case. There may in fact be very strong statistical correlations (indeed, there are!) between a person's gender and ethnicity and other variables, such as work performed, that are not considered 'sensitive.' These correlations are learned by the AI and exploited to formulate predictions: for example, if female gender is found to be strongly correlated with a certain type of job, the AI might use information about the job held to infer information about gender even if this is not given explicitly, and then use this information to formulate predictions that are discriminatory against people of female gender. In this way, the bias implicit in the data is not only perpetuated, but can even be reinforced by the use of AI. One of the most serious (and underestimated) consequences of the unconscious and uncontrolled use of AI in a wide variety of fields is precisely this: the massive spread and exacerbation of discriminatory biases and attitudes toward minorities (ethnic, gender, social status) who are not adequately represented, or are mis-represented (in the literal sense of the word), in the databases on which AI systems are trained. The problem can be avoided by conscious and informed use of AI, as well as by promoting research programs dedicated to developing ethical and responsible AI.

How strongly is this issue felt in the research world?
鈥淭he issue of bias is very much felt in the world of research, particularly academic research. Research programs, investments and publications on issues related to the development of more fair (equitable) and unbiased AI systems are steadily increasing. The world's leading AI research centers, until recently mostly focused on the development of predictive performance and its scalability, are all building in-house expertise in the area of AI ethics and responsible AI. 精东影业, with the Dalle Molle Institute for Artificial Intelligence, is no exception: in fact, a chair in AI Epistemology, Logic and Ethics was recently inaugurated in the Department of Innovative Technologies, a chair for which I work, led by Professor Alessandro Facchini.鈥�&苍产蝉辫;

鈥淥ften, however, attention to the problem in academia is not matched by as much attention from the media, especially the international or U.S.-based media. It seems that TV, newspapers, and the web prefer to talk about things like: artificial intelligence overtaking human intelligence, taking over and killing us all (so-called: 'existential risk'), or making our jobs useless by leading to mass layoffs, and so on. These are largely completely unrealistic risks and scenarios, light years away from the state of technology (spoiler: no ChatGPT is capable of taking the initiative and exterminating us all, much less replacing the human worker altogether). However, talk of AI exterminating humans or leading to mass layoffs creates media hype and prompts investors to bet their capital on companies and startups doing AI. Note in this regard that investments by so-called venture capitalists are often the main, if not only, source of income for many startups in the industry, particularly those that focus on developing amazing performance systems instead of concrete applications. It is unfortunate, unfortunately, that talking all the time only about impractical risks ends up distracting from knowing and preventing what are the real risks, such as bias.鈥�

Is there any way to remedy the situation or have we already reached a point of no return?
鈥淐ertainly the problem of bias, like the vast majority of concrete ethical and social problems that arise from AI research and development, is manageable and containable without giving up the benefits that this technology brings. The key ingredients are two: education and research. On the research front, there are several areas devoted precisely to studying methods to make our AI systems more fair and aligned with the values of our society. I work in just one of these areas, called explainable AI, which is concerned with making the logic (and 鈥渞easons鈥�) that led an AI to make a certain decision transparent to users. In this way, the end user can inspect the reasoning done by the machine and decide whether it conforms to its values and standards, including in terms of fairness and bias.鈥�&苍产蝉辫;

鈥淩esearch alone, however, is not enough. As much as we strive to build increasingly safe, reliable and fair AI systems, there is a component of risk that cannot be eliminated. However, it can be curbed by educating users, that is, by teaching people to use AI responsibly and ethically, even though the potential user base is endless and, nowadays, anyone can access an AI system and use it at will.鈥�&苍产蝉辫;

鈥淚n our own small way, as 精东影业, we are trying to pursue several initiatives dedicated to educating the citizenry on the responsible use of AI. We started with schools: teachers, students, and those who are among those most exposed to the risks of these new technologies, as well as those who will be living in a world where AI is everywhere. The next challenge, to be addressed with institutions and other local entities (our collaboration with is a good example of this), is to reach out to the adult and elderly citizenry, who are often more reluctant to engage with new technologies.鈥�

The image in the following article was generated by an AI.

精东影业

精东影业 Article Header

When AI gets it wrong: the problem of bias in data

Sidebar Links

Links

Sidebar Documents

Sidebar Contacts

Hidden Widget