Limitations and possibilities of AI in the world of health: the SPAS project

The application of AI algorithms in the world of health presents unique challenges and complexities that require careful consideration and the involvement of specialised research groups. Clinical data can be extremely variable and complex. Algorithms must be able to handle this diversity and adapt to a wide range of clinical scenarios.
The physician's experience, knowledge and needs must be integrated into these AI algorithms. The needs and perspectives of all actors involved in a medical research project, from physician to patient, from researcher to service producer, must be carefully considered.
The SPAS project is an emblematic example of such issues and needs.

Sleep disorders have increased significantly in recent years. A survey by the Federal Statistical Office shows that 23% of the population, one in three people, have occasional sleep disorders, while 5% suffer from them chronically. Taking the national population as a whole, there are therefore 300,000 people suffering from these disorders.

Experts have catalogued more than 80 different types, but the main ones are insomnia (difficulty falling asleep), hypersomnia (sleepiness during the day), respiratory disorders (e.g. sleep apnoea), parasomnias (sleepwalking), sleep-wake disorder and motor disorders.

Polysomnography

For cases deemed severe enough, polysomnography is recommended. The patient sleeps in the hospital where a number of bio-physiological parameters such as brain activity (EEG), eye and muscle movements, oxygen levels, cardiac activity (ECG) and respiration are monitored and recorded. The polysomnographic recording is then divided into 'epochs' of 30 seconds each. During a visual analysis, each epoch is assigned a sleep phase, following the rules of the American Academy of Sleep Medicine (AASM). This preliminary analysis work, which is quite tedious, repetitive and governed by well-defined standards, can take up to two hours.
It therefore seems a suitable task for artificial intelligence algorithms, which have in fact been applied to the classification of sleep stages for many years.
Today, there are several software packages offering automated or semi-automated classification services. However, their diffusion among professionals is still rather limited. Recently, thanks to the increasing computing power available, deep learning - a type of machine learning that focuses on the use of more complex neural networks - has also been employed, achieving superlative results.

But then. how come these algorithms have not yet entered the routine of sleep centres?

The SPAS project

In an attempt to find a solution to this decades-old question, the Sleep Physician Assistant System (SPAS) project was born.

The Biomedical Signal Processing (BSP) research group of the Institute of Digital Technologies for Personalised Health Care (MeDiTech), the Sleep Centre of the Inselspital Bern, the University of Bern, and the NeuroTec Centre of the sitem-insel have joined forces with two European companies Biomax (DE) and Relitech (NE) to create a platform to support the work of healthcare professionals active in the analysis of sleep disorders.

Francesca Faraci, head of the BSP research group, explained where SPAS comes from: 'Our goal is to listen to the needs of doctors to speed up and improve their work. We have automated the identification of the different sleep phases, achieving very high levels of precision and accuracy. We were able to match results similar to those of human operators, with the same distribution of variability (hipnodensity-graph). However, we realised that the real problems of adopting these tools in routine sleep centres go far beyond the performance of the algorithms'.

The real problem: the inherent uncertainty of the data

The stages of sleep, according to the globally recognised AASM manual, are five: the wakefulness phase (W awake); the transition phase between wakefulness and sleep that lasts only a few minutes (N1 phase); the preparation phase for deep sleep (N2 phase); the regenerative deep sleep phase (N3 phase); and the phase where one dreams (REM phase). A complete sleep cycle (stages 1-2-3-REM) can last approximately 90-110 minutes, and a person may go through several sleep cycles (4-5 cycles) during the same night. The quality of sleep is influenced by many factors, including the overall duration of sleep and the amount of time spent in each phase.

If a physician categorises a certain 30-second epoch (time window or sleep phase) as N1, and a colleague categorises the same epoch as N2, the data used by the algorithm will be discordant. This does not mean that either doctor is wrong: both may be right, but they may interpret the standard guidelines differently and according to their own experience or perspective. In fact, the AASM guidelines leave room for the subjective judgement of the physician. There is therefore an average margin of variability of around 20% between one operator and another, and 10% between the same operator analysing the same trace twice.

This variability in evaluation results in a variability of the training data (in technical jargon, training set), and prevents the algorithm from achieving satisfactory results for all sleep physicians. To date, it has been possible to obtain and model the same distribution of uncertainty as in humans; the maximum possible has been achieved with an AI algorithm trained with a supervised learning approach. Specifically, a supervised learning approach consists of training an algorithm on the basis of categories (in the specific case sleep phases - technically called classes or labels) that have been assigned by the human expert for each 30-second epoch.

鈥嬧€嬧€嬧€嬧€嬧€嬧€婣s in any field where artificial intelligence is employed, data quality is central.

精东影业 researcher Luigi Fiorillo, who did his PhD in this project, points out that 'the results could be greatly improved if the standards were stricter and left no room for subjective interpretations, thus reducing uncertainty. As this is impossible for a number of reasons, one solution could be to rely on algorithms in a partial way: the algorithms would be responsible for making the simple decisions, and the doctor would be responsible for dealing with the more complex decisions - in cases where both algorithms and humans have high uncertainty. In this way, the physician could speed up the evaluation of polysomnographies, but retain the final say in the analysis of complex cases".

Systems still not widely used

At present, automated and semi-automated systems for the characterisation of sleep phases are not yet fully adopted in specialised centres. Francesca Faraci continues, "There are several reasons for this. There is certainly a certain mistrust of technology on the part of the healthcare sector, particularly when it intervenes in the diagnosis process. Moreover, methods based on deep learning are considered a 'black-box', as to date the functioning of the algorithms cannot be easily interpreted. Many researchers are trying to make the functioning less opaque, but there is still a long way to go. There are also reasons linked to the low usability of the software on the market today, which is too unintuitive and difficult to integrate with the existing computer systems in the various hospitals. Other reasons are related to data security management, which often requires data to be uploaded to the cloud or to external servers: an action discouraged by data protection policies and healthcare service providers. We trust, however, that in the near future we can overcome these and other challenges of a non-technical nature, in order to build together with the experts in the field a solution that facilitates and enhances their work".

The project has resulted in several publications in high-impact scientific journals.

精东影业