Voice Features of Sustained Phoneme as COVID-19 Biomarker

Background: The COVID-19 pandemic has resulted in enormous costs to our society. Besides finding medicines to treat those infected by the virus, it is important to find effective and efficient strategies to prevent the spreading of the disease. One key factor to prevent transmission is to identify COVID-19 biomarkers that can be used to develop an efficient, accurate, noninvasive, and self-administered screening procedure. Several COVID-19 variants cause significant respiratory symptoms, and thus a voice signal may be a potential biomarker for COVID-19 infection. Aim: This study investigated the effectiveness of different phonemes and a range of voice features in differentiating people infected by COVID-19 with respiratory tract symptoms. Method: This cross-sectional, longitudinal study recorded six phonemes (i.e., /a/, /e/, /i/, /o/, /u/, and /m/) from 40 COVID-19 patients and 48 healthy subjects for 22 days. The signal features were obtained for the recordings, which were statistically analyzed and classified using Support Vector Machine (SVM). Results: The statistical analysis and SVM classification show that the voice features related to the vocal tract filtering (e.g., MFCC, VTL, and formants) and the stability of the respiratory muscles and lung volume (Intensity-SD) were the most sensitive to voice change due to COVID-19. The result also shows that the features extracted from the vowel /i/ during the first 3 days after admittance to the hospital were the most effective. The SVM classification accuracy with 18 ranked features extracted from /i/ was 93.5% (with F1 score of 94.3%). Conclusion: A measurable difference exists between the voices of people with COVID-19 and healthy people, and the phoneme /i/ shows the most pronounced difference. This supports the potential for using computerized voice analysis to detect the disease and consider it a biomarker.


I. INTRODUCTION
Covid-19 was declared a global pandemic by the World 26 Health Organization (WHO) in March 2020 [1]. The pan- 27 demic rapidly spread to over more than 200 countries with 28 more than 300 million confirmed cases and 5.5 million 29 deaths by January 2022 [2]. The disease affects multi- 30 ple body systems and organs [3], [4]. The main symp- 31 toms of COVID-19 are fever, dry cough, sore throat, 32 dyspnea, fatigue, headache, and multiple organ failure in 33 severe cases [4], [5]. 34 The pandemic has caused enormous health, economic, and 35 social challenges, and the effective suppression of its con- 36 tinued spread is dependent on efficient testing methods and 37 strategies. The current gold standard for identifying infected 38 people is based on molecular and serology testing. The poly-39 merase chain reaction (PCR) test has been widely accepted 40 VOLUME 10,2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ as the most accurate COVID-19 test [6], [7], [8]. However, 41 not all variants of the disease are serious, and variants such 42 as Omicron are generally considered to have lower morbidity 43 rates [9].
quality and, therefore, change the parameters in the patient's    As the research in this area is still in the preliminary stage, 120 more studies are needed to identify a reliable COVID-19 121 biomarker extracted from voice features that could be imple-122 mented as operable devices or testing procedures. The above 123 research indicates a possible biomarker in the voice param-124 eter. However, the studies investigated some limited voice 125 features and extracted only from vowel /a/. Furthermore, 126 the use of voice features in COVID-19 identification may 127 lead to over-optimistic or misleading results due to demo-128 graphic, subjective, and acoustic bias as shown in the work of 129 Han et al. [21]. To limit the bias this study extracted voice 130 parameters from only sustained phonemes.

131
Expanding on previous findings, this study investigated a 132 wider range of features related to voice production mecha-133 nisms or organs, including the features related to air pres-134 sure production by the lung, vocal cord vibration, and voice 135 modulation in the vocal tract (oral and nasal cavity). This 136 study also extracted the features from a wider range of sus-137 tained phonemes to capture any possible alteration due to 138 COVID-19 that might occur in voice production mechanisms 139 and organs.

140
This study aimed to determine the most effective features 141 that could be used as a COVID-19 biomarker. Once these 142 features are identified, they can be used to develop a non-143 invasive device or testing procedure to screen people infected 144 with COVID-19.  The HC participants were recruited randomly from peo-170 ple who had never been diagnosed with COVID-19, had no 171 history of any disease related to respiration or voice produc-172 tion mechanism, and did not have any COVID-19 symptoms 173 within 14 days before and after the recording.

174
The study protocol complied with the Helsinki Decla-  support the aim of this study, which is to develop a sys-198 tem that would be functional with minimum resources, such 199 that these can also be used in less affluent societies. The 200 8 kHz sampling is the norm for 2G/3G phones and hence 201 was chosen for this study. The files were transferred to the 202 FireBase cloud database. The duration of each recording was 203 between 3 to 15 seconds. The recording was performed in 204 COVID-19 hospital wards while keeping the ambient noise 205 as low as possible. The average SNR of the recordings 206 was 27.80 dB.

207
The six sustained phonemes were expected to be recorded 208 from the CV patients once every day while hospitalized. 209 However, due to the patients' health conditions and some 210 technical considerations, the recording could not be prop-211 erly acquired from each CV patient every day as expected. 212 Table 2 provides the list of valid phoneme recordings from 213 each CV patient during their stay in the hospital.

214
The recording of HC participants was acquired using the 215 same Android application with a similar setting of 8 kHz 216 and 32-bit resolution. The recording process occurred in a 217 common room while the ambient noise was kept at the lowest 218 possible level (mean SNR = 30.10 dB).  The formants features (F1 to F4) [25], the apparent vocal 240 tract length [26], [27], and the 13 coefficients of MFCCs [28] 241 represent the change in vocal tract formation due to 242 COVID-19. The voice intensity is controlled by the sub-243 glottal pressure, which is controlled by the respiratory mus-244 cles and lung volume [29], and thus the intensity features 245 were expected to represent a change in lung condition due 246 to COVID-19.   was considered for the analysis and a p-value < 0.05 indi-257 cated that the mean of the groups was significantly different.

258
The differences between the groups were also examined 259 using effect size (ES) [32]. The ES between two groups of 260 data (A and B) was calculated using Cohen's d [33] An ES of 0.50 or above indicates a medium to a large differ-264 ence between the compared groups.

266
The effectiveness of the voice features to separate CV from 267 HC subjects was also be examined based on the feature's 268 performance in a Support Vector Machine (SVM) [34] classi-269 fier. The SVM used in this work was trained with a Gaussian 270 kernel and validated using ''leave-one-subject-out'' (LOSO) 271 cross-validation. The Gaussian kernel was selected because it 272 showed the best result compared to the other kernels.

273
Several combinations of voice features were selected to 274 be used in the SVM training and validation. The accuracy, 275 sensitivity, and selectivity were recorded as the measure of the 276 features' effectiveness as a COVID-19 biomarker. The feature 277 selection was based on the statistical analysis and a rank 278 calculated by ReliefF algorithm [35]. The ReliefF algorithm 279 ranks the features based on k nearest hits and misses and 280 averages their contribution to the weights of each feature. The 281 ReliefF algorithm was implemented using MATLAB 2018b 282 with 10 nearest neighbors (k = 10).

285
The result of the Anderson-Darling normality test showed 286 that most of the features were not normally distributed, and 287 thus the Mann-Whitney U test, a nonparametric test, was 288 used to test for group differences in each of the features. 289 The group differences were also examined by calculating 290 the ES. In this analysis, a feature is considered significant 291 if the Mann-Whitney U test p-value was equal to or less 292 than 0.05 and the ES was 0.50 or above. Table 4 Table 4 shows that the features corresponding to fre-   This study investigated a range of voice features that were 366 related to vocal cord vibration (jitter, shimmer, SD of pitch, 367 HNR, and NHR), vocal tract modulation (formants, VTL, 368 and MFCC), and lung function (intensity). In this work, the 369 authors extracted the features from six sustained phonemes 370 (i.e., /a/, /e/, /i/, /o/, /u/, and /m/). These phonemes were 371 selected to examine the whole aspect of the voice production 372 system.

373
The statistical analysis and SVM classification indicated 374 that the voice features of sustained phoneme corresponding to 375 vocal tract modulation (MFCC, formants, and VTL) and lung 376 pressure stability (Intesity-SD) were sensitive to COVID-19 377 infection and, therefore, could potentially be adopted as a 378 COVID-19 biomarker compared to the features of vocal fold 379 vibration (jitter, shimmer, pitch, HNR, and NHR). The results 380 suggest that COVID-19 symptoms that affect laryngeal activ-381 ity and the oral and nasal cavities create the most alter-382 ation to the voice quality of sustained phonemes. This result 383 explained the findings of Suppakitjanusant [3], Quatieri [16], 384 Maor [7], and Loey [20] that parameters related to frequency 385 modulation of the vocal tract (log Mel spectrogram, formants, 386 and scalogram) contributed significantly to the performance 387 of the classifiers. The low to medium MFCC coefficients (c0, 388 c3, c4, c5, c6, and c10) were the most sensitive features. 389 These coefficients represent vocal tract impulse responses in 390 the range of low to medium frequency [36].

391
Among the investigated phonemes, the features extracted 392 from /i/ were the most effective features to distinguish 393 COVID-19 patients from healthy subjects. A large number 394 of features from /i/ produced a p-value of less than 0.05 and 395 a relatively high average |ES|. The SVM classification with 396 features extracted from /i/ produced the highest F1 score 397 of 94.3%.

398
The phoneme /i/ is a cardinal vowel produced while the 399 tongue is at a high-front position with spread lips [37], [38]. 400 The tongue is very close to the hard palate while its sides 401 are pressed against the teeth. The production of /i/ requires 402 precise control of the air gap between the tongue and hard 403 palate as well as maintaining proper lips position and shape. 404 In contrast, the vowel /a/, which was used commonly in the 405 previous studies, is a back-open cardinal vowel that requires 406 less precise control as long as the jaw is open wide and the 407 tongue is at the lowest position. Any change of vocal tract 408 muscle control due to infection, pain, or inflammation caused 409 by COVID-19 will, therefore, affect the production of /i/ more 410 than /a/.

411
The statistical analysis of features extracted from the 412 phonemes recorded on Days 4-6 shows better separation 413 between COVID-19 patients and healthy subjects, followed 414    [39].

424
The novelty of this study is the finding that sustained 425 phoneme features related to frequency modulation in the 426 vocal tract contains the most information to be used as 427 COVID-19 biomarkers. The other significant novelty is that 428 the features extracted from /i/ gave better differentiation 429 between COVID-19 patients and healthy subjects. This study 430 also indicates that the features recorded in the first 6 days 431 gave the best results.

432
The limitation of this study is that this study investigated a 433 relatively small number of subjects in the hospital environ-434 ment. Due to the condition of the patients, the recordings 435 could not be taken every day from all the patients. Further 436 study needs to be conducted with a large number of patients 437 under a standardized recording environment and protocol. 438 The other limitation of this study is that the recordings were 439 taken after the patients tested positive with RT-PCR. It could 440 be more useful if the recordings were taken from the subjects 441 VOLUME 10, 2022 before being declared COVID-19 positive by other means.