The Atylmo development experience, the first Brazilian algorithm exclusively focused on the integration of large volumes of data, was published in the March issue of the renowned scientific journal “IEEE Journal of Biomedical and Health Informatics”. Atylmo was created to overcome a gap in currently available Data Linkage algorithms that perform poorly on larger databases such as used in Cidacs, which involve millions of records.
The algorithm was developed by researchers from the Federal University of Bahia linked to Cidacs (Fiocruz Bahia) when began the methodological challenge of constructing the first platform of the Center, the 100 million Cohort, in 2013. “Besides the volume of data, the complexity of our scenario arises from the absence of common key attributes in all the databases involved. This requires the use of a probabilistic approach which, in turn, requires a high level of accuracy”, explain the authors.
Already in its second version, the results obtained with the tool impress: linkage accuracy varies from 93% to 97% of true positives matches. It means that in each 1,000 records integrated from different health and socioeconomic databases, more than 900 are linked correctly.
In addition to the members of Cidacs, the study “On the Accuracy and Scalability of Probabilistic Data Linkage Over the Brazilian 114 Million Cohort” is signed by researchers from the Federal University of Bahia and University College London under the leadership of Marcos Ennes Barreto, associate researcher to the three institutions.