round table meets ÖGMBT

September 15, 2014

New and recurring challenges in biocuration
Amos Bairoch, SIB - Swiss Institute of Bioinformatics and University of Geneva

 

New and recurring challenges in biocuration
Amos Bairoch, SIB - Swiss Institute of Bioinformatics and University of Geneva

Biocuration is the organization and representation of biological data so as to make it accessible to both human and computers. One of the most acute challenges in biocuration is the extraction of knowledge from literature reports. Researchers spend lots of time and money doing experiments to obtain new pertinent data and knowledge, they then process to bury up the description of what they did in scientific paper. Biocurators then need to read these papers and try to make sense of what was really done in the experiments, what are the results versus what are inferences. Modern text-mining methodologies address this challenge but they are not yet capable to extract information from figures, to distinguish real results from inferences and previously known knowledge. Another potential solution to the knowledge extraction challenge is the development and use of software tools that seamlessly create a semantically tagged version of a paper while the researcher is writing a human-readable version. Attempts have been made to develop such tools but unfortunately journal publishers are not interested in publicizing their use.

Biocurators must standardize the information that they extract from papers. To do this they need to use ontologies and controlled vocabularies. But the use of such standardization resources (SRs) are also fraught with challenges such as: instances of too many incompatible SRs or lack of adequate SRs. They also need to use these SRs in a consistent manner which is not an easy task due to the complexity of some of these resources. Biocurators also need to propagate information arising from experiments carried out on orthologous or paralogous proteins. To do so it is essential to have a clear view of the evolution of the genes to which such annotations is going to be propagated. It is also necessary to know the taxonomic scope in which an annotation can be propagated.

If a knowledge/information resource is useful to the community it needs to be constantly updated. This leads to a funding “challenge”. Normal research funding mechanisms are not targeted toward the support of infrastructures. Big resources have managed, over the years, to attract “stable” funding through specific single- or multi-states organizations.  But even such institutional funding is not very secure. In Europe, the ELIXIR initiative may help in this respect.

Biocuration is becoming an organized community thanks to the creation in 2008 of the International Society for Biocuration (ISB) which organizes annual meetings. But there are also many challenges in term of the career paths that are open to biocurators.