Watch the animation
Life sciences research increasingly relies on the integration of heterogeneous big health data such as imaging, medical history, -omics and environmental data. Harmonisation and integration of these data would allow researchers to answer new and important research questions. However, the majority of life sciences data is being underutilised. This is mainly because data are hard to find and interpret and their sharing and use is restricted due to legal, administrative, political and technical barriers. To overcome these challenges a scalable information infrastructure is needed.
EuroCAT (www.eurocat.info) is a large Interreg Project that has pioneered the development of a global system for distributed learning for health data. It was the world’s first distributed data research infrastructure in the field of cancer research. The euroCAT system allows radiation oncology centres to publish non-image data on the Semantic Web and integrate these with imaging data irrespective of language, vendor, database etc. and has been deployed and validated among many leading cancer centres. Partner UM has developed distributed learning tools from federated anonymised, ontology compliant databases.
The backbone of the system is a set of interlocking, open source tools which the UM group has brought together in an integrated suite. It de-identifies patient information and warehouses it in an image and a non-image data warehouse. From these
warehouses the data is enriched and published on the Semantic Web where it is available for a research application to use.
Key unique selling points of euroCAT
• “Research comes to the data”. EuroCAT enables researchers to use anonymised metadata from big
datasets from anywhere in the Netherlands without the need for their confidential data to leave the healthcare setting;
•EuroCAT facilitates and expedites scientific research projects but also allows for personal
investigations and crowd-sourced research;
•EuroCAT fully engages machines in the research and analytics process over big dispersed data sources.