A new methodology has been launched that allows the categorization and organization of a cell’s data. It can be used to create a harmonized dataset for the study of human health and disease.
Researchers at the Wellcome Sanger Institute, University of Cambridge, EMBL’s European Bioinformatics Institute (EMBL-EBI) and colleagues developed the tool, known as CellHint. CellHint uses machine learning to unify data generated around the world, allowing it to be accessed by the wider research community, potentially leading to new discoveries.
In a new study, published today (December 21) in Cell, researchers applied CellHint to uncover unexplored connections between healthy and diseased lung cell states. They looked at eight diseases, including interstitial lung disease and chronic obstructive pulmonary disease, and showed the potential benefits of this tool. They also applied CellHint to 12 tissues from 38 datasets, providing a deeply curated cross-tissue database of approximately 3.7 million cells.
Cellhint is freely available worldwide and was created as part of the Human Cell Atlas initiative1 which aims to map every cell type in the human body to transform the understanding of health and disease.
Single-cell genomics enables the understanding of every cell within the human body at high resolution. Currently, a challenge in bringing together the diverse datasets generated from single cell research is that there is no unified system for naming and organizing data.
To address this, researchers from the Wellcome Sanger Institute and colleagues developed CellHint, which can unify cell types produced by independent laboratories. CellHint then plots the data into a defined graph that shows the relationships between cell subtypes, giving a complete picture of all the cells identified in different data sets.
The team applied CellHint to the current data and uncovered unexplored relationships between healthy and diseased lung cells in eight diseases. It also identified cell types in the adult human hippocampus that could be of potential interest for future research.
The researchers also applied CellHint to 12 tissues from 38 datasets, providing a deeply curated cross-tissue database of approximately 3.7 million cells. Each cell was annotated, which is the process of labeling cells with specific information. They also showed how it can generate various models for automatically annotating cells in human tissues.
Dr Chuan Xu, first author from the Wellcome Sanger Institute, said: “CellHint stands out from other tools because it makes full use of the often inconsistent but valuable cell annotation information from individual studies to achieve biologically based data integration. We are excited to CellHint, cells from independent laboratories can be re-annotated, and researchers can use the resulting information to place each cell in different contexts beyond the original study. We hope that this tool will greatly facilitate the reuse of molecular and of cellular data and information in laboratories, potentially leading to new discoveries in biology.”
Dr Sarah Teichmann, senior author from the Wellcome Sanger Institute and co-founder of the Human Cell Atlas, said: “The Human Cell Atlas creates detailed reference maps of all cells in the human body to transform our understanding of biology, health and the diseases. , and single-cell technologies underpin this extremely ambitious project. Global collaboration and open data sharing is crucial to achieving the goal of a representative Human Cell Atlas that will benefit humanity worldwide. CellHint enables the integration and sharing of single-cell data, enabling the global research community to contribute and benefit from ongoing research happening around the world and help advance health and healthcare.”