The ICBG Symposium committee have carefully selected keynote speakers who are leading experts in their respective fields.
The FAIR (Findable, Accessible, Interoperable, Reusable) principles laid a foundation for sharing and publishing scientific data, now extending to all digital objects including software with the recent publishing of the FAIR principles for Research Software (FAIR4RS). One kind of software widely used in Biosciences is computational workflow systems – whose adoption has accelerated in the past few years driven by the need for repetitive and scalable data processing, access to and exchange of processing know-how, and the desire for more reproducible (or at least transparent) and quality assured processing methods. Over 320 workflow systems are currently available, although a much smaller number are widely adopted. As first class, publishable research objects, it seems natural to apply FAIR principles to workflows. But what does the FAIRness of workflows mean and why does it matter? ELIXIR, European Research Infrastructure for Life Science Data, the European EOSC-Life Workflow Collaboratory, Australia BioCommons and the Workflow Community Initiative, alongside big workflow players like Galaxy, snakemake and Nextflow, are developing an ecosystem of tools, guidelines and best practices to make bioscience workflows FAIR. In this talk I will shine a light on this work, ranging from the bigger picture of strategic directions to the practicality of daily work in the lab.
Professor Carole Goble , The University of Manchester
Carole is a Full Professor of Computer Science at the University of Manchester, UK where she leads the e-Science Group of Researchers, Research Software Engineers and Data Stewards. She has 30+ years’ experience of research reproducible science, open data and method sharing, knowledge and metadata management and computational workflows in a range of disciplines, notably the Life Sciences. She has developed production services for workflows, web services, and data management and co-led digital infrastructure projects and resources.
Carole is the Joint Head of Node of ELIXIR-UK, the national node of the ELIXIR European Research infrastructure for Life Science data, and leads the digital infrastructure for IBISBA, the EU Research Infrastructure for Industrial Biotechnology.
At the national level she serves on the leadership team of Health Data Research-UK and is a founder of the UK’s Software Sustainability Institute. Carole serves on the Board of Directors of Sage Bionetworks, the SAB of Helmholtz Metadata Collaboration, and 10+ other centres, and is the UK representative on the G7 Open Science Working Group. She has a long-time activity in computational workflows – developing workflow platforms (Taverna) , public resources for sharing workflows (myExperiment, Workflowhub) and metadata frameworks for reproducible workflows (RO-Crate, Bioschemas, CWL). She is a pioneer of Open and FAIR data and software in scholarly communication and an author of the original Nature FAIR Scientific Data Principles article. These two threads come together as FAIR Computational Workflows, which she leads for the Workflows Community Initiative.
To a genomics researcher a cow and a human are more similar than different. Both species have similar size genomes and gene number, can be afflicted by similar diseases and traits, and biological processes from gene regulation to body development are highly conserved across them. These similarities make humans a good model organism for the cow and means techniques and approaches developed for humans can rapidly be applied to cattle, while likewise findings from cattle research can potentially be used to improve human health. In this talk I will discuss some of the research we are undertaking bridging the gap between human and cattle research. Illustrating how we are applying approaches developed for humans to livestock research, such as graph genomes, how we are using the understanding of how functional variants conserved across both species can potentially be used to improve both human and cattle health and trying to understand the conservation across species of fundamental processes common to both species such as the regulation of DNA mutations.
Dr. James Prendergast , Roslin Institute, University of Edinburgh
James completed his PhD in bioinformatics and statistical genetics in 2007 from the University of Edinburgh, and following positions at the European Bioinformatics Institute and University College Dublin returned to Edinburgh to work first at the MRC Human genetics unit before joining the Roslin in 2013. James’s group is focused on understanding mammalian gene regulation, genome evolution and human and animal disease genetics.
Cancer development within an individual is an evolutionary process. This has important clinical implications for cancer prevention and therapy, as well as our understanding of cancer progression and metastatic spread. In this talk, I will outline how we can exploit cancer genomic sequencing data to decipher cancer evolutionary histories and the extent of diversity within individual tumours. I will focus on lung cancer, the leading cause of cancer-related deaths worldwide. I will evaluate how tumours spread from the primary tumour to distant sites, and when this occurs during a tumour’s development. Finally, I will explore how we can use novel bioinformatics tools to shed light on the interface between the cancer cell and the immune microenvironment, and mechanisms of immune escape. I will explore how DNA sequencing data can be harnessed to identify T cells in tumour samples, and the clinical relevance of T cell infiltrate in predicting response to immunotherapy.
Dr. Nicholas McGranahan , CRUK-UCL
Dr Nicholas McGranahan completed his undergraduate degree in Natural Sciences, specializing in Evolutionary Genetics, at the University of Bath before pursuing post-graduate studies at University College London at the Centre of Mathematics and Physics in the Life Science and Experimental Biology (CoMPLEX). In 2011, Dr McGranahan joined Professor Charles Swanton’s group at the CR-UK London Research Institute (now the Francis Crick Institute), completing a PhD in Cancer Genomics in 2015.
Nicholas established his own research group in 2018 and as a Sir Henry Dale fellow at the UCL-CRUK Lung Cancer Centre of Excellence, his research interests include using bioinformatics to dissect cancer evolution. His team explore the evolutionary history of cancers through sequencing multiple regions of individual tumours. In particular, Dr McGranahan’s research has focused on understanding the importance of genome doubling in tumour evolution, exploring the mutational processes shaping the genomes of cancers over space and time, and investigating the interface between the cancer genome and the immune microenvironment.
Often, scientific advances are made when the assumptions of underlying models are found to be inadequate. I will focus on an example from evolutionary genomics, where the assumptions underlying two foundational concepts (the evolutionary “tree”, and the interbreeding “population”) have been undermined by the advent of whole genome analysis. I will present a brief history of these concepts and their role in the development of biological thought, before describing how their inadequacies can start to be addressed by a recent genealogical approach, primarily devised to enable efficient computer simulation. This approach, implemented in our “tree sequence” software toolkit, promises to bring together the study of evolution on different timescales. Focussing on the structure needed to capture results from different statistical and computational models has forced us to examine what these genealogies actually represent. I will argue that our genealogies describe the basic biological processes of mitosis and meiosis, and are therefore less abstract than previous descriptions of the evolutionary process, although several improvements in our approach remain to be be made.
Dr. Yan Wong , University of Oxford
Yan is evolutionary geneticist with an interest in a wide range of biological problems. After a DPhil in Plant Sciences at Oxford, he collaborated with Richard Dawkins to write The Ancestor’s Tale, a comprehensive history of life in reverse time. This was followed by a period of time as a lecturer at the University of Leeds, then as a TV and radio presenter, most notably on the BBC One show Bang Goes The Theory.
Yan currently works at Oxford University’s “Big Data Institute”
The ability to measure gene expression levels for individual cells (vs. pools of cells) is crucial to address many important biological questions, such as the study of stem cell differentiation, the detection of rare mutations in cancer, or the discovery of cellular subtypes in the brain. Single-cell transcriptome sequencing (RNA-Seq) allows the high-throughput measurement of gene expression levels for entire genomes at the resolution of single cells. RNA-Seq studies provide a great example of the range of questions one encounters in a Data Science workflow, where the data are complex in a variety of ways, there are multiple analysis steps, and drawing on rigorous statistical principles and methods is essential to derive reliable and interpretable biological results. In this talk, I will provide a survey of statistical questions related to the analysis of single-cell RNA-Seq data to investigate the differentiation of stem cells in the brain, including, exploratory data analysis, dimensionality reduction, normalization, expression quantitation, cluster analysis, and the inference of cellular lineages.
Professor Sandrine Dudoit , University of California, Berkeley
Professor Sandrine Dudoit is Associate Dean for the Faculty in the Division of Computing, Data Science, and Society, Professor in the Department of Statistics, and Professor in the Division of Biostatistics, School of Public Health, at the University of California, Berkeley. Professor Dudoit's methodological research interests regard high-dimensional statistical learning and include exploratory data analysis (EDA), visualization, loss-based estimation with cross-validation (e.g., density estimation, classification, regression, model selection), and multiple hypothesis testing.
Much of her methodological work is motivated by statistical questions arising in biological research and, in particular, the design and analysis of high-throughput sequencing studies, e.g., single-cell transcriptome sequencing (RNA-Seq) for discovering novel cell types and for the study of stem cell differentiation. Her contributions include: exploratory data analysis, normalization and expression quantitation, differential expression analysis, class discovery and prediction, inference of cell lineages, and the integration of biological annotation metadata (e.g., Gene Ontology (GO) annotation). She is also interested in statistical computing and, in particular, computationally reproducible research. She is a founding core developer of the Bioconductor Project (http://www.bioconductor.org), an open-source and open-development software project for the analysis of biomedical and genomic data.
The vast majority of microbes are harmless to us, and many play essential roles in health. Others are pathogens and exert a spectrum of deleterious effects on their hosts. Infectious diseases have historically represented the most common cause of death in humans (until recently) exceeding by far the toll taken by wars or famines. From the dawn of humanity and throughout history, infectious diseases have shaped human evolution, demography, migrations and history. In this talk, I will discuss some aspects of our long-standing relationship with microbial pathogens and how genomics can unravel that complicated history. I will talk about how genomics can be applied to the past, present and future. The past will focus on the enteric pathogen Salmonella enterica and genomes recovered from long since deceased hosts. For the present, I will talk about Salmonella enterica ser. Agona from European badgers. For the future, I will reflect on our experience of tracking the culprit of the COVID-19 pandemic, SARS-CoV-2, and the steps required to combat future pandemics.
Dr. Nabil-Fareed Alikhan , Quadram institute
Nabil-Fareed is currently a Bioinformatics Scientific Programmer with Andrew Page at the Quadram Institute Bioscience. He was previously a Senior Research Fellow with Mark Achtman at the University of Warwick. He completed his PhD in 2015, under the supervision of Scott Beatson at the University of Queensland. He graduated from UQ with a Bachelor of Information Technology and a Bachelor of Science in 2008.
Nabil-Fareed's recent contributions include analysis and data curation for Quadram Institute as part of the COG UK Consortium National effort for tracking COVID19 (SARSCoV2) through genome sequencing. He also develops bioinformatics infrastructure within Quadram Institute, moving towards open web platforms and cloud computing. He contributes to the infrastructure working group within The Public Health Alliance for Genomic Epidemiology (PHA4GE). Nabil-Fareed's research interest include the population genetics and pathogenesis of enteric pathogens, including Salmonella and E. coli.
Le blianta beaga anuas, tá réabhlóid faisnéise feicthe againn i dtaighde ailse. Mar sin féin, tá cur i bhfeidhm léargais nua a fuarthas tri sheicheamhú géanóm ailse sa réimse cliniciúil sách teoranta. Leanann staidéir ghéanómacha ar aghaidh ag giniúint méid ollmhór data ach gan dóthain bealaí chun an t-eolas seo ar ghéineolaíocht ailse a aistriú go bealaí éifeachtacha chun othair ailse a chóireáil. Sa chaint seo, déanfaidh mé cur síos gearr ar cad is féidir a fhoghlaim ó phróifíliú ghéanom ailse iomlán agus conas is féidir le cláir náisiúnta um ghéanóim phoiblí cabhrú linn dul ó ghéanóm go clinic. Le heagrú forleathan data géanómaíochta agus cliniciúla ceadaítear modhanna meaisinfhoghlama a chur i bhfeidhm a d'fhéadfai forbairt a dhéanamh ar halgartaim chliniciúla a chun cabhrú le cliniceoirí agus le géanómaithe anailís agus léirmhíniú géanóim ailse aonair a fhorbairt. Le linn mo chuid cainte, déanfaidh mé tagairt do thaighde ó mo ghrúpa féin ar ailse magairli mar chuid den Tionscadal 100,000 Géanóim i Sasana agus i gcomhpháirtíocht leis an NHS. Chomh maith leis sin, pléifidh mé conas is féidir linn an bonneagar agus na creataí oideachais in Éirinn a chur chun cinn chun tacú le clár náisiúnta géanómaíochta poiblí agus le géanomaithe cliniciúla a ullmhú le haghaidh todhchaí ina mbeidh seicheamhú géanóm iomlánaíoch mar ghnáthchleachtas. Déanfaidh mé tagairt freisin don bhaint a bhíonn ag rannpháirtithe sa taighde, conas còir leighis pearsanta a ofráil do dhaoine le hailse, an gá atá le héagsúlacht san eoas géanómach agus agus comhtháthú teicneolaíochtaí nua cosúil le 'seicheamhú fada'.
Dr. Máire Ní Leathlobhair , Trinity College Dublin
Máire Ní Leathlobhair originally graduated from Trinity College Dublin with a BA in Mathematics. She completed her PhD in Biological Sciences under the supervision of Elizabeth Murchison at the University of Cambridge in 2018. Following three years as a Junior Research Fellow at the University of Oxford, Máire recently started her research group at the School of Genetics and Microbiology in Trinity College Dublin where she is HCI Assistant Professor in Biological Data Analytics. Currently, her main research interests lie in investigating the development of rare cancers.