EGI, a European landmark infrastructure for scientific computing, celebrated its 20th anniversary in 2023, marking two decades of scientific progress. This international collaboration has transformed how data-intensive science is funded and operated, uniting hundreds of research data centres into the world’s largest federation.
2023 marked the 20th anniversary of the operations of EGI, serving international research communities and glueing the hundreds of data centres participating in the EGI Federation. Over those two decades, the Federation has established itself as a European landmark infrastructure for scientific computing, which has forever transformed the way infrastructures for data-intensive science are funded and operated. The EGI infrastructure of today is the largest federation of research data centres in the world, a hyper-scale facility made possible by national and European investments by spanning multiple regions of the world to serve more than 95,000 researchers from all scientific disciplines and supporting a research output of around 1,000 open access publications each year.
Some Background
EGI emerged in the 2000s as a response to the formidable data processing challenges of the visionary initiative within the High-Energy Physics (HEP) community. Recognising the pivotal role of research communities across diverse scientific domains in driving innovation, EGI embarked on a journey that initially seemed impossible.
In its nascent stages, the endeavour faced scepticism, with many deeming the task impossible. However, drawing inspiration from the collaborative ethos inherent in distributed computing within HEP, EGI forged partnerships with entities like GEANT, fostering trust and enabling data integration regardless of its global localisation. This endeavour necessitated the establishment of a secure infrastructure dedicated to data processing.
The projects DataGrid, DataTag, and EGEE series set the basis for that infrastructure building, supported by the commitment and investment from the European Commission. This pivotal support facilitated significant advancements within HEP. It catalysed the expansion of EGI's reach to encompass a spectrum of research communities, spanning from structural biology to astronomy, astrophysics, and computational chemistry. Since the first accounting data was available in 2004, EGI has evolved into a dynamic federation. Today, it provides seamless cross-border access to High Throughput Computing (HTC), High-Performance Computing (HPC), and Cloud Compute resources hosted by leading research centres worldwide, embodying the spirit of collaboration and innovation that drives scientific progress.
With EGI-InSPIRE in 2014, EGI sparked an experimental breakthrough with the inception of the Cloud Federation, a pioneering effort that expanded the resource pool of cloud providers. This milestone was made possible through extensive software development efforts conducted collaboratively with various communities, leveraging the OpenStack platform. While the Cloud Federation represented a novel concept, the successful implementation of the HTC Federation had previously validated its feasibility. This solid foundation paved the way for the integration of the cloud resources into EGI's ecosystem. This marked a new era of technological collaboration with industry and software development communities. At its core, the EGI Federation remains firmly embedded in science. Hundreds of research data centres participate, offering scientists and integrated solutions for data processing, analytics, research data and computing facilities.
The success of the Cloud Federation has been profound, yielding tangible benefits across a spectrum of scientific disciplines. From addressing urgent demands in environmental sciences to facilitating breakthroughs in Humanities and Social Sciences (HSS) and structural biology, the cloud has emerged as a transformative tool for research. Notably, the journey embarked upon by the HEP community served as a pioneering pathfinder, inspiring and guiding the adoption of cloud technologies within EGI and beyond.
The Federation’s endeavour doesn’t stop here: next would be to integrate HPC at scale to support hybrid processing workflows with compute continuum and offer an integrated environment providing Cloud, HTC and HPC to all scientific communities in Europe.
What’s next?
Several challenges are still to be solved for the next decade, starting with increasing data productivity and valorising scientific data for reuse. Infrastructures are not yet meant to support data exploitation with the risk of creating silos, especially for big data. EGI has a role as an e-infrastructure to mitigate these challenges and also offers an opportunity to co-provide research infrastructures at scale to both European and non-European research infrastructures to deliver data, software, and applications to scale up research. Because of this, EGI plans to channel existing knowledge and investments to make them useful and usable across the full scientific spectrum.
The EU is working towards a coordinated and federated scenario that enables scalable access to data. This involves integrating federated functions and research projects, and also the ability to access data that is not necessarily produced by your own organisation or community. The vision is to create an international data commons where new research data can stem from other uses or projects, as well as from secondary data. EGI is becoming increasingly able to provide a distributed platform for high-throughput computing and data, as well as scientific applications, offering them as interconnected data and computing facilities. EGI works collaboratively with e-infrastructures and RIs to realise this data commons vision, with assets coming from different stakeholders in a coordinated manner. However, one challenge to this vision is that national infrastructures are not open by default, which is a barrier to data reuse. EGI is advocating for the opening of national ICT and IT infrastructures so that secondary data can be reused. Collaboration between different stakeholders is crucial, as science has no borders.
We are grateful to research communities, EGI Federation members, European research ministries, DG-CONNECT of the EU Commission, and the European Open Science Cloud initiative for making this possible.
Through its projects, EGI is committed to fostering:
Support to Data Spaces
EUCAIM will establish and deploy a pan-European digital federated infrastructure of FAIR pan-cancer anonymized images
EUreka3D aims to build the capacity of small cultural heritage institutions in digital transformation, particularly
The GREAT project, funded by the Digital Europe program, aims to establish the Green Deal