EUROPEAN EDTECH POLICY MAP
4.1. Sustainable funding for spaces and structures for testing, trialling, co-creation
4.1.5 Support real-world, diverse datasets for EdTech
Summary of suggested actions
Promote the creation, sharing, and responsible use of diverse, high-quality educational datasets that reflect the social, linguistic, and cultural diversity of Europe, ensuring that EdTech tools and AI systems are developed, tested, and evaluated under conditions representative of real-world learning environments.
Description
Robust and diverse datasets are essential for building trustworthy, effective, and equitable EdTech tools and services. At present, many education technologies, particularly those employing artificial intelligence, are trained or validated on datasets that lack representativeness, contextual diversity, or educational specificity. This leads to systems that perform unevenly across languages, socio-economic groups, and pedagogical settings, reinforcing inequities and undermining trust.
To address these challenges, Europe must invest in the development of real-world, privacy-compliant, and ethically sourced datasets that can be used for educational research, EdTech product evaluation, and algorithmic training. Such datasets should be built on principles of fairness, inclusivity, and transparency, and accompanied by strong governance mechanisms to ensure lawful use and protection of learners’ data.
Creating these datasets will require close cooperation between ministries of education, research organisations, EdTech companies, and data protection authorities. They should prioritise synthetic or anonymised data generation where appropriate, and establish mechanisms for secure sharing under the EU Data Governance Act and forthcoming Common European Data Spaces.
By supporting real-world, diverse datasets, Europe can strengthen the evidence base for EdTech, improve model performance, and ensure that future AI systems in education align with European values and human rights standards.
Major enabling factors
-
The EU Data Governance Act (2022) and the forthcoming European Data Act provide legal foundations for secure and interoperable data-sharing across sectors, including education.
-
The Common European Data Spaces framework and AI-on-Demand Platform offer infrastructure for controlled access to datasets under European standards of privacy and sovereignty.
-
Policymakers, educators, and researchers increasingly recognise that diverse datasets are essential for reducing algorithmic bias and ensuring fairness in AI systems.
-
Universities, research institutes, and EdTech evaluation hubs already manage large datasets and can contribute expertise in ethical data management and anonymisation.
-
Techniques such as federated learning, synthetic data generation, and differential privacy enable the creation of usable datasets without compromising personal data.
Major roadblocks
-
Unlike health or finance, education lacks a structured system for managing and sharing datasets securely across Member States.
-
Stringent GDPR requirements, particularly concerning children’s data, make it difficult to collect and share educational data for innovation purposes.
-
Schools and EdTech companies often regard data as a proprietary asset rather than a public good, reducing willingness to share or pool information.
-
Data collection methods vary widely across countries and systems, impeding cross-border comparison and aggregation.
-
Data collection, cleaning, and maintenance are resource-intensive and often excluded from project budgets.
Suggested action: Network of European testing and evaluation environments
WHO (Potential actors)
-
European Commission (DG Connect, DG EAC), in collaboration with national ministries of education, statistical agencies, and research institutions.
-
Operational partners could include the AI-on-Demand Platform, European Data Innovation Board, and national education research councils
WHAT (Goal of suggested activities)
Establish secure, ethically governed, and representative educational data infrastructures that enable the responsible use of real-world datasets for EdTech research, testing, and innovation.
​
HOW (Suggested activities)
-
Develop a common European data space for Education under the EU Data Strategy, ensuring data accessibility for research and innovation in compliance with GDPR and the AI Act.
-
Fund national and regional data collection projects focused on synthetic or anonymised education datasets reflecting Europe’s linguistic and cultural diversity.
-
Create data stewardship roles within ministries and research institutions to manage ethical and technical aspects of dataset curation.
-
Support federated learning frameworks allowing EdTech developers to train AI models on distributed data without moving sensitive information.
-
Develop European quality standards for educational datasets, ensuring documentation of provenance, representativeness, and intended use.
-
Encourage cross-sector collaboration through data trusts or data cooperatives to balance innovation and privacy protection.
-
Provide capacity-building and guidance for educators and administrators on data literacy, privacy, and ethical data use.
Existing steps in the right direction
EdTech Hub (Global Research Consortium)
The EdTech Hub is a global, multi-partner research and innovation consortium founded in 2019 and funded by the UK’s Foreign, Commonwealth & Development Office (FCDO), the Bill & Melinda Gates Foundation, and others. It aims to improve learning outcomes by providing rigorous, actionable evidence on the use of technology in education, particularly in low- and middle-income countries. The Hub works across Africa, Asia, and the Middle East, collecting and analysing diverse datasets on EdTech use, learning impact, and contextual effectiveness. It maintains an open-access repository of studies, data visualisations, and evaluation tools to support evidence-based decision-making.
The Hub demonstrates how real-world and diverse datasets can inform EdTech design, procurement, and policy development. Its research spans multiple educational, linguistic, and socio-economic contexts, providing insights into how tools perform under varying conditions. By making its datasets and findings openly available, the Hub contributes to a global evidence base that strengthens trust and transparency in educational technology
​
Specific support required to achieve the Goal:
-
Integrate EdTech Hub methodologies into European data initiatives (e.g., the Common European Data Space for Education) to ensure that datasets include diverse educational settings and user groups.
-
Support collaboration between European and global research consortia, allowing mutual learning on ethical data use, open access, and diversity standards.
-
Co-fund European participation in international data collaborations under Horizon Europe or Digital Europe, focusing on dataset interoperability and global comparability of EdTech evidence.
Example: European Language Data Space (ELDS)
The European Language Data Space provides a trusted infrastructure for sharing language resources, models, and data within the EU. It promotes multilingualism and inclusion by supporting the creation and exchange of datasets across all European languages. Many EdTech tools rely on natural language processing, making language data a crucial component of educational AI. Integrating education-specific corpora into the ELDS would strengthen linguistic diversity and model performance for learners across Europe.
​
Specific support required to achieve the Goal:
Replicate this model through regional testing hubs co-funded by national governments and the European Commission. Encourage integration with universities and teacher-training institutions, and establish cross-border cooperation mechanisms (e.g. joint pilot programmes, data-sharing agreements).
Global EdTech Testbed Network
The Global EdTech Testbed Network (GETN) is an international collaboration launched in 2022. The initiative connects testbeds, living labs, and EdTech evidence hubs across multiple regions to improve knowledge exchange and standardisation in educational technology evaluation. It seeks to harmonise testing methodologies, share evidence on product impact, and enable EdTech companies and researchers to collaborate across borders.
GETN directly addresses two of the sector’s major challenges: fragmentation of evaluation standards and limited scalability of national testbeds by enabling cross-border evidence-sharing.
​
Specific support required to achieve the Goal:
-
Link any education data space with the ELDS to ensure that educational AI systems reflect Europe’s multilingual realities. Fund the collection of child-appropriate, classroom-related linguistic data with ethical safeguards and cross-sector collaboration.
-
Copy the infrastructure of the ELDS to explore sharing education training data for EdTech systems.
Gaia-X – Federated European Data Infrastructure
Gaia-X is a European initiative launched in 2020 to create a secure, federated data infrastructure that ensures data sovereignty, interoperability, and trust across sectors. The project is coordinated by the Gaia-X European Association for Data and Cloud (AISBL), headquartered in Brussels, and supported by numerous Member States. Its architecture enables organisations to share and access data within federated “data spaces” that follow common European standards on security, transparency, and interoperability.
While most Gaia-X verticals currently focus on industry, health, mobility, and finance, the governance and technical models are transferable to the education sector. Several national hubs (including Germany, France, and Finland) have expressed interest in piloting education-related use cases, for example in credential management, student data portability, and digital learning analytics.
Gaia-X provides the architectural and governance framework needed to support a federated Education Data Space where schools, ministries, and EdTech providers can share anonymised or synthetic datasets securely and in compliance with GDPR and the Data Governance Act.
By aligning education with Gaia-X principles—federation rather than centralisation, transparency of data flows, and user control—Europe could ensure that real-world educational datasets are managed under a trustworthy and sovereign European framework
​
Specific support required to achieve the Goal:
-
Develop an education data space within Gaia-X, building on its federation and interoperability standards to connect national education data portals (e.g. Finland’s Vipunen or Germany’s Bildungsmonitoring).
-
Engage national Gaia-X hubs and education ministries to co-create governance rules specific to educational and learner data.
-
Provide EU co-funding under the Digital Europe Programme for pilot projects linking EdTech testbeds and AI evaluation hubs to Gaia-X federations.
-
Encourage participation of EdTech SMEs through simplified onboarding and sandbox participation, ensuring equitable access to the federated infrastructure.
-
Integrate ethical and pedagogical evaluation layers into Gaia-X data governance so that educational data use aligns with European human-centric AI and digital-rights principles.
