All articles

Knowledge Graphs Explained: RDF, SPARQL & the Semantic Web

Knowledge graphs explained for EU data scientists: RDF triples, SPARQL, OWL ontologies and the Semantic Web, with data.europa.eu linked open data examples.

Prep4EU Insight The EU's official open-data portal, data.europa.eu, publishes more than 1.6 million datasets as linked open data described with DCAT-AP metadata, and exposes a public SPARQL endpoint — making it one of the largest queryable knowledge graphs in the public sector.

What it is

Knowledge graphs are structured representations of real-world entities and the relationships between them, stored so that the connections themselves are first-class data rather than an afterthought. Where a spreadsheet records rows and a relational database records tables joined by foreign keys, a knowledge graph records facts as a web of nodes and edges: people, places, organisations, laws, datasets — and the explicit links that bind them. This makes them ideal for heterogeneous, highly connected, and evolving information, which is precisely the kind of data the European Union must exchange across 27 Member States and dozens of institutions.

The intellectual foundation for knowledge graphs at web scale is the Semantic Web, a vision championed by the World Wide Web Consortium (W3C) in which data, not just documents, is published in a machine-readable, linkable form. The Semantic Web rests on a small, coherent stack of open standards. At its base sits the Resource Description Framework (RDF), a data model that expresses every fact as a triple. Above RDF, RDFS (RDF Schema) adds lightweight vocabulary — classes, properties and subclass hierarchies — while OWL (Web Ontology Language) adds full logical expressivity for formal ontologies and automated reasoning. SKOS (Simple Knowledge Organization System) models controlled vocabularies, taxonomies and thesauri. Every resource is identified by a URI (or its internationalised form, the IRI), so that data published anywhere on the web can be linked to data published anywhere else. To query all of this, the stack provides SPARQL, the W3C query language for RDF, executed against specialised databases called triplestores (such as Apache Jena, GraphDB, Stardog and Virtuoso).

How it works in practice

The atomic unit of a knowledge graph is the RDF triple: a statement of the form subject – predicate – object. The subject is the thing being described, the predicate is the relationship or property, and the object is the value or the related thing. Because each element can be a URI, triples chain together: the object of one triple becomes the subject of another, and a graph emerges. The table below shows how a few plain-language facts about an EU dataset translate into triples.

Subject Predicate Object
dataset/eurostat-gdp dct:title "GDP per region"
dataset/eurostat-gdp dct:publisher org/eurostat
dataset/eurostat-gdp dcat:theme theme/ECON
person/123 schema:nationality authority/country/ITA

This is the essence of linked data: facts described with shared URIs so that "the publisher Eurostat" or "the country Italy" means the same thing in every system that references it. To retrieve information, you write a SPARQL query that describes a pattern of triples to match. A query such as SELECT ?name WHERE { ?person a schema:Person . ?person schema:name ?name . } asks the triplestore to return the names of every resource typed as a person. Where relational systems use rigid JOINs across tables, SPARQL traverses relationships directly, which is why multi-hop questions ("which datasets about mobility were published by bodies located in Member States that joined after 2004?") become natural to express.

Meaning is supplied by ontologies and vocabularies — formal models of the classes and properties a domain uses. A simple hierarchy is a taxonomy; adding broader, narrower and related links produces a thesaurus expressed in SKOS; adding formal classes, properties and logical rules produces an ontology in OWL. This ladder of expressivity is exactly what the EU operates at scale. The Publications Office, through its EU Vocabularies service, publishes the multilingual EuroVoc thesaurus and authority tables (for countries, regions and economic activity) as SKOS/RDF. The official open-data portal data.europa.eu describes its datasets with DCAT-AP, the European application profile of the W3C DCAT vocabulary, and exposes them through SPARQL endpoints and APIs. EUR-Lex publishes EU law as RDF with ELI and CELEX identifiers. Together these realise the semantic layer of the European Interoperability Framework (EIF) — the layer that ensures the meaning of exchanged data is understood identically by every party, not merely that the bytes arrive. Semantic interoperability, as EU practice repeatedly stresses, is the harder problem; technical connectivity alone never guarantees shared meaning.

Common points of confusion

Why it matters for EU data scientists

For a data scientist working inside the EU institutions, knowledge graphs are not a niche academic topic — they are the connective tissue of cross-border digital public services. The Interoperable Europe Act has made interoperability assessments mandatory for new and modified cross-border services, and every assessment touches the semantic layer where RDF, SKOS, OWL and DCAT-AP live. When Member State A says "citizen" and Member State B says "resident", it is shared vocabularies and ontologies — not a faster API — that let their records align. Reasoning over OWL ontologies lets harmonised queries succeed even when source systems describe a concept differently, bridging vocabularies through equivalence and subclass rules without rewriting any source system. Mastery here translates directly into the ability to publish discoverable open data, to build common European data spaces, and to make institutional datasets genuinely interoperable.

This is also core examinable content for the AD7 Data Science competition (EPSO/AD/429/26), where Field-4's data architecture and semantic technologies duty area tests precisely these distinctions: RDF as data versus RDFS/OWL as schema, SPARQL versus a reasoner, triplestore versus property graph, taxonomy versus ontology. If you are preparing for the competition or simply want to work fluently with EU linked data, build your foundations with the Prep4EU AD7 Data Science study pack.

Related guides

Ready to start preparing?

Practice MCQs in the exact EPSO format with instant feedback and explanations.

Start Learning