Knowledge Graphs Explained: RDF, SPARQL & the Semantic Web

Prep4EU Insight The EU's official open-data portal, data.europa.eu, publishes more than 1.6 million datasets as linked open data described with DCAT-AP metadata, and exposes a public SPARQL endpoint — making it one of the largest queryable knowledge graphs in the public sector.

What it is

Knowledge graphs are structured representations of real-world entities and the relationships between them, stored so that the connections themselves are first-class data rather than an afterthought. Where a spreadsheet records rows and a relational database records tables joined by foreign keys, a knowledge graph records facts as a web of nodes and edges: people, places, organisations, laws, datasets — and the explicit links that bind them. This makes them ideal for heterogeneous, highly connected, and evolving information, which is precisely the kind of data the European Union must exchange across 27 Member States and dozens of institutions.

The intellectual foundation for knowledge graphs at web scale is the Semantic Web, a vision championed by the World Wide Web Consortium (W3C) in which data, not just documents, is published in a machine-readable, linkable form. The Semantic Web rests on a small, coherent stack of open standards. At its base sits the Resource Description Framework (RDF), a data model that expresses every fact as a triple. Above RDF, RDFS (RDF Schema) adds lightweight vocabulary — classes, properties and subclass hierarchies — while OWL (Web Ontology Language) adds full logical expressivity for formal ontologies and automated reasoning. SKOS (Simple Knowledge Organization System) models controlled vocabularies, taxonomies and thesauri. Every resource is identified by a URI (or its internationalised form, the IRI), so that data published anywhere on the web can be linked to data published anywhere else. To query all of this, the stack provides SPARQL, the W3C query language for RDF, executed against specialised databases called triplestores (such as Apache Jena, GraphDB, Stardog and Virtuoso).

How it works in practice

The atomic unit of a knowledge graph is the RDF triple: a statement of the form subject – predicate – object. The subject is the thing being described, the predicate is the relationship or property, and the object is the value or the related thing. Because each element can be a URI, triples chain together: the object of one triple becomes the subject of another, and a graph emerges. The table below shows how a few plain-language facts about an EU dataset translate into triples.

Subject	Predicate	Object
dataset/eurostat-gdp	dct:title	"GDP per region"
dataset/eurostat-gdp	dct:publisher	org/eurostat
dataset/eurostat-gdp	dcat:theme	theme/ECON
person/123	schema:nationality	authority/country/ITA

This is the essence of linked data: facts described with shared URIs so that "the publisher Eurostat" or "the country Italy" means the same thing in every system that references it. To retrieve information, you write a SPARQL query that describes a pattern of triples to match. A query such as SELECT ?name WHERE { ?person a schema:Person . ?person schema:name ?name . } asks the triplestore to return the names of every resource typed as a person. Where relational systems use rigid JOINs across tables, SPARQL traverses relationships directly, which is why multi-hop questions ("which datasets about mobility were published by bodies located in Member States that joined after 2004?") become natural to express.

Meaning is supplied by ontologies and vocabularies — formal models of the classes and properties a domain uses. A simple hierarchy is a taxonomy; adding broader, narrower and related links produces a thesaurus expressed in SKOS; adding formal classes, properties and logical rules produces an ontology in OWL. This ladder of expressivity is exactly what the EU operates at scale. The Publications Office, through its EU Vocabularies service, publishes the multilingual EuroVoc thesaurus and authority tables (for countries, regions and economic activity) as SKOS/RDF. The official open-data portal data.europa.eu describes its datasets with DCAT-AP, the European application profile of the W3C DCAT vocabulary, and exposes them through SPARQL endpoints and APIs. EUR-Lex publishes EU law as RDF with ELI and CELEX identifiers. Together these realise the semantic layer of the European Interoperability Framework (EIF) — the layer that ensures the meaning of exchanged data is understood identically by every party, not merely that the bytes arrive. Semantic interoperability, as EU practice repeatedly stresses, is the harder problem; technical connectivity alone never guarantees shared meaning.

Common points of confusion

RDF graph vs Labelled Property Graph. Both store data as nodes and edges, but they are different families. An RDF triplestore (queried with SPARQL) uses global URIs, making it the natural choice for linked open data, cross-organisation interoperability and reasoning. A property graph such as Neo4j (queried with Cypher) attaches key–value properties to nodes and edges and uses internal, local identifiers — excellent for fast traversal in internal analytics like fraud-network detection, but not built for web-scale linking. Publishing on data.europa.eu points to RDF; an internal network app points to a property graph.
RDFS vs OWL. RDFS is a lightweight schema layer (classes, properties, subclass hierarchies). OWL is a full ontology language that adds logic — cardinality, equivalence, disjointness, inverse properties — and so enables a reasoner to infer new, implicit facts and to check an ontology for contradictions. If a task requires deriving an unstated relationship or detecting an inconsistency, that is OWL plus a reasoner, not RDFS and not a plain query.
A knowledge graph vs an ordinary graph database. Any graph database can store nodes and edges; a knowledge graph adds a layer of formal semantics — ontologies and vocabularies that give the entities and relationships agreed, machine-interpretable meaning. The graph database is the storage engine; the knowledge graph is the meaningful model it holds.

Why it matters for EU data scientists

For a data scientist working inside the EU institutions, knowledge graphs are not a niche academic topic — they are the connective tissue of cross-border digital public services. The Interoperable Europe Act has made interoperability assessments mandatory for new and modified cross-border services, and every assessment touches the semantic layer where RDF, SKOS, OWL and DCAT-AP live. When Member State A says "citizen" and Member State B says "resident", it is shared vocabularies and ontologies — not a faster API — that let their records align. Reasoning over OWL ontologies lets harmonised queries succeed even when source systems describe a concept differently, bridging vocabularies through equivalence and subclass rules without rewriting any source system. Mastery here translates directly into the ability to publish discoverable open data, to build common European data spaces, and to make institutional datasets genuinely interoperable.

This is also core examinable content for the AD7 Data Science competition (EPSO/AD/429/26), where Field-4's data architecture and semantic technologies duty area tests precisely these distinctions: RDF as data versus RDFS/OWL as schema, SPARQL versus a reasoner, triplestore versus property graph, taxonomy versus ontology. If you are preparing for the competition or simply want to work fluently with EU linked data, build your foundations with the Prep4EU AD7 Data Science study pack.

Knowledge Graphs Explained: RDF, SPARQL & the Semantic Web

What it is

How it works in practice

Common points of confusion

Why it matters for EU data scientists

Related guides

Ready to start preparing?