Understanding Postgres 19 Property Graphs
Last post I was talking about Relational Deep Learning, which starts from the claim that your database schema is already a graph. Rows are nodes, foreign keys are edges, and a model can learn on that structure directly. The catch in that work is that the graph lives in Python. RelBench pulls your tables out into pandas, builds the graph in memory as a PyTorch object, and trains on it there. The database is just where the data happened to be sitting. Even you when you go look at the documentation for PyG (PyTorch Geometric), a library that makes it easy to write and train graph neural networks, the examples are for flat files or in memory.
With the new Property Graph feature in in Postgres 19, you can declare which tables are nodes and which are edges, then pattern-match over them with MATCH predicate.
I spent a while poking at Postgres property graphs on a Formula 1 dataset that’s referenced as part of the Relational Deep Learning benchmark. This post is about what property graphs in Postgres actually are, where they fit, what they compile down to, and the one thing that I learned: a single table can be a vertex and an edge at the same time.
You can check out the code I used to learn more about property graphs here: https://github.com/neovintage/relational-deep-learning-with-pg19
How I think about a relational schema as a graph
Take any normalized schema. You have dimension tables holding entities (drivers, constructors, circuits) and fact tables recording events (a race result, a qualifying session). The foreign keys wire them together. A results row has a driver_id, a race_id, and a constructor_id. Each of those is a pointer to a row in another table.
That represenation is a graph. Every row is a potential node in the node with every foreign key representing a potential edge. Your ER diagram is the schema-level version of it while the actual data is the instance-level version.
“Give me the constructor for each result” is a graph traversal that you write as results JOIN constructors ON .... SQL/PGQ does not add the graph. It adds a way to ask graph questions without spelling out the join.
What a property graph actually is
A property graph in Postgres is a named object you create over tables you already have (trimmed here to a few tables to keep it readable):
CREATE PROPERTY GRAPH f1
VERTEX TABLES (
drivers KEY (driver_id) LABEL driver
PROPERTIES (driver_id, code, nationality, dob),
races KEY (race_id) LABEL race
PROPERTIES (race_id, year, round, date),
results KEY (result_id) LABEL result
PROPERTIES (result_id, grid, position, points, status_id, date)
)
EDGE TABLES (
results_driver
SOURCE KEY (result_id) REFERENCES results (result_id)
DESTINATION KEY (driver_id) REFERENCES drivers (driver_id)
LABEL of_driver,
results_race
SOURCE KEY (result_id) REFERENCES results (result_id)
DESTINATION KEY (race_id) REFERENCES races (race_id)
LABEL in_race
);
VERTEX TABLES lists the tables whose rows are nodes. Each gets a KEY (its identity, almost always the primary key), a LABEL you’ll match on, and a PROPERTIES list of the columns you want queryable. EDGE TABLES lists the tables whose rows are connections, each wired with a SOURCE and a DESTINATION that reference vertex keys.
The important part: this moves no data. CREATE PROPERTY GRAPH is a declaration, an overlay on the foreign-key structure you already have. The rows stay in their tables. You are describing how to read them as a graph, not making a copy.
Then you query it with MATCH:
SELECT g.code, g.points
FROM GRAPH_TABLE (f1
MATCH (d IS driver)<-[IS of_driver]-(res IS result)-[IS in_race]->(ra IS race)
COLUMNS (d.code AS code, res.points AS points)
) AS g;
The pattern reads like a sentence: a driver, connected by an of_driver edge back from a result, which connects by in_race to a race. The COLUMNS clause is what gets projected out. Everything inside GRAPH_TABLE(...) is the graph query, and the outer SELECT treats the result like any other table.
There are exactly two kinds of thing in a property graph: vertices and edges. That’s the whole structural vocabulary. Labels and properties are attributes of those two kinds, not additional kinds. When you describe the graph in psql (more on that below), every element reports an “Element Kind” that is either vertex or edge, and there is no third option.
What it compiles to
A MATCH is not a separate execution engine bolted onto Postgres. It compiles to relational joins. Run EXPLAIN over that three-element pattern and you get:
Hash Join
Hash Cond: (results_race.race_id = races.race_id)
-> Hash Join
Hash Cond: (results.result_id = results_race.result_id)
-> Hash Join
Hash Cond: (results_driver.driver_id = drivers.driver_id)
-> Hash Join
Hash Cond: (results.result_id = results_driver.result_id)
...
Four hash joins over the base tables. The graph pattern I wrote turned into exactly the joins I would have written by hand, planned by the same optimizer, using the same indexes and the same statistics. The graph is syntax. If a traversal is slow, it is slow for the same reasons a join is slow, and you fix it the same way.
You poke at the graph with the same tools you use for tables, too. \dG lists property graphs. \d f1 describes one: every vertex and edge, its backing table, its kind, and for edges the source and destination vertices. \d+ f1 reconstructs the full CREATE PROPERTY GRAPH statement, including things SQL/PGQ inferred that you never wrote, like the edge key it derived from each edge table’s primary key.
Two gotchas I learned by walking into them.
First, a KEY column is not automatically a property. List driver_id as the vertex key and then run MATCH (d IS driver WHERE d.driver_id = 1), and it fails, because driver_id is not queryable unless you also put it in PROPERTIES. The key identifies the node. It does not expose it. If you want to filter or return id columns, list them in PROPERTIES explicitly.
Second, no variable-length paths. This is the big one. In a mature graph query language you can write “follow this edge one to three times” with a quantifier. Postgres 19 does not have it:
ERROR: element pattern quantifier is not supported
You write every hop explicitly. Two hops means two edge patterns in the MATCH. A path that could be any length is not something you can express here. You drop back to a recursive CTE over the underlying tables, which works fine but means leaving the property-graph syntax behind.
Mapping your schema onto vertices and edges
This is the part I actually care about, because it is where the modeling decisions live. How do the tables you already have become vertices and edges? Three cases, easiest to hardest.
Dimension tables become vertices. A drivers table, a constructors table, a circuits table. Each row is an entity with attributes and a stable primary key. These map straight across: vertex table, key is the PK, properties are the columns you care about.
Pure join tables become edges. A classic many-to-many bridge, the has_and_belongs_to_many shape if you came up through Rails, say student_courses(student_id, course_id), is already an edge wearing a table costume. Its whole job is to connect two entities. Declare it as an edge table, source referencing one side, destination the other. Its rows are the relationships. This is the case property graphs were designed for, and it is the one nobody trips on.
Here it is in a picture. The bridge table sits between two entities, one foreign key to each side, and it drops straight into an edge:
students student_courses courses
┌────────────┐ ┌────────────────┐ ┌───────────┐
│ student_id │◄──────│ student_id FK │──────►│ course_id │
│ name │ │ course_id FK │ │ title │
└────────────┘ └────────────────┘ └───────────┘
one many rows one
as a property graph, the join table IS the edge:
( student ) ──────[ enrolled_in ]──────► ( course )
vertex edge vertex
= students = student_courses = courses
Two node types, one edge type. You already have the edge table sitting in your schema. You are just telling Postgres to read it as one.
Fact tables are where it gets interesting. A results row is the hard case. It points at three things: a driver, a race, and a constructor. Three foreign keys on one row. And it carries its own data: grid position, finishing position, points, status. So what is it, a node or an edge?
An edge in SQL/PGQ is strictly binary. One source, one destination. A results row has three endpoints, so it does not fit as a single edge without throwing two of them away. And even if it had only two, an edge is a thin connector with nowhere natural to hang grid and points and status as first-class, queryable, traversable data.
So a fact table with several foreign keys and its own columns wants to be a vertex. You make results a node, and then you need somewhere for the connections to live. You then create narrow edge tables, one per foreign key, each pairing the result’s primary key with one of its foreign keys.
CREATE TABLE results_driver AS SELECT result_id, driver_id FROM results;
ALTER TABLE results_driver ADD PRIMARY KEY (result_id);
Now results is a vertex, a hub carrying the features, and results_driver, results_race, results_constructor are the edges fanning out from it. The shape is driver <- result -> race, with the result sitting in the middle as a real node you can land on, filter, and reason about. That hub shape is what you want any time the event itself is the thing of interest, which, for a fact table, it usually is.
The choice across all three cases reduces to one question: what are you asking? If the relationship is the thing (who connects to whom), it is an edge. If the row is the thing (this specific event, with its own attributes, that you want to reason about), it is a vertex. A junction table is a relationship. A fact table is an event. They map differently because they mean different things.
One table, both a vertex and an edge
One thing I did learn in my exploration is that a single table can be a vertex and an edge in the same graph, at the same time.
I had assumed each table picks a side and that you had to have join tables in between. The “vertex or edge” rule applies to each element of the graph, not to the underlying table. You can declare the same table twice, once under VERTEX TABLES and once under EDGE TABLES, as long as the two get different aliases.
So results can be a result vertex and also the source of edges built from its own columns, with no separate edge table at all:
EDGE TABLES (
results AS res_driver
SOURCE KEY (result_id) REFERENCES results (result_id)
DESTINATION KEY (driver_id) REFERENCES drivers (driver_id)
LABEL of_driver,
results AS res_race
SOURCE KEY (result_id) REFERENCES results (result_id)
DESTINATION KEY (race_id) REFERENCES races (race_id)
LABEL in_race
)
That results AS res_driver aliases the base table as an edge, using its existing result_id and driver_id columns. There is no results_driver table to build and store. I made a version of my graph this way, aliased results three times for its three foreign keys, and traversed it. The materialized edge tables were duplicating columns that already lived in the base table.
A fact table has its own columns and several foreign keys, which is what makes it neither a clean vertex nor a clean edge until you decide:
results (one fact row = one event)
┌──────────────────────────────────────┐
│ result_id PK │
│ driver_id FK -> drivers │
│ race_id FK -> races │
│ constructor_id FK -> constructors │
│ grid, points, status (its features) │
└──────────────────────────────────────┘
one `results` table, read four ways:
( driver ) ( race ) ( constructor )
▲ ▲ ▲
of_driver │ in_race │ for_constructor
└───────────────┼───────────────┘
│
( result ) the result VERTEX
results -> result vertex (holds the features)
results AS res_driver -> edge result -> driver
results AS res_race -> edge result -> race
results AS res_constr -> edge result -> constructor
Each edge alias is one source and one destination, so a three-foreign-key fact table needs three aliases, not one clever edge. You are trading three materialized tables (or views) for three aliases over one table. The aliases store nothing.
When I had two fact tables in the same graph and did not list PROPERTIES, it blew up:
ERROR: property "number" data type mismatch: double precision vs. bigint
Without a PROPERTIES list, SQL/PGQ exposes every column as a property. Both results and qualifying have a column called number, with different types, and a property of the same name has to have one type across the whole graph. So they collided. The fix is the same PROPERTIES whitelist from earlier: name the columns you want, and the conflicting ones never enter the graph. The key-is-not-a-property gotcha and this one are the same lesson arriving from two directions.
When to use Postgres Property Graphs
Property graphs are good at one specific shape of question: find the things connected to X in a particular pattern. A few that the F1 graph answers:
- Which constructors did this driver race for? One hop, driver to result to constructor.
- Who were this driver’s competitors? A path through a shared race: driver to result to race, back to another result, out to another driver. As a self-join that is a fiddly four-way thing. As a
MATCHpattern it is one readable line. - Filtered patterns: results where the grid position was outside the top ten and the driver was Italian. Structure and attributes in the same query.
The common thread is that you know the shape of the relationship in advance. You are finding, filtering, or aggregating things that match a fixed, bounded structure.
Where they fall down is everything where the shape is the unknown. “The shortest path between two drivers.” “Everything reachable within N hops.” Those need variable-length traversal, and Postgres 19 does not have it, so you are back to recursive CTEs over the base tables. Graph algorithms (PageRank, community detection, centrality) are not pattern matching and do not belong here either.
The line I would draw: a property graph answers “find the things connected to X in this shape.” It does not answer “how is X connected to Y” when you do not know the path, and it does not answer “what is structurally important in this graph.” For the first kind of question it is a genuinely nicer surface than the equivalent joins. For the second, it is the wrong tool, and the database will tell you so with a quantifier error.
Current thoughts
SQL/PGQ is a declarative overlay on the foreign-key structure you already have. It compiles to joins, rides the same optimizer, and stores nothing extra unless you choose to. The win is readability for fixed-shape traversals, plus a single named object that documents your schema as a graph. The missing feature is variable-length paths, which push the deep or open-ended traversals back into recursive SQL. If Postgres could do that, it’d be a good contender for light graph related workloads.
The big learning for me in this exercise is a table is not a vertex or an edge. A table is rows, and you decide, per graph, whether to read those rows as nodes, as connections, or as both. The fact table that is a hub vertex and an edge source at the same time is best example that the relational and graph models are really the same thing.
So if you are on Postgres 19 and someone tells you to stand up a graph database like Neo4j, check first whether you actually just want to ask graph-shaped questions of a schema you already have. If the questions are fixed-shape, declare a property graph and write the MATCH predicate. If the questions you want to ask require querying an unknown depth, you were going to write a recursive CTE anyway. Either way, the data never has to leave. As with anything, make sure you test it out to get the performance you’re looking for!