From Tables to Relationships: Visualizing Iceberg Data as a Graph

Relationships matter, especially when you’re working with complex, interconnected data in a data lakehouse. PuppyGraph is a high-performance graph compute engine that lets you query structured data as a graph, making it easy to uncover relationships and patterns without moving your data into a separate graph database. When combined with Apache Iceberg, Project Nessie for table management, and MinIO AIStore for object storage, you get a fully integrated, cloud-native stack for secure, versioned, graph-based analytics. All this functionality and the whole stack can be containerized and deployed anywhere: on-prem, in the public clouds, in colos or in data centers.
This tutorial walks you through getting started with setting up an Iceberg data lakehouse using PuppyGraph to query and visualize your data in MinIO as a graph.
What is a Graph Database?
A graph database models data as nodes and edges rather than rows and tables. This structure makes it ideal for querying relationships. Those relationships could be it's customer behavior, fraud detection, supply chain optimization, or even social behavior graphs. Graph databases come in two major flavors: RDF (Resource Description Framework) and property graphs. RDF is schema-driven and primarily used in semantic web contexts. Property graphs, on the other hand, are more flexible and intuitive, supporting arbitrary attributes on both nodes and edges.
PuppyGraph is a property graph engine. It supports Gremlin and Cypher query languages, integrates directly with tabular data sources like Iceberg, and avoids the heavyweight infrastructure typical of traditional graph databases.
Graphs without Migration
Traditionally, leveraging compute required moving data. The past was burdened by tightly coupled compute and storage which meant that if you wanted to actually do anything with your data, it had to be ingested into this or that proprietary platform. Locked in with very little chance of escape.
Modern compute, like PuppyGraph allows your to query over your data in object storage without migration. Creating a centralized source of truth, but not locking you in to this or that engine. This pattern reduces complexity, preserves data integrity and keeps the architecture future forward.
AIStore and Graphs
MinIO AIStore provides the object storage foundation for this architecture. It delivers high-throughput performance on standard hardware, which makes it a strong fit for data-intensive workloads such as graph analytics. AIStore supports both S3 Select and S3 Express APIs. In recent benchmarks, the AIStor S3 Express API delivered more than twice the throughput for standard operations, enabling fast reads directly from storage, particularly for large objects.
Using AIStore for graph infrastructure allows teams to store structured and unstructured data in the same scalable, versioned object store that supports the rest of the data lakehouse. This avoids unnecessary data movement and reduces system complexity. Everything is accessible through a single, consistent Global Console.
AIStore is built for Kubernetes. It can be deployed with standard infrastructure-as-code workflows and scales predictably across cloud, on-prem, and edge environments. This consistency is essential for organizations that require reproducibility, automation, and operational control. The below tutorial is a proof of concept of that deployability, creating an entire graph analytics infrastructure from scratch within a single Docker container that takes seconds to spin up.
Prerequisites
To get started with the tutorial, you will need Docker. You can download Docker from here.
Please ensure that docker compose is available. The installation can be verified by running: docker compose version
Getting Started
Create a docker-compose.yaml and paste in the following:
services:
spark-iceberg:
image: tabulario/spark-iceberg
container_name: spark-iceberg
networks:
iceberg_net:
depends_on:
- minio
- nessie
volumes:
- ./warehouse:/home/iceberg/warehouse
- ./notebooks:/home/iceberg/notebooks/notebooks
environment:
- AWS_ACCESS_KEY_ID=admin
- AWS_SECRET_ACCESS_KEY=password
- AWS_REGION=us-east-1
ports:
- 8888:8888
- 8080:8080
- 10000:10000
- 10001:10001
minio:
image: quay.io/minio/minio
container_name: minio
networks:
iceberg_net:
ports:
- 9000:9000
- 9001:9001
environment:
- MINIO_ROOT_USER=admin
- MINIO_ROOT_PASSWORD=password
- MINIO_REGION=us-east-1
entrypoint: >
/bin/sh -c "
minio server /data --console-address ':9001' &
sleep 5;
mc alias set myminio http://localhost:9000 admin password;
mc mb myminio/my-bucket --ignore-existing;
tail -f /dev/null"
nessie:
image: ghcr.io/projectnessie/nessie:0.104.1
container_name: nessie
networks:
iceberg_net:
ports:
- 19120:19120
environment:
- nessie.version.store.type=IN_MEMORY
- nessie.catalog.default-warehouse=warehouse
- nessie.catalog.warehouses.warehouse.location=s3://my-bucket
- nessie.catalog.service.s3.default-options.region=us-east-1
- nessie.catalog.service.s3.default-options.endpoint=http://minio:9000/
- nessie.catalog.service.s3.default-options.path-style-access=true
- nessie.catalog.service.s3.default-options.access-key=urn:nessie-secret:quarkus:nessie.catalog.secrets.access-key
- nessie.catalog.secrets.access-key.name=admin
- nessie.catalog.secrets.access-key.secret=password
- nessie.server.authentication.enabled=false
puppygraph:
image: puppygraph/puppygraph:stable
container_name: puppygraph
networks:
iceberg_net:
environment:
- PUPPYGRAPH_USERNAME=puppygraph
- PUPPYGRAPH_PASSWORD=puppygraph123
ports:
- "8081:8081"
- "8182:8182"
- "7687:7687"
depends_on:
- spark-iceberg
networks:
iceberg_net:
To start up the services, run the command: docker compose up -d
.
If successful, you should get a message that looks like this:
[+] Running 5/5
✔ Network puppygraph-spark_iceberg_net Created 0.0s
✔ Container nessie Started 0.6s
✔ Container minio Started 0.6s
✔ Container spark-iceberg Star... 0.4s
✔ Container puppygraph Started 0.5s
Navigate to http://localhost:9001 to log in to MinIO with the username and password of admin:password
to see that a bucket called my-bucket
has been automatically created for you.
Create Data
Run the following command to open up a spark shell:
docker exec -it spark-iceberg spark-sql \
--conf spark.sql.catalog.demo=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.demo.uri=http://nessie:19120/iceberg/ \
--conf spark.sql.catalog.demo.warehouse=s3://my-bucket/ \
--conf spark.sql.catalog.demo.type=rest
Once your SQL shell is created, run the following to create and insert data into four tables: person, software, created_info, knows
CREATE TABLE person (
id VARCHAR,
name VARCHAR,
age int
);
INSERT INTO person VALUES
('v1', 'marko', 29),
('v2', 'vadas', 27),
('v4', 'josh', 32),
('v6', 'peter', 35);
CREATE TABLE software (
id VARCHAR,
name VARCHAR,
lang VARCHAR
);
INSERT INTO software VALUES
('v3', 'lop', 'java'),
('v5', 'ripple', 'java');
CREATE TABLE created_info (
id VARCHAR,
from_id VARCHAR,
to_id VARCHAR,
weight double
);
INSERT INTO created_info VALUES
('e9', 'v1', 'v3', 0.4),
('e10', 'v4', 'v5', 1.0),
('e11', 'v4', 'v3', 0.4),
('e12', 'v6', 'v3', 0.2);
CREATE TABLE knows (
id VARCHAR,
from_id VARCHAR,
to_id VARCHAR,
weight double
);
INSERT INTO knows VALUES
('e7', 'v1', 'v2', 0.5),
('e8', 'v1', 'v4', 1.0);
Add Nessie to Puppygraph
Log into PuppyGraph at http://127.0.0.1:8081/ PuppyGraph with the credentials puppygraph:puppygraph123
.
Click on Create graph schema to create a new graph schema. Fill out the following to add Nessie as a data source:
Catalog type: Apache Iceberg
Catalog name: nessie
Metastore type: iceberg-rest
RestURi: http://nessie:19120/iceberg
Warehouse - Optional: s3://my-bucket
Storage Type: S3 Compatible
Endpoint: http://minio:900
Access key - Optional: admin
Secret Key: password
Enable Path Style Access: Check
Create a Schema
Nodes
Schemas are built from nodes that consist of tables. Follow the on screen directives to create the first node for the person table. Allow PuppyGraph to select the ID.
Add the remaining tables: created, knows, and software.
Edges
Add edges to define the relationships between notes. For example, the key ID is shared between the table's person and created. Defining this relationship as an edge will create a line between the tables graphing the relationship visually.
Continue to create edge relationships between the tables.
Submit before continuing to the next steps or your selections will be lost
Craft Queries
You can use the dashboard to keep track of the nodes and edges you’ve created and use the query interface to write queries against your data.
Shutdown Docker
When you’re don experimenting, run the following commands to stop and remove everything created by docker; all the containers, networks and volumes.
docker compose down
Relationships Matter
Data lakehouses contain complex and interconnected information. Entities, events, dependencies, and metadata often span multiple tables and systems. Making these relationships available for analysis is a core responsibility of modern data architecture. By combining Apache Iceberg, Project Nessie, MinIO AIStore, and PuppyGraph, teams can expose and explore these connections without adding operational complexity or introducing new data stores.
Join us on our Slack or at hello@min.io with any questions.