Humphrey 15 hours ago

Yes — PMTiles is exactly that: a production-ready, single-file, static container for vector tiles built around HTTP range requests.

I’ve used it in production to self-host Australia-only maps on S3. We generated a single ~900 MB PMTiles file from OpenStreetMap (Australia only, up to Z14) and uploaded it to S3. Clients then fetch just the required byte ranges for each vector tile via HTTP range requests.

It’s fast, scales well, and bandwidth costs are negligible because clients only download the exact data they need.

https://docs.protomaps.com/pmtiles/

  • simonw 15 hours ago

    PMTiles is absurdly great software.

    • Humphrey 15 hours ago

      I know right! I'd never heard of HTTP Range requests until PMTiles - but gee it's an elegant solution.

      • keepamovin 5 hours ago

        Hadn't seen PMTiles before, but that matches the mental model exactly! I chose physical file sharding over Range Requests on a single db because it felt safer for 'dumb' static hosts like CF. - less risk of a single 22GB file getting stuck or cached weirdly. Maybe it would work

    • hyperbolablabla 4 hours ago

      My only gripe is that the tile metadata is stored as JSON, which I get is for compatibility reasons with existing software, but for e.g. a simple C program to implement the full spec you need to ship a JSON parser on top of the PMTiles parser itself.

      • seg_lol an hour ago

        A JSON parser is less than a thousand lines of code.

  • nextaccountic 12 hours ago

    That's neat, but.. is it just for cartographic data?

    I want something like a db with indexes

    • jtbaker 7 hours ago

      Look into using duckdb with remote http/s3 parquet files. The parquet files are organized as columnar vectors, grouped into chunks of rows. Each row group stores metadata about the set it contains that can be used to prune out data that doesn’t need to be scanned by the query engine. https://duckdb.org/docs/stable/guides/performance/indexing

      LanceDB has a similar mechanism for operating on remote vector embeddings/text search.

      It’s a fun time to be a dev in this space!

simonw 16 hours ago

There was a UK government GitHub repo that did something interesting with this kind of trick against S3 but I checked just now and the repo is a 404. Here are my notes about what it did: https://simonwillison.net/2025/Feb/7/sqlite-s3vfs/

Looks like it's still on PyPI though: https://pypi.org/project/sqlite-s3vfs/

You can see inside it with my PyPI package explorer: https://tools.simonwillison.net/zip-wheel-explorer?package=s...

__turbobrew__ 11 hours ago

gdal vsis3 dynamically fetches chunks of rasters from s3 using range requests. It is the underlying technology for several mapping systems.

There is also a file format to optimize this https://cogeo.org/

[removed] 15 hours ago
[deleted]
ericd 16 hours ago

This is somewhat related to a large dataset browsing service a friend and I worked on a while back - we made index files, and the browser ran a lightweight query planner to fetch static chunks which could be served from S3/torrents/whatever. It worked pretty well, and I think there’s a lot of potential for this style of data serving infra.

omneity 12 hours ago

I tried to implement something similar to optimize sampling semi-random documents from (very) large datasets on Huggingface, unfortunately their API doesn't support range requests well.

mootothemax 10 hours ago

This is pretty much well what is so remarkable about parquet files; not only do you get seekable data, you can fetch only the columns you want too.

I believe that there are also indexing opportunities (not necessarily via eg hive partitioning) but frankly - am kinda out of my depth pn it.