Show HN: Fahmatrix – A Lightweight, Pandas-Like DataFrame Library for Java

(github.com)

44 points by mousomashakel 2 days ago

15 comments

Hey HN, I’ve built Fahmatrix, a minimal, fast Java library for working with tabular data — inspired by Python’s pandas, but designed for performance and simplicity on the JVM.

After working extensively with Python’s data stack, I often ran into limitations related to speed, especially in larger or long-running data workflows. So I built Fahmatrix from scratch to offer similar APIs for manipulating CSVs, performing summary statistics, slicing rows/columns, and more — but all in Java.

Features:

Lightweight and dependency-free

CSV/TSV import with auto-headers

Series/DataFrame structures (like pandas)

describe(), mean(), stdDev(), percentile() and more

Fast parallel operations on numeric columns

Java 17+ support

Docs: https://moustafa-nasr.github.io/Fahmatrix/ GitHub: https://github.com/moustafa-nasr/fahmatrix

I’d love feedback from the Java and data communities — especially if you’ve ever wanted a simple dataframe utility in Java without needing full-scale ML libraries.

Happy to answer any questions!

uwemaurer 2 days ago

Always great to see efforts to make working with data frames easier. Here are some similar data frame libraries for Java:

https://github.com/jtablesaw/tablesaw

https://github.com/dflib/dflib

My preferred way is just use duckdb java API. I didn't see anything better in performance/efficiency. Also a SQL query is often easier to write

  • mousomashakel 14 hours ago

    Thanks! I'm aware of those great projects. Fahmatrix aims to offer a lightweight, dependency-free alternative that’s easy to embed in any Java app. DuckDB is super impressive, especially for SQL-heavy tasks — but my goal is more about a native, fluent API for those who prefer direct Java code over SQL.

  • theanonymousone a day ago

    Yes. It has bothered me for a long time too. Maybe the best mix is a dataframe library with basic operations (column select, non-null etc), which also allows SQL for more complex stuff?

    • mousomashakel 14 hours ago

      Totally agree that SQL can be the best tool for many jobs. My goal with Fahmatrix is to serve the opposite niche: where devs want something that's Java-native, procedural, and simple without reaching for an external engine. SQL support or DSL might come later though — I see the appeal.

    • radus a day ago

      Polars and duckdb interoperate nicely and can enable this flexibility

rickette 2 days ago

Congrats on putting this out there. There isn't a de facto pandas-like library in Java like you said. But for Kotlin there is: https://github.com/Kotlin/dataframe

  • mousomashakel 14 hours ago

    Thanks so much! Yep, I’ve seen the Kotlin DataFrame lib — very elegant. Fahmatrix is meant for plain Java users who want similar capabilities without switching ecosystems. Appreciate the support!

skanga 2 days ago

What about Tablesaw, Apache Arrow? How does this compare ...

  • mousomashakel 14 hours ago

    Good question. I’ll publish benchmarks soon, but the core difference is that Fahmatrix is fully Java, no JNI, and minimalistic — ideal for small projects or environments like Android. Tablesaw and Arrow are more powerful, but heavier. Fahmatrix aims to be the “just enough” middle ground.

owlstuffing a day ago

Nice!

I’m currently using manifold-sql with duckdb for this.

  • mousomashakel 14 hours ago

    Thanks! That’s a great combo — manifold-sql + duckdb gives you strong typing with powerful SQL under the hood. Fahmatrix is aiming to complement that approach for cases where you want quick, native Java code without SQL — e.g., when building data flows or custom logic inline. Would love to hear if you’ve hit any pain points that a Java-native approach could help with.