Comment by djrj477dhsnv

Comment by djrj477dhsnv 4 days ago

3 replies

What exactly is a "data platform"?

We have a large postgres server running on a dedicated server that handles millions of users, billions of record updates and inserts per day, and when I want to run an analysis I just open up psql. I wrote some dashboards and alerting in python that took a few hours to spin up. If we ever ran into load issues, we'd just set up some basic replication. It's all very simple and can easily scale further.

benrutter 4 days ago

Sounds like you have the benefit of a nicely designed server and good practices. A lot of companies aren't the same.

Imagine you're a big company with loads of teams/departments multiple different types of SQL servers for data reporting, plus some parquet datalakes, and hey, just for fun why not a bunch of csvs.

Getting data from all these locations becomes a full time job, so at some point someone wants some tool/ui that lets data analysts log into a single thing, and get the experience that you currently have with one postgres server.

I think it's not a problem of scale in the CS sense, more the business sense where big organisations become complex and disorganised and need abstractions on top to make them workable.

naijaboiler 3 days ago

we have databricks at my company 50m ARR, 150 employee thats still growing at 15% YoY. With 0 full time Data Engineer (1 data scientist + 1 db admin both co-manage everything on there as part-time jobs. They have their full-time role). We are able to have data from like 100 transactional database tables, Zendesk, all our logs of every API call, every single event from every user in our mobile and web applications, banking data, calendar data, goole play store data, apple store data, all in 1 place. We are a 2-sided marketplace, we can easily get 360 degree view of our B2B customers, B2C customers, measure employee productivity across all departments. It's that deep data understanding of our customers that powers our growth

My team of 3 data scientists are able to support a culture of experimentation, data-informed decision making accross the entire org.

And we do all that 30k annual spend on databricks. That's less than 1/5 the cost of 1 software engineer. Excellent value for money if you ask me.

I really struggle to imagine being able to that any cheaper. How else we can engineer a data hub for all of our data and manage appropriate access & permissions, run complex calculations in seconds (yes we have replaced overnight complex calculation done by engineering teams), join data from so many disparate sources, at a total cost (tool + labor) <80k/yr. I double dare you to suggest or find me a cheaper option for our use case.

groaninvasion 3 days ago

simple businesses dont need databricks. one humungous postgres handle operational transactions is what very simple businesses need