Comment by busterarm
Comment by busterarm 10 hours ago
Just one of the couple dozen databases we run for our product in the dev environment alone is over 12 TB.
How could I not use the cloud?
Comment by busterarm 10 hours ago
Just one of the couple dozen databases we run for our product in the dev environment alone is over 12 TB.
How could I not use the cloud?
> Just one of the couple dozen databases we run for our product in the dev environment alone is over 12 TB.
> How could I not use the cloud?
Funnily enough, one of my side projects has its (processed) primary source of truth at that exact size. Updates itself automatically every night adding a further ~18-25 million rows. Big but not _big_ data, right?
Anyway, that's sitting running happily with instant access times (yay solid DB background) on a dedicated OVH server that's somewhere around £600/mo (+VAT) and shared with a few other projects. OVH's virtual rack tech is pretty amazing too, replicating that kind of size on the internal network is trivial too.
And plenty of datacenters will be happy to give you some space in one of their racks.
Not wanting to deal with backups or HA are decent reasons to put a database in the cloud (as long as you are aware how much you are overpaying). Not having a good place to put the server is not a good reason
12 TB fits entirely into the RAM of a 2U server (cf. Dell PowerEdge R840).
However, I think there's an implicit point in TFA; namely, that your personal and side projects are not scaling to a 12 TB database.
With that said, I do manage approximately 14 TB of storage in a RAIDZ2 at my home, for "Linux ISOs". The I/O performance is "good enough" for streaming video and BitTorrent seeding.
However, I am not sure what your latency requirements and access patterns are. If you are mostly reading from the 12 TB database and don't have specific latency requirements on writes, then I don't see why the cloud is a hard requirement? To the contrary, most cloud providers provide remarkably low IOPS in their block storage offerings. Here is an example of Oracle Cloud's block storage for 12 TB:
Max Throughput: 480 MB/s
Max IOPS: 25,000
https://docs.oracle.com/en-us/iaas/Content/Block/Concepts/bl...Those are the kind of numbers I would expect of a budget SATA SSD, not "NVMe-based storage infrastructure". Additionally, the cost for 12 TB in this storage class is ~$500/mo. That's roughly the cost of two 14 TB hard drives in a mirror vdev on ZFS (not that this is a good idea btw).
This leads me to guess most people will prefer a managed database offering rather than deploying their own database on top of a cloud provider's block storage. But 12 TB of data in the gp3 storage class of RDS costs about $1,400/mo. That is already triple the cost of the NAS in my bedroom.
Lastly, backing up 12 TB to Backblaze B2 is about $180/mo. Given that this database is for your dev environment, I am assuming that backup requirements are simple (i.e. 1 off-site backup).
The key point, however, is that most people's side projects are unlikely to scale to a 12 TB dev environment database.
Once you're at that scale, sure, consider the cloud. But even at the largest company I worked at, a 14 TB hard drive was enough storage (and IOPS) for on-prem installs of the product. The product was an NLP-based application that automated due diligence for M&As. The storage costs were mostly full-text search indices on collections of tens of thousands of legal documents, each document could span hundreds to thousands of pages. The backups were as simple as having a second 14 TB hard drive around and periodically checking the data isn't corrupt.
Still missing the point. This is just one server and just in the dev enviornment?
How many pets do you want to be tending to? I have 10^5 servers I'm responsible for...
The quantity and methods the cloud affords me allow me to operate the same infrastructure with 1/10th as much labor.
At the extreme ends of scale this isn't a benefit, but for large companies in the middle this is the only move that makes any sense.
99% of posts I read talking about how easy and cheap it is to be in the datacenter all have a single digit number of racks worth of stuff. Often far less.
We operate physical datacenters as well. We spend multiple millions in the cloud per month. We just moved another full datacenter into the cloud and the difference in cost between the two is less than $50k/year. Running in physical DCs is really inefficient for us for a long of annoying and insurmountable reasons. And we no longer have to deal with procurement and vendor management. My engineers can focus their energy on more valuable things.
What is this ridiculous bait and switch. First you talk about a 12 TB dev databases and "How could I not use the cloud?". And you rightfully get challenged on that and then suddenly it's about the number of servers you have to manage and you don't have the energy to do that with your team. Those two have nothing to do with each other.
Why do people think it takes "labor" to have a server up and running?
Multiple millions in the cloud per month?
You could build a room full of giant servers and pay multiple people for a year just on your monthly server bill.
Sounds more like your use case is like the 1~2% of the cases a simple server and sqlite is maybe not the correct answer.
what are you doing that you have 12TB in dev??? my startup isn't even using a TB in production and we hands multiple millions of dollars in transactions every month.
12TB is $960/month in gp3 storage alone. You can buy 12TB of NVMe storage for less than $960, and it will be orders of magnitude faster than AWS.
Your use case is the _worst_ use case for the cloud.