Comment by Bender
In my opinion if there is no overlapping networks or the Infrastructure as Code understands pods, k8's and such then /etc/hosts can speed up resolution leaving things outside of the data-center to utilize DNS then it makes sense but requires some critical thinking about how all the inter-dependencies in the data-center play together and how fail-overs are handled.
Why aren't cloud providers and FAANGs doing this already
This probably requires that anyone touching the Infrastructure as Code are all critical thinkers and fully understand the implications of mapping applications to hosts including but not limited to applications having their own load balancing mechanisms, fail-over IP addresses, application state and ARP timeouts, broadcast and multicast discovery. It can be done but I would expect large companies to avoid this potential complexity trap. It might work fine in smaller companies that have only senior/principal engineers. Using /etc/hosts for boot-strapping critical infrastructure nodes required for dynamic DNS updates could still make sense in some cases. Point being, this gets really complex and whatever is managing the Infrastructure as Code would have to fully aware of every level of abstraction, NAT's, SNAT's, hair-pin routes, load balanced virtual servers and origin nodes. Some companies are so big and complex that one human can not know the whole thing so everyone's silo knowledge has to be merged into this Inf as Code beast. Recursive DNS on the other hand only has to know the correct up-stream resolvers to use or if they are supposed to talk directly to the root DNS servers. This simplifies the layers upon layers of abstraction that manage their own application mapping and DNS.
Another complexity trap people get lured into is split-views which should be avoided due to growing into a complexity trap over time and breaking sites when one dependency starts to interfere with another. Everyone has to learn the hard way for themselves on this topic.
My preference would be to instead make DNS more resilient. Running Unbound [1] on every node pointing to a group of edge DNS resolvers for external IP addresses with customized settings to retry and keep state up the fastest upstream resolving DNS nodes, also caching infrastructure addresses and their state, setting realistic min/max DNS TTL times is a small step in the right direction. Dev/QA environments should also enable query logging to a tmpfs mount to help debug application misconfigurations and spot less than optimal uses of DNS within the infrastructure and application settings before anything gets to staging or production. Grab statistical data from Unbound on every node and ingest it into some form of big-data/AI web interface so questions about resolution, timing, errors may potentially be analyzed.
This is just my two cents based on my experience. If it seems like I was just spewing words I was watching Shane Gillis and did not want to turn it off.
[1] - https://unbound.docs.nlnetlabs.nl/en/latest/manpages/unbound...
Thanks for the well thought response friend :)
You made some really good points. But here is my follow up: With /etc/hosts, there is no need to complicate things, for example:
10.0.0.1 sql.app.local storage.local lb.corp.net
This line could be present on every host on every network, everywhere. The only thing that should matter in my opinion is that the name portion needs to be very specific. Even if you have NAT, SNAT, etc..., /etc/hosts is only relevant to the host attempting to resolve a name, it already knows what name to use.
So long as you have one big-and-flat /etc/hosts everywhere, you just have to make sure that whenever you change an IP for a service, the global /etc/hosts reflects that change. and of course the whole devops tests, reviews,etc... ensure you don't screw that up.
Back in the day, this was a really bad idea because the problem of managing /etc/hosts at scale wasn't solved. But it is just a configuration file for which IaC is best-suited.
DNS on the other hand is a complex system that has hierarchies, zones, different record types, aliases, TTLs, caches, and more. in a closed private network, is DNS really worth it when you have already invested in IaC?