To fix the problem, there is a very simple solution: enhance the DNS server to perform a health check against the resulting host of the resource record. For example, the administrator could specify the TCP port to connect to as in the imaginary syntax below:
www A 1.1.1.1 80 ; return this record only if we can connect to its TCP port 80
www A 1.1.1.2 80
www A 1.1.1.3 80
Of course, the health check could be more general, then you could use a script:
www A 1.1.1.1 web-check.sh ; return this record only if the script returns true
where the IP would be passed to that script as an argument for checking.
It works for domain controllers too:
_ldap._tcp.dc._msdcs.foo.com. SRV 1.1.1.1 dc-check.sh
_ldap._tcp.dc._msdcs.foo.com. SRV 1.1.1.2 dc-check.sh
Finally, one might ask why implement this checking in the DNS server instead of the clients? The idea is that problems should be detected as early as possible to avoid bad effects downstream. In concrete terms, if a server is down but the DNS server (broker) still refers the clients to it, many clients will need to perform this health check themselves. But if the DNS server performs this health check, the checking is only done once, saving a lot of trouble downstream.