Comment by trod123

Comment by trod123 10 months ago

13 replies

You are not wrong that this is puzzling, especially when viewed through the perspective lens of a professional with background in these areas (10 years).

There are many red flags which beg questions.

That said, I stopped taking them at their word years ago, this isn't the first time they've had dubious announcements following entirely preventable failures. In my mind, they really don't have any professional credibility.

People in the business of System Administration would follow basic standard practices that eliminate most of these risks.

The linked post isn't a valid post-mortem, if it were it would contain unambiguous details of the timetables and specifics, both of the failure domains and resolutions.

As you say, a network connector could mean any number of things. Its ambiguous, and ambiguity in technical material is used to hide or mislead most times which is why professionals detailing a post mortem would remove any possible ambiguity they could.

It is common professional practice to have a recovery playbook, and a plan for disaster recovery for business continuity which is tested at least every 6 months, usually quarterly. This is true of both charities and business.

Based on their post, they don't have one and they don't follow this well known industry practice. You really cannot call yourself a System Administrator if you don't follow the basics of the profession.

TPOSNA covers these basics for those not in the profession, its roughly two decades old now, it is well established, and ignorance of the practices isn't a valid excuse.

Professional budgets also always have a fund for emergencies based on these BC/DR plans. Additionally, using resilient design is common practice; single points of failures are not excusable in production failure domains especially when zero-downtime must be achieved.

Automated Deployment is a standard practice as well factoring into RTO and capacity planning improvements. Cattle not Pets.

Also, you don't ever wait on a vendor to take action. You make changes, and revert when the issue gets resolved.

First thing I would have done is set the domain DNS TTL to 5 minutes upon alerted failures (as a precaution), and then if needed point the DNS to a viable alternative server (either deployed temporarily or running in parallel).

Failures inevitably happen which is why you risk manage this using a topology with load balancers/servers set up in HA groups, eliminating any single provider as a single point of failure.

This is so basic that any junior admin knows these things.

Outlandish workarounds only happen when you do not have a plan and you are dredging the bottom of the barrel.

eloisant 10 months ago

I've worked with Thibault before he could self-sustain on lichess donations, he's a professional software developer and sysadmin and one of the best I've worked with.

The people behind lichess are very much professionals, have worked in companies before, and know about everything you're writing. However instead of building a business they decided to run a completely free and ad-free non profit living off donations.

You don't get the same budget doing that compared than a subscription base / ad supported service. That's true for the number of people maintaining it as well as the cloud cost you can afford.

If you look at their track record, uptime have been pretty good. Shit happens, but if you ask me it's worth it to have a service like Lichess that can exist completely on donations.

  • trod123 10 months ago

    There are many problems with what you've written here as well as bot-like behavior in the responses that have telltale signs of vote manipulation and propaganda similar to Chinese state-run campaigns.

    We will have to disagree. You have clearly contradicted yourself in at least one way, and attempt to mislead readers in a number of other ways which I won't go into here.

    From these, I have to come to the conclusion that you don't have credibility.

    The downtime would not have happened if they had followed professional practices. Even a qualified Administrator coming into the outage fresh would have had a fix within 30 minutes if they were working at a professional level.

    Yes shit happens, but professionals have processes in place so that common shit does not just happen. This was preventable.

    • OkayPhysicist 10 months ago

      What kind of Tom Clancy novel do you live in that intelligence agencies are astroturfing for free chess sites?

      • trod123 10 months ago

        I'm going to assume that your question is genuine and sincere, and not meant sarcastically.

        If you read the following books by established experts, you should be able to rationally answer the question for yourself as to the why and the how. The subject matter involves torture for thought reform, real not fantasy. This differs from SERE training which is geared towards resisting information extraction.

        China by their own words (internal leaked documents), seeks the destruction of the national will, of their enemies. This involves an identity based approach to torture/thought reform, which falls under the military strategy, Divide and Conquer. Digital attacks are cost effective when weighed against other options.

        Anything you believe, love, or common experiences that you share with other people is fair game for inducement and then destructive interference to promote nihilism, while segmenting individuals into two groups, disassociative responses (apathetic/non-response), and psychotic break responses.

        The items targeted include chess, along with many other things. Inducement of struggle sessions to break people.

        If you spend the time to review the material I mentioned, you'll likely find out that a core belief of yours is untrue, that belief being that something like this is fantasy and impossible. This has a way of breaking the weak-willed, often in a psychological reversal/delusion.

        I hope you are a strong person, we need more rational people if we are to survive as a species.

        Robert Lifton (Thought Reform and The Psychology of Totalism) Joost Meerloo (Rape of the Mind) Robert Cialdini (Influence) USMC Press (Political Warfare)(Free ebook at their website)

DiggyJohnson 10 months ago

This is so far out of line I wonder what the background is for this issue. Lichess is not emergency dispatch software running in a 911 Call Center, if they have an outage the cost is that users can't play online chess until it is fixed. Additionally, the founder of this open source project is objectively good at what he does. Exhibit the fact that he built and hosts a top 2 online chess platform that competes well against the biggest commercial sites. How does that not lend some professional credibility.

  • trod123 10 months ago

    We will have to disagree Kenneth.

    Your idea of "so far out of line", would include any communication you disagree with, and is absent rational principles or social norms/mores basis, it is absurd.

    I stuck to the objective issues in my previous post, you should too before making baseless claims.

    Do some due dilligence on the business entities involved, peruse their github history (the deleted parts). Get a real picture about what's going on there. You'll find many contradictions if you dig.

    The question on any critical IT professional's minds is how can you run the service given the resources claimed. Yes he runs the top traffic site for chess, and its done on a bespoke monolith.

    You napkin math/sketch it out by required component services, and it quickly becomes clear that nothing adds up. When nothing is consistent, or supported, you examine your premises for contradictions and lies, which goes again back to credibility.

    (Hint: https://trufflesecurity.com/blog/anyone-can-access-deleted-a...)

high_na_euv 10 months ago

Why put so much effort when at worst you have a few hours of downtime

  • 0cf8612b2e1e 10 months ago

    As they say, each 9 of uptime increases costs by an order of magnitude. For a non profit service, a few hours of downtime seems a fine trade off vs engineering all of the “right” redundancies. All of which have their own operational costs.

koromak 10 months ago

This isn't a billion dollar company trading on the NYSE. Its a free website to play chess.