I hope this won’t be counted as some form of self-promotion, even though I am sharing a post from my own blog.

As a tech worker who works in a Cloud shop, I wanted to elaborate the many reasons why I find working with Clouds terrible, from multiple points of view.

I tried to organize my thoughts in a (relatively long) post, in which both technical aspects and political aspects (which are very related) are covered.

I am sure many people will have different perspectives, and this could be potentially also a nice prompt for a discussion.

  • Tja@programming.dev
    link
    fedilink
    English
    arrow-up
    1
    ·
    6 months ago

    The only problem is that the single instance also has 20 scenarios (and keeps the 2 as well), making it more brittle.

    A well design system removes points of failure, disk, power and network are obvious ones, and as long as you keep it byzantine safe, anything you added should be redundant so if one fails the system still runs. Ideally you remove all of them but if there’s one hidden it’s still better than “the whole thing is a single point of failure”.

    • loudwhisper@infosec.pubOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      6 months ago

      No, it’s not true. A single system has less failure scenarios, because it doesn’t depend on external controllers or anything that makes the system distributed and that can fail causing a failure to your system (which may or may not be tolerated).

      This is especially true from a security standpoint: complexity adds attack surface.

      Simple example: a kubernetes cluster has more failure scenarios than a single node. With the node you have hardware failure, misconfiguration of the node, network failure. With a kubernetes cluster you have all that for each node (each with marginally less impact, potentially, because it depends for example on stateful storage, that if you mitigate you are introducing other failure scenarios as well), plus the fact that if the control plane goes in flames your node is useless, if the etcd data corrupts your node is useless, anything that happens with resources (a bug, a misuse of the API, etc.) can break your product. You have more failure scenarios because your product to run is dependent on more components to work at the same time. This is what it means that complexity brings fragility. Looking from the security side: an instance can be accessed only from SSH, if you are worried about compromise you have essentially one service to secure. Once you run on kubernetes you have the CI/CD system, the kubernetes API, the kubernetes supply-chain, etcd, and if you are in cloud you have plenty of cloud permissions that can indirectly grant you access to the control plane and to a console. Now you need to secure 5-6-7 entrypoints to a node.

      Mind you, I am not advocating against the use of complex systems, sometimes they are necessary, but if the complexity is not fully managed and addressed, you have a more fragile system. Essentially complexity is a necessary evil to respond to some other necessities.

      This is the reason why nobody would recommend to someone who needs to run a single static website to run it on Kubernetes, for example.

      You say “a well designed system”, but designing well is harder the more complexity exists, obviously. Redundancy doesn’t always work, because redundancy needs coordination, needs processes that also depend on external components.

      In any case, I agree that you can build a robust system within Cloud! The argument I am trying to make is that:

      • you need to be aware that you are introducing complexity that needs attention and careful design if you don’t want it to result in more fragility and exposure
      • you need to spend way more money
      • you need to balance the cost with the actual benefits you are gaining

      And mind you, everything you can do in Cloud you can also do on your own, if you invest on it.

        • loudwhisper@infosec.pubOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          6 months ago

          I am specifically saying that redundancy doesn’t solve everything magically. Redundancy means coordination, more things that can also fail. A redundant system needs more care, more maintenance, more skills, more cost. If a company decides to use something more sophisticated without the corresponding effort, it’s making things worse. If a company with a 10 people department thinks that using Cloud it can have a resilient system like it could with 40 people building it, they are wrong, because they now have a system way more complex that they can handle, despite the fact that storage is replicated easily by clicking in the GUI.

          • Tja@programming.dev
            link
            fedilink
            English
            arrow-up
            1
            ·
            6 months ago

            Redundancy should be automatic. Raid5 for instance.

            Plus cloud abstracts a lot of complexity. You can have an oracle (or postgres, or mongo) DB with multi region redundancy, encryption and backups with a click. Much, much simpler for a sysadmin (or an architect) than setting the simplest mysql on a VM. Unless you’re in the business of configuring databases, your developers should focus on writing insurance risk code, or telco optimization, or whatever brings money. Same with k8s, same with Kafka, same with cdn, same with kms, same with iam, same with object storage, same with logging and monitoring…

            You can build a redundant system in a day like Legos, much better security and higher availability (hell, higher SLAs even) than anything a team of 5 can build in a week self-manging everything.

            • loudwhisper@infosec.pubOP
              link
              fedilink
              English
              arrow-up
              2
              ·
              6 months ago

              Redundancy should be automatic. Raid5 for instance.

              Yeah it should, but something needs to implement that. I mean, when distributed systems work redundancy is automatic, but they can also fail. We are talking about redundancy implemented via software, and software has bugs, always. I am not saying that it can’t be achieved, of course it can, but it has a cost.

              You can have an oracle (or postgres, or mongo) DB with multi region redundancy, encryption and backups with a click.

              I know, and if you don’t understand all that complexity you can still fuckup your postgres DB in a disastrous way. That’s the whole point of this thread. Also operators can do the same for you nowadays, but again, you need to know your systems.

              Much, much simpler for a sysadmin (or an architect) than setting the simplest mysql on a VM.

              Of course it is. You are paying someone else for that job. Not going to argue with that. In fact, that’s what makes it boring (which I talked about in the post).

              Unless you’re in the business of configuring databases, your developers should focus on writing insurance risk code, or telco optimization, or whatever brings money.

              This is a modern dogma that I simply disagree with. Building an infrastructure tailored around your needs (i.e., with all you need and nothing else) and cost effective does bring money, it does by saving costs and avoiding to spend an enormous amount of resources into renting all of that, forever, scaling with your business.

              You can build a redundant system in a day like Legos, much better security and higher availability (hell, higher SLAs even) than anything a team of 5 can build in a week self-manging everything.

              This is the marketing pitch. The reality is that companies still have huge teams, still have tons of incidents, still take long to deliver projects, still have security breaches, but they are spending 3, 5, 10 times as much and nothing of those money is capitalized.

              I guess we fundamentally disagree, I envy you for what positive experiences you must have had!

              • Tja@programming.dev
                link
                fedilink
                English
                arrow-up
                1
                ·
                6 months ago

                That’s my whole point from the beginning, boring is good. Boring is repeatable, boring is reliable.

                Of course they still have huge teams. The invention of the automobile made travel easier therefore there was more travel happening.