Dynamism is Cancer

Started by CultLeader, August 01, 2021, 04:23:19 PM

Previous topic - Next topic

CultLeader

Today I want to talk about one of the most rottenous cancers found in codebases today - dynamism. Usually, whenever someone wants to introduce dynamism in the system, things become more complex, harder to reason about and inevitably fail in production like a crooked building it is.

Theoretical reason behind this, is that dynamism introduces lack of information in the system. If things are static, i.e. known beforehand, we can test possibilities and anticipate outcomes. Dynamism, on the other hand, by definition, introduces ongoing change in the system. Which, usually, cannot be tested and reasoned about. And flows involving dynamic mutating data are very complex, and needlessly so.

Let's take apart a popular example - kafka schema services. Say, there are many topics in the kafka and each topic has different type. How do we ensure all elements are of appropriate schema and we don't get surprises? The dynamism cancer answer to that would be confluent schema registry https://github.com/confluentinc/schema-registry . I.e. a service, that has all your schemas, and you post to that and hooray, muh schema is online.

Okay, how do we check from repository, where our code is, that we interact with the queues correctly? We have to query the production schema registry. Which, needless to say, is horrible, because it can be down, and our builds become undeterministic and fail. And, of course, another issue, is that schema registry is very limited only to schemas it supports, avro, protobuf and json. Another cancer, since this is a generic service, it supports JSON for changing its data, which is a generic and error prone data format. Could we, for instance, make schema registry check that a kafka queue can be represented in clickhouse via table? Not really, we would need to do this by hand with complex dynamic flow querying the registry.

Someone might say, hey, this isn't that bad. Well, such people also say, it is normal to spend years on data stream projects because, oy vey, they are so complex. No, they're not. Imbeciles doing them make them complex.

What is the static answer to this? DEFINE YOUR SCHEMAS AS DATA IN YOUR CODEBASE IN YOUR META EXECUTABLE USING THE PATTERN.

Now, let's go over these problems if we simply define our schemas as data inside the same masculine plane executable and what that allows us to do:
1. Builds and tests are deterministic - we have the schemas in our codebase and can check them any way we want
2. We can write code, that generates code for absolutely correct typesafe interaction with the queues, where we don't parse anything at all or are aware of the underlying data formats
3. We can ensure logical/backwards compatible schema versioning with the queues
4. We can easily check if queue is appropriate to be reflected in clickhouse (checks like if columns are not too exotic)
5. We can don't deal with JSON error prone plaintext - our problem is represented as data in a typesafe OCaml and we avoid countless issues
6. We can easily generate SDKs for mobile clients or typescript clients to intereact with our queues via generated REST services easily and without mistakes
7. We have no complex flows of testing correctness, involving multiple production services which could be down, everything is tested locally, mostly in memory, hence, less need for monitoring of production failures

Actually, the benefits of knowing all schemas beforehand in the pattern are countless. We can solve all these problems with just a few loops of checking our data for any correctness degree we could imagine. There is nothing hid. Imagine, if you need to walk in absolute darkness, where you don't know what trees are around you during compile time (dynamism) versus when sun is shining and you know everything beforehand in compile time. This opens completely unexplored space of countless possibilities of what you can do with this.

Why are we storing and cherishing customer data in the companies and value it so much? Because it gives great insights about the customers which we can use to better our business. Why we treat our code metadata, our schemas, our types like garbage? Why do we divide our precious data so far away, that we cannot use it together as a whole to draw conclusions about our codebase itself? Why is it shoved into some third party component that we cannot control and that we cannot efficiently query to check if it aligns with our codebase? Or why do we monitor such component when it is too late and something is broken and people are sweating in production because we could not check if beforehand?

All such crapware as confluent schema registry shouldn't exist. Same with kubernetes yamls, which ought to be services declared in our meta executable, where we could enforce they way they interact with the system with absolute typesafety. Same with chef and its server which contains all the error prone, non typesafe json roles, where you can put anything you want in that json and nobody checks that any recipe actually read those attributes - you only find out in production. Dynamism is cancer, it breaks the system apart, divides data from data and cripples us so that we can no longer use our metadata in incredibly powerful ways like the pattern allows to do. All configuration ought to be defined in typesafe programming language (I prefer OCaml) where we cannot represent illegal states or illegal states are captured when we test the data in RAM for consistency with the system.

So, moral of the story today:
1. Run from JSONs
2. Run from YAMLs
3. Run from dynamic configurations whenever possible
4. Dynamism is cancer and breeds insane complexities to the simplest problems

Have a good one bois.