Part 6 - Assumptions

Started by CultLeader, June 04, 2021, 10:19:20 AM

Previous topic - Next topic


I want to talk about simple fact that permeates through all of life. And it is illustrated below.

The more assumptions you make the simpler code can be. The less assumptions you have the more complex code is.

Consider two solutions to a function that reads a file to string of bytes. For me such function is a phenomenon. Many programming languages, like Java, did not have a single function that can read all file contents to byte string back in the day. You have to use apache commons jar.

Why is that? Such functions make development much easier. Many answers in stack overflow suggest creating buffered reader and then spinning a loop , why all this complexity?

The knee jerk reaction to this is the usual saying "well, what if the file is 100 exabytes, you couldn't fit such file in memory". That is true, we could not read exabyte file into memory. But how often do we do that? If we need to manage that much data in a single file, SQLite is a better option anyway. But the files that we usually need to read, say, source file or a secret, or a json file - they are usually tiny. Why do we complicate 99% of the cases where the file is indeed small and function to read it can be super simple with a needless abstraction of reading it with stream just in case it is exabytes?

Reading a stream is a generic (there is that evil word again), low level, complex solution. If solution is specific to small files then it is very simple - just a single function, read all file into memory and be done. Another thing, Java cannot assume file is ASCII. So, you have to convert that certain stream of bytes via UTF-8 encoding into java's string, which is ironically two bytes. What if we eliminate that assumption too? UTF-8 is universally supported format, we can make our read file to string function simpler again.

You see how we made two assumptions and our problem and solution became simpler?

Infra management

Current software developers world is full of such complexities due to lack of assumptions (lack of information). For instance, in infra management tool Chef, which I utterly disrespect and I think every company is better off writing their own in OCaml, there are folders for files and templates. Every chef cookbook has a default folder, then could have windows/linux/solaris and what not folder for specific templates to that platform. What if company only uses specific version of linux? All that becomes useless complexity. Since chef is a generic, used up, smurfed out, whorish solution that kinda does a crappy job at everything, which is jack of all trades, master of none, it also brings all the complexity of every hipster company into one project for everyone to bear. Same with kubernetes complexity.

I heard arguments saying "Well, would you rewrite chef yourself? You know how many man hours were put into that thing?". Yes, I would. I would implement the part I need below

This is the assumption I would make in the infrastructure:

1. Use certain version of debian
2. All nodes must have docker installed
3. Any component that is needed, postgres/kafka/clickhouse/prometheus - you name it, is installed via docker

What I don't need to implement from chef this way in my OCaml infra management tool (which I did implement for myself by the way):
1. Support for any other OS but linux debian
2. Adding cerain debian package repositories of, say, newest postgres - just run that in docker
3. All the kitchen insanity, I'd turn component configs into data and analyze the data of configs instead using the pattern

What I'd improve upon:
1. Maybe implement iptables if I wanted to restrict who talks with that
2. Typesafe roles, unlike raw jsons that chef uses, with which it is very easy to make a typo that will end up in production
3. Implement typesafe, no nulls allowed function of how to install generated infra file to filesystem
4. Implement systemd functions, to, say, install certain systemd unit for things like vector to collect logs

Infra management tools ought to be a child's play. They have nothing inherently complex about them, just generate some files and write them into filesystem, maybe restart systemd service or spinup docker container with mounted directories. These things will never be as complex as, say, doing 3d vector maths for a game or development of relational database or programming language. Yet people who write infra management tools, since their task is so simple and they want to pretend that its not, they have to needlessly complicate things for themselves by making what they write as generic as possible for every use case under the sun, hence dragging along huge complexities to otherwise very simple software.

You see, you don't get the best and most ingenious people to work on tools that write some files to filesystem, spin up few systemd services and run a few docker containers. Just like we discussed in earlier post, that nothing is equal, it is also not equal the brilliance of developers that work on different problems in the industry. Most brilliant devs are likely working on games, databases, programming languages, theorem provers and so on. This is where the smartest minds are at. And in web development you have much lower quality people, where brilliance is not a concern for generating web pages while querying database. Same with frontend and same with infra management tools.

Just the fact, that there are really only three choices infra management today, namely Chef, Puppet and Ansible, and two of them are developed in Ruby, Ansible in python - that speaks volumes of how lowly esteemed and underdeveloped and uncompetitive that niche is. Infra management is the Africa of software development. Writing code, that could potentially bring down hundreds of machines in production to a standstill, with a language, which you MUST RUN in order to see if your code works - that's a recipe for disaster. And proponents of such tools are sufferers of a deep mental disorder. Usage of raw jsons to describe roles, where you can make any sort of mistake, instead of typesafe struct in programming language like OCaml, where illegal cases are mostly irrepresentable, is absurd.


The more assumptions you make for specific solution you want to implement, the simpler your solution will be. You don't always have to fear "such and such pile of crap is developed 10 years - we could not implement it ourselves" statement. Sure, I wouldn't implement something like Postgres on my own, it is a very powerful database that can cover anything you need. But such weekend projects as Chef, that don't really do anything at all besides writing files to filesystem, it is much easier to just implement trivial solution by yourself ( with, say, rock solid typesafety of OCaml and the pattern ;) ) and only implement the features that you actually need. And overall you will have very simple specific code that will be easy to maintain and will not have all the Chef complexity in it too.