Recent Posts

Pages: [1] 2 3 4

The pattern V2 / Eden DB Improvements 3

« Last post by CultLeader on April 27, 2024, 10:29:26 AM »

Sup bois, I've added extremely powerful feature into EdenDB: OCaml data modules. What this allows to do is to define data in OCaml using typesafe generated constructs.

I did this a while ago in this commit https://github.com/cultleader777/EdenDB/commit/9270c3e42ab28410e8ba3018f76b4eea0b586da4 , just didn't get around to describing it yet until now.

Motivation? Eden data language is fast to parse but will never be as powerful as standard typesafe programming language.

For instance, if I want to define 10 similar nodes in a cloud, doing it with OCaml is trivial, run a loop and you're done.

People write an utter abominations like jsonnet, or they add hacks like count variable in terraform or for_each in terraform.

In reality we just need a good programming language to define our data and we don't need to learn useless garbage like jsonnet, or how to program in terraform (which is nightmare). Terraform output is just an assembly compiler later, we don't deal with it directly.

Here's an example.

Say, we have main.edl file

Code: [Select]

DATA MODULE OCAML "some-module"

TABLE test_table {
    id INT PRIMARY KEY,
    some_text TEXT DEFAULT 'hello',
}

And we have OCaml file at some-module/bin/implementation.ml

Code: [Select]

open! Context
open! Db_types

let define_data () =
  mk_test_table ~id:123 () |> def_test_table;
  mk_test_table ~id:123 ~some_text:"overriden" () |> def_test_table;
  ()

So OCaml allows us to define table data in OCaml. We didn't write mk_test_table function (generated). define_data function is required in the generated code. There's a some_text column that has default value if unspecified. If your OCaml file implementation mismatches schema it's a compile time error.

To someone who's used to the standard implementation of compilers that do very boring stuff, linking executables from explicit outputs this might not be trivial how it works. In fact, I don't recall any compilers that do anything similar to what is done here. If codegen step is performed, it is usually performed in some Makefile task before compilation.

So, this is how it works:
1. EdenDB compiler parses all .edl sources with includes
2. We first process table schema and their interconnections, any error there (non existing foreign key column and etc.) is EdenDB compiler error
3. Now we know all the schema that will be in EdenDB, we just didn't insert any data yet and didn't check foreign keys, constraints and etc.
4. At this point we actually run OCaml codegen for the OCaml data module defined at some-module.
We generate: dune, dune-projec, context.ml, context.mli, db_types.ml, implementation.mli, main.ml and dummy implementation.ml if it doesn't exist yet.
User is responsible for modifying implementation.ml file with his data, he can have other OCaml files with utility functions and everything.
We assume that we use dune for OCaml.
5. Now we insert the EdenDB data that is defined in EDL
6. At this point, EdenDB compiler runs dune exec for data modules, reads json dump of defined data, and inserts data into EdenDB state according to the schemas
7. The rest of EdenDB checks are performed, like checking duplicate primary keys, foreign key existance on all of the data, whether it is defined in EDL, Lua or OCaml

I haven't seen any compiler do something like that so I thought it is interesting design. And I'm 99% sure, being 70k lines of code into Eden platform that this is how I will define my data at the end of the day.

So, Current eden platform pipeline:
1. Run EdenDB compiler, which runs OCaml data modules and defines data of all of our infrastructure. Basic database correctness checks are performed here
2. Eden platform analysis step, here we perform all the checks regarding our infrastructure, say if defined node ip belongs to the DC subnet, that we have 3 or 5 consul servers per DC and so on
3. Codegen step runs which compiles the rest of our project, including provisioning, postgresql migrations, nix configs, and gazillion of other stuff. There are files that user can edit manually in Eden platform in their project directory (mainly business logic of EdenDB applications).

Have a good day bois!

My Rants / ChatGPT - most overrated programming tool of all time

« Last post by CultLeader on February 17, 2024, 11:11:23 AM »

Working on Eden platform (at this point I have AWS and Google cloud talking together, and I did bunch of internal refactorings, NixOS root on zfs and etc. anyway, 5 more clouds to go until release), I thought, could ChatGPT ever be productive enough in maintaining infrastructure? Not that I haven't heard of it, I'm user of ChatGPT it as everyone else, but when I had actual problems in infra, I even tried giving it as much context as I could and it was next to useless.

In linux when running virtual machines, by default iptable rules apply to the vm guests, which should be disabled if you want your sanity. I've been debugging this for a day, and asked ChatGPT to guide me, with logs all server topolgy and contexts. For crying out loud, I've even pasted in the culprit - the ipables rules which were blocking the traffic and it didn't suggest me that iptable rules were at fault.

When I think about it, programming is extremely precise. Having never used FRRouting before to implement routing for Eden platform it was quite a pain to get networking working across clouds. For google faggity cloud I've spent a lot of time debugging why in the world L2 frame doesn't travel between the hosts only to find out google implements its own faggity andromeda abomination network which doesn't even support L2 traffic. Meaning, in google cloud you have to rely on their slow router garbage which can never be adjusted realtime in terraform and they suggest your to run your own bgp node to interact with their faggity router! All because they don't do L2 traffic. I cannot spend bunch of time dealing with faggotry of google cloud so I simply enapsulated traffic between eden platform datacenters (which could be any datacenter implementation) in IP GRE protocol so it would fly as L3 traffic.

I will not treat any cloud specially and use their special faggotry nonsense just to make nodes across clouds talk when I have public ips in every cloud, so I should be able to implement my own routing, no? Google cloud faggots...

So, anyway, when doing this complex networking, usually, a single configuration misfire and traffic is gone. There's no leeway for error. I even had to redo networking, abstract it to have attributes for datacenters and I had to write configuration simulation in tests so now that I have got to a working state I could freeze the configuration to know for a fact I didn't break anything.

Maybe I just like the pain, but I can't wait until the day emerges where all of the issues I've been dealing with simply become compile time errors in high level Eden platform data and I don't have to deal with such low level details and can focus on finally shipping suite of applications I want to build on top of Eden platform.

ChatGPT, even given all of the context, if it makes mistakes in single place things don't work. And every time I used it to generate even skeleton terraform configs I always have to run it and debug. There was not a single time where I pasted code from ChatGPT and it would just work, except for simple queries like write lua function that sums 1 to 10. Apart from these toy examples AI is next to useless.

So the only use for AI seems to be when you need to write a polite email, generate a picture, rewrite some sentences - anywhere where precision is not important. In software engineering precision is crucial.

Let me ask you one thing, would anyone use a library that does a memory leak one in a million times? No, it would be dismissed as garbage. You have to rigorously test it to make it usable in real world in production.

Some indian faggot said with a smug face "there are no programmers in 5 years". Sure, I guess there will appear a batch of people blindly copy pasting things from ChatGPT and they'll eventually still need a programmer to fix/debug the mess which they have made.

AI coding monkeys think from a very narrow short perspective. For some reason, AI fags cannot imagine producing programs any other way than there are gazillion repositories with different programming languages and you MUST parse and process all of that context before you can contribute. Someone is doing JavaScript with NodeJS? Parse and analyze that faggotry! Someone is doing Ruby, their app is full of bugs? Parse and analyze this faggotry too!

What I do with Eden platform is radically different. I'm looking down like I mentioned in another post. My data is defined in tables and columns. All columns are sequential in column store provided by EdenDB so analyzing data is blazing fast and can be parallelized to all cores. All applications are in Rust (I'll add OCaml if there is ever NATS support added, or I might expose Rust C library for OCaml to talk with NATS). So, in Eden platform you just add high level data, it is all compiled and analyzed and every application interacts correctly from day one. There is no longer need for NodeJS faggots or ChatGPT. To tell you whats wrong or to write many tedious tests. Currently, when writing app in Eden platform literally 900 lines of Rust get generated and I write 100 remaining which interact in typesafe manner with the rest of the ecosystem, making me literally 10x developer. I don't fear to interact with database/queues or other apps even without tests.

That is much more useful than some garbage generated by ChatGPT which may or may not work (usually didn't work for me). I want things to work the first time, with compile time errors knowing everything will work. I'm not interested in some probabilistic vomit I need to debug again and again and again.

Even if I'd let ChatGPT to write documentation I need to proof read it again anyway, but hey, at least it sticks to word vomit now!

Now, I still believe Eden platform will drastically reduce engineering power needed, but it will not be by producing random vomit that doesn't work - it will be code generation and compile time errors before shipping to production. That will be the big firing of Ruby and JavaScript monkeys, not ChatGPT.

Have a good day bois!

Development Tips / Feature velocity

« Last post by CultLeader on November 28, 2023, 07:21:40 AM »

Today I want to talk about pain that a lot of engineers cause to themselves.

I will tell you right now - no company at a high level gives a tinyest crap about implementation details of any component you create. Someone might say I'm saying that the sky is blue. Apparently not, because the world is infested with yaml abominations and NodeJS.

Let's pick example, there are two NodeJS services. There's a mismatch in how one service calls another and its an error. It turns into a Jira ticket.

Project manager asks what's wrong with that? Engineer says "well, one NodeJS service doesn't know schema of other service, we need to run tests". Poor project manager has to nod head and say "ok ok".

This is bullshit excuse. Your system, oh fellow engineer, who you came up with of hundreds of NodeJS microservices is an abomination and you should feel bad. You're an idiot and it is your fault. When you think about it, if both services were in same executable, in Rust, it would be just compile time error and you'd have to fix it to ship. When services are separated, the exact same error for some reason becomes okay now!

Anytime such excuse comes to light "we can't verify kubernetes deployment yaml hence it errors out know because our helm chart repo is abominable pile of yamls that doesn't work together" its the fault of idiot engineers.

Anything that gets in the way of the high level goals of adding features to make your customers happy is the fault of the engineers. You picked Ruby on Rails garbage and now suffer many production runtime errors because of that? Its your fault. Your system is a repository of crappy incoherent yamls that have no typesafe enforcement so you resort to test environments to catch such errors? It is your fault. You can't deploy new database in 5 minutes with HA, replication and backups and it doesn't interact with your app in a typesafe manner? It is your fault.

Pagan monkeys have this funny idea that every system should be open and you should be allowed to put anything in it, like dynamic type schema registration, dynamic execution hooks or whatever. "Great software is not built, it is grown!". Imbeciles. They're saying that because their mom was an open system, open to 100 different guys that is, and look at them now.

What I do with Eden platform is radically different. There are no incoherent yamls, there's a single source of truth database which can be defined in eden data language or lua (probably also with webassembly in the future if there's need). I already implemented AWS terraform code generation, which uses AWS transit gateway if you have more than one datacenter in AWS, but uses wireguard to connect to all other clouds. I'll implement 6 more clouds for the initial version and they'll all talk together via wireguard VPN with high availability and OSPF routing since day one (you can't deploy half-assed no production ready configurations in Eden platform).

You see, since I said from the very beginning, if something doesn't work in production in Eden platform its my fault as an engineer. It is much more difficult to catch all errors during compile time than just leave users swearing and sweating in production due to runtime errors caused by incorrect configurations. Now I have no excuse like today's yaml fags. So I can actually solve these problems by analyzing data in eden platform compiler which tells me within second if my data is wrong. It tells me instantly with compile time error, that I deployed all DB instances only in single datacenter, which should be scattered around datacenters in region to tolerate DC loss. It tells me instantly if query application is using is no longer correct because schema changed. It tells me instantly if one frontend app link no longer can reach other frontend app page during compilation.

So all of these just become compile time errors. So guess what happens with feature velocity? Since all of the previous excuses become compile time errors there's no reason why you couldn't implement REST call to another service in 5 minutes - compiler tells you instantly schema is changed or is incorrect because we took responsibility. Compiler will tell us instantly if prometheus alert is using non existing series. Compiler tells us instantly that we forgot to configure virtual router IP in certain datacenters and clouds where it is needed.

What an old experienced SRE had to do himself by knowing of the system now compiler tells us to do simply it has all the information together.

So what should you do for the rest of the time once you're using Eden platform (hope to release initial version next year finally)? You just develop features and everything works together from day one. And you can do work of 100 engineers alone now. I'm very excited to release this project next year as no one ever done this and it will radically change the way things are done now. Most yaml masturbating excuse engineers - you're likely to get fired soon.

Have a good day bois!

Development Tips / Looking down

« Last post by CultLeader on August 30, 2023, 05:53:50 PM »

For my thoughts are not your thoughts, neither are your ways my ways, saith the LORD. For as the heavens are higher than the earth, so are my ways higher than your ways, and my thoughts than your thoughts. - Isaiah 55:8-9
And he said unto them, Ye are from beneath; I am from above: ye are of this world; I am not of this world. - John 8:23
I am Alpha and Omega, the beginning and the ending, saith the Lord, which is, and which was, and which is to come, the Almighty. - Revelation 1:8

Let's talk about two directions software is developed in this world. Say, you develop an open source library. You're in for the world of hurt. How many different people in different contexts would try to use your library? Some idiot in golang book (yes, I've read a golang book just to hear them out, that reassured me even more that golang is trash and one day I might write dedicated post roasting golang) claims that if you want to make useful software it has to be generic. There is that evil word again.

The truth of the matter is if you make stuff generic your things are inherently limited. You are looking up. Looking up means like looking up to the sky, whether it will rain or not, you don't control that. The people that use your library are the ones from the higher context, in a specific organization looking down into your library whether it is usable in this context or not. Long story short, users of your library have all context to use it so they have much easier time to decide what to do. They don't have to use your library, for specific case they could develop their own custom component.

As soon as you put yourself into context of what you want to achieve you start looking down. You have full control of your project. Well, I guess someone could say that as I'm developing Eden platform (I'm roughly 60% there I think, developing infrastructure tool that has cloud/provisioning/versioning/logging/monitoring/ci/cd/security/applications/load balancing/databases/queues and etc. integrated together is quite a lot of work, believe me) I am imagining all the context.

In Eden platform everything is assumed, private ip scheme is assumed. Routing amongst subnets is assumed. I assume datacenters will have certain ip prefixes, like 10.17.0.0/16 for one datacenter and 10.18.0.0/16 for another datacenter. Hence the routing is simple. I assume datacenter can have up to 65k hosts, all inside /24 subnets inside datacenter like 10.17.1.0/24 for one subnet and 10.17.2.0/24 for another. I assume every datacenter has at least two wireguard gateways to connect to all other datacenters which is checked at compile time and you must fix it to ship. I use OSPF in FRRouting to route across subnets and regions and I don't need anything else.

So, developing Eden platform forced me to think in global ways of how one organization can develop everything as a single unbeatable monolith which supports infrastructure of around eight million hosts. I am thinking globally. And that required some refactorings on my part early, to figure out how regions, datacenters and interconnections between datacenters play together for all infrastructure. I was forced to go through those decisions early, not when you need to expand to multiple datacenters in production and nobody has clue on how to do that and then it takes years because of poor early decisions.

The inferior alternative to looking down is to look up, be someone's bitch. For instance, I could have used something like dnscontrol for DNS and pray they support such and such provider for all dns records. Well, that's not good enough if you want to make all assumptions and have all control. So guess what Eden platform does? That's right, we run our own DNS servers, we use rock solid BIND that stood a test of time, hence we can look down. We do our own routing. We do our own builds and rollouts with rock solid and reproducible Nix build system.

I'm not yet into this at the time, but plan is also to just generate all the terraform code needed for every datacenter. Say, you'll specify in datacenter `implementation` column that aws is the implementation and then Eden platform will generate all the terraform for all the machines in their appropriate subnets which obey Eden platform ip scheme and they'll be automatically connected via wireguard to the rest of datacenters of Eden platform, which might be on premise, google cloud azure and etc. And if more than one AWS datacenter exists they'll be connected with native AWS transit gateway instead of wireguard. We can do tricks like that just because we have all the information about our intrastructure, and don't have to limit ourselves with wireguard across datacenters.

It's a lot more work, than say, some absolutely worthless project that I don't understand why anyone would use - FlywayDB. Typical Java crap that... Executes schema migrations. And a lot of people were convinced to use this nonsense. Eden platform rolls its own schema migrations because it's very simple to do BEGIN; execute all migrations; COMMIT ourselves. Database already provides these abstractions that make it easy. However, if key value store was used it would be much more complex.

So, I do not want to release half assed trash like FlywayDB which does one thing and you need to integrate it to everything else. For Eden platform to be useful it must connect and make everything to work together from day one. Eden platform should be all that you'll need, where everything is integrated and working together since day 1, just like Apple strives to do. I want to make assumptions about all the inrastructure, hence we must take control of all the components. When we take control of all the components we are looking down and our job becomes easier, while if we were looking up, such project as Eden platform would be practically impossible.

Man is taller than a woman and looks down on a woman. Say many men use the woman, like the library we talked about. How do you call such a woman that many men looked down upon and used her? We call her a harlot, and rightfully so.

In a perfect design for which we strive, if you look up you can only look up into one direction, have only one master to whom the component will fully submit and not be influenced by any other component. No woman can ever be shared between two men, it is an abomination.

Of course, Eden platform uses third party components that are generic, used up, and because they are not aware of the certain context, suboptimal in certain places. This is just the beginning to get things up and running.

Once Eden platform is built and running in production, then we can allow ourselves to have custom, better optimized components in certain context where generic ones are limiting. But that is very far into the future.

Long story short, if you want to make your life easy, do not be like an open source fag that desperately tries to support every new itch of platform under the sun. For instance, I'll likely release Eden platform only under NixOS because I consider other operating systems a waste of time. To make your life easy, start looking down, globally, about the global context of your app so you can make assumptions and you'll avoid so much trouble down the line where most other engineers will be bogged down by meaningless indecision inducing details.

Have a good day bois.

The pattern V2 / Eden DB Improvements 2

« Last post by CultLeader on June 07, 2023, 01:16:27 PM »

Sup guys, it's been a while you lurkers that read my posts but stay silent.

I'm working on Eden platform now, and basis of that is EdenDB, to which I make changes as I need them when working on eden platform. I've made quite a few improvements since last improvements post.

Ability to refer to other element's children through REF FOREIGN CHILD syntax

The need for this arose when I needed for abstractions, like NATS or PostgreSQL instances to specify on that volumes they reside.

Say this is a server table (simplified)

Code: [Select]

TABLE server {
  hostname TEXT PRIMARY KEY,
}

It has volumes as children (simplified)

Code: [Select]

TABLE server_volume {
  volume_name TEXT PRIMARY KEY CHILD OF server,
  mountpoint TEXT,
}

So say you have server-a with defined volumes

Code: [Select]

DATA server(hostname) {
  server-a WITH server_volume {
    pgtest1, '/srv/volumes/pgtest1';
  };
  server-b WITH server_volume {
    pgtest1, '/srv/volumes/pgtest1';
  };
}

And we need to define a postgres instance of a single cluster (simplified from real schema)

Code: [Select]

TABLE db_deployment {
    deployment_name TEXT PRIMARY KEY,
}

TABLE db_deployment_instance {
    deployment_id INT PRIMARY KEY CHILD OF db_deployment,
    db_server REF FOREIGN CHILD server_volume,

    CHECK { deployment_id > 0 },
}

So now we say that we want our foo logical database, with three replicas defined, one will be master with patroni and other replica.

Code: [Select]

DATA STRUCT db_deployment [
  {
    deployment_name: foo WITH db_deployment_instance [
      {
        deployment_id: 1,
        db_server: server-a=>pgtest1,
      },
      {
        deployment_id: 2,
        db_server: server-b=>pgtest1,
      },
    ]
  }
]

Notice, how we refer to the volume of `server-a` and `server-b` where we want data to reside using `=>` operator, which means child element.
We could nest this arbitrarily deep.

So, by using foreign child syntax syntax we do two things:
1. Specify which server runs the database (parent of the volume)
2. Specify where data on server runs

And, of course, you cannot define non existing elements in EdenDB, so you cannot make a typo of specifying non existing server or volume (unlike in typical yaml hell)

Full gist is here https://gist.github.com/cultleader777/41210e58026a0bec29f7e014945e40b0

Refer to the child element with REF CHILD syntax

Another issue I had working on eden platform is that I wanted to refer to the child element from parent element.
Say, server has multiple network interfaces, how to we specify which interface exposes ssh for us to connect and provision server with?

Say this schema (simplified from original)

Code: [Select]

TABLE server {
  hostname TEXT PRIMARY KEY,
  ssh_interface REF CHILD network_interface,
}

TABLE network {
    network_name TEXT PRIMARY KEY,
    cidr TEXT,
}

TABLE network_interface {
    if_name TEXT PRIMARY KEY CHILD OF server,
    if_network REF network,
    if_ip TEXT,
    if_subnet_mask_cidr INT,
}

So, network interface is a child of server and we can refer to the child from the parent so that we know on which network interface we ssh to every machine.

Code: [Select]

DATA STRUCT network [
  {
    network_name: lan,
    cidr: '10.16.0.0/12',
  }
]

DATA server(hostname, ssh_interface) {
  server-a, eth0 WITH network_interface {
    eth0, lan, 10.17.0.10, 24;
  };
}

Also, we can refer to child of arbitrary depth with the `=>` syntax from parent.

Code: [Select]

TABLE existant_parent {
    some_key TEXT PRIMARY KEY,
    spec_child REF CHILD existant_child_2,
}

TABLE existant_child {
    some_child_key TEXT PRIMARY KEY CHILD OF existant_parent,
}

TABLE existant_child_2 {
    some_child_key_2 TEXT PRIMARY KEY CHILD OF existant_child,
}

DATA existant_parent {
    outer_val, inner_val=>henloz WITH existant_child {
        inner_val WITH existant_child_2 {
            henlo
        }
    }
}

As usual, EdenDB fails to compile if existing child elements don't exist.

Full example here https://gist.github.com/cultleader777/d4f26449d2814a30d6b34e55c5d19c76

Detached defaults with DETACHED DEFAULT syntax

I had an issue of working on Eden platform that if default is defined in the database schema then it can't be changed by the user.
This is because eden platform schema resides in one set of files and user defines data in his own set of files.

Say, server has hostname and belongs to tld in the schema file

Code: [Select]

TABLE server {
  hostname TEXT PRIMARY KEY,
  tld REF tld DETACHED DEFAULT,
  fqdn TEXT GENERATED AS { hostname .. "." .. tld },
}

TABLE tld {
  domain TEXT PRIMARY KEY,
}

Now, it would be not nice to specify to which TLD server belongs in the default. Every user of eden platform will have its own domain.
So tld element is DETACHED DEFAULT. It must be defined by the user, and can only ever be defined once. If it is not defined, or defined multiple times it is compiler error.

Code: [Select]

DEFAULTS {
  // defines default for table 'server' and column 'tld'.
  // you cannot define defaults for non existing tables
  // and column must be marked as detached default
  server.tld epl-infra.net,
}

// now we can define data with detached default
DATA STRUCT server {
  hostname: server-a
}

DATA tld {
  epl-infra.net;
}

Full example here https://gist.github.com/cultleader777/3823ccef5c22b4b086c2468ab9e2e89c

And these are the main features I needed to add to EdenDB so far while working on eden platform compiler.

See you later bois!

Development Tips / Testing assembly much?

« Last post by CultLeader on March 12, 2023, 06:13:34 PM »

This is a long overdue post about assembly. Typically, what we know about assembly language today is that those are low level instructions that fiddle very low level operations with CPU registers, memory, interrupts and etc. In this day and age nobody but compiler writers care about assembly. But even then, if you write a compiler LLVM is available to you where you emit LLVM IR (intermediate representation) instructions and LLVM generates assembly code. It could be x86, arm, PowerPC or any number of supported architectures.

Whatever new assembly language comes out specific for certain chip only very few people of programmer population needs to understand that to port it to yet another platform and all that will be available for the rest of the programmers to use. Today, 99%+ developers don't care about assembly. It is good to understand what it is and how it works, to have conceptual understanding, but to actually fiddle assembly code by hand you either need to have extremely good reason or you're just pretentious or insane.

Why am I talking about this? As I'm developing Eden platform now I need to generate a lot of code. Be it Rust code, shell scripts, nomad job files - all of them are generated now. I don't write these by hand. Most people what is generated in my framework today need to write by hand, care about it, maintain it, fiddle with it, have tests for it.

What I see as inevitable future most of the code people write today will turn into assembly automatically generated without mistakes. No more need to maintain those, you just connect high level table data in eden data language and things will just work. You will not be able to misconfigure service with a bad dns name to connect a certain postgres instance like you can in a garbage yaml hell. You will not be able to operate in DNS names - DNS names to consul services will simply be an irrelevant implementation detail. Part of our generated assembly. What you want to say instead is that deployment of this service speaks to this instance of database which has this and this schema. You don't care about implementation details of DNS.

Today virtually everyone tests this higher level assembly. Is service running? Does it open a port? Does it respond to such and such http request with such and such response?

Of course, compiler writers also have huge test suites that test assembly, certain behaviours of generated code. In the beginning of GCC compiler there were probably lots of bugs, with time they became less and less common to a point where average user encountering a compiler bug is similar to a chance of winning a lottery. Usually, once GCC or rustc runs and compiler successfully emits output, you assume that basic behaviours are correct.

But how can that be? Rustc emit executables with assembly code! You would be crazy to rely on assembly? Since there are high level rules in Rust which are enforced with tests and we now mainly trust that compiler is reasonably correct we trust Rust programs much more than Javascript programs (if you have any sanity) even though both programs are running on the same CPU and they're running same assembly instructions underneath.

With the pattern, most yamls, infrastructure configurations that you'd need to maintain by hand become assembly. Most Rust application code not written by the user becomes assembly. Consul configs become assembly. Nomad configs become assembly. Vault configs become assembly. Postgres configs become assembly. SQL migrations become assembly. NATS configs become assembly. MinIO configs become assembly. Consul policies become assembly. Vault policies become assembly. Nomad job declarations become assembly. Server nix configurations become assembly. Application build scripts become assembly. Nginx configs become assembly.

Another property of assembly is that compiler optimizations change and assembly changes but code doesn't have to. Users don't care. Now at this point for instance I use Nix to provision all servers and have reproducible builds. It is my favorite OS and package manager which will make Eden platform rock solid. But hypothetically, what happens if one day for some reason I will no longer like NixOS as server operating system? I'll just change the assembly code generated for server provisioning and VM images built (user doesn't care and doesn't need to know) to something else in one commit and that's it. Assembly will never be the most important. Assembly is swappable. Assembly is an implementation detail.

And what is the future of assembly? It will become irrelevant. Needless to say there are lots of errors in Eden platform compiler I'm working on during development, but the more and more test written, the more mature this project will become it will be like GCC and Rustc - once compiler runs, tests assumptions/consistencies/port clashes/RAM boundaries/typesafety of applications/db query correctness of your entire infrastructure of 1000 servers and 1000 applications and emits source outputs in a few seconds for your entire infrastructure - you'll just assume things will work and deploy with a smile. Without spinning up thousands of machines for testing. Without spinning up thousands of services.

And what is the future of developers that today need to maintain such assembly? Check job postings for x86 assembly programmers and find out

Have a good day bois!

Natural Philosophy / "... there should be time no longer"

« Last post by CultLeader on January 04, 2023, 05:22:49 PM »

And the angel which I saw stand upon the sea and upon the earth lifted up his hand to heaven, And sware by him that liveth for ever and ever, who created heaven, and the things that therein are, and the earth, and the things that therein are, and the sea, and the things which are therein, that there should be time no longer: - Revelation 10:5-6

I'm at the point where I'm developing eden platform, which will be the first open source version of the pattern. I'll likely be able to release the early achy breaky version as a proof of concept in half a year.

I came across an interesting thing when developing this project. Now my infrastructure and apps are all data. And I picked nomad to be the scheduler, vault storage for secrets (I run full hashicorp stack and I think kubernetes is cancer). Same thing that happened with Java and C# happened with Kubernetes and Nomad. Java is a complete and utter garbage of a language, C# had the hindsight of all the mistakes Java made and used it to create paid but much more superior product. Kubernetes also is the first Java, yaml hell and Hashicorp used its bad practices to create much better and pleasant orchestrator to work with.

When I was developing the pattern of my infrastructure I had a very specific problem I needed to solve.

Nomad application needs to access a secret that depends on database existing. If you run infrastructure the first time, neither database nor application exists. We're still in the planning stage.

So, it would be very complex if first we deploy database, then generate passwords and only then put password to vault and application can access it. Nasty chain of dependencies.

Instead what I did, logical plan of all the applications is built in memory. Application states that it will want to connect to the database X. Then code is generated that secret simply exists there in the nomad .hcl job file. Then also code is generated to provision that secret, generate random 42 character string and put it in vault. Code is also generated to provision that database user to access it.

What typically happens in companies, you need access to the database. All things happen in chronological order:
1. Database is provisioned
2. User credentials are generated
3. Credentials are provided to the user
4. He uses the database

What, I did instead, everything is happening at once. Before even application is deployed it knows it will need so and so secret to access it. Before database is deployed it knows it will need so and so user for application which doesn't even exist yet. This way, with the pattern, I see everything at once. I see all the applications before they even exist. I see all the databases before they are even provisioned, and they perfectly work together the first time.

This is a true godlike power, knowing and having all information at once in one place in this small infrastructure world I am creating.

Now all these times where God predicted something will happen in the bible, why God has infinite power or how God can nudge anything to happen in his direction make perfect sense. God sees everything at once, there is no before and after in the eyes of the LORD.

But, beloved, be not ignorant of this one thing, that one day is with the Lord as a thousand years, and a thousand years as one day. - 2 Peter 3:8

LORD has the higher ways we cannot understand:
For my thoughts are not your thoughts, neither are your ways my ways, saith the LORD. For as the heavens are higher than the earth, so are my ways higher than your ways, and my thoughts than your thoughts. - Isaiah 55:8-9

God sees everything from above we can't see:
And he said unto them, Ye are from beneath; I am from above: ye are of this world; I am not of this world. - John 8:23

We, as flesh, think only of one dimensional world. What we see locally is what we react to, can see and touch. What we perceive as time is simply our restricted vision which is disconnected from the entire view of the universe that God has. One day I will also see as God sees, know as I am known of God.

For now we see through a glass, darkly; but then face to face: now I know in part; but then shall I know even as also I am known. - 1 Corinthians 13:12

So, like I mentioned, the pattern framework that I'm building will be the first of its kind. And as I mentioned, any design worth anything at all will have to inevitably resemble divine patterns found in heaven. This framework will have unlimited power within its own respective universe.

- We will know what ports are used for what service before infrastructure is even deployed. Port conflicts will be impossible, and we won't need to spin up machines to test this.
- We will know that too many services are scheduled on one node with memory requirements that go above bounds of machine.
- We will know that 100 applications will perfectly interact with binary queues with perfect typesafety before anything is ever deployed.
- We will know that queries of every application work and hit indexes before anything is deployed.

No more spinning up cluster of machines to see if things work together, all of that will become automatically generated assembly perfectly interacting with each other.

I'm very excited about the future of eden platform, and can't wait to release this project to the world. There is nothing like this.

Happy new year by the way!

Development Tips / The infinite whac-a-mole

« Last post by CultLeader on October 01, 2022, 11:24:46 AM »

Today, there are two ways to develop software under the heaven. Namely:
- A game of infinite whac-a-mole (virtually all of the software development today)
- Quadractic improvements (the pattern)

To understand today's game of infinite whac-a-mole let's create a matrix, rows are applications and columns are features

Code: [Select]

| application | monitoring | database access | queues | logging | proxying | load balancing | .. | feature 1000th |
|-------------+------------+-----------------+--------+---------+----------+----------------+----+----------------|
| app 1       |            |                 |        |         |          |                |    |                |
| app 2       |            |                 |        |         |          |                |    |                |
| app 3       |            |                 |        |         |          |                |    |                |
| ...         |            |                 |        |         |          |                |    |                |
| app 1000th  |            |                 |        |         |          |                |    |                |

This matrix is of many applications in rows interacting with the same features. There can be very many features and very many apps. In our imaginary matrix we have 1000 features and we have a thousand apps.

Today what happens in big companies, developers are being hired to whack moles in this table for some apps or features. Developers are being thrown to at this table, for instance, to build load balancers for the apps, to build persistence for the apps, or just develop apps and talk with infra if you need some components done.

Basically, developers can cover only small parts of this table. Imagine, performance team is created and they need to expose metrics for a certain application to collect them. They need to go through specific apps and mark only one square of this table as completed. Same with many other features. Sometimes improvements are quadratic, for instance, in kubernetes cluster logging is built once and all apps logs go to one place. Or, for instance, NixOS is a quadratic improvement because it gives reproducible builds if all the apps use it. But anyway, whenever your task is "go to such and such an app and fix this bug about such and such feature not working" - you're whacking one x on this infinite table. You're doing monkey's work putting out fires forever. For instance, some database query is not tested, returns null somewhere and system doesn't work and that's extra work. Or some golang garbage app leaks memory due to having very weak abstractions (don't forget to manually put your precious defer everywhere, faggots! I'll just use rust and don't do that at all and resources will be reclaimed automatically).

With the pattern we focus only on quadratic improvements. We do things once, and cover them for every app. We prevent entire class of bugs because we use only rust and ocaml. No nil pointer exceptions and no defer garbage like in golang. We cover database access for the entire column for every single app in the infrastructure - we know that queries that apps use are valid, backwards compatible and that hit indexes (no seq scans in production). We write once to check queries that apps use and we know all applications in our infrastructure will have the same property of only using valid queries that hit indexes. We write once prometheus exports as data for every app, then code is generated for that app to increment specific global variable about a metric and it will always work the first time for every app under the sun (if the prometheus variable is not used by the app that's a compiler warning as error and cannot end up in production). We have backwards compatible structs for every app and application can only use typesafe structures to access the queues and it will always work.

With the pattern, you typically do things once per feature column and all applications benefit. How many entries this matrix has, a million? Say it is million units of work. If one developer on average does 1000 units of work in a given year then you'd need 1000 developers to maintain all of this table. However, if you go quadratic, use the pattern, and you fill column for every application at once, that's only 1000 units of work and can be done by one developer in a reasonable amount of time. You see, when I said the pattern allows one person to do the work of 1000 developers I wasn't kidding. Square root of one million is one thousand, no? How much less work you'd have if you didn't have to test every application as thoroughly with integration tests and spinning up of machines? Knowing they interact with the rest of the system with perfect typesafety? Knowing your performance counters are automatically incremented when interacting with database or with queue? Counting every byte sent to queue or database? Knowing you automatically have monitoring of a thread for 100% of CPU usage of that thread? Knowing you have preplanned autoscaling policies in place ready?

That's why I don't consider big companies seriously. You'd think that people in google are supposed to be smart but golang is an utter half-assed trash. And that is easy to explain - they have thousands of developers to play infinite whac-a-mole. Tiny brain of an average googler never had to think about how to accomplish most work with the least hands. Most people who work there are a tiny cog in a wheel that do very specific small task and that are easily replaceable. Of course, just like in weight lifting, if you go into a work every day and only do ten sets with minimal weight your muscle won't grow. The only way to grow muscle is pushing it to the limit, doing few reps until failure with heavy weights you can barely lift.

The only way to reach infinite productivity is to be in a position where you're alone and you need to perform infinite amount of work. Like God is and God alone created everything:

Fear ye not, neither be afraid: have not I told thee from that time, and have declared it? ye are even my witnesses. Is there a God beside me? yea, there is no God; I know not any. - Isaiah 44:8

God of the bible doesn't know any gods beside him. Nobody helped him. Yet he alone had to create all the heaven and the earth and the sea and all that in them is. Leftist faggots, whenever they have their pathetic small work of being a kubernetes admin, they dismiss the ideas that replace them. Everything is relative! There's no panacea! Everything has pro's and con's! Yaml's are fine! Do more integration tests! We yaml maintaining monkeys are needed, pay us our salary! We need performance team! Ruby is fine, just write more tests and have more coverage! Many divided github repositories are fine, just do more integration tests! Hire more QA testers! When with the pattern when you go quadractic you can easily fire 90% of them and do things much faster.

Eventually 99% of developers will be easily replaceable and fireable, most work done today replaced with quadratic systems of the pattern. There will be only a need for a masculine plane developers that makes all things work together and understand how all things work through and through. Not some monkey who knows specific kubernetes yamls (which in the future will all be generated without mistake anyway with the pattern) and isn't capable of anything else. The developer that will be most useful in the future will be someone who is in God's image - knowing of how everything works in the system together. The rest will be people working in feminine plane who use the system in a typesafe way, filling in the blanks in the pattern system and hence will get much lower salaries due to much lower understanding requirements to do their job. Like in customer support where the vast majority of employees are women.

Just like when God created this earth - a huge framework where out of the box there is food, water, oxygen, sun for heat and energy, fertile soil, useful animals already and etc. - we just needed to use them without necessarily knowing intricate details of how everything works together.

Have a good day, bois!

The pattern V2 / Eden DB Improvements 1

« Last post by CultLeader on August 15, 2022, 07:59:30 AM »

Sup bois, some improvements were made to the EdenDB to allow it to be much more flexible than standard SQL.

Basically, I was experimenting of how to represent domain problems as data, and I had to start with the servers.

Code: [Select]

TABLE server {
  hostname TEXT PRIMARY KEY,
}

Then I wanted to solve the infinite problem - how to check that no server has duplicate ports.

But ports are used by many abstractions, say, docker containers, systemd services and so on.
How do we ensure that different abstractions that use ports do not take duplicate port on the same server?
I kinda fell little short.

Let's see how reserved port table looks.

Code: [Select]

TABLE reserved_port {
  number INT PRIMARY KEY CHILD OF server,
}

Port is child of server. Now, say, we have docker containers. How do we refer to the reserved ports?

Before improvements shown in this post this is invalid

Code: [Select]

TABLE docker_container {
  name TEXT PRIMARY KEY CHILD OF server,
}

TABLE docker_container_port {
  port_name TEXT PRIMARY KEY CHILD OF docker_container,
  reserved_port REF reserved_port,
}

We couldn't refer to a reserved port because there's no unique key.

But then I thought, wait a minute, if both elements are ancestors of server, why the uniqueness context for port must be global?

If you think about it, the uniqueness context for such table that is child of anything ought to be the common ancestor. And if we imagine that every tables common ancestor is imaginary root table, then it makes perfect sense - everything is unique in the root table context.

This is how current topology looks visualized:

So, basically, the concept of a foreign key would be much more useful, if REF to another table would refer to the uniqueness context of the parent of the referred table.

So, having many different abstractions we can reserve ports in the single server so they would never clash and we find out about this before production when some systemd service can't start!

Let's define some fictional docker container running on some server with reserved port:

Code: [Select]

DATA server {
  epyc-1
}

DATA STRUCT docker_container {
  hostname: epyc-1,
  name: postgres
  WITH docker_container_port {
    port_name: main_port,
    reserved_port: 5432
  }
}

This so far does not work because of an error that reserved port does not exist in server:

Code: [Select]

NonExistingForeignKey { table_with_foreign_key: "docker_container_port", foreign_key_column: "reserved_port", referred_table: "reserved_port", referred_table_column: "number", key_value: "5432" }

We must reserve the port also.

Code: [Select]

DATA reserved_port {
  epyc-1, 5432
}

Now stuff compiles. And we cannot add more `reserved_port` with same value or stuff fails, if we copy reserved_port statement once again, boom:

Code: [Select]

FoundDuplicateChildPrimaryKeySet { table_name: "reserved_port", columns: "(hostname, number)", duplicate_values: "(epyc-1, 5432)" }

However, there's an issue that we would need to repeat reserved_port statement along with defining docker containers. Which leads to another improvement done to EdenDB: add rows with lua.

Basically, instead of defining this data by hand, now we can run lua code and insert to any table we want:

Code: [Select]

INCLUDE LUA {
  data('reserved_port', { hostname = 'epyc-1', number = 5432 })
}

And, of course, you must refer existing table and you must only define existing columns or compiler yiels at you.

`data` internal edendb function implementation is so trivial that I'll simply post it here:

Code: [Select]

__do_not_refer_to_this_internal_value_in_your_code_dumbo__ = {}
function data(targetTable, newRow)
    local queue = __do_not_refer_to_this_internal_value_in_your_code_dumbo__

    if queue[targetTable] == nil then
        queue[targetTable] = {}
    end

    -- we simply accumulate values to insert in lua runtime and then process
    -- them in one go in rust
    table.insert(queue[targetTable], newRow)
end

We simply have a global queue of rows to be inserted from lua and we iterate this map and insert data defined in lua in one go.

So, now, to avoid repetition we can simply define lua function that defines a docker container by inserting into all appropriate places we want:

Code: [Select]

INCLUDE LUA {
  function docker_container(hostname, containerName, portName, portValue)
    data('docker_container', {
      hostname = hostname,
      name = containerName,
    })

    data('docker_container_port', {
      hostname = hostname,
      name = containerName,
      port_name = portName,
      reserved_port = portValue
    })

    data('reserved_port', {
      hostname = hostname,
      number = portValue,
    })
  end
}

Data order doesn't matter, all checks are processed only once all data has been inserted into tables.

And we just call this function once in lua of defining postgres container, while encapsulating complexity, and deleting previous data definitions for this docker container:

Code: [Select]

INCLUDE LUA {
  docker_container('epyc-1', 'postgres', 'main_port', 5432)
}

Of course, I wrote a terse funtion as an example, in production likely there will be named map arguments with many default parameters provided internally in the function to make this more practical.

To be more practical, we include lua files for library stuff (this feature already existed):

Code: [Select]

INCLUDE LUA "lib.lua"

Full working gist can be found here https://gist.github.com/cultleader777/175fb37cb2ce1c9cb317ce2d187a926b

So, these are the two recent improvements that gave eden data language quite more power:
1. allowing foreign keys to refer to other child rows in the entity
2. defining data from lua, you can script anything you possibly want

Good day bois.

The pattern V2 / Eden Data Language: introduction

« Last post by CultLeader on August 01, 2022, 05:35:15 PM »

https://github.com/cultleader777/EdenDB

So, this is a brand new language that focuses on... Simply defining data. Like YAML, but not garbage, typesafe, no nulls or NaNs, with schemas and blazing fast to query. Compiles and emits outputs in milliseconds and relentlessly checks logical errors to disallow funny stuff in production.

Let's have a taste, shall we?

Say we have a table of servers

Code: [Select]

TABLE server {
  hostname TEXT PRIMARY KEY,
  ram_mb INT,
}

They can have disks, as children

Code: [Select]

TABLE disks {
  disk_id TEXT PRIMARY KEY CHILD OF server,
  size_bytes INT,
}

Looks similar to SQL, who needs another SQL dialect? Ok, let's see how we can define server with disks, in three ways:

Code: [Select]

DATA server {
  my-precious-epyc1, 4096;
  my-precious-epyc2, 8192;
}

As you can see, much less verbose and straight to the point as opposed to

Code: [Select]

INSERT INTO ... statement.

There is no concept of insert statement because data is immutable.

Let's add some disks 1TB and 1.5TB disks that are children to server

Code: [Select]

DATA disks(hostname, disk_id, size_bytes) {
  my-precious-epyc1, root-disk, 1000000000000;
  my-precious-epyc2, root-disk, 1500000000000;
}

The column order

Code: [Select]

(hostname, disk_id, size_bytes) is explicit here but as we've seen with server default tuple order is assumed if ommited.

This is rather verbose of adding separate rows for every server we want into separate place. Also, we redundantly repeat disk hostname.

Could we have everything in one place, so that it would look structured?

We can, with the WITH statement, let's define the same but in more concise way:

Code: [Select]

DATA server {
  my-precious-epyc1, 4096 WITH disks {
    root-disk, 1000000000000;
  },
  my-precious-epyc2, 8192 WITH disks {
    root-disk, 1500000000000;
  },
}

And here's the magic. We can define child or foreign key elements with WITH keyword and key of parent element will travel to children. There can be many levels of children and WITH statement can also be nested arbitrarily deep. There's a bunch of logic to check for errors, you can read test suite.

But long story short, this is not the garbage of `.yaml` where you can write anything we want, then you run tests and pray it will work. All fields are checked of being appropriate type, all fields must be defined, no null values are allowed and all the stuff like that is checked. Consider this typesafety as strong as defining struct in Rust or OCaml and knowing it will never have nulls or funny values.

Also, there's another variation of `WITH` so that our elements begin to look completely like traditional nested maps in programming languages, this statement is also equivalent:

Code: [Select]

DATA STRUCT server [
  {
    hostname: my-precious-epyc1, ram_mb: 4096 WITH disks {
        disk_id: root-disk,
        size_bytes: 1000000000000,
    },
  },
  {
    hostname: my-precious-epyc1, ram_mb: 4096 WITH disks [{
        disk_id: root-disk,
        size_bytes: 1500000000000,
    }]
  }
]

So, we represented the same data in three ways and all this will end up as columns for us to analyze.

Aight, what can we do, how about our data correctness?

We can write SQL (thanks embedded sqlite) to prove anything we want about our data with good ol' SQL.

Let's write a proof that we don't have smaller disks than 10GB.

Code: [Select]

PROOF "no disks exist less than 10 gigabytes" NONE EXIST OF disks {
  SELECT rowid
  FROM disks
  WHERE size_bytes < 10000000000
}

How do we prove we have nothing of certain kind? By searching for it and finding none. This is important, because if this proof in eden data language fails, it will return rowid of the offending rows which can be printed out to user so he could figure out where to fix his data.

Okay, SQL proof might be a little overkill on this, we have also column checks (thanks embedded lua).

Here's how we could have defined the disks table to have same thing checked with lua:

Code: [Select]

TABLE disks {
  disk_id TEXT PRIMARY KEY CHILD OF server,
  size_bytes INT,

  CHECK { size_bytes >= 10000000000 }
}

Also, lets have a lua computed column for size in megabytes of the disk:

Code: [Select]

TABLE disks {
  disk_id TEXT PRIMARY KEY CHILD OF server,
  size_bytes INT,
  size_mb INT GENERATED AS { size_bytes / 1000000 },

  CHECK { size_bytes >= 10000000000 }
}

Lua snippet must return an integer or compiler yells at you. No nulls are ever allowed to be returned by computed columns either.

But wait, there's more!

Datalog is the most elegant query language I've ever seen my life. But for datalog to be of any practical use at all, we cannot have any queries that refer to non existing facts or rules. I've found a decent datalog implementation for rust https://crates.io/crates/asdi , unfortunately, it does not support `NOT` statements or arithmetic operators of less, more and etc. So, I'll show two datalog proofs against our data, one that works today (even though it is meaningless) and one that would be practical and would work as soon as asdi starts supporting less than operator.

Code: [Select]

PROOF "no disks exist less than 10 gigabytes, doesn't work today" NONE EXIST OF disks DATALOG {
  OUTPUT(Offender) :- t_disks__size_mb(Size, Offender), Size < 10000.
}

PROOF "just check that no disks with name 'doofus' exists, works" NONE EXIST OF disks DATALOG {
  OUTPUT(Offender) :- t_disks__disk_id("doofus", Offender).
}

As you can see, our computed column size_mb is available in datalog to query.

But wait, there's more! EdenDB exports typesafe Rust or OCaml code to work with. Basically, once everything is verified we can generate typesafe code and analyze our data with a programming language if we really want to perform some super advanced analysis of our data.

Let's run this:

Code: [Select]

edendb example.edl --rust-output-directory .

On my computer this compilation literally took 6 milliseconds (thanks rust!). It should never appear in a profiler in a compilation process, unlike most compilers these days...

Let's see the rust source that got compiled!

Code: [Select]

// Test db content
const DB_BYTES: &[u8] = include_bytes!("edb_data.bin");
lazy_static!{
    pub static ref DB: Database = Database::deserialize(DB_BYTES).unwrap();
}

// Table row pointer types
#[derive(Copy, Clone, Debug, serde::Deserialize)]
pub struct TableRowPointerServer(usize);

#[derive(Copy, Clone, Debug, serde::Deserialize)]
pub struct TableRowPointerDisks(usize);


// Table struct types
#[derive(Debug)]
pub struct TableRowServer {
    pub hostname: ::std::string::String,
    pub ram_mb: i64,
    pub children_disks: Vec<TableRowPointerDisks>,
}

#[derive(Debug)]
pub struct TableRowDisks {
    pub disk_id: ::std::string::String,
    pub size_bytes: i64,
    pub size_mb: i64,
    pub parent: TableRowPointerServer,
}


// Table definitions
pub struct TableDefinitionServer {
    rows: Vec<TableRowServer>,
    c_hostname: Vec<::std::string::String>,
    c_ram_mb: Vec<i64>,
    c_children_disks: Vec<Vec<TableRowPointerDisks>>,
}

pub struct TableDefinitionDisks {
    rows: Vec<TableRowDisks>,
    c_disk_id: Vec<::std::string::String>,
    c_size_bytes: Vec<i64>,
    c_size_mb: Vec<i64>,
    c_parent: Vec<TableRowPointerServer>,
}


// Database definition
pub struct Database {
    server: TableDefinitionServer,
    disks: TableDefinitionDisks,
}

// Database implementation
impl Database {
    pub fn server(&self) -> &TableDefinitionServer {
        &self.server
    }

    pub fn disks(&self) -> &TableDefinitionDisks {
        &self.disks
    }

    pub fn deserialize(compressed: &[u8]) -> Result<Database, Box<dyn ::std::error::Error>> {
        // boring deserialization stuff
        ...
    }
}

// Table definition implementations
impl TableDefinitionServer {
    pub fn len(&self) -> usize {
        self.rows.len()
    }

    pub fn rows_iter(&self) -> impl ::std::iter::Iterator<Item = TableRowPointerServer> {
        (0..self.rows.len()).map(|idx| {
            TableRowPointerServer(idx)
        })
    }

    pub fn row(&self, ptr: TableRowPointerServer) -> &TableRowServer {
        &self.rows[ptr.0]
    }

    pub fn c_hostname(&self, ptr: TableRowPointerServer) -> &::std::string::String {
        &self.c_hostname[ptr.0]
    }

    pub fn c_ram_mb(&self, ptr: TableRowPointerServer) -> i64 {
        self.c_ram_mb[ptr.0]
    }

    pub fn c_children_disks(&self, ptr: TableRowPointerServer) -> &[TableRowPointerDisks] {
        &self.c_children_disks[ptr.0]
    }

}

impl TableDefinitionDisks {
    pub fn len(&self) -> usize {
        self.rows.len()
    }

    pub fn rows_iter(&self) -> impl ::std::iter::Iterator<Item = TableRowPointerDisks> {
        (0..self.rows.len()).map(|idx| {
            TableRowPointerDisks(idx)
        })
    }

    pub fn row(&self, ptr: TableRowPointerDisks) -> &TableRowDisks {
        &self.rows[ptr.0]
    }

    pub fn c_disk_id(&self, ptr: TableRowPointerDisks) -> &::std::string::String {
        &self.c_disk_id[ptr.0]
    }

    pub fn c_size_bytes(&self, ptr: TableRowPointerDisks) -> i64 {
        self.c_size_bytes[ptr.0]
    }

    pub fn c_size_mb(&self, ptr: TableRowPointerDisks) -> i64 {
        self.c_size_mb[ptr.0]
    }

    pub fn c_parent(&self, ptr: TableRowPointerDisks) -> TableRowPointerServer {
        self.c_parent[ptr.0]
    }

}

To get rows from table we can only get them through TableRowPointer special type for every single table. This type is basically wrapped usize and is a zero cost abstraction in rust. But, if you were to use raw usize you'd have a chance of passing TableRowPointerDisks to the server table, which is impossible here. TableRowPointer is just an index to arrays of our column or row stored data.

There are only two ways we can get TableRowPointer for every table:
1. We do a seq scan through entire table to get valid pointers
2. Pointers to other tables may exist as a column in a table. For instance, we can get generated .parent from disk row, or children_disks column generated for server table

Database is immutable and can never be mutated. User in rust can only ever get immutable references to columns. Meaning, you can run thousands of proofs in parallel against this data without ever worrying about locks/consistency and etc. Once your database is compiled it is a done deal and it will never change.

So we don't have nasty garbage in our api like, does this row exist? I better return an Option! No, if you ever have a TableRowPointer value it will ALWAYS point to a valid row in a one and only unique table. You can clone that pointer for free, store it in million other places and it will always point to the same column in the same table.

Of course, we should prefer using column api for performance, getting entire table row is added for convenience.

Deserialization assumes certain order in the include_bytes!() binary data and does not maintain data about tables already read/etc - we just assume that we wrote things in certain order in edendb and read them back in the same order. Also, binary data for rust is compressed with lz4 compression.

OCaml api is analogous and gives the same guarantees. Immutable data, separate types for table pointers for each table and so on.

Okay, let's do something practical with this, let's prove that no intel disks are never next to some cheap crucial disks to not ever humiliate them like that!

We'll change schema a little bit, let's add, what is effectively, a disk manufacturer enum:

Code: [Select]

TABLE disk_manufacturer {
  model TEXT PRIMARY KEY,
}

DATA EXCLUSIVE disk_manufacturer {
  intel;
  crucial;
}

Code: [Select]

DATA EXCLUSIVE says, that this data can be only defined once. Otherwise, we could keep filling up this enum with many statements in different places, this effectively makes this table an enum.

Also, let's add our disks to contain the make column with this data:

Code: [Select]

TABLE disks {
  disk_id TEXT PRIMARY KEY CHILD OF server,
  size_bytes INT,
  size_mb INT GENERATED AS { size_bytes / 1000000 },
  make REF disk_manufacturer,

  CHECK { size_bytes >= 10000000000 }
}

Code: [Select]

REF disk_manufacturer means this column has to refer by primary key to a single row in disk_manufacturer, you can never add invalid disk manufacturer value in the disk table or compiler yells.

Now our data is redefined with disk manufacturer:

Code: [Select]

DATA STRUCT server [
  {
    hostname: my-precious-epyc1, ram_mb: 4096 WITH disks {
        disk_id: root-disk,
        size_bytes: 1000000000000,
        make: intel,
    },
  },
  {
    hostname: my-precious-epyc2, ram_mb: 8192 WITH disks [{
        disk_id: root-disk,
        size_bytes: 1500000000000,
        make: intel,
    },{
        disk_id: data-disk,
        size_bytes: 1200000000000,
        make: crucial,
    }]
  }
]

So, how would the rust source look to prove that no intel disk is next to a cheap crucial disk? There ya go, full main.rs file:

Code: [Select]

#[macro_use]
extern crate lazy_static;

#[allow(dead_code)]
mod database;

fn main() {
    let db = &database::DB;
    db.disks().rows_iter().filter(|i| {
        // we only care about intel disks
        db.disk_manufacturer().c_model(db.disks().c_make(*i)) == "intel"
    }).for_each(|intel_disk| {
        let parent = db.disks().c_parent(intel_disk);
        let children_disks = db.server().c_children_disks(parent);

        for pair in children_disks.windows(2) {
            match (
                db.disk_manufacturer().c_model(db.disks().c_make(pair[0])).as_str(),
                db.disk_manufacturer().c_model(db.disks().c_make(pair[1])).as_str(),
            ) {
                ("intel", "crucial") | ("crucial", "intel") => {
                    panic!("Are you kidding me bro?")
                }
                _ => {}
            }
        }
    });
}

Do you miss endless matching of some/none options to check if some value by key exists in the table or endless unwraps? Me neither. We just get all the values directly because we know they will always be valid given we have table row pointers which can only ever be valid.

Full example project can be found here

So, basically, let's recap all the high level tools EdenDB gives to be successful working and building assumptions about your data:

1. SQL proofs
2. Lua checks/computed columns
3. Datalog (so far meh)
4. Analyze data directly in Rust or OCaml if all else fails

Basically, 4 ways to analyze the same data from four different languages and all data is statically typed and super fast/column based in the end.

Also, there are these features I didn't mention:
- include multiple files syntax
- include lua files
- unique constraints
- sqlite materialized views
- countless checks for correctness (check enum count in errors.rs file)

To see all features of EdenDB check out the tests, because I usually try to test every feature thoroughly.

And more will be added as I see fit.

What can I more say? Having such foundations, time to build the most productive open source web development platform of all time 😜

Pages: [1] 2 3 4

My Coding Cult

News:

Recent Posts

The pattern V2 / Eden DB Improvements 3

My Rants / ChatGPT - most overrated programming tool of all time

Development Tips / Feature velocity

Development Tips / Looking down

The pattern V2 / Eden DB Improvements 2

Development Tips / Testing assembly much?

Natural Philosophy / "... there should be time no longer"

Development Tips / The infinite whac-a-mole

The pattern V2 / Eden DB Improvements 1

The pattern V2 / Eden Data Language: introduction