Neither if nor while; neither map nor reduce
Relational algebra as an alternative to imperative and functional programming.
Introduction
I’ve been advocating for relational algebra for over 20 years now. For example, I wrote Relations as First-Class Citizen more than 10 years ago. More recently, together with my colleague Felix, we launched https://www.relational-algebra.dev/ to document the Bmg library used in the Klaro Cards backend.
Relational algebra is often categorized as "only for databases." No one seems to realize that it's a standalone tool, which can be applied in regular programming with excellent results: less verbosity, more productivity, and greater correctness. That’s what I aim to demonstrate here.
The example
We’ll build upon the example from the previous post but make it a bit more complex. We want to compute the columns of the Kanban "organized by month," but also the groups of cards to display in each column. The user should be able to show or hide empty columns.
Before continuing to read, I invite you to take a moment for yourself. How would you implement this feature?
- It’s simple: group blog posts by month
- But not simplistic: we need all the months between two dates, even those without blog posts
You can revisit the previous solution which computed the necessary columns. In this blog post, I’m not going to improve the TypeScript algorithm given at the end of the article. Instead, I’ll walk through how to use relational algebra step by step for a solution with no loops, no ifs, no maps, and no reduce.
A Data-Driven approach
One thing that might surprise you—because it’s unusual in object-oriented programming—is that we’re going to follow a resolutely data-oriented, or informational, approach. In other words, I won’t be bothering with classes or domain objects like Article
or KanbanColumn
. These classes and objects are implementation details (if that statement stings a bit, see the last section of this article).
Our problem and its solution are illustrated as follows.
As input, we have blog articles in a Relation
:
+----+------------------+----------------------------------+
| id | publication_date | title |
+----+------------------+----------------------------------+
| 1 | 2025-01-02 | Out of the Tar Pit |
| 2 | 2025-01-17 | Relations as First-Class Citizen |
| 3 | 2025-03-11 | Bmg, a Relational Algebra |
+----+------------------+----------------------------------+
That is, in JavaScript—or here, Ruby:
articles = [
{ id: 1, publication_date: "2025-01-02", title: "Out of the Tar Pit" },
{ id: 2, publication_date: "2025-01-17", title: "Relations as First-Class Citizen" },
{ id: 3, publication_date: "2025-03-11", title: "Bmg, a Relational Algebra" }
]
And the result we’re aiming for should look like the Relation
below (it contains "sub"-Relations as values, which is perfectly allowed by the theory).
This Relation captures the following information:
- In January, two articles were published: "Out of the Tar Pit" on the 2nd and "Relations as First-Class Citizen" on the 17th;
- February saw no publications;
- March had one publication: "Bmg, a Relational Algebra" on the 11th.
Or in summary: In month [month], the listed [articles] were published.
In the following, I will always include this kind of “informational semantics” as comments. It is essential to the approach: it allows us to reason about what we want, but also to document the solution.
// In month [month], the listed [articles] were published
+------------+--------------------------------------------------------------+
| month | articles |
+------------+--------------------------------------------------------------+
| 2025-01-01 | +----+------------------+----------------------------------+ |
| | | id | publication_date | title | |
| | +----+------------------+----------------------------------+ |
| | | 1 | 2025-01-02 | Out of the Tar Pit | |
| | | 2 | 2025-01-17 | Relations as First-Class Citizen | |
| | +----+------------------+----------------------------------+ |
| |
| 2025-02-01 | +----+------------------+----------------------------------+ |
| | | id | publication_date | title | |
| | +----+------------------+----------------------------------+ |
| | +----+------------------+----------------------------------+ |
| |
| 2025-03-01 | +----+------------------+----------------------------------+ |
| | | id | publication_date | title | |
| | +----+------------------+----------------------------------+ |
| | | 3 | 2025-03-11 | Bmg, a Relational Algebra | |
| | +----+------------------+----------------------------------+ |
+------------+--------------------------------------------------------------+
That is:
result = [
{
month: "2025-01-01",
articles: [
{ id: 1, publication_date: "2025-01-02", title: "Out of the Tar Pit" },
{ id: 2, publication_date: "2025-01-17", title: "Relations as First-Class Citizen" }
]
},
{
month: "2025-02-01",
articles: [
]
},
{
month: "2025-03-01",
articles: [
{ id: 3, publication_date: "2025-03-11", title: "Bmg, a Relational Algebra" }
]
}
]
Implementation overview
The solution fits in just a few lines. You can also find the full code on GitHub. Don’t worry—we’ll take a detailed look at everything in the following sections.
# 1. Compute the month of each article
monthly_articles = articles.extend({
month: ->(t) { t[:publication_date].beginning_of_month },
})
# 2. Find min and max month
minmax = monthly_articles.summarize([], {
:min => Bmg::Summarizer.min(:month),
:max => Bmg::Summarizer.max(:month),
}).one
# 3. Generate all months of interest
all_months = Bmg::Relation.generate(minmax[:min], minmax[:max], step: ->(d){ d.next_month }, as: :month)
# 4. Find, for each month, the articles published
articles_per_month = all_months.image(monthly_articles, :articles, [:month])
# 5. Remove empty months
result = articles_per_month.exclude(->(t) { t[:articles].empty? })
As the code suggests, we break the problem down into subproblems:
- Compute the publication month for each article, using
extend
- Find the first and last month, using
summarize
- Generate the full list of months to consider, using
generate
- Attach to each month its list of articles, using
image
- Optional: remove empty months, using
restrict
orexclude
Although I use the term "list" below, we will only be using Relations and the operators mentioned. Each takes a Relation as input and produces a Relation as output. I will show all intermediate results.
Compute the month of each publication
The extend
operator extends its input relation with one or more computed attributes. Here, we simply compute the start date of the month using ActiveRecord#beginning_of_month
, which now appears as a new attribute called month
:
monthly_articles = articles.extend({
month: ->(t) { t[:publication_date].beginning_of_month },
})
// The publication [id] was published in the month of [month]
+----+------------------+----------------------------------+------------+
| id | publication_date | title | month |
+----+------------------+----------------------------------+------------+
| 1 | 2025-01-02 | Out of the Tar Pit | 2025-01-01 |
| 2 | 2025-01-17 | Relations as First-Class Citizen | 2025-01-01 |
| 3 | 2025-03-11 | Bmg, a Relational Algebra | 2025-03-01 |
+----+------------------+----------------------------------+------------+
Find the first and last months
The summarize
operator allows us to find the two months that will form the boundaries of the Kanban board. It’s somewhat equivalent to SELECT MIN(...) GROUP BY ...
in SQL
: it lets us “summarize” the information represented in a relation. Here, we'll use it with an empty set of grouping attributes ([]
) and specify that we want to compute the smallest (min
) and largest (max
) month:
minmax_relation = monthly_articles.summarize([], {
:min => Bmg::Summarizer.min(:month),
:max => Bmg::Summarizer.max(:month),
})
The result is as follows, with its informational semantics in the comment:
// The first month in which an article was published is [min] and the last is [max]
// (each represented by its first day)
+------------+------------+
| min | max |
+------------+------------+
| 2025-01-01 | 2025-03-01 |
+------------+------------+
Generate the full List of months to consider
Here we will apply the same technique as in the previous blog post, the equivalent of generate_series
in PostgreSQL:
minmax = minmax_relation.one
months = Bmg::Relation.generate(minmax[:min], minmax[:max], step: ->(d){ d.next_month }, as: :month)
The first line uses one
, a non-relational operator that extracts the one and only record from a relation.
{ min: '2025-01-01', max: '2025-03-01' }
The generate
operator generates a relation (without taking another relation as input). Here we give it our two min and max dates and an increment function, and it generates the following relation:
// The month [month] lies between two months in which articles were published
+------------+
| month |
+------------+
| 2025-01-01 |
| 2025-02-01 |
| 2025-03-01 |
+------------+
Attach to each month its list of articles
For this final step, we use a somewhat magical operator, image
. This operator extends its input relation—the "list" of months—with a new attribute articles
. This attribute will be the sub-relation obtained by filtering the articles that share the same value for the common attribute (:month
). It is similar to a LEFT JOIN + ARRAY_AGG
in PostgreSQL:
articles_per_month = months.image(monthly_articles, :articles, [:month])
// The month [month], the listed [articles] were published
+------------+--------------------------------------------------------------+
| month | articles |
+------------+--------------------------------------------------------------+
| 2025-01 | +----+------------------+----------------------------------+ |
| | | id | publication_date | title | |
| | +----+------------------+----------------------------------+ |
| | | 1 | 2025-01-02 | Out of the Tar Pit | |
| | | 2 | 2025-01-17 | Relations as First-Class Citizen | |
| | +----+------------------+----------------------------------+ |
| |
| 2025-02 | +----+------------------+----------------------------------+ |
| | | id | publication_date | title | |
| | +----+------------------+----------------------------------+ |
| | +----+------------------+----------------------------------+ |
| |
| 2025-03 | +----+------------------+----------------------------------+ |
| | | id | publication_date | title | |
| | +----+------------------+----------------------------------+ |
| | | 3 | 2025-03-11 | Bmg, a Relational Algebra | |
| | +----+------------------+----------------------------------+ |
+------------+--------------------------------------------------------------+
Another way to understand image
is to see it as a shortcut for extend + restrict
(restrict
is introduced in the next section):
articles_per_month = months.extend({
articles: ->(t) { monthly_articles.restrict(month: t[:month]) }
})
Hide or show empty months
If you want to hide empty months, you can use the restrict
result = articles_per_month.restrict(->(t) { !t[:articles].empty? })
Or more elegantly, exclude
, its opposite:
result = articles_per_month.exclude(->(t) { t[:articles].empty? })
// The month [month], the listed [articles] were published
+------------+--------------------------------------------------------------+
| month | articles |
+------------+--------------------------------------------------------------+
| 2025-01 | +----+------------------+----------------------------------+ |
| | | id | publication_date | title | |
| | +----+------------------+----------------------------------+ |
| | | 1 | 2025-01-02 | Out of the Tar Pit | |
| | | 2 | 2025-01-17 | Relations as First-Class Citizen | |
| | +----+------------------+----------------------------------+ |
| |
| 2025-03 | +----+------------------+----------------------------------+ |
| | | id | publication_date | title | |
| | +----+------------------+----------------------------------+ |
| | | 3 | 2025-03-11 | Bmg, a Relational Algebra | |
| | +----+------------------+----------------------------------+ |
+------------+--------------------------------------------------------------+
An alternative, a bit less naive, is to avoid the effort of generating all months with generate
if you are going to remove them afterward. The next section offers an alternative along these lines.
Going further
The previous section showed how to apply pure relational algebra to the problem of grouping articles by month. In this section, we broaden the scope: how to encapsulate the whole thing into a function, and how to test it?
The complete program
If you want a function that takes the articles and returns the desired grouping, with or without empty months, you can encapsulate all the code like this:
def group_articles_per_month(articles, hide_empty_columns = false)
monthly_articles = articles.extend({
month: ->(t) { t[:publication_date].beginning_of_month },
})
all_months = if hide_empty_columns
monthly_articles.project([:month])
else
minmax = monthly_articles.summarize([], {
:min => Bmg::Summarizer.min(:month),
:max => Bmg::Summarizer.max(:month),
}).one
Bmg::Relation.generate(minmax[:min], minmax[:max], step: ->(d){ d.next_month }, as: :month)
end
all_months.image(monthly_articles, :articles, [:month])
end
In this variant, the project
operator is used to extract from the original relation the months with at least one published article. This operator never returns duplicates, so the month of January appears only once:
// monthly_articles.project([:month])
// The month [month] has at least one article published
+------------+
| month |
+------------+
| 2025-01-01 |
| 2025-03-01 |
+------------+
Where do the data come from?
The advantage of this approach is that the source of the articles doesn’t really matter. It can be a literal (for example, in a unit test):
articles = Bmg::Relation.new([...])
result = group_articles_per_month(articles)
A JSON file will also do, if you convert the publication date into an instance of the Date
class. The transform
operator allows applying a transformation to all values of an attribute. Bmg supports using a class that has a parse
method:
articles = Bmg::Relation
.json_file("articles.json")
.transform(publication_date: Date)
# shortcut for .transform(publication_date: ->(str) { Date.parse(str) })
result = group_articles_per_month(articles)
Finally, relations can obviously come from an SQL database:
database = Sequel.connect('sqlite://blog.sqlite3')
articles = Bmg.sequel(:articles, database)
result = group_articles_per_month(articles)
Testing the solution
In the previous post I already talked about post-conditions in the specification, which guided the testing effort. In this section, I propose to review them, still using relational algebra.
This testing effort is somewhat over the top, since it partly involves deconstructing what we just built. Still, it remains a nice exercise in algebra. We'll start by helping ourselves a bit for what follows by adding the next month for each row:
result = group_articles_per_month(articles).extend({
:next_month => ->(t) { t[:month].next_month }
})
Testing a single case
Based on the example, a first test allows us to intuitively verify that we have the desired results. If we simply count the results, we can take the following approach:
assert_equal(
result.count,
3
)
assert_equal(
result
.transform({
:articles => :count,
:month => :to_s
})
.y_by_x(:articles, :month),
{
"2025-01-01" => 2,
"2025-02-01" => 0,
"2025-03-01" => 1
}
)
In this example, the y_by_x
method allows us to leave the relational world and obtain a mapping between the month names and the number of publications for each.
Complete colution
The solution is complete if all the original articles are present in the final result. In other words, none are missing:
assert_empty (
articles
.not_matching(result.ungroup(:articles), [:id])
)
We use the not_matching
operator, which is equivalent to WHERE NOT EXISTS
in SQL
with an equality clause on the id.
Correct solution
The solution is correct if the publication date of each article is indeed in the indicated month. That is, there is no article that violates this constraint:
assert_empty (
result
.ungroup(:articles)
.exclude(Predicate.between(:publication_date, :month, :next_month))
)
We use the ungroup
operator which somewhat reverses what image
did earlier and returns a "flat" Relation.
Continuous solution
The solution is continuous if all months are represented between the min
month and the max
month. In other words, if every month is also a following month, except for the first:
assert_empty (
result
.exclude(month: minmax[:min])
.not_matching(
result
.project([:next_month])
.rename(next_month: :month),
[:month]
)
)
The project
and rename
operators allow us to compare the results to themselves. This is probably the least intuitive condition. Think of it as testing the following assertion: except for the first, there is no month that is not the following month of another.
Minimal solution
Finally, the solution is minimal if the first and last months have at least one article.
assert_empty (
result
.restrict(Predicate.in(:month, minmax.values))
.restrict(Predicate.empty(:articles))
)
Conclusion
I have presented here a relational approach to solve a simple but non-trivial data grouping problem. The solution is original in the sense that it does not use any imperative programming constructs (if
, repeat
, while
, until
, foreach
) nor functional programming constructs (map
, reduce
, fold
), nor recursion. The relational approach is both powerful, elegant, and concise. It allows implementing the solution as well as testing the specification in terms of post-conditions.
I hope you enjoyed this article and that it showed you something you did not know before :-) You can find the full code on GitHub.
The Bmg library is used here and offers relational algebra to Ruby programmers. It is inspired by a more theoretical algebra Tutorial D, proposed by Chris Date & Hugh Darwen in their books on the relational model, and implemented in the tool Rel. If you like this approach and/or Bmg
, know that I recently started a Typescript implementation, to which you are invited to contribute on GitHub: https://github.com/enspirit/bmg.js
Finally, if you appreciate this work, the best way to support us is to use Klaro Cards.
Appendix: Classes and Objects, an implementation detail?
If you are an object-oriented enthusiast, you might be wondering: why not introduce any class like Article
, Month
, List<Article>
, or KanbanColumn<Article>
to represent the domain concepts?
The "resolutely data-oriented" approach introduced here reflects what "flows" from one architectural component to another: information. Indeed, frontend and backend do not exchange classes and objects. They exchange data that capture useful information.
It is in this sense that I dared to say that classes and objects are an implementation detail: one can build software where frontend and backend collaborate perfectly without ever sharing classes and objects. These latter are not essential. However, one cannot create this software without sharing the information. That, for sure, is essential.
This also recalls Codd’s principle of essentiality from the Great Debate. But that is a topic for another blog post.