For a few years now I have been in the business of counting, enumerating and axiomatizing combinatorial objects using SAT solvers. Not every type of object is suitable for this; it needs to have a definition which can be easily translated to implications among boolean variables (more precisely, a conjunction of variables implying a disjunction of variables). Once an axiomatization is available, SAT solvers can be used to check if a given object satisfies the axioms or to count or list all extensions of a given partially defined object.
SAT solvers are truly marvellous pieces of software which get better and better every year, thanks to regular competitions organized by the research community. I have applied this scheme extensively to semigraphoids, gaussoids and variants of matroids. Although most of these computations have gone unused and unpublished, there is great value in the general idea of using SAT solvers to search, count and enumerate combinatorial objects.
Recently I looked over our preprint on Sign patterns of principal minors of real symmetric matrices and noticed that the number of admissible sign patterns on 6 items was only computed using the probabilistic model counter GANAK (code). Since the result carries uncertainty, we decided not to include it in our submission of the sequence to the OEIS. We tried the old exact model counter sharpSAT (code) on this problem and it ran out of memory. This prompted me to check out recent developments in model counting software and see if there are any new exact model counters which are strong enough for this problem. I found sharpSAT-TD (code) from the University of Helsinki which won some first prizes in the model counting competition earlier this decade. It confirmed GANAK’s probabilistic count of in just half a minute. I was very impressed and so I wanted to try it on something else.
For different reasons but at the same time, I was looking at antimatroids. Like the structures I was experienced with, antimatroids have a simple propositional axiomatization in terms of feasible sets:
The first one can be written as , where the brackets denote a boolean variable corresponding to the set they enclose. The second axiom one is . These axioms fit perfectly into the SAT solver framework and give a relatively small and straightforward boolean formula.
With these two axioms, I kept getting the wrong count, compared to the OEIS A119770. Turns out this is fixed by requiring also:
Perhaps this is omitted from or hidden in the descriptions I have looked at but I think it is reasonable and makes my code yield the correct results for all numbers known at the OEIS.
The OEIS data is generated by a hand-crafted C++ program due to Przemysław Uznański and Michał Bartoszkiewicz. This program, crucially, enumerates antimatroids and can even reduce them modulo isomorphy (yielding A224913). This only works up to items and the OEIS entry contains a provocative line:
The next item (for n=8) should be roughly 2^78 and seems hopeless without more mathematics.
While I have to agree that is definitely a lot for a counting task, I would not call it hopeless. The loss of hope relies on the assumption that a program countig antimatroids needs to enumerate them. Model counters do not do that. If a satisfying assignment of the formula is reached using only out of variables. Then in fact satisfying assignments have been found. In this way, model counters are not bound to perform a number of steps which is bounded from below by the number of models to count.
To demonstrate this, let me show you also a program which counts forknesses per their definition in F. Matúš: On patterns of conditional independences and covariance signs among binary variables. A forkness is a ternary relation on a ground set which satisfies the following closure properties:
Note that by this axiomatization, the set of forknesses is closed under intersection and thus forms a meet-semilattice.
Here is the program:
#!/usr/bin/env perl
use Modern::Perl 2018;
use CInet::ManySAT;
my $n = shift // die 'need ground set size';
my @N = 1 .. $n;
sub getvar {
my ($i, $j, $k) = @_;
1 + ($i-1) + ($j-1)*$n + ($k-1)*($n**2)
}
# Definition of forkness
my $solver = CInet::ManySAT->new;
for my $i (@N) {
for my $j (@N) {
for my $k (@N) {
$solver->add([ -getvar($i,$j,$k), -getvar($i,$k,$j), getvar($j,$k,$j) ]);
$solver->add([ -getvar($i,$j,$k), getvar($i,$j,$j) ]);
$solver->add([ -getvar($i,$j,$k), getvar($j,$k,$k) ]);
$solver->add([ -getvar($i,$j,$k), getvar($k,$i,$i) ]);
$solver->add([ -getvar($i,$j,$i), getvar($j,$i,$j) ]);
$solver->add([ -getvar($i,$j,$i), -getvar($j,$k,$j), getvar($i,$k,$i) ]);
$solver->add([ -getvar($i,$k,$j), getvar($j,$k,$i) ]);
}
}
}
say $solver->count(td => 10, cs => 10_000);
I have incorporated sharpSAT-TD into my CInet::ManySAT
module so that I can use it conveniently in this script, without producing external CNF files.
$ for i in {2..8}; do time perl forknesses.pl $i; done
6 # real 0m10.078s
56 # real 0m10.049s
15026 # real 0m10.054s
1746994454 # real 0m10.083s
1235642810043131384 # real 0m10.164s
40822119528659637193235146998172 # real 0m14.519s
374299843632760183014518932671883409448485124695664 # real 4m41.598s
I managed to also get the number of 9-forknesses but it took long enough that I had to use a different machine. Here is the result and the timings of computing 7- and 8-forknesses for comparison:
$ for f in fork*.cnf; do time ./sharpSAT -decot 600 -tmpdir . -decow 10 -cs 9000 "$f" | tail -7; done
40822119528659637193235146998172 # real 10m9.299s
374299843632760183014518932671883409448485124695664 # real 29m24.513s
7237131063359733682672812567239149471890797777405500679038631641915376539420 # real 5077m26.470s
This last one took about 3.5 days but that machine is generally slower than my laptop (from which I got the first set of timings). The numbers of forknesses are non-obvious and way beyond . The last one is around . By “non-obvious” I mean that they are not just powers of two or such things where the structure is easy to detect and exploit. In fact, they seem to have some large prime factors. And yet, they can still be computed by a SAT solver in a reasonable amount of time.
These numbers are now recorded on OEIS sequence A382658.
If you want to try the hardest ones for yourself, here are the CNF files:
I did not manage to compute the number of 8-antimatroids using sharpSAT-TD, but after some encouragement by Henrik Winther I decided to share the above little success I had with sharpSAT-TD and a few challenges. I should mention that I ran the largest problems in each category on the compute server of the math department at UiT for a straight month without getting an answer. But perhaps in a few years, SAT solvers will have advanced enough to complete some of these tasks.
Gaussoids: gaussoids5.cnf is an easy benchmark problem: , solved in 14s of which 10 were spent in preprocessing by flowcutter
. Open for gaussoids6.cnf.
LUBF gaussoids: lubf8.cnf was computed years ago by enumeration; the number of models is . sharpSAT-TD confirms this in 57m. The formula here is quite big and the reading and preprocessing took already 2 minutes. sharpSAT-TD also gave me the next number for lubf9.cnf after 34h40m. This computation took long enough that I decided to run it on the UiT math compute server, which is generally a bit slower than my laptop. Since this number is still quite small, it is possible that an AllSAT solver might get it faster. Open for lubf10.cnf.
Semigraphoids: semigr5.cnf is known to have models, solved in 8m30s. Open for semigr6.cnf.
Orientations of the uniform Lagrangian matroid: signs6.cnf is known to have models, solved in 24s. Still open for signs7.cnf. Since I expected that one to finish, I also prepared signs8.cnf.
Antimatroids: I managed to replicate the OEIS number for anti7.cnf, which is , in 117m26s, just under two hours on my laptop. Open for anti8.cnf.
These are just some objects I am familiar with. I am sure that there are more combinatorial structures of interest which can be modeled by boolean formulas and counted by SAT solvers. If you have some suggestions or if you compute any of these numbers or are interested in setting up a large-scale computation to really tackle one of these open problems, do let me know!
Changelog: