What version should I use?

I use the top of the tree. We really don’t break the build often, and we fix a lot of bugs. The latest release is 3.8, and the top-of-tree is 3.9-SNAPSHOT. You can download releases from SourceForge but I recommend checking out the code and checking out a tag if you want that release. Building it yourself is good practice – you’ll have the code loaded into your IDE that way as well, so you can poke around and insert print statements, debug, etc.

How do I get support?

There is a discussion board on SourceForge, although if you were to post on StackOverflow so I could write answers and get upvotes, that would be nice, too.

How do I submit a bug?

https://sourceforge.net/p/lemur/bugs/new/ Add a tag galago if you could, since bugs are shared with the rest of the lemur project.

How do I get started?

A post-doc from our lab and I put some documentation together a while back for a hackathon. You can read a fairly-recent version from her medium post: Galago - the secret documentation

Are the tests passing?

I have a cron job that mirrors Galago to github so that the tests run at least once an hour while we’re changing things in a clean environment, thanks to Travis CI: Build Status.

The github mirror will not accept pull requests. All development is really done in mercurial on sourceforge, as part of the lemur project, but it’s nice to have a backup sometimes.

Where’s the documentation for X?

The best documentation are the tests. They don’t get out of date since they are compiled and run. They tend to show the whole workflow too, as they build an index and run a query against them.

How do I find the tests for X?

  1. Get a Java IDE:
    • Intellij IDEA is my favorite and free for educational use.
    • Netbeans is free and good because it supports Galago out of the box
    • Eclipse if you want to use eclipse, you’ll need to install the m2eclipse plugin.
  2. Search by class name. All the test classes match the pattern *Test so they’re easy to pick out. LocalRetrievalTest::testSimple has some code that runs some query-likelihood queries, against a hand-crafted posting list.

How do I use the command line tool?

Well, if you want to index gov2 and you have a lot of threads, check out our script from the IR-Reproducibility challenge dotgov2.sh.

What is drmaa.jar and why do I need it?

If you’re trying to build Galago, you might get stuck on this dependency. Run the scripts/installib.shto do so. This jar, although not available on maven central allows galago to communicate with Sun Grid Engine job schedulers, which is only relevant if you have a cluster with qsub. You can use all of galago without it.

But Java’s a terrible language, it’s so verbose.

If you like Javascript/node.js, the new Java 8 engine is called Nashorn. If you like ML or OCaml, I recommend Kotlin over Scala because the former generates more efficient code, and last I looked had a better IDE. If you’re a lisp person, there’s a great community around Clojure and it interops the best aside from Kotlin.