Our Milestones for the next 6 months

Posted on Mar 5, 2021

This was part of a our first status post, but i think it makes sense to have it separate.

We are planning the following milestones:

1. wrap the cli thing in a flask service

Since we started out on a preexisting cli prototype, which just fetches search results from several hosting backends, our first milestone is about extending that thing to be a flask webservice.

We want it to be able te run search queries to several backends: gitlab, gitea, github, bitbucket, maybe sourcehut - and whatever we encounter on our way. :)
Also, we need a first frontend version, which i am sure we will tweak all the time. We need to figure out some sorting algorithm, which orders, and maybe filters these search results in some sane way. Thats also a thing we could improve over time.
We need a basic project setup; a nice dev environment, test cases, versioning system, and so on.

TL;DR: basically, its the essential stuff to make a first working version.

The plan is to have this set up by the end of the month.

2. set up documentation

The second milestone is about making good documentation for the project. People should be able to run their own instance of HubGrep if they want to, and also be able to at least find their way to make their own changes to the souce code - or to write us issue tickets. :)

3. crawlers

We dont want this project to just a search proxy, when its done.
We want our own search index, and to have this, we need crawlers for all the different APIs, and a backend, collecting the results.

After this milestone we will have a database containing metainformation of all kinds of hosting backends, which we want to offer for download (maybe others have nice ideas what to do with it), and which is the base to generate an index for our own search engine.

We expect this (and the next) milestone to take the largest part of our time, so we try to get to this point as early as possible.

4. integrate collected stuff

Right now we have no idea how large the resulting dataset will be. But when we are at this point, we need to figure out a way to make it full-text searchable.

This is a research task for us, in large parts. Since we want this service to be usable without having hundrets of gigabytes of ram, its a possibility that we end up with a mix of searching our own database for all the smaller instances, and using the “old” search proxy backend for big instances like Github or BitBucket - but we will figure that out when we are there.

Goal of this milestone is having our own search index integrated in HubGrep.

5. (nice to have)

This is not a real milestone. We will probably have a lot of feature ideas on our way: this is to collect whatever comes to mind but is beyond scope :)