Statusupdate CW29

Posted on Jul 26, 2021

RIP to our Raspberry Pi 4!

Long live… some random Intel NUC 6 that Jan had laying around!

OK, we’re not really sure the Raspi is dead, it just got in a “not doing any work” fit of hot and bothered. So we put it in a drawer and found a replacement machine, until we decide on our cloud-hosting solution. Though we’re pretty confident on the specs that HubGrep require, we want concrete confirmation first.

done

  • refactoring across the system to improve crawling overall
  • improve export/import of crawled repositories to Sphinx
  • start integrating Prometheus (initial setup)

challenges and what we’re doing

To follow up from last update, there were bugs we didn’t consider; just as prophesied! The kind that you go “well that was stupid” but still takes you time to find and reproduce. Another kind we had was at somewhere around 20mil public GitHub repositories there was one that contained a null-byte in the description. Null-bytes, of course, are not allowed in Postgres for our string fields, so that’s a crash. At this point we’re starting to really need a better overview than gut-feelings and logs, so Jan connected HubGrep to Prometheus and we can now see some nice graphs for how our crawling is going!