Posts

  • Importing CSV files with SQLite

    GitHub offers a very superficial view of how GitHub Actions runners are spending their minutes on private repositories. Currently, the only way to get detailed information about it is via the Get usage report button in the project/organization billing page. The only problem is that the generated report is a CSV file, shifting the responsibility of filtering and visualizing data to the user. While it’s true that most of the users of this report are used to deal with CSV files, be them developers or accountants experts in handling spreadsheets, this is definitely not the most user-friendly way of offering insights into billing data.

    When facing this issue, at first I thought about using harelba/q to query the CSV files directly in the command line. The problem is that q isn’t that straightforward to install, as apparently it is not available via apt nor pip, nor one is able to easily change the data once it’s imported, like in a regular database. In the first time I resorted to create a database on PostgreSQL and import the CSV file into it, but after that I never remember the CSV import syntax and it still requires a daemon running just for that. I kept thinking that there should be a simpler way: what if I use SQLite for that?

    In order to not have to CAST() each TEXT column whenever working with dates or numbers, the following schema.sql can be used:

    CREATE TABLE billing (
      date DATE,
      product TEXT,
      repository TEXT,
      quantity NUMERIC,
      unity TEXT,
      price NUMERIC,
      workflow TEXT,
      notes TEXT
    );
    

    After that, it’s possible to import the CSV file with the sqlite3 CLI tool. The --skip 1 argument to the .import command is needed to avoid importing the CSV header as data, given that SQLite considers it to be a regular row when the table already exists:

    $ sqlite3 github.db
    SQLite version 3.36.0 2021-06-18 18:58:49
    Enter ".help" for usage hints.
    sqlite> .read schema.sql
    sqlite> .mode csv
    sqlite> .import --skip 1 c2860a05_2021-12-23_01.csv billing
    sqlite> SELECT COUNT(*) FROM billing;
    1834
    

    Now it’s easy to dig into the billing data. In order to have a better presentation, .mode column can be enabled to both show the column names and align their output. We can, for instance, find out which workflows consumed most minutes in the last week and their respective repositories:

    sqlite> .mode column
    sqlite> SELECT date, repository, workflow, quantity FROM billing WHERE date > date('now', '-7 days') AND product = 'actions' ORDER BY quantity DESC LIMIT 5;
    date        repository         workflow                              quantity
    ----------  -----------------  ------------------------------------  --------
    2021-12-21  contoso/api        .github/workflows/main.yml            392
    2021-12-18  contoso/terraform  .github/workflows/staging-images.yml  361
    2021-12-22  contoso/api        .github/workflows/main.yml            226
    2021-12-21  contoso/api        .github/workflows/qa.yml              185
    2021-12-20  contoso/api        .github/workflows/main.yml            140
    

    Another important example of the data that can be fetched is the cost per repository in the last week, summing the cost of all their workflows. An UPDATE statement is required to apply a small data fix, given that the CSV contains a dollar sign $ in the rows of the price column that needs to be dropped:

    sqlite> UPDATE billing SET price = REPLACE(price, '$', '');
    sqlite> SELECT repository, SUM(quantity) * price AS amount FROM billing WHERE date > date('now', '-7 days') AND product = 'actions' GROUP BY repository;
    repository          amount
    ------------------  ------
    contoso/api         11.68
    contoso/public-web  0.128
    contoso/status      1.184
    contoso/terraform   2.92
    contoso/webapp      0.6
    

    Not intuitive as a web page where one can just click around to filter and sort a report, but definitely doable. As a side note, one cool aspect of SQLite is that it doesn’t require a file database do be used. If started as sqlite3, with no arguments, all of it’s storage needs are handled entirely in memory. This makes it even more interesting for data exploration cases like these, offering all of its queries capabilities without ever persisting data to disk.

  • Configuring firewalld on Debian Bullseye

    After doing a clean Debian 11 (Bullseye) installation on a new machine, the next step after installing basic CLI tools and disabling SSH root/password logins was to configure its firewall. It’s easy to imagine how big was my surprise when I found out that the iptables command wasn’t available. While it’s known for at least 5 years that this was going to happen, it still took me some time to let the idea of its deprecation sink and actually digest the situation. I scratched my head a bit wondering if the day I would be obliged to learn how to use nftables had finally came.

    While looking for some guidance on what are the best practices to manage firewall rules these days, I found the article “What to expect in Debian 11 Bullseye for nftables/iptables”, which explains the situation in a straightforward way. The article ends up suggesting that firewalld is supposed to be the default firewall rules wrapper/manager - something that is news to me. I never met the author while actively working on Debian, but I do know he’s the maintainer of multiple firewall-related packages in the distribution and also works on the netfilter project itself. Based on these credentials, I took the advice knowing it came from someone who knows what they are doing.

    A fun fact is that the iptables package is actually a dependency for firewalld on Debian Bullseye. This should not be the case on future releases. After installing it, I went for the simplest goal ever: block all incoming connections while allowing SSH (and preferably Mosh, if possible). Before doing any changes, I tried to familiarize myself with the basic commands. I won’t repeat what multiple other sources say, so I suggest this Digital Ocean article that explains firewalld concepts, like zones and rules persistency.

    In summary, what one needs to understand is that there are multiple “zones” within firewalld. Each one can have different sets of rules. In order to simplify the setup, I checked what was the default zone, added the network interface adapter to it and defined the needed rules there. No need for further granularity in this use case. Here, the default zone is the one named public:

    $ sudo firewall-cmd --get-default-zone
    public
    $ sudo firewall-cmd --list-all
    public
      target: default
      icmp-block-inversion: no
      interfaces:
      sources:
      services: dhcpv6-client ssh
      ports:
      protocols:
      forward: no
      masquerade: no
      forward-ports:
      source-ports:
      icmp-blocks:
      rich rules:
    

    Knowing that, it was quite simple to associate the internet-connected network interface to it and update the list of allowed services. dhcpv6-client is going to be removed because this machine isn’t on an IPv6-enabled network:

    $ sudo firewall-cmd --change-interface eth0
    success
    $ sudo firewall-cmd --add-service mosh
    success
    $ sudo firewall-cmd --remove-service dhcpv6-client
    success
    

    It’s important to execute sudo firewall-cmd --runtime-to-permanent after confirming the rules where defined as expected, otherwise they would be lost on service/machine restarts:

    $ sudo firewall-cmd --list-all
    public (active)
      target: default
      icmp-block-inversion: no
      interfaces: eth0
      sources:
      services: mosh ssh
      ports:
      protocols:
      forward: no
      masquerade: no
      forward-ports:
      source-ports:
      icmp-blocks:
      rich rules:
    $ sudo firewall-cmd --runtime-to-permanent
    success
    

    A side effect of the target: default setting is that it REJECTs packets by default, instead of DROPing them. This basically informs the client that any connections were actively rejected instead of silently dropping the packets - the latter which might be preferable. It’s confusing why it’s called default instead of REJECT, and also not clear if it’s actually possible to change the default behavior. In any case, it’s possible to explicitly change it:

    $ sudo firewall-cmd --set-target DROP --permanent
    success
    $ sudo firewall-cmd --reload
    success
    

    The --set-target option requires the --permanent flag, but it doesn’t apply the changes instantly, requiring them to be reloaded.

    An implication of dropping everything is that ICMP packets are blocked as well, preventing the machine from answering ping requests. The way this can be configured is a bit confusing, given that the logic is flipped. There’s a need to enable icmp-block-inversion and add (which in practice would be removing it) an ICMP block for echo-request:

    $ sudo firewall-cmd --add-icmp-block-inversion
    success
    $ sudo firewall-cmd --add-icmp-block echo-request
    success
    

    The result will look like this, always remembering to persist the changes:

    $ sudo firewall-cmd --list-all
    public (active)
      target: DROP
      icmp-block-inversion: yes
      interfaces: eth0
      sources:
      services: mosh ssh
      ports:
      protocols:
      forward: no
      masquerade: no
      forward-ports:
      source-ports:
      icmp-blocks: echo-request
      rich rules:
    $ sudo firewall-cmd --runtime-to-permanent
    success
    

    For someone who hadn’t used firewalld before, I can say it was OK to use it in this simple use case. There was no need to learn the syntax for nft commands nor the one for nftables rules and it worked quite well in the end. The process of unblocking ICMP ping requests is a bit cumbersome with the flipped logic, and could have been made simpler, but it’s still doable. All-in-all I’m happy with the solution and will look forward how to use it, for instance, in a non-interactive way with Ansible.

  • Exporting Prometheus metrics from Go

    Exporting Prometheus metrics is quite straightforward, specially from a Go application - it is a Go project after all, as long as you know the basics of the process. The first step is to understand that Prometheus is not just a monitoring system, but also a time series database. So in order to collect metrics with it, there are three components involved: an application exporting its metrics in Prometheus format, a Prometheus scraper that will grab these metrics in pre-defined intervals and a time series database that will store them for later consumption - usually Prometheus itself, but it’s possible to use other storage backends. The focus here is the first component, the metrics export process.

    The first step is to decide which type is more suitable for the metric to be exported. The Prometheus documentation gives a nice explanation about the four types (Counter, Gauge, Histogram and Summary) offered. What’s important to understand is that they are basically a metric name (like job_queue_size), possibly associated with labels (like {type="email"}) that will have a numeric value associated with it (like 10). When scraped, these will be associated with the collection time, which makes it possible, for instance, to later plot these values in a graph. Different types of metrics will offer different facilities to collect the data.

    Next, there’s a need to decide when metrics will be observed. The short answer is “synchronously, at collection time”. The application shouldn’t worry about observing metrics in the background and give the last collected values when scraped. The scrape request itself should trigger the metrics observation - it doesn’t matter if this process isn’t instant. The long answer is that it depends, as when monitoring events, like HTTP requests or jobs processed in a queue, metrics will be observed at event time to be later collected when scraped.

    The following example will illustrate how metrics can be observed at event time:

    package main
    
    import (
      "io"
      "log"
      "net/http"
    
      "github.com/gorilla/mux"
      "github.com/prometheus/client_golang/prometheus"
      "github.com/prometheus/client_golang/prometheus/promhttp"
    )
    
    var httpRequestsTotal = prometheus.NewCounter(
      prometheus.CounterOpts{
        Name:        "http_requests_total",
        Help:        "Total number of HTTP requests",
        ConstLabels: prometheus.Labels{"server": "api"},
      },
    )
    
    func HealthCheck(w http.ResponseWriter, r *http.Request) {
      httpRequestsTotal.Inc()
      w.WriteHeader(http.StatusOK)
      io.WriteString(w, "OK")
    }
    
    func main() {
      prometheus.MustRegister(httpRequestsTotal)
    
      r := mux.NewRouter()
      r.HandleFunc("/healthcheck", HealthCheck)
      r.Handle("/metrics", promhttp.Handler())
    
      addr := ":8080"
      srv := &http.Server{
        Addr:    addr,
        Handler: r,
      }
      log.Print("Starting server at ", addr)
      log.Fatal(srv.ListenAndServe())
    }
    

    There’s a single Counter metric called http_requests_total (the “total” suffix is a naming convention) with a constant label {server="api"}. The HealthCheck() HTTP handler itself will call the Inc() method responsible for incrementing this counter, but in a real-life application that would preferable be done in a HTTP middleware. It’s important to not forget to register the metrics variable within the prometheus library itself, otherwise it won’t show up in the collection.

    Let’s see how they work using the xh HTTPie Rust clone:

    $ xh localhost:8080/metrics | grep http_requests_total
    # HELP http_requests_total Total number of HTTP requests
    # TYPE http_requests_total counter
    http_requests_total{server="api"} 0
    
    $ xh localhost:8080/healthcheck
    HTTP/1.1 200 OK
    content-length: 2
    content-type: text/plain; charset=utf-8
    date: Sat, 14 Aug 2021 12:26:03 GMT
    
    OK
    
    $ xh localhost:8080/metrics | grep http_requests_total
    # HELP http_requests_total Total number of HTTP requests
    # TYPE http_requests_total counter
    http_requests_total{server="api"} 1
    

    This is cool, but as the metric relies on constant labels, the measurement isn’t that granular. With a small modification we can use dynamic labels to store this counter per route and HTTP method:

    diff --git a/main.go b/main.go
    index 5d6079a..53249b1 100644
    --- a/main.go
    +++ b/main.go
    @@ -10,16 +10,17 @@ import (
            "github.com/prometheus/client_golang/prometheus/promhttp"
     )
    
    -var httpRequestsTotal = prometheus.NewCounter(
    +var httpRequestsTotal = prometheus.NewCounterVec(
            prometheus.CounterOpts{
                    Name:        "http_requests_total",
                    Help:        "Total number of HTTP requests",
                    ConstLabels: prometheus.Labels{"server": "api"},
            },
    +       []string{"route", "method"},
     )
    
     func HealthCheck(w http.ResponseWriter, r *http.Request) {
    -       httpRequestsTotal.Inc()
    +       httpRequestsTotal.WithLabelValues("/healthcheck", r.Method).Inc()
            w.WriteHeader(http.StatusOK)
            io.WriteString(w, "OK")
     }
    

    Again, in a real-life application it’s better to let the route be auto-discovered in runtime instead of hard-coding its value within the handler. The result will look like:

    $ xh localhost:8080/metrics | grep http_requests_total
    # HELP http_requests_total Total number of HTTP requests
    # TYPE http_requests_total counter
    http_requests_total{route="/healthcheck",method="GET",server="api"} 1
    

    The key here is to understand that the counter vector doesn’t that mean multiple values will be stored in the same metric. What it does is to use different label values to create a multi-dimensional metric, where each label combination is an element of the vector.

  • RAID on the Ubuntu Server Live installer

    My first contact with Ubuntu was in 2006, a little after the first Long-Term Support (LTS) version 6.06 (Dapper Drake) was out. Although it still feels like yesterday, 15 years is a heck of a long time. Things were a bit different by then, as the Canonical LTS offer was of about 3 years on desktop and 5 years on server releases - instead of 5 years for both as it stands to this date. They even sent free CDs to anyone in the world, including shipping, from 2005 to 2011 when the initiative was ended. This may look stupid now, but downloading a CD over a 56k dial-up connection (which was still a thing in multiple parts of the world) used to take over a day. Even ADSL connections were not that much faster, as the most common ones were around 256-300 Kbps.

    It took me a few more years to use Linux on a desktop, which I did around the end of 2012, although I was using it on my servers at least since 2010 - the year I started to grab cheap VPS offers from LowEndBox. By 2013 I started to work with Herberth Amaral (which is also one of the most competent professionals I know), where Ubuntu was being used on the servers instead of Debian - the latter being Linux distribution I was used to. That didn’t make a huge difference, as both are quite similar when you don’t consider their desktop UI, but I still opted for Debian on my own machines.

    This trend continued when I started to contribute to the Debian Project in 2014, where I used a Debian server as my primary development machine. But, except for this server that I still have 7 years later, almost every other server I had or company that I worked on used Ubuntu - except for one employee that used CentOS. So by the end of last year when I realized that this machine wasn’t getting security updates for almost six months since the Debian Stretch support was ended, I started to think why not just install Ubuntu on it. By doing that, I could forget about this machine for 5 more years until the LTS support ended.

    To be fair, to say that I use Ubuntu on “almost every other server” is an understatement. Ubuntu is my go-to OS option on almost every kind of computing environment I use - except for my desktop which is a macOS since 2016. Ubuntu is the OS I use when starting a virtual machine with vagrant up, an EC2 instance on AWS or when I want to try something quick with docker run (although I use Alpine Linux frequently in this last use case). So opting for it on a server that is going to run for at least a few more years felt like a natural choice to me - at least until I faced their new server installer.

    To give a bit of context, by being a Debian-based distribution, Ubuntu used the regular Debian Installer for its server distribution until the 18.04 (Bionic Beaver) LTS release, when it introduced the Ubuntu Server Live Installer. It didn’t work for me, as by the time it didn’t support non-standard setups like RAID and encrypted LVM. This wasn’t a big deal, as it was quite easy to find ISOs with the classic installer, so I ignored this issue for a bit. The old setup offered the features I needed and my expectation was that it was a matter of time for the new installer to be mature enough to properly replace the former.

    Last year the new Ubuntu 20.04 (Focal Fossa) LTS release came, where the developers considered the installer transition to be complete. The notes mention the features I missed, so I thought that it would be a good idea to try it out. So let’s see how a RAID-0 installation pans out:

    Wait, what?! What do you mean by If you put all disks into RAIDs or LVM VGs, there will be nowhere to put the boot partition? GRUB supports booting from RAID devices at least since 2008, so I guess it’s reasonable to expect that a Linux distribution installer won’t complain about that 13 years later. To make sure I’m not crazy or being betrayed by my own memory, I tried the same on a Debian Buster installation:

    No complaints, no error messages. The installation went all the way and booted fine in the end. “Something is odd”, I thought. By comparing the two partitioning summaries, I noticed that the Debian one is using partitions as a base for the RAID setup, while the Ubuntu one is relying on the entire disks. I went back to the Ubuntu installer and tried to use similar steps. The problem now is that if the option Add GPT Partition is used for both devices, it creates two partitions on the first disk and only one on the second disk. So I dropped to a shell from the Live Server installer with ALT + F2 and manually created empty partitions on both disks with fdisk (cfdisk will do fine as well). After a reboot, I tried again:

    Well, the complaint went away. But after creating the RAID array, opting to format the newly device as ext4 and choosing / for its mount point, the Done button was still grayed out. Looking at the top of the screen again, the Mount a filesystem at / item was gone, so the last one that needed to be filled was the Select a boot disk. Clicking on one of the disks and selecting the option Use As Boot Device did the trick.

  • Slack threads are one honking great idea -- let's use more of those!

    Slack, one of the world’s most popular business communication platforms, launched the threaded conversations feature over 3 years ago. To this day, there are still people who don’t use them, be either for inertia, personal taste, or because they don’t understand its purpose. The goal of this article is to illustrate that by doing the effortless action of clicking the button to Start/View a thread when answering a message, you will be improving not only the life of your future self but doing a huge favor to all your colleagues.

    Threaded messages are not just a communication style. They rely on one single pillar to improve the chat tool usability: reduce distractions by giving freedom to their users. Freedom in the sense that they can choose which conversation streams to follow, without having to leave or mute channels - things that may not be wanted or even feasible. And there are many nice side-effects to get in return. Let’s go through some Slack messaging characteristics to better understand their implications.

    There’s no need to send multiple separate messages

    Which of the following examples is easier to understand:

    A bunch of messages sent separately?

    Or a single message containing all the needed information:

    It’s important to notice that the second example directly benefits the usage of threads. The messages that originated it are not scattered around. Also, if you need to append more information, the message may be edited (depending on the Workspace settings). That’s not just aesthetically pleasing, the main issue is that…

    Every message sent to a channel is a potential disruption

    A channel message may not result in a notification sent to your cellphone or desktop browser, but there are a couple of implications. First, there’s the “unread messages” icon, where the tab favicon turns white. This icon per se can catch someone else’s attention, making them wonder whether their assistance is needed or not. Second, there’s the problem that everybody will have to catch up with all channel messages when they return after being away from the chat. By using threads, the number of channel messages is reduced, making it easier for people to skim through the unread ones, choosing what they need to follow.

    Be careful when using the “also send to #channel” option

    There’s an option to also send the message to channel when replying to a thread. It should be used with care, for the reasons mentioned above: it will generate a channel message that comes with all its implications. It’s fine to use it, for instance, when sending a reminder to a thread that was started a while ago and needs attention from people that might have not seen it. Selecting this option just to “make a point”, showing what are you are answering to people that might not be interested in the thread may sound condescending and should be avoided.

    A thread is a group of related messages

    The main goal of using threads - grouping related messages - facilitates a few use cases. A thread can be, for instance, a support request from another team. After the issue is solved, one can tag it with a checkmark emoji indicating that it was concluded.

    This can either help someone else taking the shift in understanding if any action is needed or an interested third-party to figure if the problem was properly answered/addressed without going through all the messages. Without a thread, it’s hard - impossible in high-traffic channels - to even figure where the conversation ended.

    Threads improve message history significantly

    Another situation greatly improved by threads is when going through the message history, which is especially useful in the paid - and unlimited - Slack version. Either by using the search or going through a link, when finding the relevant thread all the information is in there: the parent message containing the whole context, all the discussion properly indicating where it started and where it ended. The true value of that can be easily seen, for instance, when a link to discussion is attached to a ticket in the issue tracker and accessed months later.

    Closing thoughts

    Threads were invented with a noble goal: to make text-based communication more efficient. Even if it might be tempting to take a shortcut and start typing a response when you see a message in a channel, remember that clicking on the Start/View thread button is a small step for you, but a giant leap for whole chatting experience. By doing that the life quality of everyone that might be involved in a Slack conversation, either at that exact point in time or in a long time in the future, will be greatly improved.