GitHub offers a very superficial view of how GitHub Actions runners are spending their minutes on private repositories. Currently, the only way to get detailed information about it is via the
Get usage report button in the project/organization billing page. The only problem is that the generated report is a CSV file, shifting the responsibility of filtering and visualizing data to the user. While it’s true that most of the users of this report are used to deal with CSV files, be them developers or accountants experts in handling spreadsheets, this is definitely not the most user-friendly way of offering insights into billing data.
When facing this issue, at first I thought about using harelba/q to query the CSV files directly in the command line. The problem is that
q isn’t that straightforward to install, as apparently it is not available via
pip, nor one is able to easily change the data once it’s imported, like in a regular database. In the first time I resorted to create a database on PostgreSQL and import the CSV file into it, but after that I never remember the CSV import syntax and it still requires a daemon running just for that. I kept thinking that there should be a simpler way: what if I use SQLite for that?
In order to not have to
TEXT column whenever working with dates or numbers, the following
schema.sql can be used:
CREATE TABLE billing ( date DATE, product TEXT, repository TEXT, quantity NUMERIC, unity TEXT, price NUMERIC, workflow TEXT, notes TEXT );
After that, it’s possible to import the CSV file with the
sqlite3 CLI tool. The
--skip 1 argument to the
.import command is needed to avoid importing the CSV header as data, given that SQLite considers it to be a regular row when the table already exists:
$ sqlite3 github.db SQLite version 3.36.0 2021-06-18 18:58:49 Enter ".help" for usage hints. sqlite> .read schema.sql sqlite> .mode csv sqlite> .import --skip 1 c2860a05_2021-12-23_01.csv billing sqlite> SELECT COUNT(*) FROM billing; 1834
Now it’s easy to dig into the billing data. In order to have a better presentation,
.mode column can be enabled to both show the column names and align their output. We can, for instance, find out which workflows consumed most minutes in the last week and their respective repositories:
sqlite> .mode column sqlite> SELECT date, repository, workflow, quantity FROM billing WHERE date > date('now', '-7 days') AND product = 'actions' ORDER BY quantity DESC LIMIT 5; date repository workflow quantity ---------- ----------------- ------------------------------------ -------- 2021-12-21 contoso/api .github/workflows/main.yml 392 2021-12-18 contoso/terraform .github/workflows/staging-images.yml 361 2021-12-22 contoso/api .github/workflows/main.yml 226 2021-12-21 contoso/api .github/workflows/qa.yml 185 2021-12-20 contoso/api .github/workflows/main.yml 140
Another important example of the data that can be fetched is the cost per repository in the last week, summing the cost of all their workflows. An
UPDATE statement is required to apply a small data fix, given that the CSV contains a dollar sign
$ in the rows of the
price column that needs to be dropped:
sqlite> UPDATE billing SET price = REPLACE(price, '$', ''); sqlite> SELECT repository, SUM(quantity) * price AS amount FROM billing WHERE date > date('now', '-7 days') AND product = 'actions' GROUP BY repository; repository amount ------------------ ------ contoso/api 11.68 contoso/public-web 0.128 contoso/status 1.184 contoso/terraform 2.92 contoso/webapp 0.6
Not intuitive as a web page where one can just click around to filter and sort a report, but definitely doable. As a side note, one cool aspect of SQLite is that it doesn’t require a file database do be used. If started as
sqlite3, with no arguments, all of it’s storage needs are handled entirely in memory. This makes it even more interesting for data exploration cases like these, offering all of its queries capabilities without ever persisting data to disk.