It’s been 5 months since the last status update. This post contains small snippets of projects I’ve been working on since the last status update (2021-12-15 to 2022-05-15, 151 days).

Radio and DSP

libhackrf bindings for Python

My preferred method for getting samples in and out of SDRs is to just pipe them in the shell. But I thought it might be nice to use libhackrf, the official library to interface with the HackRF One.

I used ctypes to write a thin wrapper around libhackrf. This doesn’t depend on the Python internals, and works on different Python implementations like PyPy. Also as ctypes is a built-in module of Python, my libhackrf wrapper doesn’t have any dependencies.

The functionality is exposed in two ways; a low-level API and a high-level API. The low-level API has the same function names as the C API, and works in exactly the same way. The only work done is to wrap the functions and perform any type conversions that might be necessary between Python and C.

The high-level API builds on top of the low-level one to implement the same functionality in a more idiomatic way using Python classes.

WWVB modulator

WWVB is a radio station in Colorado, US operated by NIST. It transmits a very powerful signal that contains the current time. It can be used by radio clocks to automatically update their time without needing an internet connection.

The format of the transmission is extremely simple, and is documented in great detail on both the Wikipedia page and at the NIST website under WWVB time code format.

I wrote a modulator for the WWVB signal that reads a bit string describing the current date and time, and produces samples that can be transmitted over an SDR or audio. It produces the same result when encoding example bit strings, and sounds just like the actual transmitter on 60 kHz.

As a next step, I want to write a demodulator for WWVB and try to use it to sync my time instead of using NTP or httptime.

numpydoc wrapper

A large number of the tools and libraries I write in Python are contained in a single file. I document them in the same file with docstrings and comments explaining everything. For someone reading the source code, everything is sufficiently documented.

But a lot of the time, I want to generate the project documentation or some API reference as an HTML file so that I can host it on my website. And if the entire code is organized in a single file, I don’t want to split the documentation to another file. Not only is that a chore; it’s also another place to update after touching the code, and may cause the documentation to be outdated if it is not updated at the same time as the code.

The docstring style I use is numpydoc. It is supported by the de-facto documentation tool for Python, Sphinx. But Sphinx expects a certain way of arranging your content. Aside from your code, you need a config file and one or more RST files that contain the actual documentation text.

Since I didn’t want to deal with all that for each project, I wrote a small wrapper for Sphinx. It copies your code file into a temporary directory, creates the config files and other directory structures Sphinx needs, extracts the RST text from code comments and builds it all into an HTML document. Thanks to this tool, my documentation will be a lot higher quality now. The biggest roadblock for me publishing more proper documentation has been lifted.

shinGETsu node

shinGETsu, or 新月 (New Moon), is a P2P anonymous BBS from Japan. It federates posts between the nodes using HTTP requests. The protocol itself is really simple, and has reference implementations in Python and Golang.

To discover the landscape; I first wrote a tool that can discover which nodes are peered with each other, and then drew a map of the whole network with Graphviz.

I implemented the shinGETsu protocol and wrote a node in Python. After running it for a while, I could verify (using the previously mentioned mapping script) that it was indeed peered with a lot of other nodes and could spread naturally on the network without sending explicit peering requests.

The node stores its data in ClickHouse. For now, only posts generated by me are stored in the database, but I’m planning to add support for fetching post discussions from other nodes.

Aside from checking if other nodes are peering with my node, I’ve also confirmed that my posts and threads are propagating to other nodes on the network and persist there even after I shut down my node.

Cryptography

There isn’t a lot of cryptography content for this post, but there are still a few things as it’s been a long time since the last update.

Hash avalanche diagrams

In the status update for September 2021, I mentioned working on a framework to build and evaluate integer hash functions and bit mixers.

The tool I’ve made back then was able to analyze the bias and diffusion of 64-bit to 64-bit mappings. And the analysis was done on a single core, so you either you’d be waiting a longer time between iterations or working with less accurate data.

I’ve rewritten the tool to support arbitrary sizes, even non-matching ones. You can have 64x64 mapping like before, but you can also analyze compression functions like 64x32.

Additionally; the tool now executes mixers on all cores, either getting you the same number of samples faster, or getting you 8x the number of samples in the same time on an 8-core machine.

The new tool has already helped me visualize the behaviour of different parts of hashes and ciphers, and to build my own mixers for specialized purposes. The shortened feedback loop is really handy.

Unbalanced Feistel networks

On the internet there is a huge amount of information on implementing balanced Feistel networks, but comparatively little about unbalanced feistel networks. To get familiar with them, I read some papers and made a sample one that is easy to understand. In the future, it should be a lot easier for me to make one since I have a good reference now.

Database drivers

I needed to interface with different databases for some projects, so I ended up writing some database drivers in Python.

ClickHouse driver

I wrote a ClickHouse client in Python called miniclickhouse. It uses the HTTP API of ClickHouse to implement querying. You can fetch and insert data, rows are read and written as Python dictionaries and encoded as JSON before being sent to the server.

For now it depends on requests, but I’m planning to make it completely dependency-free in the future.

cdb driver

cdb, or constant database is an on-disk key-value store file format designed by D. J. Bernstein. It allows looking up values by string keys using hash tables.

I wrote a small Python library to look up keys in cdb files without reading the whole thing into memory. For now, there is no support for creating new files or modifying existing ones. I might implement write support in the future if I need it.

Machine learning

I want to get better at Machine learning and write some notes / tools about good workflows. Towards this goal, I started playing around with Kaggle competitions. I haven’t won one yet, but I can accumulate experience as I participate in more competitions.

Minetest proxy

In online gaming, there is a concept of “Anarchy Servers”. These are servers where cheating and hacks are expected and encouraged, and has a wildly different play style to normal servers.

Since I didn’t want to hack the game client, or keep a fork up-to-date with game updates, I decided to write a proxy for the game protocol instead. The Minetest protocol is based on UDP.

With this proxy, I am able to perform hacks without touching a single line of the game client code. An additional benefit is that you can connect to the proxy with mobile clients, enabling you to have the same tools on the go.

Twitch and YouTube

I used to make programming livestreams on WatchPeopleCode, YouTube and Twitch around 2015 and 2016. I guess I had too much free time in high school, and after 2016 when I went to university I stopped doing this.

In 2022 I started doing occasional livestreams and putting the recordings on YouTube again. Below are the projects I’ve done live.

Browser fingerprinting with font enumeration

On a previous stream that was not recorded, I started to build the skeleton of a JavaScript project to query various bits of client-side information and hash them in order to create an ID that will be unique without needing cookies.

On this stream, I extended that project by adding code to try a list of common and uncommon fonts and detect if they were installed on the system. The code also detected if someone had changed the default font of their browser.

Don’t worry, I didn’t turn into an evil ad-tech stalker. In fact, I have my own browser extension to plug some of those fingerprinting holes. It is good to make easy-to-understand fingerprinting tools as a way to evaluate browsers, encourage them to plug some of the holes, and to test my anti-fingerprinting extension.

Real-time network activity graph

This project aimed to create a GUI graph that updated based on the current network usage. I used SDL for the graphics, tshark for capturing the packets, and C89 as the language to implement all of this.

The end result was a really smooth graph that scrolled and showed the amount of network activity over time. It looked really nice and would make a good widget.

If I went over the project again; I would probably replace tshark with tcpdump, as that is a more common dependency.

Thanos tool that makes half of your files disappear

This one was recommended to me by a friend, because I wanted to stream something but I didn’t have any bite-sized projects at the moment. The challenge was to “make a tool that Thanos-snaps half of your files out of existence”.

To accomplish this, I made a C89 program that recursively listed all the files under the current directory, stored them in a custom dynamic array, and randomly deleted half of them using a custom random number generator.

You can find the end-result here.

Algorithmic trading

Contrary to the September 2021 status update, this time there is actually some progress regarding Algorithmic trading.

I’ve written a full data pipeline that ingests stock, fund and forex data from different paid and free sources. This data is then stored in ClickHouse, partitioned by the date of price information.

The data is then analyzed in various ways, graphing the Sharpe and Sortino ratios of different stocks and my current portfolio.

I also wrote a crude portfolio optimizer, and the portfolios it generates seem sensible. I have tried some of the recommendations, and they had better returns than expected.

I started writing a backtesting tool for various trading strategies, but it is not complete yet and the output is very human-unfriendly. Perhaps by the next status update I’ll have some pretty graphs to show.