Hackaday

Me, hacking challenging Javascript at work: “Damn, I need a holiday.”

Dan sits at a laptop in a hotel bar, a view of Barcelona out of the window behind him, a beer bottle alongside him.

Me, hacking challenging Javascript on sabbatical: “Ah, so relaxing.”

×

Note #24625

This morning’s actual breakfast order from the 7-year-old: “A sesame seed bagel with honey, unless there aren’t any sesame seed bagels, in which case a plain bagel with honey on one half and jam on the other half, unless there aren’t any plain bagels, in which case a cinnamon and raisin bagel with JimJams on one half and Biscoff on the other half.”

Some day, this boy will make a great LISP programmer. 😂

×

Calculating the Ideal “Sex and the City” Polycule

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

I’ve never been even remotely into Sex and the City. But I can’t help but love that this developer was so invested in the characters and their relationships that when he asked himself “couldn’t all this drama and heartache have been simplified if these characters were willing to consider polyamorous relationships rather than serial monogamy?”1, he did the maths to optimise his hypothetical fanfic polycule:

Juan Pablo Sarmiento

As if his talk at !!Con 2024 wasn’t cool enough, he open-sourced the whole thing, so you’re free to try the calculator online for yourself or expand upon or adapt it to your heart’s content. Perhaps you disagree with his assessment of the relative relationship characteristics of the characters2: tweak them and see what the result is!

Or maybe Sex and the City isn’t your thing at all? Well adapt it for whatever your fandom is! How I Met Your Mother, Dawson’s CreekMamma Mia and The L-Word were all crying out for polyamory to come and “fix” them3.

Perhaps if you’re feeling especially brave you’ll put yourself and your circles of friends, lovers, metamours, or whatever into the algorithm and see who it matches up. You never know, maybe there’s a love connection you’ve missed! (Just be ready for the possibility that it’ll tell you that you’re doing your love life “wrong”!)

Footnotes

1 This is a question I routinely find myself asking of every TV show that presents a love triangle as a fait accompli resulting from an even moderately-complex who’s-attracted-to-whom.

2 Clearly somebody does, based on his commit “against his will” that increases Carrie and Big’s validatesOthers scores and reduces Big’s prioritizesKindness.

3 I was especially disappointed with the otherwise-excellent The L-Word, which did have a go at an ethical non-monogamy storyline but bungled the “ethical” at every hurdle while simultaneously reinforcing the “insatiable bisexual” stereotype. Boo! Anyway: maybe on my next re-watch I’ll feed some numbers into Juan’s algorithm and see what comes out…

AI posts on social media are the chicken nuggets of human interaction

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Perhaps inspired by my resharing of Thomas‘s thoughts about the biggest problem in AI (tl;dr: he thinks it’s nomenclature; I agree that’s a problem but I don’t know if it’s the biggest issue), Ruth posted some thoughts to LinkedIn that I think are quite well-put:

I was going to write about something else but since LinkedIn suggested I should get AI to do it for me, here’s where I currently stand on GenAI.

As a person working in computing, I view it as a tool that is being treated as a silver bullet and is probably self-limiting in its current form. By design, it produces average code. Most companies prior to having access to cheap average code would have said they wanted good code. Since the average code produced by the tools is being fed back into those tools, mathematically this can’t lead anywhere good in terms of quality.

However, as a manager in tech I’m really alarmed by it. If we have tools to write code that is ok but needs a lot of double checking, we might be tempted to stop hiring people at that level. There already aren’t enough jobs for entry level programmers to feed the talent pipeline, and this is likely to make it worse. I’m not sure where the next generation of great programmers are supposed to come from if we move to an ecosystem where the junior roles are replaced by Copilot.

I think there’s a lot of potential for targeted tools to speed up productivity. I just don’t think GenAI is where they should come from.

This is an excellent explanation of no fewer than four of the big problems with “AI” as we’re seeing it marketed today:

  1. It produces mediocre output, (more on that below!)
  2. It’s a snake that eats its own tail,
  3. It’s treated as a silver bullet, and
  4. By pricing out certain types of low-tier knowledge work, it damages the pipeline for training higher-tiers of those knowledge workers (e.g. if we outsource all first-level tech support to chatbots, where will the next generation of third-level tech support come from, if they can’t work their way up the ranks, learning as they go?)

Let’s stop and take a deeper look at the “mediocre output” claim. Ruth’s right, but if you don’t already understand why generative AI does this, it’s worth a little bit of consideration about the reason for it… and the consequences of it:

Mathematically-speaking, that’s exactly what you would expect for something that is literally statistically averaging content, but that still comes as a surprise to people.

Bear in mind, of course, that there are plenty of topics in which the average person is less-knowledgable than the average of the content that was made available to the model. For example, I know next to noting about fertiliser application in large-scale agriculture. ChatGPT has doubtless ingested a lot of literature about it, and if I ask it what fertiliser I should use for a field of black beans in silty soil in the UK, it delivers me a confident-sounding answer:

Who knows if this answer is right, of course! If the answer mattered to me – because I was about to drill my field – I’d have to do my own research to check, by which point I might as well have just done the research in the first place. If all I cared about was a quick sense-check to an answer I already knew, and it didn’t matter too much, this might be okay output. (It’s pretty verbose and repeats itself a lot, like it’s learned how to talk from YouTube tutorials: I’m surprised it didn’t finish by exhorting me to like and subscribe!)

When LLMs produce exceptional output (I use the term exceptional in the sense of unusual and not-average, not to mean “good”), it appears more-creative and interesting but is even more-likely to be riddled with fanciful hallucinations.

There’s a fine line in getting the creativity dial set just right, and even when you do there’s no guarantee of accuracy, but the way in which many chatbots are told to talk makes them sound authoritative on basically every subject. When you know it’s lying, that’s easy. But people don’t always use LLMs for subjects they’re knowledgeable about!

I asked ChatGPT to define five words for me. Two (“quantifiable” and “ethology”) are real words that somebody might have trouble with. One (“no cap”) is a slang term. One (“erinaceiophobia” is a logically-sound construction from the Latin name for the biological family that hedgehogs belong to and the Greek suffix that’s applied to irrational fears). ChatGPT came up with perfectly reasonable definitions of all of these. But it also confidently defined “undercontrastised”, a word I made up and which I can’t find used anywhere at all!

In my example above, a more-useful robot would have stated that it didn’t know the answer to the question rather than, y’know, lying. But the nature of the statistical models used by LLMs means that they can’t know what they don’t know: they don’t have a “known unknowns” space.

Regarding the “damages the training pipeline”: I’m undecided on whether or not I agree with Ruth. She might be on to something there, but I’m not sure. Needs more thought before I commit to an opinion on that one.

Ruth followed-up to say:

Oh, and an addendum to this – as a human, I find the proliferation of AI tools in spaces that are all about creating connections with other humans deeply concerning. I saw a lot of job applications through Otta at my previous role, and they were all kind of the same – I had no sense of the person behind the averaged out CV I was looking at. We already have a huge problem with people presenting inauthentic versions of themselves on social media which makes it harder to have genuine interactions, smoothing off the rough edges of real people to get something glossy and processed is only going to make this worse.

AI posts on social media are the chicken nuggets of human interaction and I’d rather have something real every time.

Emphasis mine… because that’s a fantastic metaphor. Content generated where a generative AI is trying to “look human” are so-often bland, flat, and unexciting: a mass-produced most-basic form of social sustenance. So yeah: chicken nuggets.

Photo of chicken nuggets with
Ironically, I might’ve gotten a better picture here if I’d asked AI to draw this for me, because I couldn’t find any really unappetising-looking McDonalds-grade chicken nuggets on the stock photography site I used.
× × ×

Roll Your Own Antispam

Three Rings operates a Web contact form to help people get in touch with us: the idea is that it provides a quick and easy way to reach out if you’re a charity who might be able to make use of the system, a user who’s having difficulty with the features of the software, or maybe a potential new volunteer willing to give your time to the project.

But then the volume of spam it received increased dramatically. We don’t want our support team volunteers to spend all their time categorising spam: even if it doesn’t take long, it’s demoralising. So what could we do?

Clearly-spammy message shown in a ticket management system.
It’s clearly spam, but if it takes you 2 seconds to categorise it and there are 30 in your Inbox, that’s still a drag.

Our conventional antispam tools are configured pretty liberally: we don’t want to reject a contact from a legitimate user just because their message hits lots of scammy keywords (e.g. if a user’s having difficulty logging in and has copy-pasted all of the error messages they received, that can look a lot like a password reset spoofing scam to a spam filter). And we don’t want to add a CAPTCHA, because not only do those create a barrier to humans – while not necessarily reducing spam very much, nowadays – they’re often terrible for accessibility, privacy, or both.

But it didn’t take much analysis to spot some patterns unique to our contact form and the questions it asks that might provide an opportunity. For example, we discovered that spam messages would more-often-than-average:

  • Fill in both the “name” and (optional) “Three Rings username” field with the same value. While it’s cetainly possible for Three Rings users to have a login username that’s identical to their name, it’s very rare. But automated form-fillers seem to disproportionately pair-up these two fields.
  • Fill the phone number field with a known-fake phone number or a non-internationalised phone number from a country in which we currently support no charities. Legitimate non-UK contacts tend to put international-format phone numbers into this optional field, if they fill it at all. Spammers often put NANP (North American Numbering Plan) numbers.
  • Include many links in the body of the message. A few links, especially if they’re to our services (e.g. when people are asking for help) is not-uncommon in legitimate messages. Many links, few of which point to our servers, almost certainly means spam.
  • Choose the first option for the choose -one question “how can we help you?” Of course real humans sometimes pick this option too, but spammers almost always choose it.

None of these characteristics alone, or any of the half dozen or so others we analysed (including invisible checks like honeypots and IP-based geofencing), are reason to suspect a message of being spam. But taken together, they’re almost a sure thing.

To begin with, we assigned scores to each characteristic and automated the tagging of messages in our ticketing system with these scores. At this point, we didn’t do anything to block such messages: we were just collecting data. Over time, this allowed us to find a safe “threshold” score above which a message was certainly spam.

Even when a message fails our customised spam checks, we only ‘soft-block’ it: telling the user their message was rejected and providing suggestions on working around that or emailing us conventionally. Our experience shows that the spammers aren’t willing to work to overcome this additional hurdle, but on the very rare ocassion a human hits them, they are.

Once we’d found our threshold we were able to engage a soft-block of submissions that exceeded it, and immediately the volume of spam making it to the ticketing system dropped considerably. Under 70 lines of PHP code (which sadly I can’t share with you) and we reduced our spam rate by over 80% while having, as far as we can see, no impact on the false-positive rate.

Where conventional antispam solutions weren’t quite cutting it, implementing a few rules specific to our particular use-case made all the difference. Sometimes you’ve just got to roll your sleeves up and look at the actual data you do/don’t want, and adapt your filters accordingly.

× ×

So… I’m A Podcast

Duration

Podcast Version

This post is also available as a podcast. Listen here, download for later, or subscribe wherever you consume podcasts.

Observant readers might have noticed that some of my recent blog posts – like the one about special roads, my idea for pressure-cooking tea, and the one looking at the history of window tax in two countries1 – are also available as podcast.

Why?

Like my occasional video content, this isn’t designed to replace any of my blogging: it’s just a different medium for those that might prefer it.

For some stories, I guess that audio might be a better way to find out what I’ve been thinking about. Just like how the vlog version of my post about my favourite video game Easter Egg might be preferable because video as a medium is better suited to demonstrating a computer game, perhaps audio’s the right medium for some of the things I write about, too?

But as much as not, it’s just a continuation of my efforts to explore different media over which a WordPress blog can be delivered2. Also, y’know, my ongoing effort to do what I’m bad at in the hope that I might get better at a wider diversity of skills.

How?

Let’s start by understanding what a “podcast” actually is. It is, in essence, just an RSS feed (something you might have heard me talk about before…) with audio enclosures – basically, “attachments” – on each item. The idea was spearheaded by Dave Winer back in 2001 as a way of subscribing to rich media like audio or videos in such a way that slow Internet connections could pre-download content so you didn’t have to wait for it to buffer.3

Mapping of wp-admin metadata fields to parts of a podcast feed.
Podcasts are pretty simple, even after you’ve bent over backwards to add all of the metadata that Apple Podcasts (formerly iTunes) expects to see. I looked at a couple of WordPress plugins that claimed to be able to do the work for me, but eventually decided it was simple enough to just add some custom metadata fields that could then be included in my feeds and tweak my theme code a little.

Here’s what I had to do to add podcasting capability to my theme:

The tag

I use a post tag, dancast, to represent posts with accompanying podcast content4. This way, I can add all the podcast-specific metadata only if the user requests the feed of that tag, and leave my regular feeds untampered . This means that you don’t get the podcast enclosures in the regular subscription; that might not be what everybody would want, but it suits me to serve podcasts only to people who explicitly ask for them.

It also means that I’m able to use a template, tag-dancast.php, in my theme to generate a customised page for listing podcast episodes.

The feed

Okay, onto the code (which I’ve open-sourced over here). I’ve use a series of standard WordPress hooks to add the functionality I need. The important bits are:

  1. rss2_item – to add the <enclosure>, <itunes:duration>, <itunes:image>, and <itunes:explicit> elements to the feed, when requesting a feed with my nominated tag. Only <enclosure> is strictly required, but appeasing Apple Podcasts is worthwhile too. These are lifted directly from the post metadata.
  2. the_excerpt_rss – I have another piece of post metadata in which I can add a description of the podcast (in practice, a list of chapter times); this hook swaps out the existing excerpt for my custom one in podcast feeds.
  3. rss_enclosure – some podcast syndication platforms and players can’t cope with RSS feeds in which an item has multiple enclosures, so as a safety precaution I strip out any enclosures that WordPress has already added (e.g. the featured image).
  4. the_content_feed – my RSS feed usually contains the full text of every post, because I don’t like feeds that try to force you to go to the original web page5 and I don’t want to impose that on others. But for the podcast feed, the text content of the post is somewhat redundant so I drop it.
  5. rss2_ns – of critical importance of course is adding the relevant namespaces to your XML declaration. I use the itunes namespace, which provides the widest compatibility for specifying metadata, but I also use the newer podcast namespace, which has growing compatibility and provides some modern features, most of which I don’t use except specifying a license. There’s no harm in supporting both.
  6. rss2_head – here’s where I put in the metadata for the podcast as a whole: license, category, type, and so on. Some of these fields are effectively essential for best support.

You’re welcome, of course, to lift any of all of the code for your own purposes. WordPress makes a perfectly reasonable platform for podcasting-alongside-blogging, in my experience.

What?

Finally, there’s the question of what to podcast about.

My intention is to use podcasting as an alternative medium to my traditional blog posts. But not every blog post is suitable for conversion into a podcast! Ones that rely on images (like my post about dithering) aren’t a great choice. Ones that have lots of code that you might like to copy-and-paste are especially unsuitable.

Dan, a microphone in front of him, smiles at the camera.
You’re listening to Radio Dan. 100% Dan, 100% of the time.(Also I suppose you might be able to hear my dog snoring in the background…)

Also: sometimes I just can’t be bothered. It’s already some level of effort to write a blog post; it’s like an extra 25% effort on top of that to record, edit, and upload a podcast version of it.

That’s not nothing, so I’ve tended to reserve podcasts for blog posts that I think have a sort-of eccentric “general interest” vibe to them. When I learn something new and feel the need to write a thousand words about it… that’s the kind of content that makes it into a podcast episode.

Which is why I’ve been calling the endeavour “a podcast nobody asked for, about things only Dan Q cares about”. I’m capable of getting nerdsniped easily and can quickly find my way down a rabbit hole of learning. My podcast is, I guess, just a way of sharing my passion for trivial deep dives with the rest of the world.

My episodes are probably shorter than most podcasts: my longest so far is around fifteen minutes, but my shortest is only two and a half minutes and most are about seven. They’re meant to be a bite-size alternative to reading a post for people who prefer to put things in their ears than into their eyes.

Anyway: if you’re not listening already, you can subscribe from here or in your favourite podcasting app. Or you can just follow my blog as normal and look for a streamable copy of podcasts at the top of selected posts (like this one!).

Footnotes

1 I’ve also retroactively recorded a few older ones. Have a look/listen!

2 As well as Web-based non-textual content like audio (podcasts) and video (vlogs), my blog is wholly or partially available over a variety of more-exotic protocols: did you find me yet on Gemini (gemini://danq.me/), Spartan (spartan://danq.me/), Gopher (gopher://danq.me/), and even Finger (finger://danq.me/, or run e.g. finger blog@danq.me from your command line)? Most of these are powered by my very own tool CapsulePress, and I’m itching to try a few more… how about a WordPress blog that’s accessible over FTP, NNTP, or DNS? I’m not even kidding when I say I’ve got ideas for these…

3 Nowadays, we have specialised media decoder co-processors which reduce the size of media files. But more-importantly, today’s high-speed always-on Internet connections mean that you probably rarely need to make a conscious choice between streaming or downloading.

4 I actually intended to change the tag to podcast when I went-live, but then I forgot, and now I can’t be bothered to change it. It’s only for my convenience, after all!

5 I’m very grateful that my favourite feed reader makes it possible to, for example, use a CSS selector to specify the page content it should pre-download for you! It means I get to spend more time in my feed reader.

× × ×

Where Should Visual Programming Go?

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Level 3: Diagrams are code

This is what the endgame should be IMO. Some things are better represented as text. Some are best understood visually. We should mix and match what works best on a case-by-case basis. Don’t try to visualize simple code. Don’t try to write code where a diagram is better.

One of the attempts was Luna. They tried dual representation: everything is code and diagram at the same time, and you can switch between the two:

But this way, you are not only getting benefits of both ways, you are also constrained by both text and visual media at the same time. You can’t do stuff that’s hard to visualize (loops, recursions, abstractions) AND you can’t do stuff that’s hard to code.

Interesting thoughts from Niki (and from Sebastian Bensusan) on how diagrams and code might someday be intertwined as first class citizens (but not in the gross ways you might have come across in the past when people have tried to sell you on “visual programming”).

As Niki wrote about what he calls levels 2 and 3 of the concept – in which diagrams and code are intrinsically linked I found myself thinking about Twine, a programming language (or framework? or tool?… not sure how best to describe or define it!) intended for making interactive “choose your own adventure”-style hypertext fiction.

Screenshot showing the Twine 2 IDE, with a story map alongside a scene description.

Twine’s sort-of a level 2 implementation of visual programming: the code (scene descriptions) is mostly what’s responsible for feeding the diagram. But that’s not entirely true: it’s possible to create new nodes in your story graph in a completely visual way, and then dip into them to edit their contents and imply how they link to others.

It’s possible that the IF engine community – who are working to lower the barriers to programming in order to improve accessibility to people who are fiction authors first, developers second – are ahead of the curve in the area of visual programming. Consider for example how Inform’s automated test framework graphs the permutations you (or your human testers) try, and allow you to “bless” (turn into assertions) the results so that regression testing becomes visually automated affair:

Inform 7's IDE, showing regression testing using the visual tree of the sample game Onyx.

× × ×

The Elegance of the ASCII Table

Duration

Podcast Version

This post is also available as a podcast. Listen here, download for later, or subscribe wherever you consume podcasts.

If you’ve been a programmer or programming-adjacent nerd1 for a while, you’ll have doubtless come across an ASCII table.

An ASCII table is useful. But did you know it’s also beautiful and elegant.

Frames from the scene in The Martian where Mark Watney discovers Beth Johanssen's ASCII table.
Even non-programmer-adjacent nerds may have a cultural awareness of ASCII thanks to books and films like The Martian2.
ASCII‘s still very-much around; even if you’re transmitting modern Unicode3 the most-popular encoding format UTF-8 is specifically-designed to be backwards-compatible with ASCII! If you decoded this page as ASCII you’d get the gist of it… so long as you ignored the garbage characters at the end of this sentence! 😁

History

ASCII was initially standardised in X3.4-1963 (which just rolls off the tongue, doesn’t it?) which assigned meanings to 100 of the potential 128 codepoints presented by a 7-bit4 binary representation: that is, binary values 0000000 through 1111111:

Scan of a X3.4-1963 ASCII table.
Notably absent characters in this first implementation include… the entire lowercase alphabet! There’s also a few quirks that modern ASCII fans might spot, like the curious “up” and “left” arrows at the bottom of column 101____ and the ACK and ESC control codes in column 111____.

If you’ve already guessed where I’m going with this, you might be interested to look at the X3.4-1963 table and see that yes, many of the same elegant design choices I’ll be talking about later already existed back in 1963. That’s really cool!

Table

In case you’re not yet intimately familiar with it, let’s take a look at an ASCII table. I’ve colour-coded some of the bits I think are most-beautiful:

ASCII table with Decimal, Hex, and Character columns.That table only shows decimal and hexadecimal values for each character, but we’re going to need some binary too, to really appreciate some of the things that make ASCII sublime and clever.

Control codes

The first 32 “characters” (and, arguably, the final one) aren’t things that you can see, but commands sent between machines to provide additional instructions. You might be familiar with carriage return (0D) and line feed (0A) which mean “go back to the beginning of this line” and “advance to the next line”, respectively5. Many of the others don’t see widespread use any more – they were designed for very different kinds of computer systems than we routinely use today – but they’re all still there.

32 is a power of two, which means that you’d rightly expect these control codes to mathematically share a particular “pattern” in their binary representation with one another, distinct from the rest of the table. And they do! All of the control codes follow the pattern 00_____: that is, they begin with two zeroes. So when you’re reading 7-bit ASCII6, if it starts with 00, it’s a non-printing character. Otherwise it’s a printing character.

Not only does this pattern make it easy for humans to read (and, with it, makes the code less-arbitrary and more-beautiful); it also helps if you’re an ancient slow computer system comparing one bit of information at a time. In this case, you can use a decision tree to make shortcuts.

Two rolls of punched paper tape.
That there’s one exception in the control codes: DEL is the last character in the table, represented by the binary number 1111111. This is a historical throwback to paper tape, where the keyboard would punch some permutation of seven holes to represent the ones and zeros of each character. You can’t delete holes once they’ve been punched, so the only way to mark a character as invalid was to rewind the tape and punch out all the holes in that position: i.e. all 1s.

Space

The first printing character is space; it’s an invisible character, but it’s still one that has meaning to humans, so it’s not a control character (this sounds obvious today, but it was actually the source of some semantic argument when the ASCII standard was first being discussed).

Putting it numerically before any other printing character was a very carefully-considered and deliberate choice. The reason: sorting. For a computer to sort a list (of files, strings, or whatever) it’s easiest if it can do so numerically, using the same character conversion table as it uses for all other purposes7. The space character must naturally come before other characters, or else John Smith won’t appear before Johnny Five in a computer-sorted list as you’d expect him to.

Being the first printing character, space also enjoys a beautiful and memorable binary representation that a human can easily recognise: 0100000.

Numbers

The position of the Arabic numbers 0-9 is no coincidence, either. Their position means that they start with zero at the nice round binary value 0110000 (and similarly round hex value 30) and continue sequentially, giving:

Binary Hex Decimal digit (character)
011 0000 30 0
011 0001 31 1
011 0010 32 2
011 0011 33 3
011 0100 34 4
011 0101 35 5
011 0110 36 6
011 0111 37 7
011 1000 38 8
011 1001 39 9

The last four digits of the binary are a representation of the value of the decimal digit depicted. And the last digit of the hexadecimal representation is the decimal digit. That’s just brilliant!

If you’re using this post as a way to teach yourself to “read” binary-formatted ASCII in your head, the rule to take away here is: if it begins 011, treat the remainder as a binary representation of an actual number. You’ll probably be right: if the number you get is above 9, it’s probably some kind of punctuation instead.

Shifted Numbers

Subtract 0010000 from each of the numbers and you get the shifted numbers. The first one’s occupied by the space character already, which is a shame, but for the rest of them, the characters are what you get if you press the shift key and that number key at the same time.

“No it’s not!” I hear you cry. Okay, you’re probably right. I’m using a 105-key ISO/UK QWERTY keyboard and… only four of the nine digits 1-9 have their shifted variants properly represented in ASCII.

That, I’m afraid, is because ASCII was based not on modern computer keyboards but on the shifted positions of a Remington No. 2 mechanical typewriter – whose shifted layout was the closest compromise we could find as a standard at the time, I imagine. But hey, you got to learn something about typewriters today, if that’s any consolation.

A Remington Portable No. 3 typewriter.
Bonus fun fact: early mechanical typewriters omitted a number 1: it was expected that you’d use the letter I. That’s fine for printed work, but not much help for computer-readable data.

Letters

Like the numbers, the letters get a pattern. After the @-symbol at 1000000, the uppercase letters all begin 10, followed by the binary representation of their position in the alphabet. 1 = A = 1000001, 2 = B = 1000010, and so on up to 26 = Z = 1011010. If you can learn the numbers of the positions of the letters in the alphabet, and you can count in binary, you now know enough to be able to read any ASCII uppercase letter that’s been encoded as binary8.

And once you know the uppercase letters, the lowercase ones are easy too. Their position in the table means that they’re all exactly 0100000 higher than the uppercase variants; i.e. all the lowercase letters begin 11! 1 = a = 1100001, 2 = b = 1100010, and 26 = z = 1111010.

If you’re wondering why the uppercase letters come first, the answer again is sorting: also the fact that the first implementation of ASCII, which we saw above, was put together before it was certain that computer systems would need separate character codes for upper and lowercase letters (you could conceive of an alternative implementation that instead sent control codes to instruct the recipient to switch case, for example). Given the ways in which the technology is now used, I’m glad they eventually made the decision they did.

Beauty

There’s a strange and subtle charm to ASCII. Given that we all use it (or things derived from it) literally all the time in our modern lives and our everyday devices, it’s easy to think of it as just some arbitrary encoding.

But the choices made in deciding what streams of ones and zeroes would represent which characters expose a refined logic. It’s aesthetically pleasing, and littered with historical artefacts that teach us a hidden history of computing. And it’s built atop patterns that are sufficiently sophisticated to facilitate powerful processing while being coherent enough for a human to memorise, learn, and understand.

Footnotes

1 Programming-adjacent? Yeah. For example, geocachers who’ve ever had to decode a puzzle-geocache where the coordinates were presented in binary (by which I mean: a binary representation of ASCII) are “programming-adjacent nerds” for the purposes of this discussion.

2 In both the book and the film, Mark Watney divides a circle around the recovered Pathfinder lander into segments corresponding to hexadecimal digits 0 through F to allow the rotation of its camera (by operators on Earth) to transmit pairs of 4-bit words. Two 4-bit words makes an 8-bit byte that he can decode as ASCII, thereby effecting a means to re-establish communication with Earth.

3 Y’know, so that you can type all those emoji you love so much.

4 ASCII is often thought of as an 8-bit code, but it’s not: it’s 7-bit. That’s why virtually every ASCII message you see starts every octet with a zero. 8-bits is a convenient number for transmission purposes (thanks mostly to being a power of two), but early 8-bit systems would be far more-likely to use the 8th bit as a parity check, to help detect transmission errors. Of course, there’s also nothing to say you can’t just transmit a stream of 7-bit characters back to back!

5 Back when data was sent to teletype printers these two characters had a distinct different meaning, and sometimes they were so slow at returning their heads to the left-hand-side of the paper that you’d also need to send a few null bytes e.g. 0D 0A 00 00 00 00 to make sure that the print head had gotten settled into the right place before you sent more data: printers didn’t have memory buffers at this point! For compatibility with teletypes, early minicomputers followed the same carriage return plus line feed convention, even when outputting text to screens. Then to maintain backwards compatibility with those systems, the next generation of computers would also use both a carriage return and a line feed character to mean “next line”. And so, in the modern day, many computer systems (including Windows most of the time, and many Internet protocols) still continue to use the combination of a carriage return and a line feed character every time they want to say “next line”; a redundancy build for a chain of backwards-compatibility that ceased to be relevant decades ago but which remains with us forever as part of our digital heritage.

6 Got 8 binary digits in front of you? The first digit is probably zero. Drop it. Now you’ve got 7-bit ASCII. Sorted.

7 I’m hugely grateful to section 13.8 of Coded Character Sets, History and Development by Charles E. Mackenzie (1980), the entire text of which is available freely online, for helping me to understand the importance of the position of the space character within the ASCII character set. While most of what I’ve written in this blog post were things I already knew, I’d never fully grasped its significance of the space character’s location until today!

8 I’m sure you know this already, but in case you’re one of today’s lucky 10,000 to discover that the reason we call the majuscule and minuscule letters “uppercase” and “lowercase”, respectively, dates to 19th century printing, when moveable type would be stored in a box (a “type case”) corresponding to its character type. The “upper” case was where the capital letters would typically be stored.

× × × × ×

Building a Secret Cabinet

We’ve recently had the attics of our house converted, and I moved my bedroom up to one of the newly-constructed rooms.

To make the space my own, I did a little light carpentry up there: starting with a necessary reshaping of the doors, then moving on to shelving and eventually… a secret cabinet!

I’d love to tell you about how I built it: but first, a disclaimer! I am a software engineer, and with good reason. Letting me near a soldering iron is ill-advised. Letting me use a table saw is tempting fate.

Letting me teach you anything about how you should use a soldering iron or a table saw is, frankly, asking for trouble.

A spirit level on an unfinished shelf, under a window and beneath an uncarpeted floor.
Knowing that I’d been short on shelf space in my old bedroom, I started work on fitting shelves for my new bedroom before the carpet had even arrived.

Building a secret cabinet wasn’t part of my plan, but came about naturally after I got started. I’d bought a stack of pine planks and – making use of Ruth’s table saw – cut them to squarely fit beneath each of the two dormer windows1. While sanding and oiling the wood I realised that I had quite a selection of similarly-sized offcuts and found myself wondering if I could find a use for them.

Dan drinks a 0% alcohol beer in front of three upright planks of wood.
The hardest part of sanding and oiling wood on the hottest day of the year is all the beer breaks you have to take. Such a drag.

I figured I had enough lumber to insert a small cabinet into one of the bookshelves, and that got me thinking… what about if it were a secret cabinet, disguised as books unless you knew where to look. Or to go one step further: what if it had some kind of electronic locking mechanism that could be triggered from somewhere else in the room2.

Magpie decal 'perched' on a light switch.
There are other ways in which I’ve made my new room distinctly-“mine” – like the pair of magpies – but probably the secret cabinet is the most-distinctive.

Not wanting to destroy a stack of real books, which is the traditional way to get a collection of book spines for the purpose of decorating a “fake bookshelf” panel3, I looked online and discovered the company that made the fake book spines used at the shop of my former employer. They looked ideal: carefully shaped and painted panels with either an old-school or contemporary look.

Buuut, they don’t seem to be well-equipped for short runs and are doubtless pricey, so I looked elsewhere and found the eBay presence of Beatty Lockey Antiques in Loewstof. They’d acquired a stack of them second-hand from the set of Netflix’s The School for Good and Evil.4

(By the way: at time of writing they’ve still got a few panels left, if you want to make your own…)

I absolutely must sing the praises of Brad at Beatty Lockey Antiques who, after the first delivery of fake book fronts was partially-damaged in transit, was super quick about helping me find the closest-available equivalent (I’d already measured-up based on the one I’d thought I was getting) and sent a replacement.

The cabinet is just a few bits of wood glued together and reinforced with L-shaped corner braces, with a trio of thin strips – made from leftover architrave board – attached using small brass hinges. The fake book fronts are stuck to the strips using double-sided mounting tape left over from installing a bathroom mirror. A simple magnetic clasp holds the door shut when pushed closed5, and the hinges are inclined to “want” the door to stand half-open, which means it only needs a gentle push away from the magnetic catch to swing it open.

Circuit diagram showing a Raspberry Pi Zero W connected to two relays, each connecting 12V DC to a latch solenoid.
The wiring is uncomplicated enough that even I – a self-confessed software engineer – could manage it. Note the separate power supply: those solenoids can draw a full 1 amp in a “surge” that’s enough to give a little Raspberry Pi Zero a Bad Day if you try to power it directly from the computer (there might be some capacitor-based black magic that I don’t understand that could have made this easier, I suppose)!

I mounted a Raspberry Pi Zero W into a rear corner inside the cabinet6, and wired it up via a relay to what was sold to me as a “large push-pull solenoid”, then began experimenting with the position in which I’d need to mount it to allow it to “kick” open the door, against the force of the magnetic clasp7.

This was, amazingly, the hardest part of the whole project! Putting the solenoid too close to the door didn’t work: it couldn’t “push” it from a standing start. Too far away, and the natural give of the door took the strain without pushing it open. Just the right distance, and the latch had picked up enough momentum that its weight “kicked” the door away from the magnet and followed-through to ensure that it kept moving.

A second solenoid, mounted inside the top of the cabinet, slides into the “loop” part of a large bolt fitting, allowing the cabinet to be electronically “locked”.

A Raspberry Pi Zero, relay, and solenoid assembly on the bottom outside edge of the inside of the cabinet.
I seriously must’ve spent about an hour getting the position of that little “kicker” in the bottom right just right.

Next up came the software. I started with a very simple Python program8 that would run a webserver and, on particular requests, open the lock solenoid and push with the “kicker” solenoid.

#!/usr/bin/python
#
# a basic sample implementation of a web interface for a secret cabinet
#
# setup:
#   sudo apt install -y python3-flask
#   wget https://github.com/sbcshop/Zero-Relay/blob/master/pizero_2relay.py
#
# running:
#   sudo flask --app web run --host=0.0.0.0 --port 80

from flask import Flask, redirect, url_for
import pizero_2relay as pizero
from time import sleep

# set up pizero_2relay with the two relays attached to this Pi Zero:
r1 = pizero.relay("R1") # The "kicker" relay
r2 = pizero.relay("R2") # The "locking bolt" relay

app = Flask(__name__)

# GET / - nothing here
@app.route("/")
def index():
  return "Nothing to see here."

# GET /relay - show a page with "open" and "lock" links
@app.route("/relay")
def relay():
  return "<html><head><meta name='viewport' content='width=device-width, initial-scale=1'></head><body><ul><li><a href='/relay/open'>Open</a></li><li><a href='/relay/lock'>Lock</a></li></ul>"

# GET /relay/open - open the secret cabinet then return to /relay
# This ought to be a POST request in your implementation, and you probably
# want to add some security e.g. a 
@app.route("/relay/open")
def open():
  # Retract the lock:
  r2.off()
  sleep(0.5)
  # Fire the kicker twice:
  r1.on()
  sleep(0.25)
  r1.off()
  sleep(0.25)
  r1.on()
  sleep(0.25)
  r1.off()
  # Redirect back:
  return redirect(url_for('relay'))

@app.route("/relay/lock")
def lock():
  # Engage the lock:
  r2.on()
  return redirect(url_for('relay'))
Don’t use this code as-is on any kind of open network, obviously. Follow the comments for some tips on what you’ll need to change.

Once I had something I could trigger from a web browser or with curl, I could start experimenting with trigger mechanisms. I had a few ideas (and prototyped a couple of them), including:

  • A mercury tilt switch behind a different book, so you pull it to release the cabinet in the style of a classic movie secret door.
  • A microphone that listens for a specific pattern of knocks on a nearby surface.
I had far too much fun playing about with crappy prototypes.
  • An RFID reader mounted underneath another surface, and a tag on the underside of an ornament: moving the ornament to the “right” place on the surface triggers the cabinet (elsewhere in the room).
  • The current design, shown in the video above, where a code9 is transmitted to the cabinet for verification.

I think I’m happy with what I’ve got going on with it now. And it’s been a good opportunity to improve my carpentry, electronics, and Python.

Footnotes

1 The two dormer windows, wouldn’t you guarantee it, were significantly different widths despite each housing a window of the same width. Such are the quirks of extending a building that the previous occupier had previously half-heartedly tried to extend, I guess.

2 Why yes, I am a big fan of escape rooms. Why do you ask?

3 For one thing, I live with JTA, and I’m confident that he’d somehow be able to hear the silent screams of whatever trashy novels I opted to sacrifice for the good of the project.

4 As a bonus, my 10-year-old is a big fan of the book series that inspired the film (and a more-muted fan of the film itself) and she was ever-so excited at my project using real-life parts of the set of the movie… that she’s asked me to make a similar secret cabinet for her, when we get around to redecorating her room later in the year!

5 If I did it again, I might consider using a low-powered electromagnetic lock to hold the door shut. In this design, I used a permanent magnet and a pair of latch solenoids: one to operate a bolt, the second to “kick” the door open against the pull of the magnet, and… it feels a little clumsier than a magnetic lock might’ve.

6 That double-sided mounting tape really came in handy for this project!

7 Props to vlogger Technology Connections, one of whose excellent videos on the functionality of 1970s pinball tables – maybe this one? – taught me what a latch solenoid was in the first place, last year, which probably saved me from the embarrassment of trying to do this kind of thing with, I don’t know, a stepper motor or something.

8 I’m not a big fan of Python normally, but the people who made my relays had some up with a convenience library for them that was written in it, so I figured it would do.

9 Obviously the code isn’t A-B; I changed it temporarily for the video.

× × × × ×

Reversing Sandwiches

Fixing Sandwiches

RotatingSandwiches.com is a website showcasing animated GIF files of rotating sandwiches1. But it’s got a problem: 2 of the 51 sandwiches rotate the wrong way. So I’ve fixed it:

The Eggplant Parm Sub is one of two sandwiches whose rotation doesn’t match the rest.

My fix is available as a userscript on GreasyFork, so you can use your favourite userscript manager2 to install it and the rotation will be fixed for you too. Here’s the code (it’s pretty simple):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    // ==UserScript==
    // @name        Standardise sandwich rotation on rotatingsandwiches.com
    // @namespace   rotatingsandwiches.com.danq.me
    // @match       https://rotatingsandwiches.com/*
    // @grant       GM_addStyle
    // @version     1.0
    // @author      Dan Q <https://danq.me/>
    // @license     The Unlicense / Public Domain
    // @description Some sandwiches on rotatingsandwiches.com rotate in the opposite direction to the majority. 😡 Let's fix that.
    // ==/UserScript==
     
    GM_addStyle('.q23-image-216, .q23-image-217 { transform: scaleX(-1); }');

Unless you’re especially agitated by irregular sandwich rotation, this is perhaps the most-pointless userscript ever created. So why did I go to the trouble?

Fixing Websites

Obviously, I’m telling you this as a vehicle to talk about userscripts in general and why you should be using them.

Userscripts fix and improve websites for you. Whether you want to force Reddit to stay “old Reddit” as long as possible,  make Geocaching.com’s maps more-powerful, show Twitter images uncropped by default, re-add the “cached” link to Google search results, show prices on shopping sites in terms of hours-of-your-life of work they “cost”, or just automatically click Netflix’s “yes, I’m still here, keep playing” button for maximum binge-mode, there’s a script for you. They’re like tiny browser plugins.

Screenshot from Pinterest showing many kittens, not logged-in.
Want to get endless-scroll in Pinterest without getting nagged to make a Pinterest account? There’s a userscript for that.

But the real magic is being able to remix the web your way. With just a little bit of CSS or JavaScript experience you can stop complaining that a website’s design has changed in some way you don’t like or that some functionality you use isn’t as powerful or convenient as you’d like and you can fix it.

A website I used disables scrolling until all their (tracking, advertising, etc.) JavaScript loads, and my privacy blocker blocks those files: I could cave and disable my browser’s privacy tools… but it was almost as fast to add setInterval(()=>document.body.style.overflow='', 200); to a userscript and now it’s fixed.

Don’t want a Sports section on your BBC News homepage (not just the RSS feed!)? document.querySelector('a[href="/sport"]').closest('main > div').remove(). Sorted.

I’m a huge fan of building your own tools to “scratch your own itch”. Userscripts are a highly accessible introduction to doing so that even beginner programmers can get on board with and start getting value from. More-advanced scripts can do immensely clever and powerful things, but even if you just use them to apply a few light CSS touches to your favourite websites, that’s still a win.

Footnotes

1 Remember when a website’s domain name used to be connected to what it was for? RotatingSandwiches.com does.

2 I favour ViolentMonkey.

×

My Geo*ing Limits

I thought it might be fun to try to map the limits of my geocaching/geohashing. That is, to draw the smallest possible convex polygon that surrounds all of the geocaches I’ve found and geohashpoints I’ve successfully visited.

Mathematically, such a shape is a convex hull – the smallest polygon encircling a set of points without concavity. Here’s how I made it:

1. Extract all the longitude/latitude pairs for every successful geocaching find and geohashpoint expedition. I keep them in my blog database, so I was able to use some SQL to fetch them:

SELECT DISTINCT coord_lon.meta_value lon, coord_lat.meta_value lat
FROM wp_posts
LEFT JOIN wp_postmeta expedition_result ON wp_posts.ID = expedition_result.post_id AND expedition_result.meta_key = 'checkin_type'
LEFT JOIN wp_postmeta coord_lat ON wp_posts.ID = coord_lat.post_id AND coord_lat.meta_key = 'checkin_latitude'
LEFT JOIN wp_postmeta coord_lon ON wp_posts.ID = coord_lon.post_id AND coord_lon.meta_key = 'checkin_longitude'
LEFT JOIN wp_term_relationships ON wp_posts.ID = wp_term_relationships.object_id
LEFT JOIN wp_term_taxonomy ON wp_term_relationships.term_taxonomy_id = wp_term_taxonomy.term_taxonomy_id
LEFT JOIN wp_terms ON wp_term_taxonomy.term_id = wp_terms.term_id
WHERE wp_posts.post_type = 'post' AND wp_posts.post_status = 'publish'
AND wp_term_taxonomy.taxonomy = 'kind'
AND wp_terms.slug = 'checkin'
AND expedition_result.meta_value IN ('Found it', 'found', 'coordinates reached', 'Attended');

2. Next, I determine the convex hull of these points. There are an interesting variety of algorithms for this so I adapted the Monotone Chain approach (there are convenient implementations in many languages). The algorithm seems pretty efficient, although that doesn’t matter much to me because I’m caching the results for a fortnight.

I watched way too many animations of different convex hull algorithms before selecting this one… pretty-much arbitrarily.

3. Finally, I push the hull coordinates into Geoapify, who provide mapping services to me. My full source code is available.

An up-to-date (well, no-more than two weeks outdated) version of the map appears on my geo* stats page. I don’t often get to go caching/hashing outside the bounds already-depicted, but I’m excited to try to find opportunities to push the boundaries outwards as I continue to explore the world!

(I could, I suppose, try to draw a second larger area of places I’ve visited: the difference between the smaller and larger areas would represent all of the opportunities I’d missed to find a hashpoint!)

× ×

Quickly Solving JigsawExplorer Puzzles

Background

I was contacted this week by a geocacher called Dominik who, like me, loves geocaching…. but hates it when the coordinates for a cache are hidden behind a virtual jigsaw puzzle.

A popular online jigsaw tool used by lazy geocache owners is Jigidi: I’ve come up with several techniques for bypassing their puzzles or at least making them easier.

Dominik had been looking at a geocache hidden last week in Eastern France and had discovered that it used JigsawExplorer, not Jigidi, to conceal the coordinates. Let’s take a look…

Unsolved approx. 1000 piece jigsaw puzzle.
Not just any puzzle; the geocache used an ~1000 piece puzzle! Ugh!

I experimented with a few ways to work-around the jigsaw, e.g. dramatically increasing the “snap range” so dragging a piece any distance would result in it jumping to a neighbour, and extracting original image URLs from localStorage. All were good, but none were perfect.

For a while, making pieces “snap” at any range seemed to be the best hacky workaround.

Then I realised that – unlike Jigidi, where there can be a congratulatory “completion message” (with e.g. geocache coordinates in) – in JigsawExplorer the prize is seeing the completed jigsaw.

Dialog box reading "This puzzle's box top preview is disabled for added challenge."
You can click a button to see the “box” of a jigsaw, but this can be disabled by the image uploader.

Let’s work on attacking that bit of functionality. After all: if we can bypass the “added challenge” we’ll be able to see the finished jigsaw and, therefore, the geocache coordinates. Like this:

Hackaround

Here’s how it’s done. Or keep reading if you just want to follow the instructions!
  1. Open a jigsaw and try the “box cover” button at the top. If you get the message “This puzzle’s box top preview is disabled for added challenge.”, carry on.
  2. Open your browser’s debug tools (F12) and navigate to the Sources tab.
  3. Find the jigex-prog.js file. Right-click and select Override Content (or Add Script Override).
  4. In the overridden version of the file, search for the string – e&&e.customMystery?tt.msgbox("This puzzle's box top preview is disabled for added challenge."): – this code checks if the puzzle has the “custom mystery” setting switched on and if so shows the message, otherwise (after the :) shows the box cover.
  5. Carefully delete that entire string. It’ll probably appear twice.
  6. Reload the page. Now the “box cover” button will work.

The moral, as always, might be: don’t put functionality into the client-side JavaScript if you don’t want the user to be able to bypass it.

Or maybe the moral is: if you’re going to make a puzzle geocache, put some work in and do something clever, original, and ideally with fieldwork rather than yet another low-effort “upload a picture and choose the highest number of jigsaw pieces to cut it into from the dropdown”.

× ×

The Stupidest CSS

Yesterday, I wrote the stupidest bit of CSS of my entire career.

Two new visual elements and one textual one will make it clear where a product’s placement in the marketplace is sponsored.

Owners of online shops powered by WooCommerce can optionally “connect” their stores back to Woo.com. This enables them to manage their subscriptions to any extensions they use to enhance their store1. They can also browse a marketplace of additional extensions they might like to consider, which is somewhat-tailored to them based on e.g. their geographical location2

In the future, we’ll be adding sponsored products to the marketplace listing, but we want to be transparent about it so yesterday I was working on some code that would determine from the appropriate API whether an extension was sponsored and then style it differently to make this clear. I took a look at the proposal from the designer attached to the project, which called for

  1. the word “Sponsored” to appear alongside the name of the extension’s developer,
  2. a stripe at the top in the brand colour of the extension, and
  3. a strange green blob alongside it

That third thing seemed like an odd choice, but I figured that probably I just didn’t have the design or marketing expertise to understand it, and I diligently wrote some appropriate code.3

I even attached to my PR a video demonstrating how my code reviewers could test it without spoofing actual sponsored extensions.

After some minor tweaks, my change was approved. The designer even swung by and gave it a thumbs-up. All I needed to do was wait for the automated end-to-end tests to complete, and I’d be able to add it to WooCommerce ready to be included in the next-but-one release. Nice.

In the meantime, I got started on my next bit of work. This one also included some design work by the same designer, and wouldn’t you know it… this one also had a little green blob on it?

I’m almost embarrassed to admit that my first thought was that this must be part of some wider design strategy to put little green blobs everywhere.

Then it hit me. The blobs weren’t part of the design at all, but the designer’s way of saying “look at this bit, it’s important!”. Whoops!

So I got to rush over to my (already-approved, somehow!) changeset and rip out the offending CSS: the stupidest bit of CSS of my entire career. Not bad code per se, but reasonable code resulting from a damn-stupid misinterpretation of a designer’s wishes. Brilliant.

Footnotes

1 WooCommerce extensions serve loads of different purposes, like handling bookings and reservations and integrating with parcel tracking services.

2 There’s no point us suggesting an extension that helps you calculate Royal Mail shipping rates if you’re not shipping from the UK, for example!

3 A fun side-effect of working on open-source software is that my silly mistake gets immortalised somewhere where you can go and see it any time you like!

× ×

BBC News… without the crap

Did I mention recently that I love RSS? That it brings me great joy? That I start and finish almost every day in my feed reader? Probably.

I used to have a single minor niggle with the BBC News RSS feed: that it included sports news, which I didn’t care about. So I wrote a script that downloaded it, stripped sports news, and re-exported the feed for me to subscribe to. Magic.

Lately my BBC News feed has caused me some annoyance and frustration.

But lately – presumably as a result of technical changes at the Beeb’s side – this feed has found two fresh ways to annoy me:

  1. The feed now re-publishes a story if it gets re-promoted to the front pagebut with a different <guid> (it appears to get a #0 after it when first published, a #1 the second time, and so on). In a typical day the feed reader might scoop up new stories about once an hour, any by the time I get to reading them the same exact story might appear in my reader multiple times. Ugh.
  2. They’ve started adding iPlayer and BBC Sounds content to the BBC News feed. I don’t follow BBC News in my feed reader because I want to watch or listen to things. If you do, that’s fine, but I don’t, and I’d rather filter this content out.

Luckily, I already have a recipe for improving this feed, thanks to my prior work. Let’s look at my newly-revised script (also available on GitHub):

#!/usr/bin/env ruby
require 'bundler/inline'

# # Sample crontab:
# # At 41 minutes past each hour, run the script and log the results
# */20 * * * * ~/bbc-news-rss-filter-sport-out.rb > ~/bbc-news-rss-filter-sport-out.log 2>>&1

# Dependencies:
# * open-uri - load remote URL content easily
# * nokogiri - parse/filter XML
gemfile do
  source 'https://rubygems.org'
  gem 'nokogiri'
end
require 'open-uri'

# Regular expression describing the GUIDs to reject from the resulting RSS feed
# We want to drop everything from the "sport" section of the website, also any iPlayer/Sounds links
REJECT_GUIDS_MATCHING = /^https:\/\/www\.bbc\.co\.uk\/(sport|iplayer|sounds)\//

# Load and filter the original RSS
rss = Nokogiri::XML(open('https://feeds.bbci.co.uk/news/rss.xml?edition=uk'))
rss.css('item').select{|item| item.css('guid').text =~ REJECT_GUIDS_MATCHING }.each(&:unlink)

# Strip the anchors off the <guid>s: BBC News "republishes" stories by using guids with #0, #1, #2 etc, which results in duplicates in feed readers
rss.css('guid').each{|g|g.content=g.content.gsub(/#.*$/,'')}

File.open( '/www/bbc-news-no-sport.xml', 'w' ){ |f| f.puts(rss.to_s) }
It’s amazing what you can do with Nokogiri and a half dozen lines of Ruby.

That revised script removes from the feed anything whose <guid> suggests it’s sports news or from BBC Sounds or iPlayer, and also strips any “anchor” part of the <guid> before re-exporting the feed. Much better. (Strictly speaking, this can result in a technically-invalid feed by introducing duplicates, but your feed reader oughta be smart enough to compensate for and ignore that: mine certainly is!)

You’re free to take and adapt the script to your own needs, or – if you don’t mind being tied to my opinions about what should be in BBC News’ RSS feed – just subscribe to my copy at: https://fox.q-t-a.uk/bbc-news-no-sport.xml

×