A rant on FOSS/linux accessbility

Let’s start this with some nerd cred so we get some proper context. I loves me some open source operating systems. I’ve been running linux (debian then slackware then rock linux then gentoo etc etc up through to arch) and freebsd for an eternity on my desktops and laptops and servers. Yes, I’ve installed slackware from floppies. I started off with windowmaker as my window manager, then e16 (mandrake got me a job and I’ve slept on his couch though we’ve not spoken recently), then wmii and i3 (I do love a good tiling window manager) all the way up to hyprland. I also have used xfce for years, mostly on my most recent giant monitors, and gnome when I was forced. So when I say I have experience with open source on the desktop, I have some history here.

I have also run Apple operating systems from the Macintosh SE up to darwin and the macbook pro in my lap right now. I’ve also been a microsoft fan-person (RIP win8 and windows phone).

So when the other nerds find me for saying what I’m about to say, all of this preamble exists so maybe at least one of them will shut the fuck up.

But let’s get down to brass tacks. In the last year I’ve had new and interesting disabilities show up in my life. For instance, I’ve spent the last week and will probably spend at least the next week not really being able to speak. (I can physically do the thing but it is wildly a bad idea to try.) I also have auditory processing issues that make video captions a necessity. That sort of thing.

For all of my difficulties that would affect my computer use, my Apple kit has had a solution (ignoring keyboards). I have earbuds from beets (aka apple) that work transparently with everything and help with noise cancellation and voice focus. If a video lacks captions, apple has a “live captions” app that will try to generate captions real time. This includes humans speaking near me. apple has “live speech” which is text to speech but it also attempts inflection and tone. on my phone, I press a button three times and it asks if I want live speech or captions and then does the thing I want. and it Just Works. on my macbook, I have two menu bar buttons. and they Just Work. In both cases, it took zero time to setup and start seeing results. I did obviously spend time finding the proper voice for live speech (zoe “premium” edition) but I used the basic starter voice for a day or so.

To provide way too much personal information, I set up live speech on my iphone while 50% sedated, crying, and bleeding so I could communicate with my partner. The most lucid memory I have from that car ride was the elation of being able to communicate properly with the people around me, to get my immediate needs met, without having to use my face.

I pulled my linux laptop off the basement couch table a moment ago and it occurred to me. Can I trivially do either live captions or live speech on a 2024 linux distribution? TLDR: no, no I can’t. Even on a distro like ubuntu which is about as corporate as it gets, these are not really options. I can get gnome, I think, to read some text in a flat monotone constant speed computer voice with no inflection or life. I cannot get live captions type deal at all.

And even if I could. If it existed, it would require that I run gnome on ubuntu or something else very specific. It would only work in that one desktop environment where the one user with similar issues and lots of time sat down to make it work. I would not be able to have any choice in my operating environment because I want this one thing. And there’s little chance it would work on freebsd or a not-linux system. (Though I have great confidence the openbsd folks would help me make it work on openbsd.)

Look, I’m not even going to talk about bluetooth headphones, ok? can we just take it as read that bluetooth is a shit show in open source operating systems?

Can we also skip the conversation about integration (or the total fucking lack thereof) between open source desktops and any mobile device made by anyone in 2024? Ok good.

Apple sucks. macOS is boring “this would be an ios candy environment but we couldn’t find beige candy” sort of thing. I can rant for days on why macOS and I are not friends. But when I realized I had a new-to-me accessibility need like live captions or speech, they were waiting for me in an accessibility menu. And the options are plumbed throughout the system. Hell, I was watching tiktoks moments ago using live captions so I could watch videos from creators who can’t find that captioning button.

I don’t like running macOS. I want my tiling window managers and sparkly nerd toys. I want to have live video from a zoo as my frigging desktop “wallpaper”. I want the infinite customization options and custom personalized environments. I miss that a lot.

But I can’t. I just can’t. Because whenever I realize a new disability in myself or an accommodation that would make my life easier, usually that option is already built into my mac. I spent more time just now looking up text to speech options on Linux than it took me to find and optimize apple’s version and have a long conversation with my partner (who then also set up their tts and caption software [I think they went with Evan “premium”]).

I understand the why underlying this problem. Apple has more money than my little brain can properly understand. It has enough money in the bank in cash to prevent medical bankruptcy in the USA for all of time. And to house everyone. And to feed everyone….. sigh… anyway, when Apple wants to solve a problem like text to speech, they look in the couch cushions, find a spare hundred million dollars, and either write code or buy code to solve that problem. When they wanted a dozen or more high quality voices with inflection possibilities, they bought those voices with the change they found in the lint catcher in the dryer.

The FOSS desktop world does not and will not have access to money like that. We cannot buy our way out of the problem. (And, to be fair to all parties, Apple would leverage that free-money FOSS solution as the basis for the Apple thing. yay capitalism!) But no one at Canonical is seriously saying “let’s go spend a hundred million dollars on text to speech for Ubuntu desktop”.

We’re out here on our own, waiting for someone with similar needs but enough time, personal privilege, and mental fortitude to build a solution and publish it for free and deal with FOSS world’s super fun times to keep the project alive. Without access to that capital and those resources, FOSS will always be left behind in the human user scenarios.