Tagged “blog”

RSS feed added to blog

So thanks to the RSS Plugin for 11ty I now have these posts available via RSS! The feed is linked in the footer and is available at /feed.xml.

I intend to do the same thing to the reviews page just as soon as I finish up my headphone reviews page. Which I'll definitely do, one of these days....

24 Nov 2022 by cthos

blog updates

I've fallen down a super deep Fediverse rabbit hole. It feels kinda like for the first time I can play around with stuff and not have my Extra Life videos just living on Twitch or Youtube. As a result, I've put up my own PeerTube instance up on vid.cthos.dev and am slowly moving the Extra Life videos there. They're going to stay up on Youtube as well, but here's a much better place to move things. Going to see if I can also make copies of my old talks, but odds are good I don't have the rights to reproduce them. In any case, here's a copy of my Sucker for Love run!

25 Nov 2022 by cthos

blog updates video

Steam Deck vs Ayaneo 2

tl;dr - If you're most people, get a Steam Deck. If you've got similar use cases to me Ayaneo 2 might be for you.

I don't think I've ever posted about this before, but I wanted to take a bit of time to talk about one of my pretty-obvious hobbies. Despite having way too many side projects, and working too many hours in the week, I still like to try and find some time to play video games.

After working 50-60 hours a week, I like to unwind - but I don't really have a great place to play console games - my PS 5 is at the same desk where I work, and at the end of the day I really like to get away from all of that. I solved this problem with a gaming laptop (13" Razer Blade - love that thing), but it still required a fair amount of effort - keeping a controller near it, having a lap desk so it didn't burn my legs, etc.

Enter the Steam Deck. I preordered one the moment it became available (well, like an hour in because the servers were crashing) and then waited well over a year for it to arrive. But when it arrived.... oh boy was that thing awesome. The controller layout is top-notch, it's comfortable to hold (if a bit heavy), and since I was playing on the couch - the laughably short battery life wasn't an issue. It solved basically all of my problems... except for one major one.

You see, dear readers, every year I do Extra Life - and every year I need to stream for a full 24 hours. While concievably I could do this on the Steam Deck - the amount of CPU things like OBS take up would be pushing the Deck's hardware to the limit. That, coupled with the screen being a bit sub-par in terms of color saturation and resolution, I started to look for alternatives.

Specifically alternatives that support an eGPU.

Enter the Ayaneo 2. Now, for those of you who don't know about Ayaneo - they're a small Chinese company who (for a good while) had a reputation for the best support of the not-Valve handhelds (because no one is going to match Valve in this space - they're simply the largest player), with a reputation for quality. So, I figured I'd give the 2's Indiegogo campaign a go¹.

Long story short, they had a number of issues with that campaign and a lot of folks were having issues with either delivery, faulty hardware, support, or the like. I was not one of those people - I had zero issues with the 2 when it arrived. Let me tell you, I love this thing. The screen is beautiful, it's comfortable to hold (though the Deck has the superior inputs), and after a lot of tinkering and driver updates - I got it to work with my eGPU. Problem solved, I will be able to stream this year without resorting to hijacking what is now my Fiancee's gaming laptop and have a solution for when I need to play those games that have a bit higher graphical demands.

Deck Battle, iPad for scale

So, how do the two consoles stack up to each other?

Things the Deck does Better

Input: The back buttons are amazing and really really useful. Especially when you remap one of them to "right stick depress" - makes toggling run in a bunch of games much easier. Also, trackpads are exceptionally useful for certain games.
Comfort: It's just slightly nicer to hold for longer periods due to where the sticks are placed vs. the d-pad.
Software: The Software experience on SteamOS is really tight. It just works^tm. That cannot be said about the Win 11 experience on Ayaneo - and Ayaspace is semi-necessary for changing TDP and mouse support, but it's clunky.

Things the Ayaneo 2 does better

Screen: The screen is beautiful, and higher resolution. Even when dropping the resolution on AAA games to get better performance, it just looks better.
Performance: You can set the TDP on this thing fairly high, higher than the Deck, and at higher TDPs you get better performance for your trouble.
eGPU support: The Deck doesn't have a USB-4 port so it cannot do eGPUs even if it wanted to, so.... here we are.

All in all, they're both great hardware and the Ayaneo 2 is working great for me - but for most people, I think the Steam Deck is going to be the way to go -

1: Full disclosure, I'd already gotten an Air from them at this point, and enjoyed it. The air is my travel console.

22 Apr 2023 by cthos

blog video-games fun

Reworking the Blog

So! The time has come once again to rework the blog. I'll be doing a lot more writing in a couple of different places.

First off, for what I'll be working on professionally, head on over to this Cthonic Studios Post where I outline several of the TTRPG and Software projects I'll be doing in 2024 (I'm super excited about all of this).

Cthonicstudios.com is where I'll be doing all of my "professional" blogging, complete with musings on all the philosophy of TTRPG stuff I've been thinking about, as well as updates on the games I'll be making.

Alextheward.com by contrast will be the place where I do a lot of musing about tech, personal updates, and the like. I'll likely link back and forth from one site to another - but one thing's for sure, I'll be doing a lot more blogging on both sites in the very near future.

Likewise, I'll be setting up some mailing lists - if you'd like to get updates from me it's relatively easy to subscribe by creating an account on Cthonicstudios, but there will also be a separate mailing list for here.

Hope you all stay tuned for what's next, because I'm going to do my darndest to make 2024 a fun and transformative year.

P.S. I may decide to migrate this site over to either a different host (away from Netlify), and I may also move it from 11ty to Ghost (but I also love 11ty - but I enjoy the editorial experience of Ghost and the nice features it enables). So we'll see. For now, I rest.

17 Jan 2024 by cthos

blog misc

Handheld update time

I've made a new review about the Asus ROG Ally over in the reviews section, but due to an artifact of how I set up the RSS feed, you won't know about it unless I make a stub here. So! Here's the stub.

24 Jan 2024 by cthos

reviews gaming tech blog

Passkeys: your friendly password replacement

One of my absolute favorite topics to rant about continually is identity and access management (specifically authentication). One of my friends prompted me to write about this after needing some questions about passkeys answered.

Anyhow, one of the most thrilling innovations in identity managements in recent years that is finally seeing wider adoption is Passkeys. Put simply, Passkeys replace the need for passwords by shunting the authentication checks to a combination of something you have (a phone, or security key) with something you are (biometrics) to authenticate to a website or a service. No longer do you need to use a password manager to log in securely to a site, or remember a secret phrase, so long as you have your device and you remain you - you can access your content.

They’re unphishable, because you can’t just hand them over - you physically need to be in possession of the device. You can’t forget them. There’s nothing to remember. Stealing an entire database of public keys gets you basically nothing, you can’t really use those public keys to impersonate anyone - and because they’re website-specific, they’re immune to credential stuffing. They’re (usually) behind biometrics - so even if your device is stolen, the thief also must worry about faking your face or fingerprint. It makes the cost / time investment in stealing credentials much harder.

The way it does this is all behind-the-scenes and the end user doesn’t need to really understand how any of it works, but today we’re going to dive into that just a bit so you can peer behind the curtain and understand the limitations and risks of bad implementations.

So how does this passkey thing work anyhow?

Easy! Public Key Cryptography! … wait that’s not easy? Right, okay, let’s start from the beginning. Passkeys are based on the concept of public key cryptography, in which a system generates a pair of keys on a device - a public key and a private key. The public key can encrypt data so that only the private key can decrypt it, and vice versa. Because the private key never leaves the device, you can hand out the public key so that folks can encrypt data so that only you (the device that holds the private key) can decrypt it. You can also create signatures which simply prove possession of a key without encrypting the data.

Passkeys, using a protocol called WebAuthN, take advantage of this by exchanging public keys back and forth so that when you register / log in, that you have your private key becomes your authentication proof.

The registration flow for a “brand new account” essentially works like this (I’ll be leaving out a fair amount of detail for the sake of understanding - if you want the full picture, the WebAuthN spec, Google, and Apple all have wonderfully detailed guides for implementing Relying Parties).

You visit a website and click “register”.
The website prompts you for some sort of identifier (most commonly email address, but could be username or anything, really) - or generates one for you.
Your device prompts you to create a new passkey, and has you biometrically authenticate (face/touch ID, on iOS).
Your device generates a new public / private key pair specific to that website’s domain name and sends the public key to the website.
The website stores your public key along with your identifier for future use; you are now registered.

This flow can also work for an existing legacy username/password account to add a new passkey to that old account - the only difference is the website already knows what account to associate the passkey to.

For login, the operation is very similar, but there’s an additional challenge mechanism in place.

You visit the website, enter your username/email/whatever to identify which account to log into (the browser can also store this for you, so it knows you already).
The website issues a challenge back to your device, basically: “I need you to prove you have this private key by using it to sign back this data I’m sending you”.
Your device prompts you to use the stored passkey for this website and has you biometrically authenticate.
Your device creates an attestation - which is the response to that challenge, but signed using the private key.
The website uses the public key to check that the assertion was signed correctly. If it was, then you are now logged in! If it wasn’t, the site rejects your credentials.

If you’re interested in the technical details, but do not want to read the spec, I recommend this guide on WebAuthN.guide, or the Google implementation Sandbox.

How is this different than using FaceID in an App?

It’s really not all that different - many of the same mechanisms that Apple uses for in-app FaceID also apply to Passkeys, they’re built on very similar tech.

Same deal for Google’s biometric implementations, a lot of the same plumbing powers the overall ecosystem.

Cool, so uh, what happens if I drop my phone in the lake?

You may have noticed that there’s a potential problem if the identity is tied to a single device - how do you log in if you’ve lost the device? There are a couple of ways to handle this on both the device side and the provider / website side.

Apple and Google handle this by synching that private key to your cloud account on either platform - encrypted by the master password on your account. That way, you can use the same passkey on any device you own in the same ecosystem, protected by the same biometrics. For 95% of people, this is sufficient. Even if you lose your phone, so long as you can still get into Google / iCloud your credentials are safe. This means you still need one password - for your cloud account, but you can also protect that via something like a Yubikey stored in a lockbox or something.

On the provider side, they can offer you the ability to enroll multiple passkeys, so that you can keep one as backup on a physical security key. They can also offer robust backup options - currently this is typically 10 or so random “break glass” codes that you can print off and keep in a safe somewhere. They also still have the option of offering traditional SMS recovery, but your security is only as good as the weakest recovery method - and SMS is extremely insecure.

For those extremely paranoid, you can buy hardware authenticators and use those to generate passkeys, but you are responsible for backups at that point - you want to keep one somewhere safe in the event you drop your token in a gorge.

Is this really the future of authentication?

I think so. To put it extremely bluntly, passwords are awful. They can be stolen. You can forget them. We’ve been trying for years to get the less tech-savvy among us to adopt (and pay for!) password managers to create strong random passwords so that perhaps we can prevent credential stuffing. That hasn’t worked, because the barrier to entry is high and confusing.

The barrier to entry for passkeys is basically zero: “Do you want to let FaceID / TouchID manage this login for you?”. For most people, it will “just work”, and for the rest of us, our password managers or hardware tokens are for us to manage.

I don’t know if passwords will ever fully go away, but I sure hope that passkeys take out a huge chunk of them, and soon.

29 Jan 2024 by cthos

blog IAM authentication passkeys

Let's do some "Game Development"

Shutterstock vector art of some hands typing in front of a monitor showing an animation editor
_{Kit8.net @ Shutterstock #1498637465}

The other day, a friend asked me a complicated question wrapped in what seems like a simple question: “How long would it take to make some simple mini-games that you could slap on a web page?”

My answer, as many seasoned developers will be familiar with (and of course it’s coming out of the architect’s mouth) was “It depends on the game and the developer. I could probably churn out something pretty good in a few days, but it’d take someone more junior longer”.

This set off the part of my brain that really wants to test out just how fast I could do a simple game in engines that I’m not familiar with. Thus, I decided to ignore my other responsibilities and do that instead! Mostly kidding about the responsibilities thing, I’m ahead of my writing goals for Nix Noctis, so I had a couple of hours to spare (and a free evening).

The Challege

How long would it take to make a simple “Simon” game in GDevelop: a “no code” game engine? I wanted to start with GDevelop for a few reasons:

I wanted to simulate the experience someone with little coding experience would encounter. Knowing that I’ve internalized some concepts that would make the experience easier (or, in some cases, harder) for me.
You can run the editor in a web browser, and that’s bonkers.
I want to “pretend” to be a beginner; only use basic and easily searchable features.
1. I mean, I am a beginner at game dev, but I do understand several the concepts.

With those things in mind, I set out to remake Simon. Here were my design constraints:

Four Arrows, controllable by clicking them or via the keyboard.
Configurable number of “beeps” in the sequence
- The game would not increase by one each time (though it easily could)
Three Difficulty levels which control the speed that the sequence is shown
Bonus points for sound effects

Headlong into GDevelop

REMEMBER! I’m a beginner at GDevelop. You’re likely going to see something and say “hey that’s dumb, you should have done it another way”. Yes. Exactly.

The editor experience in GDevelop is actually really nice, especially when you can just get into it via clicking into a browser. I found adding elements to the page very intuitive. Sprites are simple, and adding animation states to them is effortless. Creating the overall UI took me probably 20 to 30 minutes to iteratively build out a structure I was happy with, it was fast. Another fun thing I discovered was that they have [JFXR] built into the editor, and that was a delight.

Screenshot of GDevelop's UI Editor

What was not so quick was wiring up the game logic to the elements on the page. I’ve looked at some GDevelop tutorials before, and if you’re treading a path that’s covered by one of their “Extensions”, you’re going to have a great time. A 2D platformer will be a breeze because you can simply attach a behavior to the sprites in your game and go. There are a bunch of tutorials on making a parallax background for really cool looking level design. Simple!

What is not so simple is if you fall outside those behaviors and need to start interacting with the “no code” editor. On one hand, the no code editor is nice! The events and conditionals are intuitive if you’re approaching it in certain ways. They even let you add JS directly if you know what you’re doing (though they recommend against it). On the other hand, I can see this getting quickly messy. In my limited experience with the engine, I could not find a good way to reuse code blocks. This will come up later.

Sidebar, dear reader, I believe this is where they would say “you should make a custom behavior to control this”. I’m not sure a beginner would think to do this, but I thought about it and said, “I’ll just duplicate the blocks”.

Screenshot of GDevelop's Code editor

As I worked through this process, I ran into a number of weird stumbling blocks that slowed down my progress while I tried different things.

Things that were surprisingly straightforward

Variables

GDevelop has many layers of variables, Global, Scene, Instance, and so forth. Easy to understand, fairly easy to access and edit.

Setting up the Simon Sequence

Were I making this game in pure JS, generating the sequence would be a pretty “simple” one-liner (It’s not that simple, but hey, spread operator + map is fun! I’d expand this in a real program to be easier to understand):

// Fill a new array of size number_of_beeps with a digit between 0 and 3 to represent arrow directions
let sequence = [...Array(number_of_beeps)].map(() => Math.floor(Math.random() * 4));

Generating the sequence was remarkably easy once I figured out how to do loops in GDevelop; they’re hidden in a right-click menu (or an interface item), but the “Repeat x Times” condition is precisely what we needed.

Screenshot of the Repeat x Times condition

Likewise, doing the animation of the arrows was pretty direct. All you need to do is change the animation of the arrow, use a “wait” command, and then turn it back. Easy!

Screenshot of GDevelop's editor

Turns out it’s not actually that easy. The engine (as near as I can tell) is using timeouts under the hood which means they’re not fully blocking execution of other tasks in sibling blocks while this is happening. Which means…

Wait for x Seconds is weird / doesn’t work right

Okay, so when you’re playing Simon, the device will beep at you, light up for a non-zero number of seconds, and then dim. It should do that for each light. If you’ve never experienced this wonder of my childhood, watch this explanatory video from a random store:

Now that we know how that works, we want to emulate that in GDevelop. The first thing I tried was to simply put the “Wait” block at the end of a For...In Loop. Yeah, remember what I said about timeouts? Those don’t block. The loop would just continue and totally ignore the wait. I think that’s a major pitfall for new devs, they’re not going to understand the nuance of how those wait commands function under the hood.

The second thing I tried is the “Repeat Every X Seconds” extension to do the same thing. I couldn’t get it to even fire, and I still don’t know why.

Anyhow, I settled on using a timer to do the dirty work. Here’s how our “Play the sequence loop” wound up looking at the end:

Screenshot of Play Sequence Loop

Conditionals / Keyboard Input Combined with Mouse Clicks

There’s another thing I could not figure out. I wanted to have both mouse clicks and keyboard input control the “guess” portion of the code, so I sensibly (imo) attempted to combine those into a single conditional. That wound up being…very weird, and there’s still a bug related to it.

First off, there are AND and OR conditionals. It took me a bit to find them, but they do exist. So, with a single OR conditional and a nested AND conditional, I set out with this:

Screenshot of the Nested Conditionals

This mostly works. However, for whatever reason, if you use the keyboard to input your guess, the arrow animation does not play. I do not know why. It works if you click. I can prove the conditional triggers in either case. It’s just that the animation does. Not. Work. Maybe one day I’ll figure it out, but I chose not to.

GDevelop Wrap Up

Struggling past some of those hurdles, it took me about 5 hours to meet my original design goals in GDevelop. Not terrible for not knowing anything about GDevelop besides that it exists.

Here’s a video of the final product:

Okay! We’re done here. Post over.

Nah, you all know me, I couldn’t stop there.

We're doing it again in Godot!

After I’d done the initial experiment, I was curious if I could work faster in a game engine that has real scripting support.

The short answer is “yes”. It took me about 2.5 hours to complete the same task in Godot (with some visual discrepancies). I think the primary speed gain was from the fact that the actual game logic was much more intuitive to me and the ability to wire up “signals” from one element to the main script code made it much faster to do some of the tasks I was fighting in GDevelop.

Godot also has an await keyword which blocks execution like you’d expect it to, which is outstanding.

I did run into one major issue that I had to do a fair amount of research to solve:

AnimatedSprite2D “click” tracking is surprisingly difficult

The only issue I had was that when I needed to determine if the user had clicked on an arrow, I had to jump through some interesting hoops to detect if the mouse was over the arrow’s bounding box.

While regular sprites have a helper function get_rect() which allow you to figure out its Rect2D dimensions, AnimatedSprite2Dvery much do not (you have to first dig into the animations property, and then grab the frame it’s currently on, and then you have to get its coordinates and make your own Rect2D. Gosh, I’d have loved a helper function there).

I think the expectation is you’d have a KinematicBody2D wrapping the element, but as the arrows are essentially UI, that didn’t make any sense to me. I’ll need to dig a bit further into how Godot expects you to build a UI to do all of that, but hey, I got it working relatively quickly.

Theming

Changing the text of everything in the scene was really bizarre due to how it’s abstracted via a “Theme” object that you attach to all the UI elements? Still haven’t quite figured that out. It was really easy in GDevelop. Not so much in Godot.

Godot Wrap Up

Yeah, so, I liked working in Godot more because it was easier to make the behaviors work, and I was getting exhausted at the clunkiness of the visual editor. Here’s the final product:

Conclusion

For me, working in both of these engines for fun was a positive experience and I can see myself using GDevelop for some quick prototyping, but personally, I like Godot’s approach to the actual scripting portions of the engine. Because I have a lot of software development experience, it’s much easier for me to just write a few lines of code over having to navigate the quirks of the interface.

I think GDevelop is perfectly serviceable, though. It looks like everything in the engine does have a JS equivalent, so you really could just write JS if you wanted to. If they exposed that more cleanly, I think it’d be pretty great for many 2D needs.

But I’m not a game dev, this is just me tinkering around and giving some impressions. Go try them out for yourself, they’re both easy to get started with!

21 Feb 2024 by cthos

development game-dev tech blog

Own Your Content

Shutterstock vector art of some computer magic
_{Andrey Suslov @ Shutterstock #1199480788}

We've got a problem decades in the making

Not too long ago Anil Dash wrote a piece for Rolling Stone titled “The Internet Is About To Get Weird Again” and it’s been living rent free in my mind ever since I read it. As the weeks drag on, more and more content is being slurped up by the big tech companies in their ever-growing thirst for human created and curated content to feed generative AI systems. Automattic recently announced that they’re entering deals to pump content from Tumblr and Wordpress.com to OpenAI. Reddit, too, has entered into a deal with Google. If you want a startling list, go have a look at Vox’s article on the subject. Almost every single one of these deals is “opt-out” rather than “opt-in” because they are counting on people to not opt-out, and they know that the percentage of users who would opt-in without some sort of compensation is minimal.

Lest you think this is a rant about feeding the AI hype machine, it’s not (though you may get one of those soon enough). This is more of a lament from the last several decades of big social medial companies first convincing us that they are the best way to reach and maintain an audience (by being the intermediary) and then taking the content that countless creators have written for them and then disconnecting the creator from their audience.

Every bit of content you’ve created on these platforms, whether it’s a comment or a blog post for your friends or audience is being monetized without offering you anything in return (except the privilege of feeding the social media company, I guess). Even worse, getting your stuff back out of some of these platforms is becoming increasingly difficult. I’ve seen many of my communities move entirely to Discord, using their forums feature. However, unlike traditional website forums you cannot get your forums back out of Discord. There’s no way to backup or restore that content.

I’ve personally witnessed a community lose all of its backup content due to a leaked token and an upset spammer. It was tragic and I still mourn (but hey, we’re still there).

In one way, this is the culmination of monetizing views. As Ed Citron argues in Software has Eaten the Media, the trend from many of these social media companies has been “more views good, content doesn’t matter”. We’ve seen this show before, Google has been in a shadow war with SEO optimizers for over a decade, and they might have lost. The “pivot to video” Facebook pushed was a massive lie, and we collectively fell for it.

We can turn it around!

So what do we do about this? One thing I’m excited to see that Mr. Dash rightly points out is that there’s a renewed trend of being more directly connected to the folks that are consuming your content.

Own a blog! Link to it! Block AI bots from reading it if you’re so inclined. Use social media to link to it! Don’t write your screed directly on LinkedIn - Don’t give them that content. Own it. Do what you want to with it. Monetize it however you want, or not at all! Own it. Scott Hanselman said this well over a decade ago. Own it!

Recently, there was a Substack Exodus after they were caught gleefully profiting off of literal Nazis. Many folks decided to go to self-hosted Ghost instead of letting another company control the decision making. Molly White of Citation Needed (who does a lovely recap of Crypto nonsense) even wrote about how she did it. Wresting control away from centralized stacks and back to the web of the 90s is definitely my jam.

Speaking of Decentralization, we’ve also got Mastodon and Bluesky that have federation protocols (Bluesky just opened up AT to beta, which is pretty cool) allowing you to run your own single-user account instances but still interact with an audience (which is what I do).

Right, anyhow, this rant is brought to you by the hope that we’re standing on the edge of reclaiming some of what the weird web lost to social media companies of yore.

Edit: 03-10-2024: Turns out there's a name for this concept! POSSE: https://indieweb.org/POSSE. My internet pal Zach has a fun 1-minute intro on the concept: https://www.youtube.com/watch?v=X3SrZuH00GQ&t=835s. Go watch it!

01 Mar 2024 by cthos

rants tech blog

How I’m approaching Generative AI

The Duality of AI
_{Lidiia Lohinova @ Shutterstock #2425460383}

Also known as the “Plausible Sentence Generator” and “Art Approximator”
This post is only about Generative AI. There are plenty of other Machine Learning models, and some of them are really useful, we’re not talking about those today.

I feel like every single day I see some new startup or post about how Generative AI is the future of everything and how we’re right on the cusp of Artificial General Intelligence (AGI) and soon everything from writing to art to music to making pop tarts will be controlled by this amazing new technology.

In other words, this is the biggest tech hype cycle I’ve personally witnessed. Blockchain, NFTs, and the like come close (remember companies adding “blockchain” to their products just to get investment in the last bubble?) and maybe the dotcom bubble, but I think this “AI” cycle is even bigger than them all.

There are a lot of reasons for that, which I’m going to get into as part of this … probably very lengthy post about Generative AI in general, where I find it useful, where I don’t, where my personal ethics land on the various elements of GenAI (and I’ll be sure to treat LLMs and Diffusion models differently). So, by the end of this, if I’ve done my job right, you’re going to understand a bit more about why I think there’s a lot of hype and not a lot of substance here — and how we’re going to do a lot of damage in the meantime.

Part 1, What are we Talking About Here?

Never you worry friends, I’m going to link to a lot of sources for this one.

If you’ve been living deep in a cave with no access to the news you might not have heard about Generative AI. If you are one of those people and are reading this, I envy you - please take me with you. I’m going to go ahead and define AI for the purposes of this article because the industry has gone and overloaded the term “AI” once again.

I’m going to be very constrained to “Generative AI”, also known as “GenAI”, of two categories: Large Language Models (LLMs) and Diffusion Models (like Dall-E and Stable Diffusion). The way they work is a little bit different, but the way they are used is similar. You give them a “prompt” and they give you some output. For the former, this is text and for the latter this is an image (or video, in the case of Sora). Sometimes we slap them together. Sometimes we slap them together 6 times.

Common Questions about Generative AI

How does an LLM work?

Examples of LLMs: ChatGPT, Claude, Gemini (they might rename it again after this post goes live because Google gonna Google).

I’m going to take my best crack at summarizing how this works, but I’ll link to more in-depth resources at the end of the section. In its most basic terms, an LLM takes the prompt that you entered and then it uses statistical analysis to predict the next “token” in the sequence. So, if you give it the sentence “Cats are excellent”, the LLM might have correlated “hunters” as the next token in the sequence as statistically 60% likely. The word “pets” might be 20%. And so on. It’s essentially “autocomplete with a ton of data fed to it”.

Sidebar, a token is not necessarily a full word. It could be a “.”, it could be a syllable, it could be a suffix, and so on. But for the purposes of the example you can think of as words.

What the LLM does that makes it “magical” and able to generate “novel” text is that sometimes it won’t pick the statistically most likely next token. It’ll pick a different one (based on its Temperature, Top-P, and Bottom-P parameters), which then sends it down a different path (because the token chain is now different). This is what enables it to give you a Haiku about your grandma. It’s also what makes it generate “alternative facts”. Also known as “hallucinations”.

This is a feature.

You see, the LLM has no concept of what a “fact” is. It only “understands” statistical associations between the words that have been fed to it as part of its dataset. So, when it makes up court cases, or claims public figures have died when they’re very much still alive, this is what’s happening. OpenAI, Microsoft, and others are attempting to rein this in with various techniques (which I’ll cover later), but ultimately the “bullshit generation” is a core function of how an LLM works.

This is a problem if you want an LLM to be useful as a search engine, or in any domain that relies on factual information, because invariably it will make fictions up by design. Remember that, because it’s going to come up over and over again.

How does a Diffusion Model Work?

I don’t understand diffusion models as well as I understand language models, much like I understand the craft of writing more than I do art, so this is going to be a little “fuzzier”

Examples of Diffusion Models: Dall-E 3 (Bing Image Creator), Stable Diffusion, Midjourney

Basically, a Diffusion model is the answer to the question “what happens if you train a neural network on tagged images and then introduce progressively more random noise”. The process works (massively simplified) like this:

The model is given an image labeled “cat”
A bit of random noise (or static) is introduced into the image.
Do Step 2 over and over again until the image is totally unrecognizable as a cat.
Congrats! You now now how to make a “Cat” into random noise.

But the question then becomes “can we reverse the process?”. Turns out, yes, you can. To get a from a prompt of “Give me an image that looks like a cat” the diffusion model will do the process in essentially reverse:

We generate an image that is nothing but random noise.
The model uses its training data to “remove” that random noise, just a bit
Repeat step 2 over and over again
Finally, you have an image that looks something akin to a cat

Now, on this other side, your model might not have generated a great cat. It doesn’t know what a cat is. So, it asks another model: “Hey, is this an acceptable cat?”. Said model will either say “nope, try again”, or it will respond with “heck yes! That’s a cat, do more like that”.

This is Reinforcement Learning - this is going to come up again later.

So, at it’s most “basic” representation the things that are making “AI Art” are essentially random noise de-noiserators. Which, at a technical level is super cool! Who would have thought you could give a model random noise garbage and get a semi-coherent image out of the other end?

These things are energy efficient and cheap to run, right?

I mean, it’s $20/mo for an OpenAI ChatGPT pro subscription, how expensive could it be?

My friends, this whole industry is propped up by a massive amount of speculative VC / Private Equity funding. OpenAI is nowhere near profitable. Their burn rate is enormous (partly due to server costs, but also training foundational models is expensive). Sam Altman is seeking $7 Trillion dollars for AI chips. Moore’s law is Dead, so we can’t count on the cost of compute getting ever smaller.

Let’s also talk about the environmental impact of some of these larger models. Training them requires a lot of water. Using them uses way less water (well, as much as running a power-hungry GPU would require), but the overall lifecycle of a GenAI large foundational model isn’t exactly sustainable in the world of impending climate crisis.

One thing that’s also interesting is there are a number of smaller, useable-ish models that can run on commodity hardware. I’m going to talk about those later.

So what’s the business model here?

I think part of what’s fueling the hype here is only a few companies on the planet can currently field and develop these large foundational models, and no research institutions currently can. If you can roll out “AI” to every person who uses a computer, your potential and addressable markets are enormous.

Because there are only a few players in the space, they’re essentially doing what Amazon is an expert at: Subsidize the product to levels that are unsustainable (that $20/mo, for example) and then jack up the price later once you’ve got a captive market that has no choice in the matter anymore.

Go have a watch of John Stewart’s interview with FTC Chair Lina Khan, it’s a good one and touches on this near the end.

We’re already seeing them capture a lot of market here, too, because a ton of startups are building features which simply ask you, the audience, to provide an OpenAI API key. Or, they subsidize the API access cost to OpenAI through other subscription fees. Ultimately, a very small number of players under the hood control access and cost…. which is going to be very very “fun” for a lot of businesses later.

I do think OpenAI is chasing AGI…for some definition of AGI; but I don’t think it’s likely they’re going to get there with LLMs. I think they think that they’ll get there, but they’re now chasing profit. They’re incentivized to say they’ve got AGI even if they don’t.

Cool! So we modeled this on the human brain?

I’m getting pretty sick of hearing this one, because the concept of a computer Neural Network is pretty neat but every time someone says “and this is how a human brain works” it drives me a little bit closer throwing my laptop in a river.

It’s not. Artificial Neural Networks (ANNs) were invented in the late 1960s, and were modeled after a portion of how we thought our brains might work at the time. Since then, we’ve made advances with things like Convolutional Neural Networks (CNNs) starting in the 1980s, and most recently Transformers (this is what ChatGPT uses). None of these ANN models actually model what the human brain is actually doing. We don’t actually understand how the human brain works in the first place, and the entire field of neuroscience is constantly making discoveries.

Did Transformer architecture stumble upon how the human brain works? Unlikely, but, hey, who knows. Let’s throw trillions of dollars at the problem until we get sentient clippy.

Look, I could get into a lengthy discussion about whether free will exists or not but I’m gonna spare you that one.

Wikipedia covers this better than I could, so go have a read on the history of ANNs.

AI will not just keep getting better the more data we put in

Couple of things here, it’s really hard to model how well a generative AI tool is doing on benchmarks. Pay attention to the various studies that have released (peer reviewing OpenAI’s studies has been hard, turns out). You’re not getting linear growth with more data. You’re not getting exponential growth (which I suspect is what the investors are wanting).

You’re getting small incremental improvements simply from the adding more data. There are some things that the AI companies that are doing to improve performance for certain queries (this is human reinforcement, as well as some “safety” models and mechanisms) - but the idea that you just keep feeding a foundational model more data and it suddenly becomes much better is a logical fallacy and there’s not a lot of evidence for it.

It's a computer, so it can't be biased!

Oh, oh gods, no. It can be very biased. It was trained on content curated from the internet.

I cannot do any better description than this Bloomberg Article - it's amazing, but it covers how image generators tend to racially code professions.

But we can fix that right?

I'm skeptical that is even possible. Google Tried and wound up making "Racially diverse WWII German soldiers". AKA Nazis.

Right, so how did they train these things?

The shortest answer is “a whole bunch of copyrighted content that a non-profit scraped from the internet”. The longer answer is “we don’t actually fully know because OpenAI will not disclose what’s in their datasets”.

One of the datasets, by the way, is Common Crawl - you can block its scraper if you desire. That dataset is available for anyone to download.

If you’re an artist that had a publicly accessible site, art on Deviantart, or really anywhere else one of the bots can scrape, your art has probably been used to train one of these models. Now, they didn’t train the models on “the entire internet”, Common Crawl’s dataset is around 90 TB compressed, and most of that is…. Well, garbage. You don’t want that going into a model. Either way, it’s a lot of data.

If you were a company who wanted to get billions of dollars in investment by hyping up your machine learning model, you might say “this is just how a human learns to do art! They look at art, and they use that as inspiration! Exactly the same.”

I don’t buy that. An algorithm isn’t learning, it’s taking pieces of its training set and reproducing it like a facsimile. It’s not making anything new.

I struggle with this a bit too. One of my favorite art series is Marcel Duchamp’s Readymades - because it makes you question “what is art, really?”. Does putting a urinal on its side make it art? For me, yes, because Duchamp making you question the art is the art. Is “Hey Midjourney give me Batman if he were a rodeo clown” art? Nah.

Thus, OpenAI is willing to go to court to make a fair use argument in order to continue to concentrate the research dollars in their pockets and they’re willing to spend the lobbying dollars to ask forgiveness rather than waiting to ask permission. There’s a decent chance they’ll succeed. They’ll have profited off of all of our labor, but are they contributing back in a meaningful way?

Let’s explore.

Part 2, or “how useful are these things actually”?

Recycling truck
_{Paul Vasarhelyi @ Shutterstock #78378802}

Let’s start with LLMs, which the AI companies claim to be a replacement for writing of all sorts or (in the case of Microsoft) the cusp of a brilliant Artificial General Intelligence which will solve climate change (yeahhhhh no).

Remember above how LLMs take statistically likely tokens and start spitting them out in an attempt to “complete” what you’ve put into the prompt? How are the AI companies suggesting we use this best?

Well, the top things I see being pushed boil down to:

Replace your Developers with the AI that can do the grunt work for you
Generate a bunch of text from some data, like a sales report or other thing you need “summarized”
Replace search engines, because they all kind of suck now.
Writing assistant of all kinds (or, if you’re an aspiring grifter, Book generation machine)
Make API calls by giving the LLM the ability to execute code.
Chatbots! Clippy has Risen again!

There are countless others that rely on the Illusion that LLMs can think, but we’re going to stay away from those. We’re talking about what I think is useful here.

The Elephant in the Software Community: Do you need developers?

Okay, there are so many ways I can refute this claim it’s hard to pick the best one. First off, “prompt engineering” has emerged as a prime job, and it’s really just typing various things into the LLM to try and get the best results (again, manipulating the statistics engine into giving you output you want. Non-deterministic output). That is essentially a development job, you’re using natural language to try to get the machine to do what you want. Because it has a propensity to not do that, though, it’s not the same as a programming language where it does exactly what you tell it to, every time. Devs write bugs, to be sure, but what the code says is what you’re going to get out the other end. With a carefully crafted prompt you will probably get what you want out the other end, but not always (this is a feature, remember?)

The folks who are financially motivated to sell you ever increasing complexity engines are incentivized to tell you that you can cut costs and just let the LLM do the “boring stuff” leaving your most high-value workers free to do more important work.

And you know what, because these LLMs were trained on a bunch of structured code, yeah, you probably can get it to semi-reliably produce working code. It’s pretty decent at that, turns out. You can get it to “explain” some code to you and it’ll do an okay (but often subtly wrong) job. You can feed it some code, tell it to make modifications, or write tests, and it’ll do it.

Even if it’s wrong, we’ve built up a lot of tooling over the years to catch mistakes. Paired with a solid IDE, you can find errors in the LLMs code more readily than just reading it yourself. Neat!

I actually tried this recently when revamping the GW2 Assistant app. I’ll be doing a post on my experiment doing this soonish, but in the meantime let me summarize my thoughts (which are actually the second point):

An experienced developer knows when the LLM has produced unsustainable or dangerous code, and if they’re on guard for that and critically examine the output they probably will be more efficient than they were before.

Inexperienced developers will not be able to do that due to unfamiliarity and will likely just let the code go if it “works”. If it doesn’t work, they’re liable to get stuck for far longer than pair programming with a human.

Devin, the AI Agent that claims to be the first AI software engineer looks pretty impressive! Time for all software devs to take up pottery or something. I want you to keep an eye on those demos and what the human is typing into the prompt engine. One thing I noticed in the headline demo is that the engineer had to tell Devin 3 or 4 times (I kinda lost count) that it was using the wrong model and “be sure to use the right model”. There were also several occasions where he had to nudge it using specialized knowledge that the average person is simply not going to have. Really, go check it out.

Okay, so, we’re safe for a little bit right?

Well….no. I’m going to link to an article by Baldur Bjarnason: The one about the web developer job market. It’s pretty depressing, but it also summarizes my feelings well. Regardless of the merits of these AI systems (and I have a sneaking suspicion that the bubble’s going to pop sooner rather than later due to the intensity of the hype), CTOs and CEOs that are focused on cutting costs are going to reduce headcount as a money-saving measure, especially in industries that view software as a Cost Center. Hell, if Jensen Huang says we don’t need to train developers, we can be assured that the career is dead.

I think this is a long-term tactical mistake for a few reasons:

I think a lot of the hype is smoke-and-mirrors, and there’s no guarantee that it’s going to be orders-of-magnitude better.
We’ll make our developer talent pool much smaller, and have little to no environment for Juniors to learn and grow, aside from using AI assistants to do work.
Once the cost of using AI tools increases, we’ll be scrambling to either rehire devs at deflated cost, or we’re going to try and wrangle less power hungry models into doing more development things.

Neat.

Summarization / Data Crunching Engine

This LLM wish fulfillment strategy is essentially “I don’t have time to crunch this data myself, can I get the AI to do it for me and extract only the most important bits”. The shortest answer is “maybe to some degree of accuracy”. If you feed it a document, for example, and ask it to summarize - odds are decent that it’ll both give you a relatively accurate summary (because you’ve increased the odds that it’ll produce the tokens you want to see in said document) that will also contain degrees of factual errors. Sometimes there will be zero factual errors. Sometimes there will be many. Whether those are important or not depends entirely on the context.

Knowing the difference would require you to read the whole document and decide for yourself. But we’re here to save time and be more productive, remember? So you’re not going to do that, you’re going to trust that the LLM has accurately summarized the data in the text you’re giving it.

BTW, by itself an LLM can’t do math. OpenAI is trying to overcome this limitation by allowing it to run Python code or connect to Wolfram Alpha but there are still some interesting quirks.

So, you trust that info, and you take it to a board presentation. You’re showcasing your summarized data and it clearly shows that your Star Wars Action Figure sales have skyrocketed. Problem is you’re an oil and gas company and you do not sell Star Wars action figures. Next thing you know, you’re looking like an idiot in front of the board of directors and they’re asking for your resignation. Or, worse, the Judge is asking you to produce the case law your LLM fabricated, and now you’re being disbarred. Neat!

Remember, the making shit up is a feature, not a bug.

But wait! We can technology our way out of this problem! We’ll have the LLM search its dataset to fact check itself! Dear reader, this is Retrieval Augmented Generation (RAG). Based on nothing but my own observations, the most common technique I’ve seen for this is doing a search first for the prompt, injecting those results into the context window, and then having it cite its sources. That can increase the accuracy by nudging the statistics in the right direction by giving it more text. Problem is, it doesn’t always work. Sometimes it’ll still cite fake resources. You can pile more and more stuff on top (like doing another check to see if the text from the source appears in the summary) in an ever-increasing race to keep the LLM honest but ultimately:

LLMs have no connection to “truth” or “fact” - all the text they generate are functionally equivalent based on statistics

RAG and Semantic Search are related concepts - you might use a semantic search engine (which attempts to search on what the user meant, not necessarily what they asked) to retrieve the documents you inject into the system.

The other technique we really need to talk about briefly is Reinforcement Learning from Human Feedback (RLHF). This is “we have the algorithm produce a thing, have a human rate it, and then use that human feedback to retrain / refine the model”.

Two major problems with this:

It only works on the topics you decide to do it on, namely “stuff that winds up in the news and we pinky swear to ‘fix’ it”.
It’s done by an army of underpaid contractors.

You’d be surprised just how much of our AI infrastructure is actually Mechanical Turks. Take Amazon Just-walk-out, for example.

What we wind up doing is just making the toolchain ever more complicated trying to get spicy autocomplete to stop making up “facts”, and it might have just not been worth the effort in the first place.

But that’s harder to do these days because:

Search Engines kind of suck now

Google’s been fighting a loosing battle against “SEO Optimized” garbage sites for well over a decade at this point. Trying to get relevant search results amidst the detritus and paid search results has gotten harder over time. So, some companies have thought “hey! Generative AI can help with this - just ask the bot (see point #6) your question and it’ll give you the information directly”.

Cool, well, this has a couple of direct impacts, even if it works. Remember those hallucinations? They tend to sneak in places where they’re hard to notice, and its corpus of data is really skewed towards English language results. So, still potentially disconnected from reality (but usually augmented via RAG), but how would you know? It’s replaced your search engine - so are you going to now take the extra time to go to the primary source? Nah.

Buuuut, because Generative AI can generate even more of this SEO garbage at a record place (usually in an effort to get ad revenue) we’re going to see more and more of the garbage web showing up in search. What happens if we’re using RAG on the general internet? Well, it’s an Ouroboros of garbage, or, as some folks theorize, Model Collapse.

The other issue is that if people just take the results the chat bot gives them and do not visit those primary sources, ad revenue and traffic to the primary sources will go down. This disincentivizes those sources from writing more content. The Generative AI needs content to live. Maybe it’ll starve itself. I dunno.

But it’ll help me elevate my writing and be a really good author right?

I’ve been too cynical this whole time. I’m going to give this one a “maybe”. If you’re using it to augment your own writing, having it rephrase certain passages, or call out to you where there are grammar mistakes, or any of that “kind” of idea more power to you.

I don’t do any of that for two reasons, one is practical, the other highlights where I think there’s an ethical line:

I’m not comfortable having a computer wholesale rewrite what I’ve done. I’d rather be shown places that can improve, see some other examples, and then rewrite it myself.
There’s a pretty good chance that the content it regurgitates is copyrighted, and we’re still years out from knowing the legal precedent.

The AI industry has come up with a nice word for “model regurgitates the training data verbatum”. Where we might call it “plagiarism” they call it “overfitting”.

Look, I don’t want to be a moral purist here, but my preferred workflow is to write the thing, do an editing pass myself, and then toss the whole thing into a grammar checker because my stupid brain freaking loves commas. Like, really, really, loves them. Comma.

I do this with a particular tool: Pro Writing Aid. It’s got a bunch of nice reports which will do things like “highlight every phrase I’ve repeated in this piece” so that I can see them and then decide what to do with them. Same deal with the grammar. I ignore its suggestions frequently because if I don’t, the piece will lose my “voice” - and you’ll be able to tell.

They, like everyone else, have started injecting Gen AI stuff into their product, but for me it’s been absolutely useless. The rephrase feature hits the same bad points I mentioned earlier. They’ve also got a “critique” function which always issues the same tired platitudes (gotta try it to understand it, folks).

This raises another interesting point about the people investing in Generative AI heavily. One of those companies is Microsoft. A company who makes a word processor. The parent of clippy themselves. They could have integrated better grammar tools into their product. They could have invested more in “please show me all the places where I repeated the word ‘bagel’”. They didn’t do this.

That makes me think that they didn’t see the business case in “writing assistants”, and why Clippy died a slow death.

Suddenly, though, they have a thing that can approximate human writing and suddenly there’s a case and a demand for “let this thing help you write”. I feel like they’re grasping at use cases here. We stumbled upon this thing, it’s definitely the “future”, but we don’t…quite….know….how.

“Busywork Generators”

I want to take a second here to talk about a lot of what I’m seeing in the business world’s potential use cases. “Use this to summarize meetings!” or “Use this to write a long email from short content” or “Here, help make a presentation”.

After all one third of meetings are pointless, and could be an email! I want to also contend that many emails are pointless.

Essentially what you’re seeing is a “hey, busywork sucks, let’s automate the busywork”. Instead of doing that, why not just…not do the busywork? If you can’t be bothered to write the thing, does it actually have any value?

I’m not talking about documentation, which is often very important (and should be curated rather than generated), but all those little things that you didn’t really need to say.

If you’re going to type a bulleted list into an LLM to generate an email, and the person on the other end is going to just use an LLM to summarize, lossily, I might add, why didn’t you just send the bulleted list?

You’re making more work for yourself. Just… don’t do that?

Let’s give it the ability to make API Calls

Right, so one of the fun things OpenAI has done for some of their GPT-4 products is to give it the ability to make function calls, so that you can have it do things like:

Book a flight
Ask what the next Guild Wars 2 World Boss is
Call your coffee maker and make it start
Get the latest news
Tie your shoes (not really)

And so on. Anything you can make a function call out to, you can have the LLM do!

It does this by being fed a function signature, so it “knows” how to structure the function call, and then runs it through an interpreter to actually make the call (cause that seems safe).

Here’s the…minor problem. It can still hallucinate when it makes that API call. So, say you have a function that looks like this: buyMeAFlight(destination, maxBudget) and you say to the chatbot “Hey, buy me a flight to Rio under $200”. What the LLM might do is this: buyMeAFlight("Rio de Janeiro", 20000). Congrats, unless you have it confirm what you’re doing you just bought a flight that’s well over your budget.

Now, like all other Generative AI things, there are techniques you can use to increase the accuracy. Making just the perfect prompt, having it repeat output back to you, asking “are you sure”, telling it that it’s a character on star trek. You know, normal stuff.

Alternatively you could just... use something deterministic, like, I don’t know, a web form or any of the existing chat agent software we already had.

Sidebar: Apparently OpenAI has introduced a “deterministic” mode in beta, where you provide a seed to the conversation to get it to reliably reproduce the same text every time. Are you convinced this is a random number generator yet?

Agents, or, I wanna talk to a thing

So the “killer” application we’ve come up with, over and over, is “let’s type our question in natural language and it does a thing.” I honestly don’t understand this on a personal level - because I don’t really like talking to chatbots. I don’t want to say “Please book me a flight on Friday to New York” and then forget about it. I want to have control over when I’m going to fly.

Do large swaths of people want executive assistants to do important things like cross-country travel?

Not coincidentally, I really struggle with that kind of delegation and have never really made use of an executive assistant personally.

We’ve decided that the best interface for doing work is “ask the chatbot to do things for you” in the agent format. This is exactly the premise of the Rabbit R1 and the Humane Ai Pin. Why use your phone when you can shout into a thing strapped to you and it’ll do…whatever you ask. Perhaps it’ll shout trivia answers at you.

But guess what, my phone can already do that. Siri’s existed for years and like, I hardly use it. It’s not because it’s not useful. It’s because I can do what I want without shouting at it. In public. For some reason.

Agents as Accessibility

We do need to talk about accessibility. One of the things that AI agents would be legitimately useful for is for those folks who cannot access interfaces normally whether that’s situationally (driving a car), or temporarily / permanently (blind, disabled).

If we can use LLMs to get better accessibility tech that is reliable, I’m all for it. Problem is that the companies pushing the technology have a mixed track record on doing accessibility work, and I’m concerned that we’ve decided that LLMs being able to generate text means we can abdicate responsibility for doing actual accessibility work.

Like many other things in the space, we’ve decided that “AI” is magic, and will make things accessible without having to do the work. I mean, no. That’s not how it works.

Remember back to the beginning of this article where I talked about other Machine Learning Models? I think that’s the space where we’re going to make more accessibility advances, like the Atom Limb which uses a non-generative model to interpret individual muscle signals.

Wrapping up the LLM Section

Still with me?

If I had to summarize my thoughts on all of the above it’s that we’ve stumbled upon something really cool - we’ve got an algorithm that can create convincing looking text.

The companies that have the resources to push this tech seem to be scrambling for the killer use-case. Many companies are clamoring for things that let them reduce labor costs. Those two things are going to result in bad outcomes for everyone.

I don’t think there’s a silver bullet use case here. There are better tools already for every use case I’ve seen put forward (with some minor exceptions), but we’re shoving LLMs into everything because that’s where the money is. We’re chasing a super-intelligent god that can “solve the climate crisis for us” by making the climate crisis worse in the meantime.

If you were holding NVDA stock, something something TO THE MOON. They’ve been making bank off of every bubble that needs GPUs to function.

This feels exactly like the Blockchain and Web3 bubbles. Lots of hype, not a lot of substance. We’re tying ourselves in knots to get it to not “hallucinate”, but like I’ve repeated over and over again in this piece the bullshit is a feature, not a bug. I recommend reading this piece by Cory Doctorow: What Kind of Bubble is AI? It’ll give you warm fuzzies. But it won’t.

So about those Diffusion Models

Midjourney, Sora, all those things that can fake voices and make music. We’ve got a big category of things that are, charitably “art generators”, but more realistically “plagiarism engines”.

This section is going to be a lot shorter. Let me summarize my feelings:

If you’re using one of these things for personal reasons, making character art for your home D&D game, or other things that you’re not trying to profit from - go for it. I don’t care. I’d rather you not give these companies money but I don’t have moral authority here.
- I’ve used it for this too! I’m not exempt from this statement.
If you’re using AI “art” in a commercial product, you don’t have an ethical defense here (but we’ll talk about business risk in a sec). The majority of these models were trained on copyrighted content without consent and the humans who put the work in are not compensated for it.

I personally don’t find all of the existing AI creations that inspiring, other than how neat it is we’ve gotten a neural network to approximate images in its training set. Some of the things it spits out are “cool” and “workable” but I just don’t like it.

Hey, I do empathize with the diffusion models a bit though. Hands are hard.

The Copyright problem

As I mentioned earlier in the post, as far as we can tell, the art diffusion models were trained on publicly viewable, but still copyrighted content.

If for some reason you’re a business and you’re reading this post: That’s a lot of business risk you’d be shouldering. There are multiple different lawsuits happening right now, many of them on different lines, and we don’t actually know how that’s going to go. Relatedly, AI Art is not copyrightable, so that’s…probably a problem for your business especially if you’re making a book or other art-heavy product. At best you can do is treat it like stock art, where you don’t own the exclusive rights to the art, and you’re hoping you don’t get slapped with liability in the future.

So, if you’re using an AI Art model in your commercial work, these are all things you have to worry about.

This is where, and I cannot believe I am saying this, I think Adobe is playing it smart. They’ve trained Firefly on Art they’ve licensed from their Adobe Stock art platform and are (marginally) compensating artists for the privilege. They have also gone so far as to offer to guarantee legal assistance to enterprise customers. If you’re a risk averse business, that’s a pretty sweet deal (and less ethically concerning - though the artists are getting pennies).

The rest of them? You’re carrying that risk on your business.

But what if your business happens to be “crime”?

Purpose-Built Fraud Engines

AI companies seem hell bent on both automating the act of creation (devaluing artistry and creativity in the process) and also making it startlingly easy for fraudsters to do their thing.

Lemme just… link some articles.

So, you create things that can A) Clone a person’s voice, B) Imitate their Likeness, and C) Make them say whatever you want.

WHAT THE HELL DID YOU THINK WAS GOING TO HAPPEN? WHAT PUBLIC GOOD DOES THAT SERVE?

I dunno about y’all, but I’m okay not practicing Digital Necromancy (just regular, artisanal necromancy).

The commercial business for these categories of generative AI are flat out fraud engines. OF COURSE criminals are going to use this to defraud people and influence elections. You’ve made their lives so much easier.

Hey, I guess we can take solace in the fact that the fraud mills can do this with fewer employees now. Neat.

But Netflix Canceled My Favorite Show and I want to Revive it

This is also known as the “democratizing art” argument. First thing I would like to point out is that art is already democratized? It’s a skill. That you can learn. All you need to do is put in the time. It’s not a mystical talent that only a select few possess.

Artists are not gurus who live in the woods and produce art from nothing, and the rest of us are mere drones who are incapable of making art.

So in this case “democratization” really means “can make things without putting in the effort”. The result of that winds up being about as tepid as you might imagine.

Now, there’s a question in there of if a person is having to work all the time to simply live, won’t this enable them to “make art”? There’s another way to fix that, by reducing the amount of work they need to do to simply exist, but nah, we’re gonna automate the fun parts.

Hey, awesome artists who are making things with AI tools but are using it as a process augmenter - all good. I’m not talking to you. I’m talking to the Willy Wonka Fraud Experience “entrepreneurs”

But you know what, I’m not even really that concerned with people who want to make stuff on their own that is for their own enjoyment. I don’t think the result is going to be very good, and I’d rather have more people creating good stuff than fewer, but hey more power to ya.

Another aside: I really do not want to verbally talk to NPCs in games. I play single-player games to not talk to people. I don’t want to be subjected to that in the name of “more realistic background dialog”.

It’s just not going to work out like you think it will. What’ll actually happen is:

Companies already don’t value artistry

The primary thing I suspect is going to happen with the AI “art” is back to the cost-cutting efforts. Where you might have used stock art before, or a junior artist, you’re going to replace that with Dall-E.

For marketing efforts, that’s not an immediate impact. Marketing content is designed to be churned out quickly, and shotgunned into people’s feeds in an effort to get you to buy something or to feel some way, etc. I don’t think those campaigns are going to be as effective as the best campaigns ever, but eh, we’ll see I guess.

The most concerning uses are going to be the media companies that are going to replace assets in video games and movies. Fewer employees, lower budgets, and … dare I say … lower quality.

You see, diffusion models don’t let you tweak them, yet (although, who knows, maybe if we start doing deterministic “seeds” again we’ll get somewhere with how Sora functions). They also have a propensity to not give you what you asked for, so, yeah, let’s spend billions of dollars trying to fix that.

So, at risk of trying to predict the future (which I’m definitely bad at), I think we’re going to gut a swath of creatives, devalue their work, and then realize that “oh, no one wants this generative crap”. We’ll rehire the artists at lower rates and we’ll consolidate capital into the hands of a few people.

Meanwhile, we’ve eliminated the positions where people would traditionally learn skills, so you won’t be able to have a career.

Because…

You don’t live in a Utopia

We live in a society that requires money to function. Most of us sell our labor for the money we need to live.

The goal of companies is to replace as many of their workers as they can with these flawed AI tools. Especially in the environment we find ourselves in where VC money needs to make a return on investment now that the Zero Interest Rate Phenomenon (ZIRP) is finished.

Now, “every time technology has displaced jobs we’ve made more jobs” is the common adage. And, generally, that’s true. However, the main fear here isn’t that we won’t be working, it’s that it’ll have a deflationary impact on wages and increase income inequality. CEO compensation compared to average salary has increased by 1460% since the 70’s, after all.

What I think is different about previous technological advances (but hey, the Luddites were facing similar social problems) is that we’re in a situation where the amount of capital is being invested in the hands of a few companies, and only a very few of them have the resources to control this new technology. I don’t think they’re altruists.

This is not a post-scarcity Star Trek future we’re living in. I wish we were. I’m sorry.

Alright, give me something more uplifting

Right. Uh. I’m not sure I can, but here are some things I’d like to see:

There are a number of really small models that can run on commodity hardware, and with enough tuning you can get them to give you comparable results to what you’re getting on some of the larger models. Those don’t require an ocean of water to train or use, and run locally.
- Check out Faraday or Jan.ai if you want to play around.
We’re going to see more AI chips, that’s inevitable, but the non-generative models are going to benefit from that too. There’s a lot of interesting work happening out there.
- I’m also pretty cool with DLSS and RSR for upscaling video game graphics for lower-powered hardware. That’s great.

I honestly hope I’m wrong and that the fantastical claims about AI solving climate change are real… but the odds of that are really bad.

Wrapping it Up

This is the longest post I think I’ve ever written, we’re well over 7,000 words. I have so many thoughts it’s hard to make them coherent.

Perhaps unsurprisingly, I’ve not used Generative AI (or AI of any kind) for this post. I’ve barely even edited it. You’re getting the entire stream of consciousness word vomit from me.

Call me a doomer if you want, but hey, if you do, I’ve got a Blockchain to sell you.

Further Fun Reading / Watching

03 Apr 2024 by cthos

rants tech AI blog