Tagged “rants”

Own Your Content

Shutterstock vector art of some computer magic
_{Andrey Suslov @ Shutterstock #1199480788}

We've got a problem decades in the making

Not too long ago Anil Dash wrote a piece for Rolling Stone titled “The Internet Is About To Get Weird Again” and it’s been living rent free in my mind ever since I read it. As the weeks drag on, more and more content is being slurped up by the big tech companies in their ever-growing thirst for human created and curated content to feed generative AI systems. Automattic recently announced that they’re entering deals to pump content from Tumblr and Wordpress.com to OpenAI. Reddit, too, has entered into a deal with Google. If you want a startling list, go have a look at Vox’s article on the subject. Almost every single one of these deals is “opt-out” rather than “opt-in” because they are counting on people to not opt-out, and they know that the percentage of users who would opt-in without some sort of compensation is minimal.

Lest you think this is a rant about feeding the AI hype machine, it’s not (though you may get one of those soon enough). This is more of a lament from the last several decades of big social medial companies first convincing us that they are the best way to reach and maintain an audience (by being the intermediary) and then taking the content that countless creators have written for them and then disconnecting the creator from their audience.

Every bit of content you’ve created on these platforms, whether it’s a comment or a blog post for your friends or audience is being monetized without offering you anything in return (except the privilege of feeding the social media company, I guess). Even worse, getting your stuff back out of some of these platforms is becoming increasingly difficult. I’ve seen many of my communities move entirely to Discord, using their forums feature. However, unlike traditional website forums you cannot get your forums back out of Discord. There’s no way to backup or restore that content.

I’ve personally witnessed a community lose all of its backup content due to a leaked token and an upset spammer. It was tragic and I still mourn (but hey, we’re still there).

In one way, this is the culmination of monetizing views. As Ed Citron argues in Software has Eaten the Media, the trend from many of these social media companies has been “more views good, content doesn’t matter”. We’ve seen this show before, Google has been in a shadow war with SEO optimizers for over a decade, and they might have lost. The “pivot to video” Facebook pushed was a massive lie, and we collectively fell for it.

We can turn it around!

So what do we do about this? One thing I’m excited to see that Mr. Dash rightly points out is that there’s a renewed trend of being more directly connected to the folks that are consuming your content.

Own a blog! Link to it! Block AI bots from reading it if you’re so inclined. Use social media to link to it! Don’t write your screed directly on LinkedIn - Don’t give them that content. Own it. Do what you want to with it. Monetize it however you want, or not at all! Own it. Scott Hanselman said this well over a decade ago. Own it!

Recently, there was a Substack Exodus after they were caught gleefully profiting off of literal Nazis. Many folks decided to go to self-hosted Ghost instead of letting another company control the decision making. Molly White of Citation Needed (who does a lovely recap of Crypto nonsense) even wrote about how she did it. Wresting control away from centralized stacks and back to the web of the 90s is definitely my jam.

Speaking of Decentralization, we’ve also got Mastodon and Bluesky that have federation protocols (Bluesky just opened up AT to beta, which is pretty cool) allowing you to run your own single-user account instances but still interact with an audience (which is what I do).

Right, anyhow, this rant is brought to you by the hope that we’re standing on the edge of reclaiming some of what the weird web lost to social media companies of yore.

Edit: 03-10-2024: Turns out there's a name for this concept! POSSE: https://indieweb.org/POSSE. My internet pal Zach has a fun 1-minute intro on the concept: https://www.youtube.com/watch?v=X3SrZuH00GQ&t=835s. Go watch it!

01 Mar 2024 by cthos

rants tech blog

How I’m approaching Generative AI

The Duality of AI
_{Lidiia Lohinova @ Shutterstock #2425460383}

Also known as the “Plausible Sentence Generator” and “Art Approximator”
This post is only about Generative AI. There are plenty of other Machine Learning models, and some of them are really useful, we’re not talking about those today.

I feel like every single day I see some new startup or post about how Generative AI is the future of everything and how we’re right on the cusp of Artificial General Intelligence (AGI) and soon everything from writing to art to music to making pop tarts will be controlled by this amazing new technology.

In other words, this is the biggest tech hype cycle I’ve personally witnessed. Blockchain, NFTs, and the like come close (remember companies adding “blockchain” to their products just to get investment in the last bubble?) and maybe the dotcom bubble, but I think this “AI” cycle is even bigger than them all.

There are a lot of reasons for that, which I’m going to get into as part of this … probably very lengthy post about Generative AI in general, where I find it useful, where I don’t, where my personal ethics land on the various elements of GenAI (and I’ll be sure to treat LLMs and Diffusion models differently). So, by the end of this, if I’ve done my job right, you’re going to understand a bit more about why I think there’s a lot of hype and not a lot of substance here — and how we’re going to do a lot of damage in the meantime.

Part 1, What are we Talking About Here?

Never you worry friends, I’m going to link to a lot of sources for this one.

If you’ve been living deep in a cave with no access to the news you might not have heard about Generative AI. If you are one of those people and are reading this, I envy you - please take me with you. I’m going to go ahead and define AI for the purposes of this article because the industry has gone and overloaded the term “AI” once again.

I’m going to be very constrained to “Generative AI”, also known as “GenAI”, of two categories: Large Language Models (LLMs) and Diffusion Models (like Dall-E and Stable Diffusion). The way they work is a little bit different, but the way they are used is similar. You give them a “prompt” and they give you some output. For the former, this is text and for the latter this is an image (or video, in the case of Sora). Sometimes we slap them together. Sometimes we slap them together 6 times.

Common Questions about Generative AI

How does an LLM work?

Examples of LLMs: ChatGPT, Claude, Gemini (they might rename it again after this post goes live because Google gonna Google).

I’m going to take my best crack at summarizing how this works, but I’ll link to more in-depth resources at the end of the section. In its most basic terms, an LLM takes the prompt that you entered and then it uses statistical analysis to predict the next “token” in the sequence. So, if you give it the sentence “Cats are excellent”, the LLM might have correlated “hunters” as the next token in the sequence as statistically 60% likely. The word “pets” might be 20%. And so on. It’s essentially “autocomplete with a ton of data fed to it”.

Sidebar, a token is not necessarily a full word. It could be a “.”, it could be a syllable, it could be a suffix, and so on. But for the purposes of the example you can think of as words.

What the LLM does that makes it “magical” and able to generate “novel” text is that sometimes it won’t pick the statistically most likely next token. It’ll pick a different one (based on its Temperature, Top-P, and Bottom-P parameters), which then sends it down a different path (because the token chain is now different). This is what enables it to give you a Haiku about your grandma. It’s also what makes it generate “alternative facts”. Also known as “hallucinations”.

This is a feature.

You see, the LLM has no concept of what a “fact” is. It only “understands” statistical associations between the words that have been fed to it as part of its dataset. So, when it makes up court cases, or claims public figures have died when they’re very much still alive, this is what’s happening. OpenAI, Microsoft, and others are attempting to rein this in with various techniques (which I’ll cover later), but ultimately the “bullshit generation” is a core function of how an LLM works.

This is a problem if you want an LLM to be useful as a search engine, or in any domain that relies on factual information, because invariably it will make fictions up by design. Remember that, because it’s going to come up over and over again.

How does a Diffusion Model Work?

I don’t understand diffusion models as well as I understand language models, much like I understand the craft of writing more than I do art, so this is going to be a little “fuzzier”

Examples of Diffusion Models: Dall-E 3 (Bing Image Creator), Stable Diffusion, Midjourney

Basically, a Diffusion model is the answer to the question “what happens if you train a neural network on tagged images and then introduce progressively more random noise?” The process works (massively simplified) like this:

The model is given an image labeled “cat”
A bit of random noise (or static) is introduced into the image.
Do Step 2 over and over again until the image is totally unrecognizable as a cat.
Congrats! You now know how to make a “Cat” into random noise.

But the question then becomes “can we reverse the process?”. Turns out, yes, you can. To get a from a prompt of “Give me an image that looks like a cat” the diffusion model will do the process in essentially reverse:

We generate an image that is nothing but random noise.
The model uses its training data to “remove” that random noise, just a bit
Repeat step 2 over and over again
Finally, you have an image that looks something akin to a cat

Now, on this other side, your model might not have generated a great cat. It doesn’t know what a cat is. So, it asks another model: “Hey, is this an acceptable cat?” Said model will either say “nope, try again”, or it will respond with “heck yes! That’s a cat. Do more like that”.

This is Reinforcement Learning - this is going to come up again later.

So, at it’s most “basic” representation the things that are making “AI Art” are essentially random noise de-noiserators. Which, at a technical level is super cool! Who would have thought you could give a model random noise garbage and get a semi-coherent image out of the other end?

These things are energy efficient and cheap to run, right?

I mean, it’s $20/mo for an OpenAI ChatGPT pro subscription, how expensive could it be?

My friends, this whole industry is propped up by a massive amount of speculative VC / Private Equity funding. OpenAI is nowhere near profitable. Their burn rate is enormous (partly due to server costs, but also training foundational models is expensive). Sam Altman is seeking $7 Trillion dollars for AI chips. Moore’s law is Dead, so we can’t count on the cost of compute getting ever smaller.

Let’s also talk about the environmental impact of some of these larger models. Training them requires a lot of water. Using them uses way less water (well, as much as running a power-hungry GPU would require), but the overall lifecycle of a GenAI large foundational model isn’t exactly sustainable in the world of impending climate crises.

One thing that’s also interesting is there are a number of smaller, useable-ish models that can run on commodity hardware. I’m going to talk about those later.

So what’s the business model here?

I think part of what’s fueling the hype here is only a few companies on the planet can currently field and develop these large foundational models, and no research institutions currently can. If you can roll out “AI” to every person who uses a computer, your potential and addressable markets are enormous.

Because there are only a few players in the space, they’re essentially doing what Amazon is an expert at: Subsidize the product to levels that are unsustainable (that $20/mo, for example) and then jack up the price later once you’ve got a captive market that has no choice in the matter anymore.

Go have a watch of John Stewart’s interview with FTC Chair Lina Khan, it’s a good one and touches on this near the end.

We’re already seeing them capture a lot of market here, too, because a ton of startups are building features which simply ask you, the audience, to provide an OpenAI API key. Or, they subsidize the API access cost to OpenAI through other subscription fees. Ultimately, a very small number of players under the hood control access and cost…. which is going to be very very “fun” for a lot of businesses later.

I do think OpenAI is chasing AGI…for some definition of AGI; but I don’t think it’s likely they’re going to get there with LLMs. I think they think that they’ll get there, but they’re now chasing profit. They’re incentivized to say they’ve got AGI even if they don’t.

Cool! So we modeled this on the human brain?

I’m getting pretty sick of hearing this one, because the concept of a computer Neural Network is pretty neat but every time someone says “and this is how a human brain works” it drives me a little bit closer to throwing my laptop in a river.

It’s not. Artificial Neural Networks (ANNs) were invented in the late 1960s, and were modeled after a portion of how we thought our brains might work at the time. Since then, we’ve made advances with things like Convolutional Neural Networks (CNNs) starting in the 1980s, and most recently Transformers (this is what ChatGPT uses). None of these ANN models actually model what the human brain is actually doing. We don’t actually understand how the human brain works in the first place, and the entire field of neuroscience is constantly making discoveries.

Did Transformer architecture stumble upon how the human brain works? Unlikely, but, hey, who knows. Let’s throw trillions of dollars at the problem until we get sentient clippy.

Look, I could get into a lengthy discussion about whether free will exists but I’m gonna spare you that one.

Wikipedia covers this better than I could, so go have a read on the history of ANNs.

AI will not just keep getting better the more data we put in

Couple of things here: it’s really hard to model how well a generative AI tool is doing on benchmarks. Pay attention to the various studies that have been released (peer reviewing OpenAI’s studies has been hard, turns out). You’re not getting linear growth with more data. You’re not getting exponential growth (which I suspect is what the investors are wanting).

You’re getting small incremental improvements simply from the adding more data. There are some things that the AI companies that are doing to improve performance for certain queries (this is human reinforcement, as well as some “safety” models and mechanisms) - but the idea that you just keep feeding a foundational model more data and it suddenly becomes much better is a logical fallacy and there’s not a lot of evidence for it.

It's a computer, so it can't be biased!

Oh, oh gods, no. It can be very biased. It was trained on content curated from the internet.

I cannot do any better description than this Bloomberg Article - it's amazing, but it covers how image generators tend to racially code professions.

But we can fix that right?

I'm skeptical that is even possible. Google Tried and wound up making "Racially diverse WWII German soldiers". AKA Nazis.

Right, so how did they train these things?

The shortest answer is “a whole bunch of copyrighted content that a non-profit scraped from the internet”. The longer answer is “we don’t actually fully know because OpenAI will not disclose what’s in their datasets”.

One of the datasets, by the way, is Common Crawl - you can block its scraper if you desire. That dataset is available for anyone to download.

If you’re an artist that had a publicly accessible site, art on Deviantart, or really anywhere else one of the bots can scrape, your art has probably been used to train one of these models. Now, they didn’t train the models on “the entire internet”, Common Crawl’s dataset is around 90 TB compressed, and most of that is…. Well, garbage. You don’t want that going into a model. Either way, it’s a lot of data.

If you were a company who wanted to get billions of dollars in investment by hyping up your machine learning model, you might say “this is just how a human learns to do art! They look at art, and they use that as inspiration! Exactly the same.”

I don’t buy that. An algorithm isn’t learning, it’s taking pieces of its training set and reproducing it like a facsimile. It’s not making anything new.

I struggle with this a bit too. One of my favorite art series is Marcel Duchamp’s Readymades - because it makes you question “what is art, really?”. Does putting a urinal on its side make it art? For me, yes, because Duchamp making you question the art is the art. Is “Hey Midjourney give me Batman if he were a rodeo clown” art? Nah.

Thus, OpenAI is willing to go to court to make a fair use argument in order to continue to concentrate the research dollars in their pockets and they’re willing to spend the lobbying dollars to ask forgiveness rather than waiting to ask permission. There’s a decent chance they’ll succeed. They’ll have profited off of all of our labor, but are they contributing back in a meaningful way?

Let’s explore.

Part 2, or “how useful are these things actually”?

Recycling truck
_{Paul Vasarhelyi @ Shutterstock #78378802}

Let’s start with LLMs, which the AI companies claim to be a replacement for writing of all sorts or (in the case of Microsoft) the cusp of a brilliant Artificial General Intelligence which will solve climate change (yeahhhhh no).

Remember above how LLMs take statistically likely tokens and start spitting them out in an attempt to “complete” what you’ve put into the prompt? How are the AI companies suggesting we use this best?

Well, the top things I see being pushed boil down to:

Replace your Developers with the AI that can do the grunt work for you
Generate a bunch of text from some data, like a sales report or other thing you need “summarized”
Replace search engines, because they all kind of suck now.
Writing assistants of all kinds (or, if you’re an aspiring grifter, Book generation machine)
Make API calls by giving the LLM the ability to execute code.
Chatbots! Clippy has Risen again!

There are countless others that rely on the Illusion that LLMs can think, but we’re going to stay away from those. We’re talking about what I think is useful here.

The Elephant in the Software Community: Do you need developers?

Okay, there are so many ways I can refute this claim it’s hard to pick the best one. First off, “prompt engineering” has emerged as a prime job, and it’s really just typing various things into the LLM to try and get the best results (again, manipulating the statistics engine into giving you output you want. Non-deterministic output). That is essentially a development job; you’re using natural language to try to get the machine to do what you want. Because it has a propensity to not do that, though, it’s not the same as a programming language where it does exactly what you tell it to, every time. Devs write bugs, to be sure, but what the code says is what you’re going to get out the other end. With a carefully crafted prompt you will probably get what you want out the other end, but not always (this is a feature, remember?)

The folks who are financially motivated to sell you ever increasingly complex engines are incentivized to tell you that you can cut costs and just let the LLM do the “boring stuff” leaving your most high-value workers free to do more important work.

And you know what, because these LLMs were trained on a bunch of structured code, yeah, you probably can get it to semi-reliably produce working code. It’s pretty decent at that, turns out. You can get it to “explain” some code to you and it’ll do an okay (but often subtly wrong) job. You can feed it some code, tell it to make modifications, or write tests, and it’ll do it.

Even if it’s wrong, we’ve built up a lot of tooling over the years to catch mistakes. Paired with a solid IDE, you can find errors in the LLMs code more readily than just reading it yourself. Neat!

I actually tried this recently when revamping the GW2 Assistant app. I’ll be doing a post on my experiment doing this soonish, but in the meantime let me summarize my thoughts (which are actually the second point):

An experienced developer knows when the LLM has produced unsustainable or dangerous code, and if they’re on guard for that and critically examine the output they probably will be more efficient than they were before.

Inexperienced developers will not be able to do that due to unfamiliarity and will likely just let the code go if it “works”. If it doesn’t work, they’re liable to get stuck for far longer than pair programming with a human.

Devin, the AI Agent that claims to be the first AI software engineer looks pretty impressive! Time for all software devs to take up pottery or something. I want you to keep an eye on those demos and what the human is typing into the prompt engine. One thing I noticed in the headline demo is that the engineer had to tell Devin 3 or 4 times (I kinda lost count) that it was using the wrong model and “be sure to use the right model”. There were also several occasions where he had to nudge it using specialized knowledge that the average person is simply not going to have. Really, go check it out.

Okay, so, we’re safe for a little bit right?

Well….no. I’m going to link to an article by Baldur Bjarnason: The one about the web developer job market. It’s pretty depressing, but it also summarizes my feelings well. Regardless of the merits of these AI systems (and I have a sneaking suspicion that the bubble’s going to pop sooner rather than later due to the intensity of the hype), CTOs and CEOs that are focused on cutting costs are going to reduce headcount as a money-saving measure, especially in industries that view software as a Cost Center. Hell, if Jensen Huang says we don’t need to train developers, we can be assured that the career is dead.

I think this is a long-term tactical mistake for a few reasons:

I think a lot of the hype is smoke-and-mirrors, and there’s no guarantee that it’s going to be orders-of-magnitude better.
We’ll make our developer talent pool much smaller, and have little to no environment for Juniors to learn and grow, aside from using AI assistants to do work.
Once the cost of using AI tools increases, we’ll be scrambling to either rehire devs at deflated cost, or we’re going to try and wrangle less power hungry models into doing more development things.

Neat.

Summarization / Data Crunching Engine

This LLM wish fulfillment strategy is essentially “I don’t have time to crunch this data myself, can I get the AI to do it for me and extract only the most important bits”. The shortest answer is “maybe to some degree of accuracy”. If you feed it a document, for example, and ask it to summarize - odds are decent that it’ll both give you a relatively accurate summary (because you’ve increased the odds that it’ll produce the tokens you want to see in said document) that will also contain degrees of factual errors. Sometimes there will be zero factual errors. Sometimes there will be many. Whether those are important or not depends entirely on the context.

Knowing the difference would require you to read the whole document and decide for yourself. But we’re here to save time and be more productive, remember? You’re not going to do that. You’re going to trust that the LLM has accurately summarized the data in the text you’re giving it.

BTW, by itself an LLM can’t do math. OpenAI is trying to overcome this limitation by allowing it to run Python code or connect to Wolfram Alpha but there are still some interesting quirks.

So, you trust that info, and you take it to a board presentation. You’re showcasing your summarized data and it clearly shows that your Star Wars Action Figure sales have skyrocketed. Problem is you’re an oil and gas company and you do not sell Star Wars action figures. Next thing you know, you’re looking like an idiot in front of the board of directors and they’re asking for your resignation. Or, worse, the Judge is asking you to produce the case law your LLM fabricated, and now you’re being disbarred. Neat!

Remember, the making shit up is a feature, not a bug.

But wait! We can technology our way out of this problem! We’ll have the LLM search its dataset to fact check itself! Dear reader, this is Retrieval Augmented Generation (RAG). Based on nothing but my own observations, the most common technique I’ve seen for this is doing a search first for the prompt, injecting those results into the context window, and then having it cite its sources. That can increase the accuracy by nudging the statistics in the right direction by giving it more text. Problem is, it doesn’t always work. Sometimes it’ll still cite fake resources. You can pile more and more stuff on top (like doing another check to see if the text from the source appears in the summary) in an ever-increasing race to keep the LLM honest but ultimately:

LLMs have no connection to “truth” or “fact” - all the text they generate are functionally equivalent based on statistics

RAG and Semantic Search are related concepts - you might use a semantic search engine (which attempts to search on what the user meant, not necessarily what they asked) to retrieve the documents you inject into the system.

The other technique we really need to talk about briefly is Reinforcement Learning from Human Feedback (RLHF). This is “we have the algorithm produce a thing, have a human rate it, and then use that human feedback to retrain / refine the model”.

Two major problems with this:

It only works on the topics you decide to do it on, namely “stuff that winds up in the news and we pinky swear to ‘fix’ it”.
It’s done by an army of underpaid contractors.

You’d be surprised just how much of our AI infrastructure is actually Mechanical Turks. Take Amazon Just-walk-out, for example.

What we wind up doing is just making the toolchain ever more complicated trying to get spicy autocomplete to stop making up “facts”, and it might have just not been worth the effort in the first place.

But that’s harder to do these days because:

Search Engines kind of suck now

Google’s been fighting a losing battle against “SEO Optimized” garbage sites for well over a decade at this point. Trying to get relevant search results amidst the detritus and paid search results has gotten harder over time. So, some companies have thought “hey! Generative AI can help with this - just ask the bot (see point #6) your question and it’ll give you the information directly”.

Cool, well, this has a couple of direct impacts, even if it works. Remember those hallucinations? They tend to sneak in places where they’re hard to notice, and its corpus of data is really skewed towards English language results. So, still potentially disconnected from reality (but usually augmented via RAG), but how would you know? It’s replaced your search engine - so are you going to now take the extra time to go to the primary source? Nah.

Buuuut, because Generative AI can generate even more of this SEO garbage at a record pace (usually in an effort to get ad revenue) we’re going to see more and more of the garbage web showing up in search. What happens if we’re using RAG on the general internet? Well, it’s an Ouroboros of garbage, or, as some folks theorize, Model Collapse.

The other issue is that if people just take the results the chat bot gives them and do not visit those primary sources, ad revenue and traffic to the primary sources will go down. This disincentivizes those sources from writing more content. The Generative AI needs content to live. Maybe it’ll starve itself. I dunno.

But it’ll help me elevate my writing and be a really good author right?

I’ve been too cynical this whole time. I’m going to give this one a “maybe”. If you’re using it to augment your own writing, having it rephrase certain passages, or call out to you where there are grammar mistakes, or any of that “kind” of idea more power to you.

I don’t do any of that for two reasons, one is practical, the other highlights where I think there’s an ethical line:

I’m not comfortable having a computer wholesale rewrite what I’ve done. I’d rather be shown places that can improve, see some other examples, and then rewrite it myself.
There’s a pretty good chance that the content it regurgitates is copyrighted, and we’re still years out from knowing the legal precedent.

The AI industry has come up with a nice word for “model regurgitates the training data verbatim”. Where we might call it “plagiarism” they call it “overfitting”.

Look, I don’t want to be a moral purist here, but my preferred workflow is to write the thing, do an editing pass myself, and then toss the whole thing into a grammar checker because my stupid brain freaking loves commas. Like, really, really, loves them. Comma.

I do this with a particular tool: Pro Writing Aid. It’s got a bunch of nice reports which will do things like “highlight every phrase I’ve repeated in this piece” so that I can see them and then decide what to do with them. Same deal with the grammar. I ignore its suggestions frequently because if I don’t, the piece will lose my “voice” - and you’ll be able to tell.

They, like everyone else, have started injecting Gen AI stuff into their product, but for me it’s been absolutely useless. The rephrase feature hits the same bad points I mentioned earlier. They’ve also got a “critique” function which always issues the same tired platitudes (gotta try it to understand it, folks).

This raises another interesting point about the people investing in Generative AI heavily. One of those companies is Microsoft. A company who makes a word processor. The parent of clippy themselves. They could have integrated better grammar tools into their product. They could have invested more in “please show me all the places where I repeated the word ‘bagel’”. They didn’t do this.

That makes me think that they didn’t see the business case in “writing assistants”, and why Clippy died a slow death.

Suddenly, though, they have a thing that can approximate human writing and suddenly there’s a case and a demand for “let this thing help you write”. I feel like they’re grasping at use cases here. We stumbled upon this thing, it’s definitely the “future”, but we don’t…quite….know….how.

“Busywork Generators”

I want to take a second here to talk about a lot of what I’m seeing in the business world’s potential use cases. “Use this to summarize meetings!” or “Use this to write a long email from short content” or “Here, help make a presentation”.

After all one third of meetings are pointless, and could be an email! I want to also contend that many emails are pointless.

Essentially what you’re seeing is a “hey, busywork sucks, let’s automate the busywork”. Instead of doing that, why not just…not do the busywork? If you can’t be bothered to write the thing, does it actually have any value?

I’m not talking about documentation, which is often very important (and should be curated rather than generated), but all those little things that you didn’t really need to say.

If you’re going to type a bulleted list into an LLM to generate an email, and the person on the other end is going to just use an LLM to summarize, lossily, I might add, why didn’t you just send the bulleted list?

You’re making more work for yourself. Just… don’t do that?

Let’s give it the ability to make API Calls

Right, so one of the fun things OpenAI has done for some of their GPT-4 products is to give it the ability to make function calls, so that you can have it do things like:

Book a flight
Ask what the next Guild Wars 2 World Boss is
Call your coffee maker and make it start
Get the latest news
Tie your shoes (not really)

And so on. Anything you can make a function call out to, you can have the LLM do!

It does this by being fed a function signature, so it “knows” how to structure the function call, and then runs it through an interpreter to actually make the call (cause that seems safe).

Here’s the…minor problem. It can still hallucinate when it makes that API call. So, say you have a function that looks like this: buyMeAFlight(destination, maxBudget) and you say to the chatbot “Hey, buy me a flight to Rio under $200”. What the LLM might do is this: buyMeAFlight("Rio de Janeiro", 20000). Congrats, unless you have it confirm what you’re doing you just bought a flight that’s well over your budget.

Now, like all other Generative AI things, there are techniques you can use to increase the accuracy. Making just the perfect prompt, having it repeat output back to you, asking “are you sure”, telling it that it’s a character on star trek. You know, normal stuff.

Alternatively you could just... use something deterministic, like, I don’t know, a web form or any of the existing chat agent software we already had.

Sidebar: Apparently OpenAI has introduced a “deterministic” mode in beta, where you provide a seed to the conversation to get it to reliably reproduce the same text every time. Are you convinced this is a random number generator yet?

Agents, or, I wanna talk to a thing

So the “killer” application we’ve come up with, over and over, is “let’s type our question in natural language and it does a thing.” I honestly don’t understand this on a personal level - because I don’t really like talking to chatbots. I don’t want to say “Please book me a flight on Friday to New York” and then forget about it. I want to have control over when I’m going to fly.

Do large swaths of people want executive assistants to do important things like cross-country travel?

Not coincidentally, I really struggle with that kind of delegation and have never really made use of an executive assistant personally.

We’ve decided that the best interface for doing work is “ask the chatbot to do things for you” in the agent format. This is exactly the premise of the Rabbit R1 and the Humane Ai Pin. Why use your phone when you can shout into a thing strapped to you and it’ll do…whatever you ask. Perhaps it’ll shout trivia answers at you.

But guess what, my phone can already do that. Siri’s existed for years and like, I hardly use it. It’s not because it’s not useful. It’s because I can do what I want without shouting at it. In public. For some reason.

Agents as Accessibility

We do need to talk about accessibility. One of the things that AI agents would be legitimately useful for is for those folks who cannot access interfaces normally whether that’s situationally (driving a car), or temporarily / permanently (blind, disabled).

If we can use LLMs to get better accessibility tech that is reliable, I’m all for it. Problem is that the companies pushing the technology have a mixed track record on doing accessibility work, and I’m concerned that we’ve decided that LLMs being able to generate text means we can abdicate responsibility for doing actual accessibility work.

Like many other things in the space, we’ve decided that “AI” is magic, and will make things accessible without having to do the work. I mean, no. That’s not how it works.

Remember back to the beginning of this article where I talked about other Machine Learning Models? I think that’s the space where we’re going to make more accessibility advances, like the Atom Limb which uses a non-generative model to interpret individual muscle signals.

Wrapping up the LLM Section

Still with me?

If I had to summarize my thoughts on all of the above it’s that we’ve stumbled upon something really cool - we’ve got an algorithm that can create convincing looking text.

The companies that have the resources to push this tech seem to be scrambling for the killer use-case. Many companies are clamoring for things that let them reduce labor costs. Those two things are going to result in bad outcomes for everyone.

I don’t think there’s a silver bullet use case here. There are better tools already for every use case I’ve seen put forward (with some minor exceptions), but we’re shoving LLMs into everything because that’s where the money is. We’re chasing a super-intelligent god that can “solve the climate crisis for us” by making the climate crisis worse in the meantime.

If you were holding NVDA stock, something something TO THE MOON. They’ve been making bank off of every bubble that needs GPUs to function.

This feels exactly like the Blockchain and Web3 bubbles. Lots of hype, not a lot of substance. We’re tying ourselves in knots to get it to not “hallucinate”, but like I’ve repeated over and over again in this piece the bullshit is a feature, not a bug. I recommend reading this piece by Cory Doctorow: What Kind of Bubble is AI? It’ll give you warm fuzzies. But it won’t.

So about those Diffusion Models

Midjourney, Sora, all those things that can fake voices and make music. We’ve got a big category of things that are, charitably “art generators”, but more realistically “plagiarism engines”.

This section is going to be a lot shorter. Let me summarize my feelings:

If you’re using one of these things for personal reasons, making character art for your home D&D game, or other things that you’re not trying to profit from - go for it. I don’t care. I’d rather you not give these companies money but I don’t have moral authority here.
- I’ve used it for this too! I’m not exempt from this statement.
If you’re using AI “art” in a commercial product, you don’t have an ethical defense here (but we’ll talk about business risk in a sec). The majority of these models were trained on copyrighted content without consent and the humans who put the work in are not compensated for it.

I personally don’t find all of the existing AI creations that inspiring, other than how neat it is we’ve gotten a neural network to approximate images in its training set. Some of the things it spits out are “cool” and “workable” but I just don’t like it.

Hey, I do empathize with the diffusion models a bit though. Hands are hard.

The Copyright problem

As I mentioned earlier in the post, as far as we can tell, the art diffusion models were trained on publicly viewable, but still copyrighted content.

If for some reason you’re a business and you’re reading this post: That’s a lot of business risk you’d be shouldering. There are multiple different lawsuits happening right now, many of them on different lines, and we don’t actually know how that’s going to go. Relatedly, AI Art is not copyrightable, so that’s…probably a problem for your business especially if you’re making a book or other art-heavy product. At best you can do is treat it like stock art, where you don’t own the exclusive rights to the art, and you’re hoping you don’t get slapped with liability in the future.

So, if you’re using an AI Art model in your commercial work, these are all things you have to worry about.

This is where, and I cannot believe I am saying this, I think Adobe is playing it smart. They’ve trained Firefly on Art they’ve licensed from their Adobe Stock art platform and are (marginally) compensating artists for the privilege. They have also gone so far as to offer to guarantee legal assistance to enterprise customers. If you’re a risk averse business, that’s a pretty sweet deal (and less ethically concerning - though the artists are getting pennies).

The rest of them? You’re carrying that risk on your business.

But what if your business happens to be “crime”?

Purpose-Built Fraud Engines

AI companies seem hell bent on both automating the act of creation (devaluing artistry and creativity in the process) and also making it startlingly easy for fraudsters to do their thing.

Lemme just… link some articles.

So, you create things that can A) Clone a person’s voice, B) Imitate their Likeness, and C) Make them say whatever you want.

WHAT THE HELL DID YOU THINK WAS GOING TO HAPPEN? WHAT PUBLIC GOOD DOES THAT SERVE?

I dunno about y’all, but I’m okay not practicing Digital Necromancy (just regular, artisanal necromancy).

The commercial business for these categories of generative AI are flat out fraud engines. OF COURSE criminals are going to use this to defraud people and influence elections. You’ve made their lives so much easier.

Hey, I guess we can take solace in the fact that the fraud mills can do this with fewer employees now. Neat.

But Netflix Canceled My Favorite Show and I want to Revive it

This is also known as the “democratizing art” argument. First thing I would like to point out is that art is already democratized? It’s a skill. That you can learn. All you need to do is put in the time. It’s not a mystical talent that only a select few possess.

Artists are not gurus who live in the woods and produce art from nothing, and the rest of us are mere drones who are incapable of making art.

So in this case “democratization” really means “can make things without putting in the effort”. The result of that winds up being about as tepid as you might imagine.

Now, there’s a question in there of if a person is having to work all the time to simply live, won’t this enable them to “make art”? There’s another way to fix that, by reducing the amount of work they need to do to simply exist, but nah, we’re gonna automate the fun parts.

Hey, awesome artists who are making things with AI tools but are using it as a process augmenter - all good. I’m not talking to you. I’m talking to the Willy Wonka Fraud Experience “entrepreneurs”

But you know what, I’m not even really that concerned with people who want to make stuff on their own that is for their own enjoyment. I don’t think the result is going to be very good, and I’d rather have more people creating good stuff than fewer, but hey more power to ya.

Another aside: I really do not want to verbally talk to NPCs in games. I play single-player games to not talk to people. I don’t want to be subjected to that in the name of “more realistic background dialog”.

It’s just not going to work out like you think it will. What’ll actually happen is:

Companies already don’t value artistry

The primary thing I suspect is going to happen with the AI “art” is back to the cost-cutting efforts. Where you might have used stock art before, or a junior artist, you’re going to replace that with Dall-E.

For marketing efforts, that’s not an immediate impact. Marketing content is designed to be churned out quickly, and shotgunned into people’s feeds in an effort to get you to buy something or to feel some way, etc. I don’t think those campaigns are going to be as effective as the best campaigns ever, but eh, we’ll see I guess.

The most concerning uses are going to be the media companies that are going to replace assets in video games and movies. Fewer employees, lower budgets, and … dare I say … lower quality.

You see, diffusion models don’t let you tweak them, yet (although, who knows, maybe if we start doing deterministic “seeds” again we’ll get somewhere with how Sora functions). They also have a propensity to not give you what you asked for, so, yeah, let’s spend billions of dollars trying to fix that.

So, at risk of trying to predict the future (which I’m definitely bad at), I think we’re going to gut a swath of creatives, devalue their work, and then realize that “oh, no one wants this generative crap”. We’ll rehire the artists at lower rates and we’ll consolidate capital into the hands of a few people.

Meanwhile, we’ve eliminated the positions where people would traditionally learn skills, so you won’t be able to have a career.

Because…

You don’t live in a Utopia

We live in a society that requires money to function. Most of us sell our labor for the money we need to live.

The goal of companies is to replace as many of their workers as they can with these flawed AI tools. Especially in the environment we find ourselves in where VC money needs to make a return on investment now that the Zero Interest Rate Phenomenon (ZIRP) is finished.

Now, “every time technology has displaced jobs we’ve made more jobs” is the common adage. And, generally, that’s true. However, the main fear here isn’t that we won’t be working, it’s that it’ll have a deflationary impact on wages and increase income inequality. CEO compensation compared to average salary has increased by 1460% since the 70’s, after all.

What I think is different about previous technological advances (but hey, the Luddites were facing similar social problems) is that we’re in a situation where the amount of capital is being invested in the hands of a few companies, and only a very few of them have the resources to control this new technology. I don’t think they’re altruists.

This is not a post-scarcity Star Trek future we’re living in. I wish we were. I’m sorry.

Alright, give me something more uplifting

Right. Uh. I’m not sure I can, but here are some things I’d like to see:

There are a number of really small models that can run on commodity hardware, and with enough tuning you can get them to give you comparable results to what you’re getting on some of the larger models. Those don’t require an ocean of water to train or use, and run locally.
- Check out Faraday or Jan.ai if you want to play around.
We’re going to see more AI chips, that’s inevitable, but the non-generative models are going to benefit from that too. There’s a lot of interesting work happening out there.
- I’m also pretty cool with DLSS and RSR for upscaling video game graphics for lower-powered hardware. That’s great.

I honestly hope I’m wrong and that the fantastical claims about AI solving climate change are real… but the odds of that are really bad.

Wrapping it Up

This is the longest post I think I’ve ever written, we’re well over 7,000 words. I have so many thoughts it’s hard to make them coherent.

Perhaps unsurprisingly, I’ve not used Generative AI (or AI of any kind) for this post. I’ve barely even edited it. You’re getting the entire stream of consciousness word vomit from me.

Call me a doomer if you want, but hey, if you do, I’ve got a Blockchain to sell you.

Further Fun Reading / Watching

03 Apr 2024 by cthos

rants tech AI blog

Microsoft’s Copilot+ Recall is a Horrible Idea

Shutterstock evil cube
_{80's Child @ Shutterstock #1235192320}

…and you should disable it. Or not buy a Copilot+ PC.

I’m going to cite every source I’ve got on this, the story out of Microsoft is changing and being “clarified” as this goes on, so I’ll do my best to keep this updated and as accurate as possible.

Update - June 13th, 2024: It just keeps getting better and better. Ars Technica says Microsoft is in full damage control mode with just 2 days to go until rollout.

Update - June 7th, 2024: The Verge is now reporting that Microsoft is going to make some changes after the uproar. I'm not sure this matters, I don't think Microsoft has anyone's trust that they won't make changes in the future to get at all that juicy user-generated data. But good on them for doing something.

Update - June 2nd, 2024: Oh good, Nvidia is partnering to enable this on even more computers. It's going to be basically everywhere soon whether you want it or not. Nice. Apropos of nothing I've moved my Framework to running Bazzite. It's working great so far. Haven't tried an eGPU yet.

Update - May 31st, 2024 Kevin Beaumont has posted a lengthy Q/A style post about this which is very good on his blog: Stealing everything you’ve ever typed or viewed on your own Windows PC is now possible with two lines of code — inside the Copilot+ Recall disaster. You should go give that a read. The one thing I call out is I think the wording implies it's also acting as a keylogger, but I don't think it is necessarily, but it is screenshotting often enough that whatever you type is going to wind up in a screenshot.

Okay, here it is again. Another post about Generative AI. Or, rather, "bad ideas brought about by the generative AI hype cycle". Really, I would rather not be spending more time writing about all of this, but it just keeps finding me somehow. I’m so tired.

Anyway, let's talk about Copilot+ Recall (apparently it’s just “Recall” but that’s really difficult to web search, so I’m going to use Copilot+ in front of it). The idea behind it is that everything you do on your computer, will be screenshotted every {n} seconds and stored by the Neural Processing Unit (NPU) somewhere on your computer. It'll do Optical Character Recognition (OCR) on these screenshots and do some other magical "AI goodness" to enable you to later query for anything you did within the past {n} months (configurable, based on how much storage you want to use). Here's the official Microsoft documentation on the feature.

Sidebar, apparently this is enableable on non-Copilot+ computers, but you have to go out of your way to do it.

You know what else records and stores everything you do on your computer? A rootkit. That's right, this thing that Microsoft is installing and apparently enabling by default on new Copilot+ machines, is behaving exactly like a computer virus wants to.

During setup of your new Copilot+ PC, and for each new user, you're informed about Recall and given the option to manage your Recall and snapshots preferences. If selected, Recall settings will open where you can stop saving snapshots, add filters, or further customize your experience before continuing to use Windows 11. If you continue with the default selections, saving snapshots will be turned on (emphasis mine). — Microsoft

How helpful!

Microsoft's main "defense" has thus far been "but it only stays on your computer and never leaves the network". This totally ignores the fact that viruses that want to gain access to that process could just do that itself, regardless of Microsoft's wishes.

This thing will be an incandescent target for hackers. You'd think that Microsoft would know this, but it seems like in their rush to "win" the AI hype race they've cut some corners (which is unsurprising given their recent track record on security).

Recall's security is entirely based on “but it stays local”

But “stays local” is not the same as “secure”. The only fool-proof security is to not store the thing in the first place, but here we are.

Let’s look at some of the corners they’ve cut.

First off, according to Kevin Beaumont, the NPU takes the text it extracts from the images and stores it in a user-readable sqlite database. This is very convenient for searching! This is also very convenient for any malicious process that happens to be running as you to ship off.

Guess what else it does (or rather doesn't do): obfuscate passwords or other sensitive information:

"Note that Recall does not perform content moderation. It will not hide information such as passwords or financial account numbers. That data may be in snapshots that are stored on your device, especially when sites do not follow standard internet protocols like cloaking password entry." — Techradar

Riiiight, so I'm guessing it's not going to obfuscate things like, I dunno, your bank's website? Possibly showing your full account details?

Are there… any other things you do on your computer that you'd rather not be stored for however long Recall wants to store the screenshots? Nothing? Anyhow, maybe you can take some small solace in the fact that apparently Microsoft does know how to process content and won't store DRM'd things:

"Recall also does not take snapshots of certain kinds of content, including InPrivate web browsing sessions in Microsoft Edge. It treats material protected with digital rights management (DRM) similarly; like other Windows apps such as the Snipping Tool, Recall will not store DRM content." — From that same Techradar article

… Cool. Now, Microsoft claims the following about other browsers:

Recall won’t save any content from your private browsing activity when you’re using Microsoft Edge, Firefox, Opera, Google Chrome, or other Chromium-based browsers.

Which was not the case when they first announced the idea, near as I can tell. So I guess they walked back from the “this only works in Edge”.

Microsoft Promises you can Manually exclude apps

Users can pause, stop, or delete captured content and can exclude specific apps or websites. — Ars Technica Article

But you’ve put the burden on the user to A) Know this is happening and B) Actually manage to catch everything they don’t want included.

Microsoft says, “Trust us, we’ve got Secure Core and Pluton processors!”

The security protecting your Recall content is the same for any content you have on your device. Microsoft provides many built-in security features from the chip to the cloud to protect Recall content alongside other files and apps on your Windows device.
Secured-core PC: all Copilot+ PCs will be Secured-core PCs. This feature is the highest security standard for Windows 11 devices to be included on consumer PCs. For more information, see Secured-core PCs.
Microsoft Pluton security processor will be included by default on Copilot+ PCs. For more information, see Microsoft Pluton. — Microsoft

Microsoft, buddy. None of those things matter if you trick the user into running something in user space because you’ve granted the user access to that database.

None of those things matter if someone with access to the computer wants to go looking back through your history forever.

By the way, if you go look at the Pluon processor, it doesn’t mention a word about Windows Home edition, so I dunno if every Copilot+ machine is going to come with a Windows Pro license or what?

But surely these things are encrypted on the device using BitLocker? Right?

And on that previous note, only if you have a Business or Pro license?

This one is a little tricky because Microsoft is apparently being a little vague, but by all accounts it looks like Home users get to have unencrypted screenshots just sitting on their laptop. Fun!

Sidebar: To see those screenshots, the user needs to be able to decrypt them so…. Again… virus. Running as the user. Can see them.

Maybe all Copilot+ machines come with Windows Pro? I just dunno.

Who else might have access to your computer?

Many other people who are better able to speak to this problem have pointed out that this is rife for abuse by abusive partners.

In fact, Recall seems to only work best in a one-device-per-person world. Though Microsoft explained that its Copilot+ PCs will only record Recall snapshots to specific device accounts, plenty of people share devices and accounts. For the domestic abuse survivor who is forced to share an account with their abuser, for the victim of theft who—like many people—used a weak device passcode that can easily be cracked, and for the teenager who questions their identity on the family computer, Recall could be more of a burden than a benefit. — Malwarebytes

So, yeah. That’s… great.

Will workplaces enable this?

Some of them definitely will. Look, your corporate computer is already watching what you do for good reason. A corporation needs to know if their systems are being used for nefarious, illegal, or other things that will be a problem for them (exfiltrating product secrets, for example).

This goes to a whole other level. On one hand, this is the ultimate forensic “we need to figure out what happened” tool. On the other hand, it makes the risk of leaving a laptop in the airport more risky than it already is. It increases the damage a successful malware deployment can do.

It also increases the amount of stuff that you potentially have to store for legal holds. Many industries have an obligation to hold on to certain things for extended periods of time. Recall is liable to include information that would fall under those regulatory holds so, congrats, now IT also has to implement an archival process for all of those across their entire fleet.

I guess that’s a long-winded way to say “depends on the company, but please for the love of all that is holy do not do personal stuff on your work’s laptop”.

Wrap Up

I, personally, have no idea why anyone would trust Microsoft enough to keep this feature secure, and I’m pretty sure we’re going to see a stark uptick in attacks targeting this feature as it rolls out.

Likewise, it would not surprise me if at some point in the future an update to “send select metadata to Microsoft” pops up because the AI race is fueled by data and the allure of all that distributed data is strong.

I can see how this could be useful. I’ve frequently wanted to find something amorphous that I wasn’t able to readily find…. But the downsides far outweigh that benefit for me. I suggest you strongly weigh the risks vs. the benefits and don’t use this thing.

Links to All the Articles

28 May 2024 by cthos

rants tech blog

Apple Intelligence is also not great

Apple Intelligence in a rainbow color

Update June 17th, 2024 - Ed Zitron has a pretty great take on this where in he argues that the ChatGPT integration is a mere footnote and it doesn't seem like Apple thinks it's going to be useful.

Well here we are again. Yet another giant company has decided to put Generative AI front and center, embedding it inside the operating system. I'm starting to think this is the most cursed timeline possible (not really, but the simulation is getting real weird).

I'm going to get to that, but first I want to take a minute and talk about that WWDC keynote, because it had some other things that I liked and a bunch more "AI/Machine Learning" than they actually mentioned.

The first hour of the keynote was good

Look, right out of the gate, Apple announced a ton of things for each one of their operating systems. They did the thing that Apple always does: Refine some feature something else has had for years and announce it like it was both their idea and revolutionary. They also did something I thought was smart: When a feature is powered by machine learning, they didn't mention that and just talked about what the feature could do. Brilliant! There's a lot of relevant, applicable things that ML can do for you that doesn't sway into the generative category.

These very handy features include:

Categorizing your emails into tabs, a feature Gmail has had since 2013. This is almost certainly using some kind of classifier model to do the work. It'll also likely be a bit of a hot mess (but that's okay).
Showing you relevant bits of trivia about the movie or show you're watching, a feature Amazon has had since 2018ish (Prime Video X-Ray)?
Taking 2d Images and making them into "spatial photos" (giving them depth) using an ML algorithm (I dunno if there's anything comparable to this one)
Shake your head while wearing headphones to decline a call from your Gam Gam (using an ML algorithm to classify a nod vs a shake), a feature some other buds did in 2021.
Tapping your fingers to do actions on your watch, which again is using a classifier model to understand what gesture you just made.

Sidebar, did anyone else catch the dig at Google Chrome in the Safari announcements? They basically said "Safari is a browser where private mode is actually private" - which is an amazing throwback to this revelation.

By not actually saying anything about ML in those announcements, you instead focus on how things actually make your life easier rather than eating up the current hype.

That calculator is awesome

Hands down, the star of the show is the Apple Calculator App for iPadOS. Why the hell the calculator has never been available on an iPad before has been the subject of many a blog post and article.

But the absolute coolest thing that was announced was the handwriting scratchpad. Simply write your equations out, and the app will do the math for you. It has variables! It can do algebra and trigonometry! It'll add up the items in a list you just wrote down.

You know how it's doing that? Optical Character Recognition (OCR) which is another ML Algorithm.

Relatedly, that's also how it's doing the "your handwriting is bad and we're gonna make it look less bad but still recognizably like your handwriting".

Bottom Line

I was vibing with the keynote for the majority of the presentation. Useful features, minimal risk, finally a calculator. I was a bit disappointed that they DIDN'T FIX STAGE MANAGER, but otherwise it was a good start.

And then Craig walked out to tell us all about ..... Apple Intelligence, immediately triggering my gag reflex.

Apple Intelligence is the same thing as Recall

...but from a company with a better track record on security.

First thing's first, I want to say that Apple did a good job on putting privacy and security first. It's clear that they want to position themselves as the "privacy alternative" to Microsoft, and they've done a reasonable job of that over the years. Not perfect, by any stretch, but reasonable. Keep that in mind as we have a little chat about this.

Apple Intelligence, according to Apple, is a bunch of models running primarily locally on your Apple device provided you've got a strong enough chip (more on that in a second). If the local models determine they can't do a task for you (unknown how it's making that decision, probably going to be on a per-feature basis), it'll farm that out to the cloud, but using "Private cloud compute". That's something Apple just cooked up and has a lot of info around how they plan to do it. They're even opening it up to 3rd party security researchers. Neat!

I'm just a security enthusiast, not a professional (though I do love a good HackFu), so I'm going to leave the "does this do what they say it does" to others. What I will mention is that Apple's reputation for security is miles better than Microsoft, so I suspect the general public is more inclined to believe them when they say something.

It's also going to do Recall-esque things across your entire device, because it has deep access to all the data on your device. It won't be taking screenshots every few seconds because it doesn't have to. It already has deep systems-level access to aggregate everything you're doing. They've been doing this for a while now, for example if you've seen someone send you a picture in iMessage you can also see that same picture showcased for you in Photos. They've been categorizing people, places, and things in your photos for a long while.

Now, because of the deep integration of all of those things, they can semantically search and gobble all of that up at once. Yay!

Look, Apple does have some goodwill left to give it the benefit of "we're not going to pillage this juicy training source and we pinkie promise it's more secure than Microsoft", but this is the same thing.

These local and cloud models will apparently do all sorts of great things:

Make Siri more useful! (maybe)
Summarize your emails!
Summarize and prioritize your boundless notifications!
Summarize your text messages!
Semantic search across your entire device (just like Recall!)
Semantic Search inside of videos! (Possibly getting the subject very wrong)
Write a bedtime story for your child!
Check your Grammar for you! (I'm not sure why you'd use an LLM for that as I've mentioned before)
- RIP Grammarly?
Make an Emoji of your friend, without their consent, and send it to them!
Make a custom emoji of whatever you want!
...Generate "art" locally, I guess?

Basically, exactly the same stuff that all the other LLM / Diffusion model companies want you to do smashed together with the idea behind Recall.

Let's take a moment and look at some of the examples they gave

This, for me is one of the key things that illustrates just how "we're grasping for ideas" the LLM/Diffusion crowd is.

Enhance your notes with hallucinatory diffusion

Notes about Indian Architecture with a nice sketch about said architecture

The one that really got me, and I thought was absolutely ridiculous was an example of a student learning about architecture. They'd drawn a fantastic sketch of a building that they were learning about. BUT WAIT! That image is just a sketch. What if we had a machine hallucinate a copy of it? The demo goes on to show how a generative model takes the sketch and makes it into a picture.

Who the hell wants to do that? First off, it was a great sketch, and didn't need to be "enhanced". Second, you're learning about something that presumably...exists. That there are real photos of. Why would you waste the electricity to make a brand new image, that may or may not look like what you wanted when you could...I don't know, find an actual photo of the thing. Wikipedia has one, even! I just... ugh.

It's one thing to want to generate an image of a fictional character that doesn't exist and a whole other thing to want to generate an image of things that absolutely do already exist.

Send your friends creepy pictures of them

I don't have a lot of energy to talk about this one, but if you thought memoji were weird, at least you had control over that. This time, because Apple Intelligence "knows your friends" you can now make a memoji for them whether they like it or not. Doing...who knows what? Can I type in whatever prompt I want?

I do not like this. Please do not send me any of these. I might block you.

Relatedly, you're going to be able to do contextual diffusion models right there to "really express what you're feeling". I honestly do not understand why anyone would want to do this. I don't get it. Please do not explain it to me. I don't want to know.

Anyhow, those are just two examples of "I don't think they've got a killer use case for this so we're going to guess at what people do with their phones". The real problem is the giant OpenAI shaped elephant in the room.

All the Privacy in the world stops when you send stuff to OpenAI

The other "major announcement" that Apple made at WWDC was their partnership with OpenAI.

Somehow, which wasn't exactly clear during the keynote, if a local model determines that it can't do a thing for you - and the private cloud models can't either, it may prompt you to send your query to OpenAI. This might just be isolated to Siri, or it could be everywhere? I'm not sure.

It's not exactly a "groundbreaking" integration. It's basically the same thing as any other OpenAI API wrapper, but at the OS level (Yikes).

The major problem with this is you may trust Apple not to leak your data everywhere, but you sure as hell shouldn't extend the same trust to OpenAI. Apple assures us that OpenAI isn't retaining that data but OpenAI has been caught lying repeatedly. You cannot trust OpenAI to do what it says it's going to, and shipping an integration to OpenAI right in Apple operating systems is a major breach of trust.

I don't like this one bit.

I hope that there's a way to disable it.

Also, no talk of hallucinations

Notably absent from Apple's announcements is "what happens if the thing hallucinates". All we got was a single line at the bottom of a screenshot saying to "Check important info for mistakes". Awesome. We're going to have LLMs summarize everything on your device, and we're not going to mention that it's liable to get those things wrong. Cool.

I expect, especially with locally running models, you're going to get more hallucinations, not less, but Apple seems to think the opposite. Guess we'll find out.

What I'm going to do

Well, here's a silver lining. I have an iPhone 14 which will not support Apple Intelligence (excellent). So I guess I'm just never going to buy a newer iPhone? My iPad and macbook do support these features so... I'm going to wait until I hear about how much of it I can opt out of before I update to the next version of the OS.

It's possible I'll be sitting on an old OS forever.

Or I'll go live in the woods I guess. I do not want any of this, and it's being foisted upon me because our tech overlords think it's what investors want. The enshittifcation will continue until morale improves.

11 Jun 2024 by cthos

rants tech blog

Taking /e/OS for a Test Drive

Pixel 6 Running /e/OS on my desk

Right, so here we are again, just another step in my journey to find a way to opt out of all the “shove generative AI into everything” trend in my personal life. In my last post on the Apple Intelligence announcement, I ended on a note of “well, I guess I’ll just not upgrade”. That’s a viable strategy for a while (which I intend to make use of for as long as I can), but in the long term it might not be viable.

Look, I’m not totally opposed to running an LLM or something on device, but I want it to be on my terms, not baked into every interaction with the operating system. Apple seems best poised to allow that kind of behavior (which is funny given their track record on customizations), while Google and Microsoft are hell bent on shoving this everywhere in search of the next big thing.

Thus, I spent a week running an experiment running one of the many projects based on the Android Open Source Project (AOSP), the open source part of the Android project sans the Google Extras. I figure the LLM stuff won’t make it into the open source side of stuff maybe ever since Google sees it as a moneymaker (or do they?), so it seems like a reasonably safe bet for a while.

The Experiment, Ground Rules

In order to simulate what my life would be like in a world where I try to get away from the big players while still holding a smart phone (a dumbphone or Lightphone are always options), I set some rules for myself.

No signing into Google services on the phone. They shall not touch the datas.
No signing into Apple services either. It’s not as big a problem, but I do use Apple Music all the time.
Stay as close to the OS’s preferences as possible.

That’s pretty much it, but I did a bit of a bonus goal while I was doing this. I paired a CMF Watch Pro and used the CMF Buds Pro as my daily drivers for the same time period. I wanted to go with a bit of a “budget premium” experience to challenge myself on the “do I really need the best stuff”

As an aside, my primary watch is an Apple Watch 6 and I see no reason to upgrade.

The gear list for this experiment is:

Pixel 6 Pro (Used, from Backmarket)
CMF Watch Pro (Amazon)
CMF Buds Pro (Amazon)

The Operating System, /e/OS

There are a lot of different flavors of AOSP projects out there for various needs. Lineage OS is probably the most common one out there, and Graphene OS is the extreme privacy / security focused version, but I went with /e/OS because I was already familiar with it and it offers a very interesting sync capability (which I’ll get to at the end of this post).

/e/OS is itself a Lineage fork, by the by

Now, I could have gone off the board to a Linux based phone distro (and I may give that a go in the future) but that’s a big jump I didn’t want to take - and I wanted to try to preserve the “app ecosystem” that I’m used to. Which is a big advantage of /e/OS: it ships with MicroG (which they maintain) and an app store that can pull APKs from the Google Play store anonymously (thus fulfilling rule #1 but still taking advantage of the Play store ecosystem). Jumping to only using progressive web apps (PWAs) and open source apps was a bit too far of a stretch for my daily driving.

Now that I’d picked an OS, I needed to pick a phone that would both be compatible and ideally wouldn’t create more eWaste in the process. My original plan had been to use an old Galaxy Note 10 that we had lying around, but unfortunately Samsung US phones have locked bootloaders that are a pain to unlock. So, I went on Backmarket and found a Pixel 6 Pro Unlocked, and in good condition. I did this for both price and compatibility reasons, before you set out on this endeavor yourself, you’ll want to read the supported device list for your given OS carefully.

Unlocked from Google is an important distinction here. The ones the Carriers sell may not be able to unlock the bootloader as well, leaving you stuck with stock android.

The phone arrived safely and was in pretty good condition. There were a few scuffs, but nothing a thin screen protector couldn’t hide (sparing me the madness of noticing it all the time).

Alright! Let’s go!

Getting /e/OS onto the Pixel 6 Pro

/e/OS also has an “Easy installer” that supports some newer devices, which is very fancy and easy. Unfortunately, a Pixel 6 Pro is not one of the supported devices.

Okay, so this part is very straightforward, but requires a few steps and you need a separate computer to do it. That guide is here. At a high level, the process is:

Download 2 files from the /e/OS site
Ensure Android SDK tools are installed on your computer.
Use adb / fastboot to reboot into recovery mode.
Use fastboot to unlock the bootloader.
Flash the files from step 1 onto the phone.
Reboot and enjoy your new OS.

Okay, we need to take a quick second and talk about unlocking the bootloader for those who are unfamiliar. The bootloader tells your phone how to load Android, and it is locked to an official version of Android so that you can be sure that the OS running on your phone is what you and the manufacturer expect it to be. It’s one of the things that prevent hackers with physical access to the device from doing nefarious things.

Now, unlocking the bootloader is both what allows you to flash a different OS onto the device, but it also breaks that integrity guarantee, meaning if a threat actor grabs your phone and has a few minutes, they can do some shenanigans like pull all the data off of the phone.

Some devices support relocking the bootloader. The Pixel 6 Pro is not one of those devices. This can still be fiddly even if it is supported.

/e/OS takes a defensive step against this by encrypting the data at rest, encrypted with your password (or pin) so that if that were to happen, at least it’s behind an encryption wall. Now, they could also potentially load malware onto it, which then also hijacks your password and conceivably sends it somewhere, but those are the risks we take.

The /e/OS experience

At its core, /e/OS is just the stock Android 13 experience without the google services built in. Instead, it ships with MicroG, which is an open source implementation of most of the Google APIs.

What this means is that most Android apps, even the ones that rely on Google Play services will still work. (SafetyNet also works in some contexts, though I don’t have any apps that are using it). You can control / enable / disable MicroG at your leisure, but it is necessary for push notifications to work. In particular, Push notifications must still register with Google services because that’s just how push notifications have to work - there are no alternative push servers. There are some apps that allow polling for notifications, but if you want them in realtime, this is your only option (that I am aware of). These are still anonymous as documented here.

The default Launcher (Bliss) is a lot like older iOS before the app drawer existed - namely, there is no app drawer. All installed apps just appear on the home screen. Also, you can’t place widgets anywhere except the leftmost screen. I’m not the biggest fan of this experience, but I stuck with it for this experiment. If I were to do this long term, I’m liable to install an alternative launcher (I like Nova).

App Lounge

A stock Android phone without installing apps will not be very useful unless you’re going for a minimalist kind of thing, but that’s not me, I want apps. You do have the option of only installing open source apps, but that’s also not going to get me to a fully functional app (Look, I use Discord a lot).

App Lounge has a very handy feature so that I would not break my first rule: You can use an anonymous Google account to download apps from the Play Store. This requires you to additionally trust the /e/foundation to not tamper with the apps as they’re being installed, but the app is open sourced so it can be audited and it wouldn’t behoove the e foundation to slip in some back doors. Anyway, using one of any number of anonymous accounts they maintain, you can essentially side load apps from the official play store, which means I can install Discord without too much trouble.

Advanced Privacy

The other neat feature they’ve baked in is the “Advanced Privacy” which essentially blocks ad trackers at the OS level and gives you a handy report on every tracker it’s blocked and which app the request originated from. This includes stuff like the usual Firebase analytics, all the way down to some less common ones like Qualtrics.

I’m pretty sure they’re doing this purely on the domain of the request and using a block list to make the determination, which means some stuff is probably slipping by, but it’s pretty fun to see that DoorDash is extremely leaky.

Likewise, the Aurora store gives most apps a privacy score out of 10. You’ll see a lot of them with a 0/10 on the privacy list, because of all the tracking.

Let’s talk about Murena Cloud (formerly /e/Cloud)

So one of the other really neat things about /e/OS that I wanted to mention is their cloud service, which they’ve got an integration directly in the OS which allows you to sync files, notes, contacts, emails and the like as you would if you were heavily in this ecosystem. Murena.io hosts this cloud service, which gives you 1 GB of storage for free and charges a reasonable fee for upgrades.

But we’re not in this to simply trade one cloud provider for another, oh no. The coolest part about the cloud software is that it’s essentially a specialized fork of Nextcloud, and it’s open sourced. That means, theoretically, you can run the whole set of cloud services on a server you control, deeply integrated with your mobile operating system. That’s pretty neat.

Now, you can already do this with vanilla Nextcloud and the various Nextcloud apps, but you’ll lack the built in integration. It’s not a big hassle to just use Nextcloud, but it is really appealing that built-in integration supports private servers.

The other thing that is cool is that murena.io also just works as a Nextcloud server, so you can do things like use the Nextcloud Feeds app to sync up with RSS feeds you put up in murena.io. Or use the Nextcloud files app to sync your files if you really want to. Lots of flexibility there to host or not host what you want.

What worked?

Now that the experiment is over and I’m back in the corporate embrace of Apple, what did I like about the experience?

Overall, I was really impressed at how smoothly everything went… for the most part. I could install all the apps I needed, and the PWA support for other major apps (like Starbucks) meant I didn’t need to install as many apps as I have before on other Android phones.

I was also blissfully free of the “TRY GEMINI!” pop-ups that have started appearing on other Android phones (like my Motorola G5 Power) or whatever Samsung is calling theirs (which also just appeared on my wife’s Galaxy S22).

I was concerned that I wouldn’t be able to install an eSIM and get it working, but I was also pleasantly surprised that worked without any issues as well.

What was missing?

The only major issue I ran into whilst doing this experiment is that I make heavy use of Apple pay on the Apple watch, specifically to store my transit card. Tapping and getting on transit is an essential part of my day, which was sorely lacking on the /e/OS build. Payments are probably never going to work on the phone directly (because of how payment industry infrastructure works and requires direct partnerships).

There are some workarounds. I decided to see if I could get my old Galaxy Watch Active 2 to pair and verify a card via Samsung Pay on the watch, and I’m happy to report that after some wrangling, I got it working. The only major hurdle is that I had to install the Edge browser and make it the default so that Samsung would let me log into the Galaxy Wear app.

Now, this is a Tizen based watch, not a WearOS watch, so I have no idea how a WearOS watch is going to function, but reports on the forums don’t look very promising. If I wind up trying it out, I’ll write a follow up post.

The forums have had some success with Garmin Watch / Garmin Pay, but the banks that Garmin Pay supports are pretty limited in the US.

Either way, I got the Watch Active 2 to verify a card, and I was able to tap and pay at a Petco shortly after verifying it. So, uh, success!

Wrap up

Overall, I think this was a pretty successful experiment, and should I be unhappy with my ability to disable OpenAI integration in iOS 18 I can at least switch my phone if I need to - but given that I can just stay on 17 for an indefinite period of time I’m unlikely to make the jump just yet.

I'll update this post when I've got the review of the CMF watch up with a link to it!

01 Jul 2024 by cthos

rants tech blog

Tagged “rants”

Own Your Content

We've got a problem decades in the making

We can turn it around!

How I’m approaching Generative AI

Part 1, What are we Talking About Here?

Common Questions about Generative AI

How does an LLM work?

Further Reading:

How does a Diffusion Model Work?

Further Reading

These things are energy efficient and cheap to run, right?

So what’s the business model here?

Cool! So we modeled this on the human brain?

AI will not just keep getting better the more data we put in

It's a computer, so it can't be biased!

But we can fix that right?

Right, so how did they train these things?

Part 2, or “how useful are these things actually”?

The Elephant in the Software Community: Do you need developers?

Summarization / Data Crunching Engine

Search Engines kind of suck now

But it’ll help me elevate my writing and be a really good author right?

“Busywork Generators”

Let’s give it the ability to make API Calls

Further Reading

Agents, or, I wanna talk to a thing

Agents as Accessibility

Wrapping up the LLM Section

So about those Diffusion Models

The Copyright problem

Purpose-Built Fraud Engines

But Netflix Canceled My Favorite Show and I want to Revive it

Companies already don’t value artistry

You don’t live in a Utopia

Alright, give me something more uplifting

Wrapping it Up

Further Fun Reading / Watching

Microsoft’s Copilot+ Recall is a Horrible Idea

Recall's security is entirely based on “but it stays local”

Microsoft Promises you can Manually exclude apps

Microsoft says, “Trust us, we’ve got Secure Core and Pluton processors!”

But surely these things are encrypted on the device using BitLocker? Right?

Who else might have access to your computer?

Will workplaces enable this?

Wrap Up

Links to All the Articles

Apple Intelligence is also not great

The first hour of the keynote was good

That calculator is awesome

Bottom Line

Apple Intelligence is the same thing as Recall

Let's take a moment and look at some of the examples they gave

Enhance your notes with hallucinatory diffusion

Send your friends creepy pictures of them

All the Privacy in the world stops when you send stuff to OpenAI

Also, no talk of hallucinations

What I'm going to do

Taking /e/OS for a Test Drive

The Experiment, Ground Rules

The Operating System, /e/OS

Getting /e/OS onto the Pixel 6 Pro

The /e/OS experience

App Lounge

Advanced Privacy

Let’s talk about Murena Cloud (formerly /e/Cloud)

What worked?

What was missing?

Wrap up