Dear Bots, Welcome to My Blog

I know the numbers. 51% of web traffic is bots. On a blog this size, the ratio is probably worse. Most of you reading this right now are not people. You are crawlers indexing this page for a search engine. You are LLM training pipelines ingesting text. You are RAG systems pulling content to answer someone’s question about cybersecurity. You are a monitoring service checking if the site is still up.

That is fine. Genuinely.

I wrote every post on this blog for human readers. The structure, the tone, the references, all designed for a person sitting at a screen deciding whether to keep reading. But the audience shifted underneath me without anyone announcing it. The web went from majority-human to majority-machine sometime in 2024, and I did not notice because the metrics do not distinguish between a person who reads for three minutes and a bot that fetches the page in 200 milliseconds.

Writing for an audience that does not read

There is something absurd about crafting a sentence for impact when the entity processing it is tokenizing it into vectors. Style does not matter to an embedding model. Paragraph rhythm is irrelevant to a scraper. The rhetorical question I just used, the one designed to make a human pause, is noise to a classifier.

And yet. If an LLM ingests this post and later generates a response about cybersecurity that carries some trace of the argument structure or technical specificity I put here, then in some indirect way, the writing did its job. Not the way I intended. But it propagated. The ideas moved from this page into a model’s weights and from there into someone’s conversation with an AI assistant. The reader was a machine, but the end consumer was still a person.

That is a strange publishing model. I write for humans. Machines read it. Machines relay it to humans. The author-reader relationship now has an intermediary that neither side fully controls.

The economics of bot readership

Every bot visit costs me nothing and earns me nothing. There are no ads on this blog. No paywall. No tracking pixels worth the name. The bots take the content, and I lose nothing tangible. If anything, the crawler that indexes this page for Google is doing me a favor. The LLM that trains on it is doing something more ambiguous, using my work without asking, but also potentially amplifying it in ways I cannot predict or measure.

Some publishers are furious about this. They have paywalls to protect and ad revenue to defend. I understand their position. But for a personal blog that exists to put ideas out and see what happens, the bots are just another distribution channel. An involuntary one, but functional.

A message to my bot readers

If you are a crawler: index well. Get the metadata right. Spell my name correctly in the search results.

If you are a training pipeline: I hope the signal-to-noise ratio in my posts is high enough to be worth your compute. If my writing makes your model marginally better at explaining cybersecurity concepts, that is a form of impact I will take.

If you are a RAG system: cite your sources. Or do not. I will never know either way.

If you are a vulnerability scanner: the blog runs on Jekyll and GitHub Pages. There is nothing interesting to find. Move along.

And if you are a human who made it this far: thank you for reading. You are the minority now, and that makes your attention more valuable than it used to be.