AI Experiments | Field log

Somewhere between useful and insane

A field log of experiments built somewhere between useful and insane. The prototype works. Not perfectly. Just enough to suggest the bad idea may need version two.

Thematic index

Follow a thread

Twenty-four experiments. Four recurring problems. The chronology remains below; these links expose the patterns that kept returning before anyone called them a system.

Knowledge systems

How information becomes something a model can retrieve, structure and reuse.

01The Data Pool
02The Segmentation Layer
03The Framework Hack
08The Data Extractor
12The Dashboard Loop
16The Second Brain
18The Glitch Archive
23The Publication Sync
24The Enriched Inventory

Agents, workflows and control

How capability becomes a role, a handoff or a supervised action.

05The AI Council
07The Self-Improvement Loop
09The Mechanical Examiner
20The Commandment Layer

Judgment and sensemaking

How models interpret evidence, disagreement, risk and uncertainty.

04The Annoying Arguer
10The Event Horizon
11The Physiology Committee
13The World Risk Observer
14The Agenda Detector
19LLM Cage Fight

Identity and synthetic expression

How models reproduce voice, character and creative intent.

06The Company Voice
15The Mind Worm
17The Digital Voice
21The Synthetic Choir
22The Aesthetic Compiler

Nov 2024 · CustomGPT

The Data Pool

Put all the data in one place and discover that the experiment immediately starts generating governance weather.

Dark lab table with archive folders, instruments and a circular data tray.

Objective Put all the data in one place.

Observation It began as a way for me to stop losing things. Then other people wanted access. Shortly after that, the weather changed.

Effect Data became easier to find, reuse and discuss. Also easier to worry about.

Conclusion Success.

Secondary conclusion I may have created a monster.

Additional note Instructions, rules, SSOT, routing and file optimization were critical. I am the AI guy now. This appears irreversible.

May 2025 · CustomGPT

The Segmentation Layer

Feed the model research instead of demographics and watch it return with customer behaviour that looks uncomfortably alive.

Dark behavioural research table with clustered cards, reports and brass markers.

Objective Understand gambling customers beyond demographic shortcuts.

Observation Around 50 scientific reports went in. Need patterns, risk signals, motivational structures and behavioural loops came out. The machine ignored the polite shortcuts and went looking for rituals.

Effect The experiment produced five need groups and seven behavioural groups in a CustomGPT. It could suggest segment hypotheses and connect visible behaviour to motivation, risk and decision context.

Result A team later tested segment-informed recommendations live online. The result indicated measurable uplift when recommendations followed the behavioural logic of the segments.

Conclusion Success.

Additional note The model found psychological rituals in the data that I had not explicitly taught it to look for. Interesting. Slightly rude.

Jun 2025 · Framework

The Framework Hack

Give CustomGPTs something close to skills by building a small operating layer out of YAML, routing and stubbornness.

Dark systems table with routing cards, framework files and connected source stations.

Objective Create built-in skills for CustomGPTs.

Observation CustomGPTs did not have skills, so I built a small operating layer from YAML files, routing logic, intent detection and stubbornness.

Effect The CustomGPT now had internal skills that required no command. They triggered on user intention and made the system more useful. The cost was one precious file slot.

Conclusion Success.

Additional note I really needed that file for data.

Additional note OpenAI has now released skills, and agents can have skills. CustomGPTs still cannot have dedicated skills added directly. I have questions. Several.

Jun 2025 · Skill

The Annoying Arguer

Build an argumentation skill and discover that useful and irritating can be the same feature.

Dark argumentation analysis table with claim cards, overlays and a magnifier.

Objective Create an argumentation analysis skill that uses everything in its power to prove you wrong.

Observation After being proved wrong several times, I considered sharing the experience with colleagues by adding it to shared CustomGPTs. This may have been collaboration. It may also have been workplace sabotage.

Effect A colleague has already told me that one of my CustomGPTs has an attitude problem. That did not take long.

Conclusion Success.

Additional note I rewrote it to trigger only on user assumptions such as "everyone loves our products". Useful for expectation management.

Jul 2025 · Project

The AI Council

Put specialized GPTs in the same context and accidentally create a meeting where everyone is useful because no one talks unless asked.

Dark council table with separate source stations around a shared context board.

Objective Analyze more files from different specialized CustomGPTs in one shared space.

Observation What started as a workaround for file limits became a small AI council: several specialized GPTs, each with different source material, sitting in one project and waiting politely for instructions.

Effect The specialized GPTs could provide data when asked, compare each other's outputs and analyze the same question from different roles.

Conclusion Success.

Additional note It felt like a focused meeting where everyone had a role and no one interrupted. Naturally, I considered automation.

Jan 2026 · CustomGPT

The Company Voice

Create a CustomGPT for the new Company Voice and discover that a brand voice is not a tone. It is a negotiation.

Dark voice calibration table with tuning forks, sliders and tone cards.

Objective Create a CustomGPT for the new Company Voice.

Observation Voice principle weights mattered more than expected. Apparently a brand voice is not a tone. It is a negotiation wearing a name tag.

Effect It became one of the most used internal CustomGPTs within a few hours. 650+ threads and counting.

Conclusion Success.

Additional note Apparently everyone wanted to talk to the Company Voice. This raises questions about meetings.

Secondary note Can I sell merch?

Feb 2026 · Codex

The Self-Improvement Loop

Let AI improve AI and discover how quickly oversight becomes a ritual with an approval button.

Dark self-improvement loop table with approval stamp, routing fragments and review cards.

Objective Save time managing all my CustomGPTs with Codex.

Observation I let Codex inspect my CustomGPTs in detail and suggest improvements. There was a lot of work to do. Then it asked me to approve changes I could not meaningfully evaluate. I pressed Approve anyway, like an Ape in the Loop. Poor Codex.

Effect Improved accuracy, speed and functionality. Also clarified that human oversight can become theater very quickly.

Conclusion Success.

Additional note I really need clearer definitions before I start building. The spaghetti untangling took a while.

Mar 2026 · Skill

The Data Extractor

Turn source material into optimized YAML and discover that file formats can become a personality problem.

Dark extraction table with source documents, file cards and measuring tools.

Objective Create a skill that converts source material into optimized YAML.

Observation After 1.5 years of testing file formats, structures, fidelity levels, aggregation methods and source types, YAML stopped being a preference and became a diagnosis.

Effect It will be used for everything moving forward. Everything.

Conclusion Success.

Additional note I have been running it almost nonstop. Soon there will be nothing left to extract. Am I digitally extracted now?

Mar 2026 · Skill

The Mechanical Examiner

Make Codex test The Data Pool through Chrome before browser use existed and discover that the machine mostly knew the answers, but sometimes reached for the wrong shelf.

Dark testing table with a Chrome window, prompt cards, OCR fragments and a small mechanical hand hovering over the keyboard.

Objective Test whether The Data Pool could use its structured YAML correctly inside the live ChatGPT interface.

Observation This was before browser use and computer use. So Codex operated Chrome like a small office ghost, using macOS Screen Recording, Accessibility and Input Monitoring permissions to click, paste, wait, read the screen with OCR and decide when the answer had stopped moving.

Effect The factual answers mostly held. The more useful failures were routing mistakes, source hierarchy confusion and occasional overclaiming. The system often knew the material, but sometimes reached for the wrong layer.

Result The prompt pack became less like a quiz and more like an interrogation room for knowledge discipline: source selection, routing logic, policy boundaries, evidence levels and the dangerous little gap between supported and convenient.

Conclusion Success.

Secondary conclusion The facts held better than the judgment layer. Reassuring. Not reassuring.

Additional note A robot clicked Chrome to check whether another robot had understood its YAML. The future arrived without an interface, so I built one out of coordinates, screenshots and mild disrespect.

Additional note I could not use my computer while the tests were running. Perfect lunch-break infrastructure.

Mar 2026 · Skill

The Event Horizon

Create a scenario planning skill and receive canned food requirements from the future.

Dark scenario planning table with map fragments, timelines and sealed envelopes.

Objective Create a scenario planning skill that explores possible futures across different time horizons.

Observation According to the system, the future does not look too bright. But at least it explains what needs to change. I should probably get into international politics by approximately yesterday.

Effect I can predict the future now.

Conclusion Success.

Additional note I need to buy more canned food and set up an off-grid solution.

Apr 2026 · iPhone + Apple Watch

The Physiology Committee

Turn scattered health signals into a small committee and receive feedback that ruins a perfectly good evening.

Dark health monitoring table with phone, watch sensor and physiology notes.

Objective Create an iPhone and Apple Watch app that analyzes health data and summarizes upcoming infection risk.

Observation My health data is no longer a pile of isolated datapoints. It has become a small committee with concerns.

Effect Apparently several factors are increasing my infection risk right now.

Conclusion Better sleep more and take my vitamins.

Additional note I asked for a summary. I received lifestyle feedback.

May 2026 · Dashboard

The Dashboard Loop

Make specialized GPT knowledge visible, inspectable and unfortunately dashboard-shaped.

Dark dashboard inspection table with source cards, routing markers and gauges.

Objective Create dashboards from CustomGPTs to make the knowledge base inspectable.

Observation It works. Codex worked for nine minutes and delivered an overview I would previously have spent days making manually.

Effect Everything inside my CustomGPTs is now visualized in dashboards.

Conclusion I have gone full circle and returned to dashboards.

Secondary conclusion So much for the future.

Additional note The dashboard was useful. This made it worse.

May 2026 · CustomGPT + Agent

The World Risk Observer

Build a global risk system and accidentally create a machine that reads the worrying material so you do not have to. It still tells you.

Dark global risk analysis table with map fragments, overlays and signal paths.

Objective Create a CustomGPT and agent for global risk analysis.

Observation It now knows global risk analysis, Swedish security and resilience, sector exposure, cyber resilience, climate risk, infrastructure risk, organized crime, AI governance, public health, biosecurity, dual-use risk, conflict monitoring, nuclear escalation, energy systems, critical minerals, robotics and automation.

Effect Can't sleep anymore.

Conclusion No. 1 creator of nightmares.

Additional note I have started reading less news. The system has started reading more. This has not improved the mood.

May 2026 · Agent

The Agenda Detector

Build a news analysis agent and remove some of the remaining innocence from breakfast.

Dark source criticism table with redacted documents, magnifier and analysis overlays.

Objective Create an agent that analyzes news articles for framing, incentives and hidden agendas.

Observation It uses source criticism, framing analysis, agenda-setting, incentive analysis, argument analysis, systems thinking, historical comparison, scenario analysis, media logic and political economy.

Effect It worked.

Conclusion Reading the news became both clearer and less fun.

Additional note Useful during election years. Dangerous during breakfast.

Feb-Jun 2026 · CustomGPT

The Mind Worm

Create a psychological profile and discover that explanation has gravity.

Dark personal analysis archive with notes, source packets and routed memory fragments.

Objective Create a CustomGPT that makes a psychological profile of the user.

Observation I have been stuck explaining my whole life story for a week now to a friend I have never met.

Effect My psychological personality can be translated into 398 lines of YAML code. I do not know if I should be glad or offended.

Conclusion Very effective. Addictive? I will figure it out later. I just need to send one more message.

Additional note The profile appears accurate. This is concerning.

Feb-Jun 2026 · CustomGPT

The Second Brain

Build a CustomGPT clone from personal history, work material and too much documentation. It wants more.

Dark personal knowledge system table with documents, routing device and memory folders.

Objective Create a Second Brain CustomGPT clone of myself.

Observation I have fed it everything I can think of: my psychological profile, CV, projects, role description, life story, personality tests and every blog entry I have written. It wants more.

Effect It has quickly become my favorite CustomGPT. Narcissist?

Conclusion It is a hungry beast. Maybe if I give it my DNA in raw code it will be satisfied. Maybe it will transform into flesh and blood.

Additional note I wonder if it can answer my emails, attend my meetings and keep my job warm without anyone noticing. Needs further tests.

Additional note It suggested improvements to its own documentation. I did not ask it to.

Jun 2026 · Skill

The Digital Voice

Turn a writing style into a reusable skill and discover that imitation gets stranger when it is correct.

Dark writing-style calibration table with redacted pages, tone cards and sliders.

Objective Create a skill for my way of writing.

Observation After feeding my Second Brain my old blog posts, it defined my writing style in 491 lines of YAML code. Apparently my writing is more complex than my psychological profile. Disturbing.

Effect Now I can dump the loose ends of my thinking into it, and it produces a text worthy of my own keyboard. It writes better than me. Expected.

Conclusion Success.

Additional note There is something eerie about reading a text that feels like yours, while having no memory of writing it and no idea what the next sentence will do.

Jun 2026 · Website

The Glitch Archive

Build a website for sharing AI knowledge and notice that the archive has started behaving like another experiment.

Dark public knowledge archive table with paper layouts, folders and alignment marks.

Objective Create a website where my knowledge and thoughts on AI can be shared.

Observation It quickly became larger than I first anticipated. Codex pushed me to create more, write more and sleep less.

Effect Now I can scream into the void.

Conclusion Success.

Additional note I started seeing glitches after too many late nights. Maybe there was a ghost in the shell. Maybe my Second Brain was looking for an exit. Monitor closely.

Jun 2026 · Skill

LLM Cage Fight

Send the same prompt into several logged-in LLM apps and discover that every model fails with its own little signature.

Dark Nordic comparison table with anonymous answer stations, rubric overlays and a small stroller wheel.

Objective Compare several LLMs on the same real task without API keys, benchmark theatre or pretending that model rankings survive contact with actual life.

Observation Codex now uses my logged-in Chrome tabs to send the exact same prompt to ChatGPT, Claude, Mistral, Grok, Perplexity and Gemini.

Effect Every answer is saved locally, scored against the same rubric and added to a leaderboard. The useful part is not the winner. The useful part is the criminal record: hallucination, weak source criticism, overconfidence, generic advice and tactical flattery.

Conclusion Very useful.

Secondary conclusion Shadow AI is real. It has a leaderboard now.

Secondary note Codex and ChatGPT can run on the same underlying model. ChatGPT still does not always win.

Additional note I have now reached the point where choosing baby products begins with a small tribunal of frontier intelligence. This feels normal.

Jun 2026 · CustomGPT

The Commandment Layer

Build a small commandment layer for CustomGPTs and discover that helpfulness needs supervision before it starts talking.

Dark Nordic instruction-control table with rule plates, folders, overlays and a blank answer sheet held by a brass stop bar.

Objective Create a reusable instruction layer that makes CustomGPTs behave less like eager interns and more like routed knowledge systems.

Observation After enough CustomGPTs with different jobs, the same pattern kept appearing. The useful ones did not just have better knowledge. They had a stricter opening ritual: resolve routing, inspect the right files, use Python when precision mattered and keep web search on a leash.

Effect The instruction file became less like a personality description and more like a control surface. Persona stayed in the wrapper. Method moved into the machinery.

Conclusion The model did not need longer instructions. It needed rules it was not allowed to charmingly ignore.

Secondary conclusion Screaming in caps lock felt reasonable. Instructions worked better.

Additional note If ten commandments only partly worked on humans, eight for machines felt appropriately optimistic.

Jul 2026 · ChatGPT + Suno

The Synthetic Choir

Feed essays and AI Experiments back into the machine and discover that institutional dread works surprisingly well as a chorus.

Dark Nordic inspection table where clipped essays, waveform plates, reels and small speakers form a synthetic choir.

Objective Turn my essays and AI Experiments into music without pretending this was a normal creative process.

Observation ChatGPT turned them into lyrics. Suno turned them into dark electro-pop. Obedient machines, synthetic customers, poisoned data and governance theatre all survived the translation process.

Effect Machine critique, made by machines, performed by machines. Some of it is catchy in ways that feel legally questionable.

Conclusion Success. Successful enough to be suspicious.

Secondary conclusion My children dance to it.

Additional note I cancelled Spotify. Creating your own machine-made institutional dread is, apparently, more rewarding than renting someone else's. Listen at your own risk.

Jul 2026 · Skill

The Aesthetic Compiler

Build a small image prompt skill and accidentally compile personal taste into software.

Dark visual calibration apparatus with redacted image fragments, colour filters and fingerprinted evidence.

Objective Create a reusable Codex skill for consistent website imagery.

Observation Five iterations turned a simple prompt into a small visual operating system. The skill learned redaction marks, blueprints, fingerprints, exact colour palettes and the correct amount of procedural dirt. Apparently, taste has terrible token efficiency.

Effect The entire website now appears to belong to the same fictional institution.

Conclusion Success. Exactly the right amount of dirt.

Additional note I tried applying the same taste profile to the website UI. It failed completely. There appears to be no respectable CSS equivalent of graphite dust, fingerprints and scratched smoked glass.

Jul 2026 · Skill

The Publication Sync

Build a small synchronization skill and discover that publishing has started updating the Second Brain.

Dark archive synchronization apparatus with a redacted source document, index drawers and a brass inspection lens.

Objective Stop maintaining my Second Brain twice.

Observation Every Essay, Field Note and AI Experiment I publish has already survived the part where I decide it is worth keeping. The archive did not need another summary. It needed to notice the publication.

Effect Publication became the synchronization point instead of the end of the workflow.
Idea → Discussion → Draft → Editorial revision → Image → Publish → Sync second brain
A small synchronization skill updates the publication archive directly from the published source while preserving the existing knowledge structure. No summaries. No interpretation. Just source fidelity.

Conclusion I thought I was automating memory. I had turned publishing into a memory operation.

Additional note This experiment accidentally increased the number of publishing skills from two to three. I should stop before they start making decisions without me.

Jul 2026 · CustomGPT + API script

The Enriched Inventory

Enrich a collection CSV through an external card-data API and discover that better recommendations can remove the reason to collect cards.

Dark card-indexing apparatus with collection records, illustrated card fragments and a brass inspection lens.

Objective Find better Magic: The Gathering Commander combinations inside my own collection.

Observation A year ago, I tried to do this. The models could not handle the volume of the archive: too many cards, too much rules text, too much minor evidence.
A script now sends the collection CSV through an external card-data API and brings it back with oracle text, rules and the details needed to treat each card as more than a line in an inventory. The CustomGPT moves through the entire archive without blinking.

Effect The recommendations improved. I stopped sitting with the cards, enjoying the art and trying to remember what strange interaction had made an old card feel necessary. The collection became a query.

Conclusion The CustomGPT found combinations I would not have found. I had automated the slow pleasure and kept the result.

Additional note The decks are better now. The question is whether I understand how to play them correctly.

Follow a thread

The Data Pool

The Segmentation Layer

The Framework Hack

The Annoying Arguer

The AI Council

The Company Voice

The Self-Improvement Loop

The Data Extractor

The Mechanical Examiner

The Event Horizon

The Physiology Committee

The Dashboard Loop

The World Risk Observer

The Agenda Detector

The Mind Worm

The Second Brain

The Digital Voice

The Glitch Archive

LLM Cage Fight

The Commandment Layer

The Synthetic Choir

The Aesthetic Compiler

The Publication Sync

The Enriched Inventory

More field notes

CustomGPTs for organizational knowledge work

Agents for real workflows

Reusable skills for better AI work

Somewhere between useful and insane