LLMs Wildest Dreams

Over the past two weeks, I’ve been following a number of discussions around recent research by Anthropic — the company behind the Claude language model family. These discussions have played out not only in academic circles, but also across social media platforms and YouTube, with science communicators offering their own takes on what the research means.

I noticed that many of these interpretations diverge significantly, sometimes dramatically, from what the papers actually say (and — by the way — from each other). In particular, some claims struck me as exaggerated or speculative, even though they were presented with great confidence and strong rhetorical framing. Others, while more cautious, also made far-reaching conclusions that aren’t necessarily supported by the evidence.

So I decided to take a closer look at the primary sources—the papers themselves—and compare them to some of the circulating interpretations online. My aim here is to share how I read the papers, highlight where I believe common misunderstandings have occurred, and offer a technically grounded perspective that avoids hype, but also doesn’t dismiss the progress that has clearly been made.

More specifically, this post focuses on the following:

  • A structured recap of the two Anthropic papers and what they actually demonstrate.
  • A comparison of two popular YouTube interpretations (by Matthew Berman and Sabine Hossenfelder), which offer almost opposing takes.
  • A short excursion on LLMs and how they work (and why)
  • My own analysis of why certain assumptions — about self-awareness, internal reasoning, and “honesty” in LLMs — may not be warranted based on the research.

In writing this, I’m not claiming to offer the final word. But I do think it’s important to take a step back from the soundbites and really ask: What do these findings actually show? And what might they not show?

If you’ve found yourself wondering whether LLMs are secretly reasoning behind our backs, or whether it’s just a glorified autocomplete—this post might help you sort signal from noise.

Read More

Obsidian Plugin for Print

This blog post covers my experience building an Obsidian plugin using ChatGPT – including a minimal backend. The project is open source and available on GitHub:

Why that? To solve a real limitation I’ve hit repeatedly. Obsidian on iOS doesn’t support printing or PDF export. That’s an issue, if you want to print short notes, e.g. prep lists for talks, interview questions, or quick todos. To make it even more useful (to me), I don’t print to A4 – I use a thermal receipt printer connected to a Raspberry Pi 4B. It prints on 80 mm continuous paper, ideal for portable, foldable notes that fit in a jacket pocket. Of course, a nice addition would be to support A4, too 🤓.

The resulting notes look like this:

Read More

Exploring langchain4j with Spring Boot: A Practical Journey

Many examples, tutorials, and blog posts convey the impression that Python is the only language to do AI. This blog post shows that for enterprise applications, Kotlin with Spring Boot is a very good choice in order to build robust AI and Agentic AI applications.

In today’s fast‑evolving landscape of AI‑driven applications, integrating language models into enterprise solutions is essential. This post explores LangChain4j within a Spring Boot environment using Kotlin. I will demonstrate how to declare AI services using annotations, integrate system prompts, dynamic user prompts, and tools — all via declarative interfaces.

Read More

The ChatGPT Programming Experience: A Second Attempt

Welcome back to my series on using ChatGPT for programming (written with help of ChatGPT)! In this second installment, I’ll share my experiences using ChatGPT to build a small software project.

Here you can find the chats: https://github.com/juangamnik/chatgpt-schedule/tree/main/chatgpt

I set out with the premise that I would not write more than a few lines of code, leaving most of the work to ChatGPT. Let’s dive into the numbers, the areas where ChatGPT slowed me down, and the moments where it truly shined.

Read More

Large Language Models, GPT, and I

This is the first blog post in a row that describes my first experiences with ChatGPT as a pair programmer and assistant for a developer. As you will see, these experiences had some ups and some downs, but all in all – spoiler alert – on the one hand is the advancement in A[G]I (Artificial [General] Intelligence) especially in regard of NLP (Natural Language Processing) using LLM (Large Language Models) and GPT (Generative Pre-trained Transformers) very impressive. On the other hand I had to revise some of the observations (and criticism) I made, just weeks later, since the pace of evolution in the AI scene is so freaking high.

So these are the blog posts of this series:

  1. This post
  2. The ChatGPT Programming Experience: A Second Attempt
  3. The Evolution of Large Language Models in Programming: A Broader Perspective

But let’s start at the beginning (the following text has been created with help of GPT-4)…

Read More