Recent
Anthropic acquires Bun, the JavaScript runtime →
DeepSeek-v3.2: Pushing the frontier of open large language models [pdf] →
Proxmox internals: from pmxcfs to the VM lifecycle
Last updated: 301125
I have been exploring Proxmox VE to understand how it functions internally. Specifically how it manages objects, how the API interacts with the storage subsystem, and where the “state” of the cluster actually lives.
Instead of relying on the web …
Cheating the Reaper in Go (Arena Allocators) →
It’s a repost, actually. But worth reading again.
Proxmox VE 9.1, new release. The most important part of this new release is definitely the support for OCI images, getting closer and closer to a Docker-style workload. But what does this actually mean?
To run containers in Proxmox before, you first had to create an LXC container, install Docker, and only then would you finally be able to run the container. With this new release, Proxmox allows you to query and pull directly from the Registry the image you need, unpack it, and then convert it into a disk image that LXC can use. It’s as if it uses them as templates for LXC containers. So, it isn’t native support in the strict sense: Proxmox continues to use LXC, this is simply a convenience feature to allow generating an LXC from an OCI image.
What I’m wondering is how the update of these images is handled. I’ve read some posts and watched a few videos, but it seems Proxmox hasn’t gone quite this far yet.
NATS as broker for my home server →

Just playing with my new side project.
Scaling HNSWs →
Rationale: Or why am I bothering to rewrite nanomsg? →
Libxev for Noobs
Lately, I’ve been working on a Zig project in my spare time. I needed a solution for managing async, and I chose libxev for this. I think it’s an excellent library, and I admire mitchellh’s work. While studying the library, I had the idea of gathering some basic …
Kafka is fast -- I'll use Postgres. HN thread →
Waiting for Zig's new Async I/O
Zig will introduce its new async soon. It is still under active development. While we wait for it, you can already build async code in Zig today, you just don’t get language keywords for it. Some readers will already know how to do it, others won’t. Since I love …
At-Least-Once
When we talk about reliable message delivery, we’re really talking about what happens when something goes wrong: a crash, a timeout, … The broker, producer, and consumer all participate in this contract. So we can see that reliability is the combination of how the …
Message Brokers
A message broker is middleware that receives, stores, and forwards messages between producers (senders) and consumers (receivers), enabling them to communicate asynchronously and independently of each other’s availability, location, or implementation details. Instead of …
An Open Letter to Everyone I've Butted Heads With →
'The G in GPU is for Graphics damnit!': Adventures in Triton Kernels, Profiling, Parallelism and More →
Heap-overflowing Llama.cpp to RCE →
GLM 4.6. It’s Z.ai new flagship model. It is an update to GLM-4.5 and reaches near parity with Claude Sonnet 4.
Major improvement is the longer context window, expanded from 128K to 200K. It also shows significant improvements in coding. Z.ai aims to make it the best agentic coding model.
It is available for Claude Code, Opencode, Cline, and many others. Their Lite Plan, costing $3/month, provides up to 120 prompts every 5 hours, which should be about 3x the usage quota of the Claude Pro plan. I’m gonna try it today.
As I did for previous LLM releases on this blog, I won’t report the benchmarks but you can consult them on their official blog announcement.
Zig Builds Are Getting Faster →
About DeepSeek Sparse Attention
DeepSeek Sparse Attention, some considerations about it after reading the paper.
DeepSeek-V3.2-Exp is defined as an experimental sparse-attention model. Its architecture is the same as DeepSeek-V3.1-Terminus, except for the introduction of DeepSeek Sparse Attention (DSA). The …
Lot of interesting stuff, I don’t know where to start!
I’ll start from DeepSeek, since I am a stan.
DeepSeek-V3.2-Exp. It’s a new model built on V3.1-Terminus, and it introduces DeepSeek Sparse Attention, which is a new architecture module that enables fine-grained sparse attention, selecting top-k key-value entries for each query using efficient FP8 operations and lightning indexer. It was built by training five RL-specialized models (math, competitive programming, agentic coding, logical reasoning, and agentic search) using GRPO, then distilling them into the final 685B-parameter model. It comes with a 6-page paper which does not seem to be really specific, but it really seems that they are silently cooking and figured something out. Anyway, 10x cheaper inference at 128k tokens, with API prices cut by 50%+. They insist on it being experimental. Matches V3.1-Terminus on most benchmarks, but shows slight degradation in reasoning-heavy tasks like GPQA due to generating fewer reasoning tokens. It seems they cracked cheap, long context for LLMs. I’ll try to write more on the paper when I have time.
Claude 4.5 Sonnet – boring stuff? Anthropic made a bold statement about their latest model: the world’s best coding model. I’m not a fan of benchmarks, it just seems to be superior of Opus 4.1, Sonnet 4, GPT-5-Codex, GPT-5 and Gemini 2.5 Pro on a lot of benchmarks. My personal highlights: extended thinking mode, which allows sustained focus on multi-step tasks for over 30 hours; it’s trained on a proprietary dataset mix including public internet data (up to July 2025). Post-training uses RL from human and AI feedback.
Bonus: LoRA without regret. New blog post from Thinking Machines that experimentally shows that LoRA fine-tuning matches full fine-tuning’s sample and compute efficiency for post-training on smaller datasets when using high ranks and applying it to all layers (especially MLPs). Worth reading.
SimpleFold: Folding proteins is simpler than you think →
(via). The Github repository accompanies the research paper, SimpleFold: Folding Proteins is Simpler than You Think (Arxiv 2025).
Qwen3-Max has been released from Qwen team. It’s their largest and most advanced large language model to date. It competes against GPT-5 and Grok 4.
The base model has over 1 trillion parameters and was pretrained on 36 trillion tokens. Its architecture seems to follow the same of other models from Qwen3 series: it provides a highly optimized MoE design, which activates only a subset of parameters per inference. This is something we’ve already seen with Qwen3-Next models, form which I think it inherits the same context window also.
The thinking variant, Qwen3-Max-Thinking, it is equipped with tool use and they say it’s deployed in heavy mode. It’s unclear to me what do they mean with it: perhaps they give it way more computational resources compared to the non-thinking variant.
They are taking the core architecture and maxxioptimizing it to reduce costs and improve efficiency. It’s impressive to me.
In the last 12 hours, Qwen has released:
- Qwen3-Max
- Qwen3-VL-235B-A22B: most powerful vision-language model in the series
- Upgrade to Qwen3-Coder: improved terminal tasks, safer code gen
- Qwen3Guard: safety moderation series for real-time AI content filtering
- Personal AI Travel Designer: new feature in Qwen Chat for personalized trip planning
- Qwen3-LiveTranslate-Flash: low-latency live translation model for real-time audio/text
While Qwen is continuing to optimize and release new models, I’ll wait for DeepSeek. I’m convinced they are cooking.
Go has added Valgrind support. While reading the commit, I saw this:
Instead of adding the Valgrind headers to the tree, and using cgo to call the various Valgrind client request macros, we just add an assembly function which emits the necessary instructions to trigger client requests.
This is super interesting. Let’s have a quick look at the code:
// Copyright 2025 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
//go:build valgrind && linux
#include "textflag.h"
// Instead of using cgo and using the Valgrind macros, we just emit the special client request
// assembly ourselves. The client request mechanism is basically the same across all architectures,
// with the notable difference being the special preamble that lets Valgrind know we want to do
// a client request.
//
// The form of the VALGRIND_DO_CLIENT_REQUEST macro assembly can be found in the valgrind/valgrind.h
// header file [0].
//
// [0] https://sourceware.org/git/?p=valgrind.git;a=blob;f=include/valgrind.h.in;h=f1710924aa7372e7b7e2abfbf7366a2286e33d2d;hb=HEAD
// func valgrindClientRequest(uintptr, uintptr, uintptr, uintptr, uintptr, uintptr) (ret uintptr)
TEXT runtime·valgrindClientRequest(SB), NOSPLIT, $0-56
// Load the address of the first of the (contiguous) arguments into AX.
LEAQ args+0(FP), AX
// Zero DX, since some requests may not populate it.
XORL DX, DX
// Emit the special preabmle.
ROLQ $3, DI; ROLQ $13, DI
ROLQ $61, DI; ROLQ $51, DI
// "Execute" the client request.
XCHGQ BX, BX
// Copy the result out of DX.
MOVQ DX, ret+48(FP)
RET
This is the amd64 assembly for the Valgrind client request. This asm emits the exact instruction sequence that Valgrind’s macro VALGRIND_DO_CLIENT_REQUEST would have produced in C, just without cgo.
On arm64, the same idea is implemented with different registers and the AArch64 “marker” Valgrind looks for.
It’s nice because they do everything on the language itself, even when relying on assembly. Some reasons I could imagine they do it this way: to avoid cgo and keep the runtime pure-Go, but most importantly control.
Really interesting for me that Go team decided to follow this route. Also, I’m not a fan of cgo.
A lot of activities from Qwen team, and DeepSeek is starting to go out from their temporary stealth mode. Following is a summary with the updates that I found most interesting from them.
Qwen models
- Qwen3-Omni, a 30B multimodal model that supports text, audio, images and video. MoE-based Thinker–Talker design with AuT pretraining for strong general representations, plus a multi-codebook design that drives latency to a minimum. It seems to think a lot! BF16 Instruct model is 78.85 GB. This model replaces the previous Qwen2.5-Omni.
- Qwen3-Next-80B-A3B-Instruct-FP8 and Qwen3-Next-80B-A3B-Thinking-FP8 FP8 quantized versions. Official ones.
DeepSeek updates to V3.1
DeepSeek released DeepSeek-v3.1-Terminus, an updated version of their V3.1 model. What’s improved? Language consistency: fewer CN/EN mix-ups & no more random chars. Also, it seems to be improved in agentic tool use.
Exciting times.
xAI releases Grok 4 Fast. It uses the same architecture as Grok 4 but incorporates efficiency improvements from training data and reinforcement learning. It supports a 2 million token context window and unified weights handle both chain-of-thought reasoning and direct responses, controlled by prompts.
Also, it’s cheap! 47x cheaper than Grok 4. I think it’s cheaper compared to the average, with a price of 0.20$ per input million tokens and 0.50$ per output million tokens. These prices refer to requests under 128k.
Sj.h: A tiny little JSON parsing library in ~150 lines of C99 →
TernFS: an exabyte scale, multi-region distributed filesystem →
Sharing for the love of distributed filesystems.
KDE is now my favorite desktop →
I agree with OP. KDE is nice, snappy, and simple. During the last years it focused on bug fixing and improvements rather than introducing major redesigns, and I appreciate that. Unfortunately, I can’t say the same for macOS.