vLLM v0.16.0: Throughput Scheduling and a WebSocket Realtime API Date: February 24, 2026 Source: vLLM Release Notes Release Context: This is a version upgrade. vLLM v0.16.0 is the latest release of the popular open-source inference server. The WebSocket Realtime API is a new feature that mirrors the functionality of OpenAI’s Realtime API, providing a self-hosted […]
Matthew Aberham
Matthew Aberham is a solutions architect, and full-stack engineer focused on building scalable web platforms and intuitive front-end experiences. He works at the intersection of performance engineering, interface design, and applied AI systems.
Blogs from this Author
LLM Concept Vectors: MIT Research on Steering AI Behavior
Date: February 23, 2026 Source: Science Researchers from MIT and UC San Diego published a paper in Science describing LLM concept vectors and a new algorithm called the Recursive Feature Machine (RFM) that can extract these concept vectors from large language models. Essentially, these are patterns of neural activity corresponding to specific ideas or behaviors. […]
Anthropic Accuses DeepSeek of Distillation Attacks on Claude
Date: February 23, 2026 Source: Anthropic Blog Anthropic published a detailed post revealing what it calls an Anthropic distillation attack at industrial scale, accusing three Chinese AI labs (DeepSeek, Moonshot AI/Kimi, and MiniMax) of systematically extracting Claude’s capabilities. According to Anthropic, the labs created over 24,000 fraudulent accounts and generated more than 16 million exchanges […]
Minimax M2: Innovative Reasoning Strategy from Open-Source Model Showing Big Results
In the fast-paced world of artificial intelligence, a new open-source model from Chinese AI firm Minimax is making a significant impact. Released in late October 2025, Minimax M2 has rapidly gained acclaim for its innovative approach to reasoning, impressive performance, and cost-effectiveness, positioning it as a formidable competitor to established proprietary models. A New Architecture for a […]
Chandra OCR: The BEST in Open-Source AI Document Parsing
In the specialized field of Optical Character Recognition (OCR), a new open-source model from Datalab is setting a new benchmark for accuracy and versatility. Chandra OCR, released in October 2025, has rapidly ascended to the top of the leaderboards, outperforming even proprietary giants like GPT-4o and Gemini Pro on key benchmarks. Beyond Simple Text Extraction Chandra is not […]
Request Hedging: Accelerate Your App by Racing Duplicate Calls
Users notice slow requests; even if 99 % finish quickly, that 1 % “long‑tail” latency can make your app feel sluggish. Request hedging solves this by speculatively firing a second duplicate after a short delay, racing to beat out outliers before they ever impact the UI. Why the slowest 1 % of requests matter The time it takes […]
Tool‑Augmented RAG Chatbot: GPT‑4, pgVector & Next.js
This is Part 3 of a three-part series (links at the bottom). In Part Two, we moved from concept to execution by building the foundation of a Retrieval‑Augmented Generation (RAG) system. We set up a Postgres database with pgvector, defined a schema, wrote a script to embed and chunk text, and validated vector search with cosine similarity. In […]
Postgres RAG Stack: Embedding, Chunking & Vector Search
This is Part 2 of a three-part series (links at the bottom). The GitHub repo can be checked out here. Postgres RAG Stack brings together Postgres, pgVector, and TypeScript to power fast, semantic search. In Part One, we covered the theory behind semantic search: how embeddings convert meaning into vectors, how vector databases and indexes enable […]
Vector Search Embeddings and Retrieval-Augmented Generation
This is Part 1 of a three-part series (links at the bottom). Traditional search engines and databases match based on keywords. These systems are fine when you’re looking for an exact or partial string match but fail when the goal is to find content that’s conceptually similar, not just textually identical. Vector search bridges this […]