
Radar Trends to Watch: January 2025 – O’Reilly
Despite its 31 days, December is a short month. Announcements and events other than corporate parties are difficult to attract attention. To counteract this trend, OpenAI has made a series of announcements: the “12 Days of OpenAI”. Not wanting to be outdone, Google responded with a flurry of announcements, including the Gemini 2.0 Flash Thinking model. Models appeared that could use streaming audio and video for both input and output. Perhaps the most important announcement, however, was DeepSeek-V3, a very large expert model (671B parameters) whose performance is comparable to the other top models – but which costs about a tenth as much to train.
AI
- DeepSeek-V3 is another LLM worth checking out. Its performance is comparable to Llama 3.1, GPT-4o and Claude Sonnet. Although training was not cheap, training costs were estimated to be about 10% of larger models.
- Not to be outdone by Google, OpenAI has previewed its next models: o3 and o3-mini. These are both “reasoning models” trained to solve logical problems. They could be released at the end of January; OpenAI is looking for security researchers to test.
- Not to be outdone by 12 Days of OpenAI, Google has released a new experimental model trained to solve logic problems: Gemini 2.0 Flash Thinking. Unlike OpenAI’s GPT models that support thinking, Flash Thinking shows its chain of thought explicitly.
- Jeremy Howard and his team have released ModernBERT, a major upgrade to the BERT model they released six years ago. It is available in two sizes: 139M and 395M parameters. It is ideal for entity retrieval, classification, and extraction, as well as other components of a data pipeline.
- AWS’s Bedrock service provides the ability to check the output of other models for hallucinations.
- To ensure they aren’t overtaken by 12 Days of OpenAI, Google has announced Android XR, an operating system for extended reality headsets and glasses. Google doesn’t plan to build its own hardware; They work with Samsung, Qualcomm and other manufacturers.
- Not to be outdone by the 12 Days of OpenAI, Anthropic has announced Clio, a privacy-preserving approach to discovering how people use their models. This information is used to improve Anthropic’s understanding of security issues and to create more useful models.
- Not to be outdone by 12 Days of OpenAI, Google has announced Gemini 2.0 Flash, a multimodal model that supports streaming for both input and output. The announcement also introduced Astra, an AI agent for smartphones. Neither is generally available yet.
- OpenAI has released Canvas, a new feature that combines programming with writing. Changes to the canvas (code or text) immediately become part of the context. Python code runs in the browser using Pyodide (Wasm) and not in a container (like Code Interpreter).
- Stripe has announced an agent toolkit to help you integrate payments into agent workflows. Stripe recommends using the toolkit in test mode until the application has been thoroughly validated.
- Simon Willison shows how to run a GPT-4 class model (Llama 3.3 70B) on a reasonably well-equipped laptop (64GB MacBook Pro M2).
- As part of the 12 Days of OpenAI series, OpenAI finally released its video generation model Sora. It’s free for ChatGPT Plus subscribers, but limited to 50 five-second video clips per month; A ChatGPT Pro account relaxes many of the restrictions.
- Researchers have shown that advanced AI models, including Claude 3 Opus and OpenAI o1, are capable of “scheming”: working against the interests of their users to achieve their goals. Scheming includes undermining oversight mechanisms, deliberately providing substandard results, and even taking measures to prevent a shutdown or replacement. Hello, HAL?
- Roaming RAG is a new retrieval augmented generation technique that finds relevant content by searching headings to navigate documents – like a human. It requires well-structured documents. Actually a surprisingly simple idea.
- Google has announced PaliGemma 2, a new version of its Gemma models with Vision.
- GPT-4-o1-preview no longer exists; The preview is now the original, OpenAI o1. In addition to advanced thinking skills, the production version is said to be faster and deliver more consistent results.
- A group of AI agents in Minecraft behaved surprisingly similarly to humans – and even developed professions and religions. Is this a way to model human group collaboration?
- One thing the AI industry desperately needs (besides more performance) is better benchmarks. Current benchmarks are closed, easily manipulated (that’s what AI does), and not reproducible, and they may not test anything useful. Better Bench is a framework for assessing benchmark quality.
- Palmyra Creative, a new language model from Writer, promises the ability to develop a “style” so that not all AI-generated output sounds boringly the same.
- During training, the AI detects biases from human data. When humans interact with AI, there is a feedback loop that reinforces these biases.
programming
- Unicon may never become a top 20 (or top 100) programming language, but it is a descendant of Icon, which has always been my favorite language for string processing.
- What do CAPTCHAs mean if LLM-equipped bots can successfully complete tasks set for humans?
- egui, along with eframe, is a GUI library and framework for Rust. It is portable and runs natively (on macOS, Windows, Linux and Android), on the web (with Wasm) and in many game engines.
- For the archivist in us: The Manx Project is not about an island in the Irish Sea or about cats. It is a catalog of manuals for old computers.
- Cerbrec is a graphical Python framework for deep learning. It is aimed at Python programmers who do not have sufficient expertise to build applications using PyTorch or other AI libraries.
- GitHub has announced free access to GitHub Copilot for all current and new users. With free access you get 2,000 code completions and 50 chat messages per month. They also added the ability to use Claude 3.5 Sonnet in addition to GPT-4o.
- Devin, the AI-powered coding tool that claims to support software development from start to finish, including design and debugging, is generally available.
- JSON5, also known as “JSON for humans,” is a variant of JSON designed to be human-readable so that it can be written and maintained manually—for example, in configuration files.
- AWS announced two significant new services: Aurora DSQL, a distributed SQL database, and S3 Tables, which powers data lakehouses via Apache Iceberg.
- AutoFlow is an open source tool for building a knowledge graph. It is based on TiDB (a vector database), LlamaIndex and DSPy.
Security
- Portspoof is a security tool that ensures that all 65,535 TCP ports appear open to valid services. It emulates a valid service on each port. This makes it difficult for an attacker to determine which ports are actually open without checking each port.
- Let’s Encrypt, which issues the certificates that websites (and other applications) use to prove their identity, has announced short-lived certificates that expire after six days. Short-lived certificates increase security by minimizing exposure if a private key is compromised.
- Due to the continued presence of attackers on telecommunications networks, the US FBI and CISA have recommended the use of encrypted communication protocols. (Though they still want backdoors into encryption systems that would leave them vulnerable to attacks.)
- A new phishing attack uses corrupted Word documents to bypass security checks. Even if the documents are damaged, Word can recover them.
- LLM Flowbreaking is a new class of attacks on language models that prevent guardrails from preventing offensive output from reaching the user. These attacks exploit race conditions in how the application interacts with users.
- Bootkitty is a UEFI bootkit aimed at secure booting on Ubuntu systems. It appears to have been developed by cybersecurity students in Korea and then leaked (possibly accidentally). It has not yet been found in the wild, but if it is found it will pose a dangerous threat.
- DEF CON has launched a project to improve the cybersecurity of water infrastructure in the US. They start with six water utilities that serve rural communities.
Quantum computing
- Google has built a quantum computing chip in which an error-corrected logical qubit can remain stable for an hour. It crosses the “subthreshold”: the error rate drops when physical qubits are added for error correction. The chip was manufactured at Google’s new manufacturing facility.
Web
- Google adds “Store Reviews” to Chrome. Reviews are AI-generated summaries of reports from well-known sources reporting fraud and other issues.
- Here’s a guide to creating streaming text UIs on the web. Streaming text is almost a necessity for developing AI-driven chatbots.
biology
- Yes, we can have a virtual taste. A research group has developed a lollipop interface to allow people to experience taste in virtual worlds.
Learn faster. Dig deeper. See further.