Introducing Distill CLI: An efficient Rust-based media summarization tool
A few weeks ago I wrote about a project our team was working on called Distill. A simple application that summarizes and extracts important details from our daily meetings. At the end of this post I promised you a CLI version written in Rust. After some code reviews from Rustaceans on Amazon and a little polish, today I am ready to release the Distill command line interface.
After building from source, simply pass Distill CLI a media file and select the S3 bucket you want to save the file to. Today, Distill supports outputting summaries as Word documents, text files, and printing directly to the terminal (default). You’ll find it’s easily extensible – my team (OCTO) already uses it to export summaries of our team meetings directly to Slack (and is working on support for Markdown).
Crafting is a great way to learn and be curious
The way we build has changed quite a bit since I started working with distributed systems. Today, compute, storage, databases, and networking are all available on demand, if you want it. As developers, our focus has shifted to ever-faster innovation, and over time, tinkering at the system level has become a lost art. But tinkering is as important today as it ever was. I vividly remember the hours I spent tinkering with BSD 2.8 to get it running on PDP-11s, and that cemented my never-ending love of operating system software. Tinkering offers us a chance to really get to know our systems. To experiment with new languages, frameworks, and tools. To look for efficiencies big and small. To find inspiration. And that’s exactly what happened with Distill.
We rewrote one of our Lambda functions in Rust and found that cold starts were 12x faster and memory usage decreased by 73%. Before I knew it, I started thinking about how I could make the whole process more efficient for my use case.
The original proof of concept stored media files, transcripts, and summaries in S3, but since I run the CLI locally, I realized I could store the transcripts and summaries in memory and save myself some writes to S3. I also wanted an easy way to upload media and monitor the summarization process without leaving the command line, so I cobbled together a simple UI that provides status updates and notifies me when something fails. The original showed what was possible, left room for tinkering, and was the blueprint I used to write the Distill CLI in Rust.
I encourage you to try itand let me know if you find any errors or special cases or have ideas for improvement.
Builders choose Rust
As technologists, we have a responsibility to build sustainably. And this is where I really see Rust’s potential. With its emphasis on performance, memory safety, and concurrency, it offers a real opportunity to reduce compute and maintenance costs. Its memory safety guarantees eliminate obscure bugs that plague C and C++ projects and reduce crashes without impacting performance. Its concurrency model enforces strict compile-time checking, prevents data contention, and maximizes the performance of multi-core processors. And while compile errors can be damn annoying right now, fewer developers chasing bugs and more time that can be focused on innovation is always a good thing. That’s why it’s become a go-to resource for developers who make a living solving problems on an unprecedented scale.
Since 2018, we’ve increasingly used Rust for critical workloads across various services such as S3, EC2, DynamoDB, Lambda, Fargate, and Nitro, especially in scenarios where hardware costs are expected to dominate over time. In his guest post last year, Andy Warfield wrote a bit about ShardStore, the bottom layer of S3’s storage stack that manages data on each individual disk. Rust was chosen for its type safety and structured language support to identify bugs earlier, and how they’ve written libraries to extend that type safety to applications on disk structures. If you haven’t done so yet, I encourage you to do so Read articleand that SOSP paper.
This trend is reflected throughout the industry. discord moved their Read States service from Go to Rust to fix large latency spikes caused by garbage collection. It is 10x faster and the worst tail latencies have been almost reduced 100x. As well Figma has rewritten performance-sensitive parts of its multiplayer service in Rust and has seen significant performance improvements on the server side, such as a six-fold reduction in average CPU usage per machine.
The point is, if cost and sustainability are important to you, there is no reason not to consider Rust.
Rust is hard…
Rust has a reputation for being a difficult language to learn, and I don’t deny that there is a learning curve. It will take some time to get comfortable with the borrow checker, and you will struggle with the compiler. It’s like writing a PRFAQ for a new idea on Amazon. There’s a lot of friction in the beginning, which is sometimes difficult when all you really want to do is jump into the IDE and start building. But once you’re on the other side, there’s huge potential to gain speed. Remember, the cost of building a system, service, or application is nothing compared to the cost of running it, so the way you build should be under constant review.
But you don’t have to just take my word for it. At the beginning of the year The registry Google published results showing that their Rust teams were twice as productive as teams using C++, and that a team of the same size using Rust instead of Go was just as productive and had more correctness in their code. There are no bonus points for increasing headcount to address preventable problems.
Final thoughts
I want to be clear: this is not a call to rewrite everything in Rust. Monoliths are not dinosaursthere is no one programming language that dominates all others, and not every application has the same business or technical requirements. It’s about using the right tool for the right job. That means challenging the status quo and constantly looking for ways to incrementally optimize your systems – tinkering with things and measuring what happens. Something as simple as changing the library you use to serialize and deserialize json
from the Python standard library to orjson
may be all you need to speed up your app, reduce your memory footprint, and cut costs in the process.
If you take nothing else away from this post, I encourage you to actively seek efficiency in all aspects of your work. Tinker. Measure. Because everything has a price, and cost is a pretty good indicator of a sustainable system.
Now start building!
Special thanks to AWS Rustaceans Niko Matsakis And Grant Gurvis for their code reviews and feedback during the development of the Distill CLI.