OS Libraries Building on Hamilton

An overview of recent libraries that are built on top of/for Hamilton

and

Sep 06, 2024

Let’s take a step back to reflect. When we open-sourced Hamilton, we did not really know what to expect. All we thought was that people would find it valuable. While we were right about that, the deluge of innovative new use-cases, feature requests, publicity, etc… that followed completely blew us away. In this post we’re going to be talking about a few of the recent additions to the Hamilton ecosystem from our community.

First, however, let’s go over the basics of Hamilton.

Hamilton

Hamilton is a lightweight python library for building dataflows (any computation requiring data) out of python functions. It can run wherever Python does, and is widely used for anything from simple statistical modeling (this post) to more complex ML, to online RAG in web services and more complex ingestion pipelines for AI systems.

The core concept is simple – you write each data transformation step as a single Python function, with the following rules:

The name of the function corresponds to the output variable it computes.
The parameter names (and types) correspond to inputs. These can be either passed-in parameters or names of other upstream functions.

This approach allows you to represent your data in ways that correspond closely to code, are naturally self-documenting, and portable across infrastructure.

After writing your functions (assets), you write a driver that executes them – in most cases this is a simple import/run (specifying the assets you want computed and letting the framework do the rest), but it provides options to customize execution:

Hamilton has a notebook integration that we’ll be using in our post, allowing you to define your modules in a cell and reference them in drivers later. You can do this with the %%cell_to_module command, which will update a variable with a module pointer and plot the dataflow defined by the module.

Once you define your module, you will have access to the module pointer as a variable with the name of the module, allowing you to build the driver, just as we did above. See this post for more information on how the integration works.

Ecosystem Additions

Hypster + Hypernodes

Gilad Rubin

recently open-sourced his library hypster — a lightweight configuration system for interacting with, iterating on, and managing AI/ML workflows. Together with Hypernodes, these form a frontend of sorts for Hamilton — enabling quick iteration + narrowing the dev → prod gap. While it is independent of Hamilton, it draws from the same concepts and can be used to configure/manage Hamilton DAGs. Hypernodes, while still in early days, allows you to use Hamilton DAGs within Hypster configurations. He recently talked at the Hamilton community meetup about his work — sharing how he used these tools to build and iterate on a powerful, extensible RAG Q/A system. Have a listen — there’s a lot to learn! He’s really paving the way on leveraging hierarchical DAGs (hamilton DAGs as black-box steps in his pipeline) to lower cognitive burden in developing complex workflows.

Gilad writes about the philosophy behind these libraries and more in his recent post 5 Pillars for a Hyper-Optimized AI Workflow.

Gilad’s Substack

5 Pillars for a Hyper-Optimized AI Workflow

Intro In the last decade, I carried with me a deep question in the back of my mind in every project I've worked on: “How (the hell) am I supposed to structure and develop my AI & ML projects…

a year ago · 3 likes · Gilad Rubin

Flower Power

Volker Lorrmann released FlowerPower, an lightweight orchestration library built to run Hamilton workflows! It leverages APScheduler to run Hamilton workflows, allowing you to configure Hamilton workflows and launch/run them on a cadence! Together with the Hamilton UI you can now run, track, and iterate on your pipelines in a distributed setting.

Wren AI

Wren AI is leveraging Hamilton + Async mode to power their core product. You can read their original post on how they used Hamilton to scale through 1500+ concurrent requests here, and our co-post with them on new async features + async upgrades:

Async Dataflows in Hamilton

Elijah ben Izzy, DAGWorks Inc., and Howard

July 8, 2024

Wren AI + Hamilton

Read full story

We’re really excited about this — while there are a lot of new text-to-SQL tools out there, Wren’s OS implementation is rising above the rest!

DSP Decision Engine

Capitec open-sourced their decision engine library, spockflow — meant to make automating decision trees easy and traceable. They built it using Hamilton for traceability, creating both custom decorators *and* wrapping Hamilton execution/visibility.

Specifying and executing decision engines is easy with the @Tree.condition decorator, which dynamically constructs a Hamilton DAG representing a decision tree:

You can get started using the library by visiting their repository.

NaturF Model

Researchers out of Oak Ridge National Laboratory and Pacific Northwest National Laboratory published the NaturF (Neighborhood Adaptive Tissues for Urban Resilience Futures) model in the Journal of Open Source Software (JOSS). This is a 132 building parameter model that uses building topology to measure the impact of existing/proposed developments on urban microclimates. NaturF produces parameters the the WRF (weather research forecast) model can ingest.

Under the hood, NaturF leverages Hamilton to compute the parameters — this makes it easier to visualize, trace/debug, and recompute specific portions of the DAG. You can see the graph that produces parameters here — it’s part of the paper!

You can read through the documentation here. NaturF also uses GeoPandas to run geographic computation in addition to the geopandas extension for Hamilton.

And more…

Along with the all the exiting libraries outlined above, we are constantly accepting user contributions, promoting blog posts, and sharing out new libraries — reach out if you have/want to build on top of Hamilton!

We want to hear from you!

If you’re excited by any of this, or have strong opinions, drop by our Slack channel / leave some comments here! Some resources to help you get started:

📣 join our Hamilton community on Slack — need help with Hamilton? Ask here.

⭐ give Hamilton a star on github

📝 open an issue if you find any bugs/want new features

We recently launched Burr to create LLM agents and applications.

DAGWorks’s Substack

Async Dataflows in Hamilton

Discussion about this post