How Postman Fastened a Lacking Layer of their Knowledge Stack – Atlan
13 mins read

How Postman Fastened a Lacking Layer of their Knowledge Stack – Atlan


After I joined Postman’s information group just a little over a 12 months in the past, our information was largely a thriller to me. Daily, I’d publish questions on Slack like “The place can I discover our MAU (month-to-month lively customers)?” Somebody would inform me the place to get it, however as I dug additional, I’d discover MAU information in different areas. And generally the completely different areas contradicted one another.

Over time, I discovered methods to navigate Postman’s wealth of knowledge—which tables had completely different variations of the identical information, or completely different filters, or sync points. However this didn’t cease with me. As the info group scaled almost fivefold in a single 12 months, this problem got here up many times with every new group member.

On the time, Postman’s information system was pretty easy. We had a set of knowledge tables, and details about these tables lived within the heads of our early information group members. This labored when the corporate and its information had been small, nevertheless it couldn’t sustain as we began to develop exponentially.

Postman at the moment has tons of of group members distributed throughout 4 continents, and greater than 17 million customers from 500,000 firms utilizing our API platform.

From the beginning, Postman Co-founder and CTO Ankit Sobti wished to make it possible for information was democratized. He used to say that it’s troublesome for a knowledge group to sit down and churn insights day in and day trip. As a substitute, he staunchly believes, everybody within the firm ought to be capable of entry our information and acquire insights from it. This turned particularly necessary in 2020 when Postman continued to scale whereas going absolutely distant throughout the COVID-19 pandemic.

To handle this problem, the info group and I made a decision to tackle Postman’s information system as a challenge final 12 months. Our purpose was to make Postman’s information simpler to entry and perceive, each for brand spanking new hires inside the information group and for folks throughout the corporate.

Modernizing and democratizing a large-scale information system is a giant problem, and we’re undoubtedly not the one firm making an attempt to crack it. So, within the hopes that our expertise could assist others making an attempt to take care of the identical challenges, I now wish to share how we went about this challenge, what labored and what didn’t, and what we’ve discovered to date.

The place we began—the challenges of Postman’s information stack

At Postman, we’ve carried out a contemporary information stack. Knowledge engineers convey information into Redshift, Amazon’s cloud information warehouse. Then our analysts rework the info with dbt, our SQL engine, and create dashboards and Explores on Looker.

At present, we’ve got about 170 lively customers per week on Looker. That’s lots for a corporation of round 400 folks, nevertheless it’s not but reaching our purpose for everybody to have the ability to use our information.

One of many foremost points we had been going through was the shortage of consistency when offering context round information—making context the lacking layer in our information stack. As Postman grew, it turned troublesome for everybody to grasp and, extra importantly, belief our information.

We had been creating dashboards and visualizations based mostly on requests from throughout the corporate: no matter folks wanted, we designed. Nevertheless, the metrics on these dashboards typically overlapped, so we had inadvertently created completely different variations of the identical metrics. When it wasn’t clear how every metric was completely different, we misplaced folks’s belief. (And because the saying goes: constructing belief is difficult, however dropping it’s simple—it simply takes one mistake.)

The info group’s Slack channel was filling up with questions from different groups asking us issues like “The place is that this information?” and “What information ought to I take advantage of?”

Our skilled information analysts spent hours every week fielding these questions from new hires. After all, this was irritating for all concerned. However we additionally realized there was a bigger drawback—it could be a catastrophe if any of our analysts left the corporate, since a lot data was saved of their heads.

In our information group’s dash retrospectives, we realized that Postman’s information system wanted assist, so we launched into a challenge to democratize our information and repair discoverability. Our purpose was to create extra time for our group and extra belief inside the firm.

Resolution #1: Documenting our information with Confluence

As a substitute of embarking on a large overhaul of our information system, we determined to start out smaller, implement an answer shortly, and see what we may be taught from it. Postman was already utilizing Atlassian, so we began by making a Confluence doc.

Earlier than, all of our information questions and solutions had been saved in Slack. Slack will be onerous to navigate and search, so folks had been asking the identical questions again and again. It’s simple sufficient to reply one or two questions on Slack, however 20 or 100? It’s simply not scalable.

Going ahead, our purpose was to make our new Confluence doc a single, searchable supply of fact.

At any time when one thing got here up a number of occasions on Slack, we put it on Confluence. For instance, when somebody requested, “How do you calculate MAU?” we added the desk and calculations to the doc. When a number of folks requested us for a similar metrics, we additionally added these stats and charts.

Resolution #2: Creating a knowledge dictionary with Google Sheets

Our Confluence doc was begin, however like Slack, a single doc simply couldn’t scale as shortly as we had been. Our subsequent thought was to create a knowledge dictionary in Google Sheets.

This appeared pretty easy. We first bought all our desk, schema, and column names in a single place. Then, for just a few sprints, we assigned everybody in our information group to doc 5 tables every. Every particular person put aside a few hours to jot down down all the things they knew about their information tables within the Google Sheet.

We additionally included opinions on this course of. After every particular person documented their tables, another person within the information group would learn via their work. If it appeared clear, they’d say it was good to go.

It was a good suggestion, however we bumped into challenges executing it:

  • Low-quality documentation: On the time, our information group had almost 20 folks in it, however solely three or 4 of them had been with Postman for greater than a 12 months. These veteran group members couldn’t doc all of our information, so everybody chipped in. Nevertheless, among the individuals who had been documenting our information didn’t truly know a lot about it. They hadn’t arrange the info, and so they weren’t the proprietor of the info desk. Our newer group members would add no matter they understood, nevertheless it didn’t all the time give an entire image of the info desk.
  • The brand new information dictionary additionally had hassle with scale: We had almost 20 information group members making an attempt to work on the doc. With that many individuals writing, modifying, and commenting on the identical time, it shortly turned an excessive amount of to deal with on a Google Sheet. And that was simply the info group. We wished to ultimately open the info dictionary to all the firm, however we couldn’t determine methods to hold our documentation safe and tamper-proof with tons of of customers.

Resolution #3: Implementing a pre-built information workspace with Atlan

After making an attempt to construct our personal answer twice, we began to search for a pre-existing product that we may undertake. That’s after we discovered Atlan, a contemporary information workspace, which appeared like a transparent answer for our data-discovery issues.

On Atlan, we’ve been in a position to catalog and doc all of our information, and its catalog acts as a single supply of fact for our information. The catalog contains a number of ranges of permissions for several types of customers inside and out of doors of the info group, so everybody can seek for and entry information with out having to message the info group.

The outcome? Everybody is ready to discover the appropriate information for his or her use case, and the info is constant throughout the board for all accessing it. The clearest consequence is that everybody is lastly speaking about the identical numbers, which helps us rebuild belief in our information. If somebody says that our development is 5%, it’s 5%.

Our new information workspace has been a hit for us due to its clear interface and highly effective functionalities—offering documentation, possession data and utilization, and information discovery.

Shifting past information discovery

At Postman, we would have liked to handle points round information discovery and context as a result of they turned main issues as we scaled. However as we’ve solved these issues, we realized that we’ve inadvertently set ourselves to this point tackle larger information challenges.

For instance, now that we’ve arrange a system to trace our information, we will use it to grasp our information lineage—together with the place every information asset comes from, and the way they’re all linked to one another.

Knowledge lineage will be actually helpful for a few causes. First, understanding how our information is linked helps us remedy our each day bugs and points faster.

Second, we’ve discovered that having a single supply of fact for our information helps our information group as we develop. Each time we modify one thing or add one thing new, it’s necessary to examine the way it will have an effect on all the things else in our information system. As a substitute of posting a query on Slack, we will examine information lineage and discover all the things we have to change or replace.

Lineage is only one avenue that we’ve seen open up as we’ve improved the best way we doc and catalog our information. We’re additionally taking steps towards sustaining extra constant information descriptions throughout instruments, bettering our information high quality, and extra.

In the long term, we see work on bettering information discovery as a basis for democratizing information throughout the corporate. Having a dependable information basis, the place folks can discover and perceive all our information, opens the potential for having everybody take part in analyzing information. This may allow our total firm to develop into extra data-aware and data-driven, which is the purpose for any main firm at present.

What we discovered from the final 12 months

As I feel again on our efforts to enhance our information stack, there are just a few learnings that stick out—issues that we managed to get proper the primary time round, and in addition issues that I want we had completed in another way:

  • Begin small and construct to greater options: Whenever you’re taking over a giant information problem, it’s simple to leap to occupied with equally massive options that take numerous assets or cash. Nevertheless, beginning with smaller, faster options helped us perceive extra about what works and doesn’t work for us. Then, after we ventured right into a extra complete, paid answer, we knew precisely what options we had been on the lookout for.
  • Take note of scale: Our first two options had been stable concepts, however they didn’t sustain with our group and information as we scaled. That’s why it’s necessary to consider how a knowledge product will scale as your organization and information scales. Will it sustain in case your information grows by 100x in a 12 months? Can it deal with numerous customers, all with completely different wants and entry ranges?
  • One enchancment unlocks others: Fixing one a part of your information stack lays the muse for larger enhancements. Knowledge cataloging and documentation aren’t significantly “enjoyable,” however they’re essential for anything you could wish to do along with your information. In spite of everything, you may’t tackle ML, AI, and the opposite newest information buzzwords should you don’t perceive your information.

This text was initially printed on the Postman Weblog. The creator, Prudhvi Vasa, is the Analytics Chief at Postman.

Leave a Reply

Your email address will not be published. Required fields are marked *