Unleashing Cloud Insights with Steampipe

Tanawit Aeabsakul

Follow

Published in

Life@LINE MAN Wongnai

6 min readApr 9, 2024

--

In the rapid evolution of our system, keeping pace with its growth and complexity presents an ongoing challenge. One of the most pressing issues is automating the comprehensive understanding of our entire infrastructure to ensure standardized operations. This task is not just important but unavoidable.

Imagine if there were a way to unify all this data, simplifying its management and setting the stage for expansion into other critical areas like compliance, security, cost optimization, and automated reporting and alerting.

In this article, we delve into the complexities of infrastructure management, exploring the challenges and offering insights into a solution that can transform the way you manage your infrastructure by using Steampipe!

Hello everyone, My name is Tanawit Aeabsakul. I am currently a Senior Software Engineer of the platform team at LINE MAN Wongnai (LMWN). Today I will share my experience with using Steampipe, which is a tool that help us solve these challenges and make understanding cloud infrastructure easier. Let’s dive in and see how Steampipe can help us solve this problem.

First of all, what is a Steampipe? 🤔

Steampipe is a tool that allows you to query, organize, and access data from different sources using SQL. It essentially acts as a universal data interface, making it easier to work with diverse data sets.

Imagine you have a bunch of data sources — like cloud providers, databases, or APIs — all with different ways to access and query them. Steampipe lets you use a familiar language, SQL, to connect to and query these sources in a standardized way.

Instead of learning new querying languages or dealing with different interfaces for each source, you can use Steampipe to write SQL queries across all your data sources. This makes managing and analyzing your data more efficient and consistent, regardless of where it’s stored or how it’s accessed.

Steampipe interactive client — Photo from steampipe.io

How Steampipe work? ⚙️

Steampipe leverages several key technologies and architectural concepts to provide its functionality

Steampipe Overview Architecture — Photo from steampipe.io

PostgreSQL’s FDW

Steampipe leverages PostgreSQL’s FDW to connect and query external data sources. FDW allows PostgreSQL to access data stored outside the database, treating them like local tables. Steampipe uses this feature to connect to various data sources, translating SQL queries into source-specific queries. Therefore, from the client’s point of view, Steampipe is just a postgres database.

Plugin Architecture

Steampipe’s Plugin Architecture allows for the creation of plugins to connect to different data sources. Each plugin translates SQL queries into the specific syntax required by the data source. Plugins can also define additional functions or capabilities to enhance query capabilities.

Caching and Optimization

Steampipe uses caching to improve query performance and reduce the load on external data sources. It caches query results to avoid repeated queries for the same data. Additionally, Steampipe optimizes queries by pushing filters and aggregations down to the datasource when possible, reducing the amount of data transferred.

Execution Flow

When a query is submitted, Steampipe’s core engine analyzes it to determine which plugin to use. It then passes the query to the appropriate plugin, which translates it into the source-specific query language. The plugin executes the query against the data source and returns the results to Steampipe, which then presents them to the user.

What have we tried to use it for? 🧪

1. Dependency vulnerability compliance

Example result from Dependency vulnerability compliance query

We utilize Steampipe to assess the number of vulnerable dependencies in each project, examining their behavior and vulnerability risk levels to maintain software security and protection. Any potential exploits or violations can be identified and addressed by the responsible team.

2. Open source license compliance

Example result from Open source license compliance query

We leverage Steampipe to verify the open-source licenses of dependencies in each project, ensuring compliance with licensing requirements and preventing legal issues related to open-source software use. Any non-compliant instances can be assigned to the responsible team for resolution.

3. Try using ready-made mods that are ready to use, such as

The AWS Insights Mod provides interactive dashboards for visualizing cloud intelligence and security metrics. It assists in reporting on AWS resource configurations, visualizing relationships, and aggregating metrics to enhance understanding of your cloud infrastructure. It can help answer questions such as:

What are the relationships between this resource and others?
Is this resource publicly accessible?
Is encryption enabled, and which keys are used for encryption?
Is versioning enabled?
What networking ingress and egress rules are associated with this resource?

AWS Insights Mod — Photo from https://github.com/turbot/steampipe-mod-aws-insights

So how do things that aren’t ready-made like Dependency vulnerability compliance and Open source license compliance work? 🔍

Both of them work mainly using these 3 plugins.

Gitlab plugin: Contains all Gitlab project information.
Google Sheet plugin: Provides information about the relationship between Gitlab projects and the teams managing them.
Dependency Track plugin: Offers information about dependencies used in each project

While Gitlab and Google Sheets plugins are readily available from the Steampipe hub, the Dependency track plugin is one we developed in-house.

The function of the Dependency track plugin is to fetch data from a system called Dependency-track (Dependency-Track is an intelligent Component Analysis platform that allows organizations to identify and reduce risk in the software supply chain).

Dependency track dashboard — Photo from github.com/DependencyTrack/dependency-track

In our Dependency-track there will be information stored on what dependencies each Gitlab project has as well as details of each Dependency such as version being used, license, risk score, vulnerabilities (Some people may wonder how this information in each project gets to the Dependency track. Let me summarize it briefly: Every day there is a cron job that scans all recently changed projects to find any dependencies in each project and uploads those dependencies [called SBOM] in CycloneDX format to the Dependency-track).

Dependency track integration — Photo from github.com/DependencyTrack/dependency-track

When we have all the information We can join these data together to create a Dashboard or Alert as desired.

As mentioned earlier, “From the client’s point of view, Steampipe is just a PostgreSQL database” which means we can use anything with a PostgreSQL driver to connect to it. Just like we use Grafana to connect with Steampipe to make a dashboard or alert.

Dependency vulnerability compliance panel in our Grafana dashboard

Open source license compliance panel in our Grafana dashboard

Notification of dependency vulnerabilities compliance in our Slack monitoring channel

Conclusion

These are just a few of the things we’ve started with Steampipe, and I think there’s a lot more we can do to keep our eyes and ears open so we can make the product exceed our intended standards. If you have an interesting idea or are facing a problem that you think Steampipe can help with, don’t hesitate to share it with us and If you’re interested in tackling challenging problems and making a broad impact, join us on this journey at LINE MAN Wongnai 👉 https://careers.lmwn.com/development.

Thank you for spending your valuable time reading this article. Goodbye 👋