Magic Data Apps with Snowflake, Streamlit, and DuckDB

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

5 min readMay 27, 2022

This pattern will allow you to create high-traffic apps available to the world, hosted by Streamlit Cloud. Play and prepare your data with Snowflake and Streamlit, and then serve the results in production out of static Parquet files.

Image generated by DALL-E 2, thanks Charly Wargnier

Snowflake acquired Streamlit for a good reason: Streamlit adds magic to the process of creating a data app. Connecting Streamlit to Snowflake is straightforward , and my teammate Dash Desai showed how to play with Snowpark for Python (or just use the Streamlit Snowflake connector)

Getting Started with Snowpark for Python and Streamlit

Recently Snowflake announced entering into an agreement to acquire Streamlit in order to democratize writing and…

medium.com

Connect Streamlit to Snowflake - Streamlit Docs

This guide explains how to securely access a Snowflake database from Streamlit Cloud. It uses the…

docs.streamlit.io

Some data apps don’t need a full fledged database backend, but only a way to quickly present analytical results out of static data that has been pre-optimized. If you want to release a data app that’s expected to have high traffic, no writes, and low personalization/security requirements — this pattern might be for you.

In summary, the pattern in this post will show you how easy it is to:

Prepare data in Snowflake, and export it as Parquet files to cloud storage buckets.
Read these Parquet files in Streamlit Cloud, from your private cloud storage bucket.
Query these Parquet files in Streamlit, using DuckDB.

Step 1: Create a storage bucket

For this post, I chose Google Cloud Storage as it lives close to the current Streamlit Cloud servers. My Snowflake account lives in AWS, but it can easily export files to GCS. Streamlit can also read files out of S3, but here GCS was my choice.

Once we create a GCS bucket we need to give it write permission from Snowflake and read permissions to Streamlit Cloud. Those are the next two steps.

Step 2: Export data from Snowflake to your bucket

Creating a GCS integration in Snowflake is straightforward:

use role accountadmin;create storage integration fhoffa_lit # arbitrary name
  type = external_stage
  storage_provider = gcs
  enabled = true
  storage_allowed_locations = ('gcs://fhoffa-lit/');
  
desc storage integration fhoffa_lit;

That desc storage integration will give you the id of a GCP service account that Snowflake will create for you. Now you only need to tell GCP to give that service account writing privileges to your bucket.

Find your bucket and “edit access”:

Add the id of the service account that Snowflake provisioned with “Storage Object Admin” permissions (or less if desired):

Add the id of the service account provisioned by Snowflake

With this, we can export tables or the results of any query to that storage integration in Snowflake. First create a stage and make sure that you can read all the files already in this bucket:

create stage fhoffa_lit_stage
  url = 'gcs://fhoffa-lit/'
  storage_integration = fhoffa_lit;list @fhoffa_lit_stage;

Exporting the results of a query as a parquet file to GCS is straightforward now:

copy into @fhoffa_lit_stage/out/parquet_ 
from (
    select 1 a, 2 b, 3 c
)
file_format = (type = 'parquet')
header = true;

Step 3: Read in Streamlit from your bucket

Now that we have a Parquet file in GCS, download it within Streamlit Cloud. First, create a service account in GCP, and share it with Streamlit Cloud using its secrets manager:

Give some secrets to Streamlit to read from GCS

I know I’ve used this word several times already, but now reading a CSV out of GCS into Streamlit is… straightforward:

import streamlit as stfrom google.oauth2 import service_account
from google.cloud import storagecredentials = service_account.Credentials.from_service_account_info(
    st.secrets["gcp_service_account"]
)
client = storage.Client(credentials=credentials)bucket = "fhoffa-lit"@st.experimental_memo(ttl=600)
def read_file(bucket_name, file_path):
    bucket = client.bucket(bucket_name)
    return bucket.blob(file_path).download_as_string().decode("utf-8")content = read_file(bucket, "myfile.csv")
content

Step 4: Read Parquet in Streamlit with DuckDB

Notice that in the example above I read a CSV — not a Parquet file. This is because a Parquet file needs more parsing.

First, instead of just reading the file, take it out of GCS and into Streamlit Cloud local storage:

@st.experimental_memo(ttl=60000)
def download_to_local_file(bucket_name, file_path, destination):
    bucket = client.bucket(bucket_name)
    return bucket.blob(file_path).download_to_filename(destination)download_to_local_file(
  bucket, "out/parquet__0_0_0.snappy.parquet", "temp.parquet")

And now you can query that local parquet file with DuckDB:

import duckdbcon = duckdb.connect(database=':memory:')
con.execute("SELECT * FROM read_parquet('temp.parquet')")
rows = con.fetchall()
rows

Requirements?

Ahh, these are the requirements.txt that Streamlit Cloud will easily install:

duckdb
google-cloud-storage

That’s it

That’s it. With this pattern you can:

Analyze data in Snowflake.
Tell Snowflake to export Parquet files into GCS.
Have Streamlit Cloud copy Parquet files from GCS into Streamlit Cloud.
Query Parquet files in Streamlit Cloud with DuckDB (on the server side).

Find the code and minimal sample app here:

If you haven’t tried Streamlit Cloud, follow this tutorial to deploy your first app.

Deploy an app - Streamlit Docs

Streamlit Cloud lets you deploy your apps in just one click, and most apps will deploy in only a few minutes. If you…