Skip to content

getindata/kedro-snowflake

Repository files navigation

Kedro Snowflake Pipelines plugin

Python Version License SemVer PyPI version Downloads

Maintainability Rating Coverage Documentation Status

We help companies turn their data into assets

About

This plugin allows to run full Kedro pipelines in Snowflake. Right now it supports

  • Kedro starter, to get you up to speed fast
  • automatically creating Snowflake Stored Procedures from Kedro nodes (using Snowpark SDK)
  • translating Kedro pipeline into Snowflake tasks graph
  • running Kedro pipeline fully within Snowflake, without external system
  • using Kedro's official SnowparkTableDataSet
  • automatically storing intermediate data as Transient Tables (if Snowpark's DataFrames are used)
  • (New!) MLflow integration with Snowflake with examples in Snowflights Kedro starter

Documentation

For detailed documentation refer to https://kedro-snowflake.readthedocs.io/

Usage

With starter

  1. Install the plugin

    pip install "kedro-snowflake>=0.1.0" 
  2. Create new project with our Kedro starter ❄️ Snowflights 🚀:

    kedro new --starter=snowflights --checkout=master
    And answer the interactive prompts ⬇️ (click to expand)
    Project Name
    ============
    Please enter a human readable name for your new project.
    Spaces, hyphens, and underscores are allowed.
     [Snowflights]: 
    
    Snowflake Account
    =================
    Please enter the name of your Snowflake account.
    This is the part of the URL before .snowflakecomputing.com
     []: abc-123
    
    Snowflake User
    ==============
    Please enter the name of your Snowflake user.
     []: user2137
    
    Snowflake Warehouse
    ===================
    Please enter the name of your Snowflake warehouse.
     []: compute-wh
    
    Snowflake Database
    ==================
    Please enter the name of your Snowflake database.
     [DEMO]: 
    
    Snowflake Schema
    ================
    Please enter the name of your Snowflake schema.
     [DEMO]: 
    
    Snowflake Password Environment Variable
    =======================================
    Please enter the name of the environment variable that contains your Snowflake password.
    Alternatively, you can re-configure the plugin later to use Kedros credentials.yml
     [SNOWFLAKE_PASSWORD]:       
    
    Pipeline Name Used As A Snowflake Task Prefix
    =============================================
    
     [default]:
    
    Enable Mlflow Integration (See Documentation For The Configuration Instructions)
    ================================================================================
    
     [False]: 
    
    The project name 'Snowflights' has been applied to: 
    - The project title in /tmp/snowflights/README.md
    - The folder created for your project in /tmp/snowflights
    - The project's python package in /tmp/snowflights/src/snowflights
    
  3. Run the project

    cd snowflights
    kedro snowflake run --wait-for-completion

In existing Kedro project

  1. Install the plugin
    pip install "kedro-snowflake>=0.1.0" 
  2. Initialize the plugin
    kedro snowflake init <ACCOUNT> <USER> <PASSWORD_FROM_ENV> <DATABASE> <SCHEMA> <WAREHOUSE>
  3. Run the project
    kedro snowflake run --wait-for-completion

Kedro pipeline in Snowflake Tasks

Kedro Snowflake Plugin

Execution:

Kedro Snowflake Plugin CLI