Blog

Writing from our team. The latest news, insights, and resources.

How to earn rewards by sharing the knowledge!

Referring a friend to something you genuinely believe in is one of the simplest yet most powerful ways to create opportunities. With that in mind, we’re excited to introduce the Data Engineer Academy Referral Program—a way to reward you for sharing the benefits of industry-leading data engineering training with the people you know. We designed...

By: Chris Garzon | November 25, 2024 | 8 mins read
Learn More

How to host a website on AWS EC2

In today’s digital world, both individuals and businesses require a powerful website. However, finding a trustworthy hosting company is an important step in creating a website. Amazon Web Services (AWS) EC2 provides a strong and scalable infrastructure for hosting websites, making it a great alternative for your hosting requirements. Step-by-step instructions for how to host...

By: ninad magdum | June 17, 2023 | 13 mins read
Learn More
Data Governance in the Cloud

Data Governance in the Cloud: Lake Formation, Unity Catalog, Purview, and Snowflake Horizon

Cloud data governance is the mix of access rules, metadata, classification, lineage, and audit controls that keep cloud data safe and usable. Lake Formation, Unity Catalog, Microsoft Purview, and Snowflake Horizon all help with that job, but they solve it from different starting points. The best choice usually depends on where your data already lives...

By: Chris Garzon | June 8, 2026 | 9 mins read
Learn More
BigQuery Cost Guardrails for Data Engineers

BigQuery Cost Guardrails for Data Engineers: Slots, Partitions, and Query Limits

Data engineers can control BigQuery spend by combining slot management, partition pruning, and query limits. That’s the core of bigquery cost optimization. The goal isn’t to block useful analysis, it’s to stop surprise bills, runaway joins, and scheduled jobs that read far more data than expected. Modern teams run ad hoc SQL, BI refreshes, dbt...

By: Chris Garzon | June 8, 2026 | 8 mins read
Learn More
Microsoft Fabric vs Synapse

Microsoft Fabric vs Synapse for Data Engineers in 2026

Microsoft Fabric is usually the better choice for new data engineering projects in 2026. Synapse still makes sense when you already have Azure SQL pools, pipelines, and permissions working in production. In the Microsoft Fabric vs Synapse decision, the real tradeoffs are speed, cost control, governance, and how much legacy work your team can safely...

By: Chris Garzon | June 7, 2026 | 9 mins read
Learn More
Snowflake Dynamic Tables

Snowflake Dynamic Tables Explained for Data Engineers

Snowflake dynamic tables are a serverless way to keep transformed data fresh without wiring up a pile of scheduled jobs. You define the result you want, set a freshness target, and Snowflake manages refreshes for you. For data engineers, Snowflake dynamic tables mean simpler pipelines, less orchestration, and cleaner incremental updates. That matters when you’re...

By: Chris Garzon | June 7, 2026 | 9 mins read
Learn More
AWS Glue vs Lambda

AWS Glue vs Lambda vs Step Functions for ETL: Which Should You Use?

AWS Glue is best for large batch ETL. Lambda is best for small event-driven transforms. Step Functions is best when your pipeline has many steps, retries, or branches. If you’re comparing AWS Glue, Lambda, and Step Functions for ETL, the right choice comes down to data size, workflow complexity, cost, and how much control your...

By: Chris Garzon | June 6, 2026 | 8 mins read
Learn More
Databricks vs Snowflake

Databricks vs Snowflake for Data Engineers: Jobs, Cost, and Architecture

In the Databricks vs Snowflake choice, Databricks usually wins for raw data pipelines, Spark-heavy processing, and machine learning support. Snowflake often wins for fast SQL analytics, cleaner warehouse workflows, and lower day-to-day platform effort. That doesn’t make one “better” in every case. The right pick depends on the jobs your team handles, how your data...

By: Chris Garzon | June 5, 2026 | 8 mins read
Learn More
Semantic Layer for Data Engineers

Semantic Layer for Data Engineers: Metrics, Models, and BI Consistency

A semantic layer in data engineering gives you one shared place to define metrics, business logic, and model relationships, so BI tools show the same numbers everywhere. When two dashboards disagree on revenue or active users, the problem is usually not the chart. The problem is inconsistent logic. That mismatch creates copied SQL, repeated reviews,...

By: Chris Garzon | June 4, 2026 | 9 mins read
Learn More
Partitioning and Clustering in Warehouses

Partitioning and Clustering in Warehouses: Performance Without Guesswork

Partitioning and clustering help a warehouse scan less data, which usually means faster queries and lower cost. In plain terms, warehouse partitioning and clustering are table layout choices that improve pruning, not magic fixes for bad SQL or weak models. That matters when dashboards slow down, fact tables keep growing, and cloud bills rise with...

By: Chris Garzon | June 4, 2026 | 9 mins read
Learn More
SQL MERGE for Data Engineers

SQL MERGE for Data Engineers: Upserts, CDC, and Idempotent Pipelines

SQL MERGE matches incoming rows to existing rows and then updates, inserts, or deletes them in one statement. Data engineers use it to write upsert logic, finish CDC loads, and make repeat runs safe. In data engineering, SQL MERGE helps you keep warehouse tables current without chaining together separate update and insert jobs. It also...

By: Chris Garzon | June 3, 2026 | 10 mins read
Learn More
Data Quality Tests

Data Quality Tests in SQL: Nulls, Duplicates, Ranges, and Referential Integrity

Data quality tests in SQL help you catch bad rows early, before they break dashboards, audits, or machine learning work. The four checks that matter most are nulls, duplicates, range rules, and referential integrity. They work well in Snowflake, BigQuery, Redshift, and Postgres because the logic stays close to the tables. One null customer_id can...

By: Chris Garzon | June 2, 2026 | 9 mins read
Learn More
Slowly Changing Dimensions Type 2

Slowly Changing Dimensions Type 2 with SQL and dbt

A slowly changing dimension type 2 keeps the full history of a dimension row. When a tracked value changes, you close the old row and insert a new one instead of overwriting the past. That matters in analytics because you often need to know what was true on a given date. SQL handles the change...

By: Chris Garzon | June 1, 2026 | 9 mins read
Learn More
Incremental Data Models in dbt

Incremental Data Models in dbt: Append, Merge, and Snapshot Strategies

dbt incremental models load only new or changed rows, so you don’t rebuild a full table on every run. That makes pipelines faster, lowers warehouse cost, and helps large tables stay fresh. In practice, most teams choose between three patterns: append for immutable data, merge for rows that change, and snapshots for history. The right...

By: Chris Garzon | May 31, 2026 | 9 mins read
Learn More
Common Mistakes in a Snowflake Real Time Project

Common Mistakes in a Snowflake Real-Time Project

Most Snowflake real-time projects fail for a simple reason: teams move too fast, skip planning, and treat streaming data like batch data with shorter timing. That works in a demo. It falls apart in production, where late events, duplicates, bad timestamps, and recovery gaps show up fast. If you’re building one of these pipelines, you...

By: Chris Garzon | May 30, 2026 | 9 mins read
Learn More

CDC Pipelines Explained: Debezium, Kafka, and Warehouse MERGE Patterns

A CDC pipeline captures row changes in a source database, publishes those changes as events, and applies them to a warehouse table. Instead of reloading full tables, it moves only inserts, updates, and deletes. If you’re learning cdc pipeline data engineering, this is one of the clearest patterns to understand because it shows how modern...

By: Chris Garzon | May 29, 2026 | 10 mins read
Learn More