02/02/2021
Data warehouse: a foundation for business intelligence
A data warehouse is a repository that stores current and historical data from disparate sources. It’s a key component of a data analytics architecture that creates an environment for decision support, analytics, business intelligence, and data mining.
A data warehouse holds data from multiple sources, including internal databases and SaaS platforms. After the data has been loaded, it can be cleansed, transformed, catalogued, and checked for quality before it’s used for analytics dashboards, reporting, machine learning, or anything else.
Historically, businesses used ETL tools to pipe data into expensive on-premises data warehouse systems. Due to the limited capacity of these expensive systems, business users needed to perform as much prep work as possible before loading data into the system. Today, however, cloud-based data warehouses — including Amazon Redshift, Microsoft Azure SQL Data Warehouse, Google BigQuery, and Snowflake — offer flexible infrastructure whose processing and storage capacity can quickly scale based on an organization’s data needs. More and more organizations are opting to skip preload transformations in favor of running transformations at query time — a process referred to as ELT. This lets business users transform raw data within a data warehouse at any time for any particular use case.
Data warehouses vs. data lakes vs. data marts
Although a data warehouse is an effective and useful way to store data for business analytics, it’s best suited for structured data defined by a schema.
By contrast, a data lake can hold both structured and unstructured data, so in addition to sources defined by schemas, it can hold raw data such as log files, internet clickstream records, images, or social media posts.
A data mart is similar to a data warehouse, but holds data for one specific department or line of business, such as sales or finance. A data warehouse can feed data to a data mart, or a data mart can feed a data warehouse.
Data warehouses, data lakes, and data marts perform different duties. Businesses may use all three for different purposes.