Data Lake
| ||||||||||||
A data
lake is a storage repository that holds a vast amount of raw data
in its native format until it is needed. While a hierarchical data warehouse
stores data in files or folders, a data lake uses a flat architecture to
store data. Each data element in a lake is assigned a unique identifier and
tagged with a set of extended metadata tags. When a business question arises,
the data lake can be queried for relevant data, and that smaller set of data
can then be analyzed to help answer the question. The term data lake is often associated with Hadoop-oriented object storage. In such a scenario, an organization's data is first loaded into the Hadoop platform, and then business analytics and data mining tools are applied to the data where it resides on Hadoop's cluster nodes of commodity computers. Like big data, the term data lake is sometimes disparaged as being simply a marketing label for a product that supports Hadoop. Increasingly, however, the term is being accepted as a way to describe any large data pool in which the schema and data requirements are not defined until the data is queried.
Data lake vs. data warehouse
Data lakes and data warehouses are both used for storing big data, but each approach has its own uses. Typically, a data warehouse is a relational database housed on an enterprise mainframe server or the cloud. The data stored in a warehouse is extracted from various online transaction processing (OLTP) applications to support business analytics (BA) queries and data marts for specific internal business groups, such as sales or inventory teams.
Data warehouses are useful when there is a
massive amount of data from operational systems that needs to be readily
available for analysis. Because the data in a lake is often uncurated and can
originate from sources outside of the company's operational systems, lakes
are not a good fit for the average business analytics user.
March is the official month for:
National Breast Implant Awareness Month
Asset Management Awareness Month Endometriosis Awareness Month Irish-American Heritage Month Multiple Sclerosis Awareness Month National Caffeine Awareness Month National Brain Injury Awareness Month National Celery Month National Cerebral Palsy Awareness Month National Cheerleading Safety Month National Craft Month National Credit Education Month National Flour Month National Frozen Food Month National Kidney Month National Noodle Month National Nutrition Month National Peanut Month
National Sauce Month
National Trisomy Awareness Month National Umbrella Month National Women’s History Month National Colorectal Cancer Awareness Month National Music in Our Schools Month National Social Work Month www.amazon.com/author/paulbabicki ==================================================
|
Tuesday, March 12, 2019
Tabula Rosa SystemsTechnical Term - Data Lake
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment