·

Engenharia de Software ·

Bases de Dados

Send your question to AI and receive an answer instantly

Ask Question

Preview text

Building an Analytics Platform Fundamentals of Designing an Analytics Platform Common Requirements When developing a data pipeline, you will need to think about a number of common topics: • Is real-time processing required? • Do different segments of your dataset have different timeliness requirements? • What are the durability & scalability requirements? Apache Hadoop & HDFS FILESTORE RAW UNIFY REFINE CONSUME Catalog Consumption tools (Pltools chain Analytics Value- added Data Data Layer Marathon HDFS/HBase/0ozie HIVE/Sqoop/Flume Kafka + Spark Streaming Spark Streaming + MLlib AWS-HQ/Local SQL scripts Batch Processing Managed Processing Governance Metadata Management Management Consumption Compliance Requirements Approximately align with circle users (ease of use blocks for each doc). More real-time offerings with easier access. Starting Point Maintain durability and trust of data. Analytics- Driven Pipeline Data demands high availability, scalability, and security capabilities. Designs frequently involve: • Batching & Input Media Unique data design • Data cleansing & ETL data quality & validation services layer Core Characteristics of a Pipeline The flexibility of new scheduling solutions extends for example automated scaling, resource management, and monitoring. Proplayers Have open-source job divisions for operation & maintenance reporting. - Kirti It's crucial to remain focused on data SLAs defining Source SQL map Common patterns include: - Managed data lake solutions - AWS Lake Formation or Looker/Tableau • Focusing on data virtualization by carefully determining specific usage preferences. • Look for opportunities to performance tune Add capability media & choose generic queue or Kafka a stream cluster topology. Driven by specific business value, scope and story potential based in a Fault Tolerance Overhead Requirements (Ro) & Storage Strategy bespoke nature of managing: Data Volume/Data Reliability relative costs of your alternatives