

Your next ETL pipeline will be serverless
11 snips Jul 4, 2025
Poonam Pratik Patel, Director at The Line Tech UK and AWS Community Builder, dives into the world of serverless ETL implementation. She shares insights on how serverless architectures can streamline data processing while ensuring accuracy. The discussion includes practical strategies for data validation and partitioning, alongside the integration of AWS tools like Glue and Lambda. Poonam also highlights the transformative role of AI and ML in the evolution of data pipelines, making them more efficient and scalable.
AI Snips
Chapters
Transcript
Episode notes
Clean Data Crucial for Decisions
- Businesses need clean, correct data to make the right decisions and grow effectively.
- Manual data correction is impractical and error-prone with millions of data points.
Embrace Serverless for ETL
- Use AWS serverless services like Lambda and Step Functions to eliminate infrastructure management in ETL.
- Focus on defining data flow and validation, not on backend infrastructure sizing or maintenance.
Pipeline Orchestrated by Step Functions
- Data from branches is collected into a single S3 bucket which triggers a Step Function workflow.
- The workflow invokes Lambda functions that validate and process data, moving invalid files to a separate bucket.