
Data Engineering Podcast An Exploration Of The Expectations, Ecosystem, and Realities Of Real-Time Data Applications
Aug 22, 2022
01:06:20
User-Facing Analytics Examples
- Real-time analytics power user-facing features like recommendations and logistics tracking.
- Examples include e-commerce recommendations and live delivery updates.
Internal Operational Analytics Examples
- Internal operational analytics use real-time data for decision-making.
- Examples include fraud detection and flight rerouting.
Real-Time Adoption Flywheel
- Real-time analytics adoption and technology improvements are in a flywheel effect.
- Each drives the other in a continuous loop.
Get the Snipd Podcast app to discover more snips from this episode
Get the app 1 chevron_right 2 chevron_right 3 chevron_right 4 chevron_right 5 chevron_right 6 chevron_right 7 chevron_right 8 chevron_right 9 chevron_right 10 chevron_right 11 chevron_right 12 chevron_right 13 chevron_right 14 chevron_right 15 chevron_right 16 chevron_right 17 chevron_right 18 chevron_right 19 chevron_right
Introduction
00:00 • 2min
How Did You Become a Product Officer at Rockset?
01:45 • 2min
The Main Use Cases for Logistics and Delivery Tracking
03:31 • 3min
Real Time Analytics
06:48 • 3min
Scalability in the Real Time World?
09:35 • 5min
Is There a Place for Batch or Real Time for Analytics?
14:45 • 3min
Are You Using a Warehouse?
17:27 • 4min
How to Choose the Right Real Time Platform?
21:21 • 6min
Managing the Cloud on Prem or in the Cloud?
26:54 • 3min
Select Star Data Discovery Platform - The Biggest Challenge With Modern Data Systems
30:09 • 6min
Do You Have a Batch or a Real Time Environment?
35:54 • 4min
Continuity of Queries, and Not Just Data
40:21 • 4min
Data Quality Matters More Than Ever
44:20 • 3min
Data Engineering Podcast - Ascend Data Automation Cloud
47:24 • 5min
Using Real Time Tracking for Heavy Construction
52:01 • 2min
The Biggest Lesson You've Learned From Working in Real Time Analytics
54:20 • 3min
Why Rockset Is the Wrong Choice?
57:29 • 3min
Rockset - What's Next?
01:00:13 • 2min
What's the Biggest Gap in the Data Management Ecosystem?
01:02:13 • 4min
Summary
Data has permeated every aspect of our lives and the products that we interact with. As a result, end users and customers have come to expect interactions and updates with services and analytics to be fast and up to date. In this episode Shruti Bhat gives her view on the state of the ecosystem for real-time data and the work that she and her team at Rockset is doing to make it easier for engineers to build those experiences.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show!
- Data stacks are becoming more and more complex. This brings infinite possibilities for data pipelines to break and a host of other issues, severely deteriorating the quality of the data and causing teams to lose trust. Sifflet solves this problem by acting as an overseeing layer to the data stack – observing data and ensuring it’s reliable from ingestion all the way to consumption. Whether the data is in transit or at rest, Sifflet can detect data quality anomalies, assess business impact, identify the root cause, and alert data teams’ on their preferred channels. All thanks to 50+ quality checks, extensive column-level lineage, and 20+ connectors across the Data Stack. In addition, data discovery is made easy through Sifflet’s information-rich data catalog with a powerful search engine and real-time health statuses. Listeners of the podcast will get $2000 to use as platform credits when signing up to use Sifflet. Sifflet also offers a 2-week free trial. Find out more at dataengineeringpodcast.com/sifflet today!
- The biggest challenge with modern data systems is understanding what data you have, where it is located, and who is using it. Select Star’s data discovery platform solves that out of the box, with an automated catalog that includes lineage from where the data originated, all the way to which dashboards rely on it and who is viewing them every day. Just connect it to your database/data warehouse/data lakehouse/whatever you’re using and let them do the rest. Go to dataengineeringpodcast.com/selectstar today to double the length of your free trial and get a swag package when you convert to a paid plan.
- Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer.
- Your host is Tobias Macey and today I’m interviewing Shruti Bhat about the growth of real-time data applications and the systems required to support them
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what is driving the adoption of real-time analytics?
- architectural patterns for real-time analytics
- sources of latency in the path from data creation to end-user
- end-user/customer expectations for time to insight
- differing expectations between internal and external consumers
- scales of data that are reasonable for real-time vs. batch
- What are the most interesting, innovative, or unexpected ways that you have seen real-time architectures implemented?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Rockset?
- When is Rockset the wrong choice?
- What do you have planned for the future of Rockset?
Contact Info
- @shrutibhat on Twitter
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
- Rockset
- Embedded Analytics
- Confluent
- Kafka
- AWS Kinesis
- Lambda Architecture
- Data Observability
- Data Mesh
- DynamoDB Streams
- MongoDB Change Streams
- Bigeye
- Monte Carlo Data
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
