

Data Archives - Software Engineering Daily
Data Archives - Software Engineering Daily
Databases and data engineering episodes of Software Engineering Daily
Episodes
Mentioned books

Jun 16, 2021 • 56min
Blissfully: Comprehensive IT Management with Aaron White
Delivering Saas products involves a lot more than just building the product. Saas management involves customer relationship management, licensing, renewals, maintaining software visibility, and the general management of the technology portfolio.
The company Blissfully helps businesses manage their SaaS products from within a complete IT platform with organization, automation, and security built in. The Blissfully platform offers a system of record for creating and maintaining a single source of truth for technology, a workflows and automations feature for defining and executing consistent IT processes, an IT collaboration feature, and a security and compliance feature. These features come together to form a comprehensive IT management platform.
In this episode we talk with Aaron White, a Founder and CTO at Blissfully. Aaron was previously a Co-Founder and Board Member of Price Intelligently, and Vice President at Venrock before that.
Sponsorship inquiries: sponsor@softwareengineeringdaily.com
The post Blissfully: Comprehensive IT Management with Aaron White appeared first on Software Engineering Daily.

Jun 15, 2021 • 41min
Stemma: Understanding Big Data with Mark Grover
Amundsen was started at Lyft and is the leading open-source data catalog with the fastest-growing community and the most integrations. Amundsen enables you to search your entire organization by text search, see automated and curated metadata, share context with co workers, and learn from others by seeing most common queries on a table or frequently used data.
Powered by Amundsen, the company Stemma is a fully managed data catalog that bridges the gap between data producers and data consumers. Stemma adds features to Amundsen like showing meaningful data to individual users, adding metadata to data automatically, and documenting data on the fly. Stemma integrates with all the major data sources like Snowflake, Redshift, Google BigQuery, and Apache Airflow.
In this episode we talk to Mark Grover, Founder at Stemma. Mark co-created Amundsen and authored the book Hadoop Application Architectures. He was an engineer at Cloudera before joining Lyft as a Product Manager.
Sponsorship inquiries: sponsor@softwareengineeringdaily.com
The post Stemma: Understanding Big Data with Mark Grover appeared first on Software Engineering Daily.

May 27, 2021 • 48min
Data Exploration with a New Python Library with Doris Lee
Data exploration uses visual exploration to understand what is in a dataset and the characteristics of the data. Data scientists explore data to understand things like customer behavior and resource utilization. Some common programming languages used for data exploration are Python, R, and Matlab.
Doris Jung-Lin Lee is currently a Graduate Research Assistant at the University of California, Berkeley, also earning a PhD in Information Management and Systems. Doris also did her undergrad at Berkeley, studying physics and astrophysics. She is currently developing Lux, a Python library for accelerating and simplifying the process of data exploration. Her research and work with Lux is aimed to make data science more intuitive and accessible to end users. In this episode Doris joins us to discuss data exploration and her research and development of Lux.
Sponsorship inquiries: sponsor@softwareengineeringdaily.com
The post Data Exploration with a New Python Library with Doris Lee appeared first on Software Engineering Daily.

May 25, 2021 • 58min
Firebolt: Data Warehouses with Eldad Farkash
Cloud data warehouses are databases hosted in cloud environments. They provide typical benefits of the cloud like flexible data access, scalability, and performance.
The company Firebolt provides a cloud data warehouse built for modern data environments. It decouples storage and compute to operate on top of existing data lakes like S3. It computes orders of magnitude faster performance from gigabyte to petabyte scale by using a columnar data structure, vectorized processing, just-in-time query compilation, and continuously aggregated indexing. Firebolt scales with data lakes by processing queries across clusters of nodes in parallel, providing consistently fast processing and granular control over resources.
In this episode we talk with Eldad Farkash, Co-Founder and CEO of Firebolt. Eldad was previously a Venture Partner at Angular Ventures and a Founder, CTO and Board Member at Sisense before that. We discuss big data, data warehouses, and the unique benefits offered by Firebolt.
Sponsorship inquiries: sponsor@softwareengineeringdaily.com
The post Firebolt: Data Warehouses with Eldad Farkash appeared first on Software Engineering Daily.

May 20, 2021 • 54min
Preset: Visualizing Big Data with Srini Kadamati
Apache Superset is an open-source, fast, lightweight and modern data exploration and visualization platform. It can connect to any SQL based data source through SQLAlchemy at petabyte scale. Its architecture is highly scalable and it ships with a wide array of visualizations.
The company Preset provides a powerful, easy to use data exploration and visualization platform powered by Apache Superset. Preset enables team members with some to no programming experience to build interactive visualizations and dashboards with a no-code viz builder and SQL editor. It works directly on top of popular cloud data warehouses and leading data engines. Preset delivers all the data visualization power of Apache Superset through their complete, easy to consume, enterprise ready platform.
In this episode we talk with Srini Kadamati, Senior Data Scientist / Developer advocate at Preset. Previously Srini worked as Head of Product at Dataquest.io and as a Data Scientist at Radius Intelligence before that. He is also a Committer to Apache Superset. We discuss data visualization, the power of big Data, and Preset.
Sponsorship inquiries: sponsor@softwareengineeringdaily.com
The post Preset: Visualizing Big Data with Srini Kadamati appeared first on Software Engineering Daily.

May 17, 2021 • 44min
ClickHouse: Data Warehousing with Robert Hodges
Columnar databases store and retrieve columns of data rather than rows of data. Each block of data in a columnar database stores up to 3 times as many records as row-based storage. This means you can read data with a third of the power needed in row-based data, among other advantages.
The company Altinity is the leading enterprise provider for ClickHouse – an open-source column-store analytic database, now a fully managed service developed and operated with Altinity.Cloud. Altinity only bills for the compute, storage, and support that is used. They provide enterprise support for analytic applications like tuning queries, Kafka support, and ClickHouse bugs, and their ClickHouse clusters run with out-of-the-box security and privacy.
In this episode we talk with Robert Hodges, CEO at Altinity. Before becoming CEO at Altinity, Robert worked as a Senior Staff Engineer at VMWare and was the CEO of Continuent before that. We discuss databases and data warehousing, ClickHouse, and how Altinity helps customers create enterprise analytic applications.
Sponsorship inquiries: sponsor@softwareengineeringdaily.com
The post ClickHouse: Data Warehousing with Robert Hodges appeared first on Software Engineering Daily.

May 13, 2021 • 52min
Apache Hudi: Large Scale Data Systems with Vinoth Chandar
Apache Hudi is an open-source data management framework used to simplify incremental data processing and data pipeline development. This framework more efficiently manages business requirements like data lifecycle and improves data quality. Some common use cases for Hudi is record-level insert, update, and delete, simplified file management and near real-time data access, and simplified CDC data pipeline development (AWS.amazon.com).
In this episode we speak to Vinoth Chandar, VP of Apache Hudi. Vinoth is the creator of the Hudi project at Uber. He continues to lead its evolution at the Apache Software Foundation. Previously he was a Principal Engineer at Confluent, and a Sr Staff Engineer/Manager at Uber before that. We discuss building large scale distributed and data systems.
Sponsorship inquiries: sponsor@softwareengineeringdaily.com
The post Apache Hudi: Large Scale Data Systems with Vinoth Chandar appeared first on Software Engineering Daily.

May 12, 2021 • 45min
Akita: Application Programming Interfaces with Jean Yang
An Application Programming Interface, API for short, is the connector between 2 applications. For example, a user interface that needs user data will call an endpoint, like a special URL, with request parameters and receive the data back if the request is valid. Modern applications rely on APIs to send data back and forth to each other and save, edit, delete, or retrieve data in databases. The number of APIs used in a single application is growing due to the increase of micro-services and distributed architectures. Understanding how your applications use APIs can increase their efficiency and stability and make debugging easier.
The company Akita observes the structure of programs to visualize, map, and manage API behavior. By monitoring the APIs in your applications, Akita can catch code changes that may break production applications. While this work is normally labor-intensive, Akita automates it by analyzing the API traffic. They check the observed behaviors against intended specs and contracts to provide clear oversight on all activity. This information can then be generated into maps that help you document and version your APIs across your entire service ecosystem.
In this episode we talk with Jean Yang, Founder and CEO of Akita Software. Jean was previously an assistant professor at Carnegie Mellon University and a postdoctoral researcher and Harvard Medical School before that. We discuss modern APIs, their role in applications, and how Akita Software makes understanding and building APIs easier for developers.
Sponsorship inquiries: sponsor@softwareengineeringdaily.com
The post Akita: Application Programming Interfaces with Jean Yang appeared first on Software Engineering Daily.

May 11, 2021 • 53min
Nextmv: Optimization in Fluid Work Environments with Carolyn Mooney
The traveling salesman problem is a classic challenge of finding the shortest and most efficient route for a person to take given a list of destinations. This is one of many real-world optimization problems that companies encounter. How should they schedule product distribution, or promote product bundles, or define sales territories? The answers to these questions constantly change because business environments constantly change.
The company Nextmv helps solve these problems with production-ready, commercial tools for solving optimization problems and simulating models with real company data. Their tool Hop encodes optimization strategies for dynamic environments. Hope can be deployed to routing, scheduling and assignment problems in multiple industries like on-demand delivery, e-commerce, and IT infrastructure management. Their tool Dash is a commercial-grade simulation engine that provides an environment to “A/B test” models online with real data.
In this episode we talk to Carolyn Mooney, CEO at Nextmv. Carolyn was previously a Lead Systems Engineer at Grubhub, and a Decision System Analyst at Zoomer before that. We discuss optimization problems throughout different industries, machine learning strategies for solving them, and go into detail about how Nextmv helps companies become more profitable and efficient.
Sponsorship inquiries: sponsor@softwareengineeringdaily.com
The post Nextmv: Optimization in Fluid Work Environments with Carolyn Mooney appeared first on Software Engineering Daily.


