Creating an ETL process with Apache Cassandra

idelfonsog2
Jan 30, 2022
1 min read

Updated: Sep 1, 2023

Handling BIG DATA and storage now a days is no just feasible, it’s a must.

import pandas
import cassandra

Losing a customer is not an option. In today's world, there are numerous devices gathering and transmitting data. One advantage of using a document store database, or NoSQL, is that developers don't have to constantly manage or adjust entities, migrations, and changes in existing products. Companies and products operate in an agile environment where requirements shift continuously. NoSQL enables us to quickly adapt to these changes.

The Business Case 💼

Consider an application scenario where we have deployed a music app that collects data and stores it in a local text file. From this data, we can determine which songs users listen to and which membership tier they are subscribed to (on a broad scale).

// "Some of the largest production deployments include Apple's, with over 75,000 nodes storing over 10 PB of data.
// Netflix (2,500 nodes, 420 TB, over 1 trillion requests per day), // Chinese search engine Easou (270 nodes, 300 TB, over 800 million requests per day), 
// and eBay (over 100 nodes, 250 TB)." 
// https://cassandra.apache.org/

My role involved extracting, transforming, and loading this data into a system. This empowered business teams to bring their requirements and derive insights and solutions from the data.

🔗 Jupyter Notebook ETL Process

Handling BIG DATA and storage now a days is no just feasible, it’s a must.

The Business Case 💼

Comments