Creating an ETL process with Apache Cassandra

Handling BIG DATA and storage now a days is no just feasible, it’s a must.

import pandas
import cassandra

Losing customer it’s not an option. Today in the world we have a ton of devices that are gathering and sending data. The benefit of using a document store database #NoSQL, is that developers don’t need to maintain and/or adjust entities, migrations and changes on existing products. Companies and product moves in an agile environment, where requirements are constantly changing; NoSQL allows us to spin these requirements in a quick manner.


The Business Case 💼


The following application establishes the following case where we have deploy a music app and its collecting data which it’s the store to a local text file. From that we known which songs does the user listens to and which membership they are on (at a higher level).

// "Some of the largest production deployments include Apple's, with over 75,000 nodes storing over 10 PB of data.
// Netflix (2,500 nodes, 420 TB, over 1 trillion requests per day), // Chinese search engine Easou (270 nodes, 300 TB, over 800 million requests per day), 
// and eBay (over 100 nodes, 250 TB)." 
// https://cassandra.apache.org/

My job was to extract transform and load this data into the system where business teams could bring their requirements and collect solutions from the data.







🔗 Jupyter Notebook ETL Process