top of page

Creating an ETL process with Apache Cassandra

Handling BIG DATA and storage now a days is no just feasible, it’s a must.

import pandas
import cassandra

Losing customer it’s not an option. Today in the world we have a ton of devices that are gathering and sending data. The benefit of using a document store database #NoSQL, is that developers don’t need to maintain and/or adjust entities, migrations and changes on existing products. Companies and product moves in an agile environment, where requirements are constantly changing; NoSQL allows us to spin these requirements in a quick manner.

The Business Case 💼

The following application establishes the following case where we have deploy a music app and its collecting data which it’s the store to a local text file. From that we known which songs does the user listens to and which membership they are on (at a higher level).

// "Some of the largest production deployments include Apple's, with over 75,000 nodes storing over 10 PB of data.
// Netflix (2,500 nodes, 420 TB, over 1 trillion requests per day), // Chinese search engine Easou (270 nodes, 300 TB, over 800 million requests per day), 
// and eBay (over 100 nodes, 250 TB)." 

My job was to extract transform and load this data into the system where business teams could bring their requirements and collect solutions from the data.

🔗 Jupyter Notebook ETL Process

bottom of page