Parse Big Data with Swift TabularData Framework 💿

idelfonsog2
Sep 19, 2022
1 min read

Updated: Sep 1, 2023

If you're in a Data Scientist role at a company, you've likely heard of, worked with, or discussed Big Data and the plethora of tools available for Extraction, Loading, and Transformation. Often, this involves using popular coding languages and frameworks to interpret raw data.

A DataFrame is a type (or object, like currency numbers) that allows software developers to manipulate raw data using its instance functions. I'm pleased that the Swift TabularData Framework has adopted conventions similar to those of Pandas and Numpy.

If you're a software developer or programmer, you probably know that Swift is a renowned programming language primarily used for developing software applications for the iPhone, Apple Watch, and MacBook. Over the past five years, its availability has expanded to other platforms and operating systems, including Windows, Ubuntu, CentOS, and Amazon Linux.

Back to TabularData Framework!

Why do we need a framework instead of SQL (Structured Query Language)? It boils down to performance and potentially reducing context-switching for developers.

This:

dataFrame.description(options: formattingOptions)

It is different from this:

SELECT * FROM TABLE_NAME

The former queries unstructured data, which is distinct from the latter's structured data. This distinction brings about differences in patterns, speed 🐢, and costs 💰.

If you're curious about the differences mentioned, check out my article on how I achieved this using AWS Redshift and Spark

The TabularData Framework enables Swift developers to work with Big Data on user's devices or servers.

2022 City Of Chicago Divvy Bike Trip Data

In the above snapshot, I mapped the date from a `String` type to a `Date` type (object) to work with it in my Swift application.

Comments