List of Posts

Here you can find some ideas, concepts and things that I have learned and wanted to share with you!

Partitioning and Bucketing


Resume: In this post, we will explore two optimization techniques for efficient storage and processing of data.

Calculate the Initial Number of Partitions in Spark



Resume: In this post, we will explore the Spark internals for calculating the initial number of partitions.

Spark Architecture

Resume: In this post, I will talk about some knowledge and thoughts that I have collected recently about the Spark Architecture.

Inside of the Spark RDD


Resume: In this post, we will go inside the spark RDD and it's execution plan.

Data Processing

Resume: In this post, we'll dive into the world of data processing, exploring concepts such as streaming processing, batch processing, real-time processing, and micro-batching.

Data Engineering Lifecycle

Resume: A brief explanation of the Data Engineering LifeCycle based on the model presented in the book Fundamental of Data Engineering. Here we explain the five main stages and the six undercurrent elements that are part of this cycle.