This project implements an ETL (Extract, Transform, Load) pipeline in Python using DuckDB to process and analyze log records (in JSON format). The system extracts the data, calculates usage and ...
arquivos_teste_dados_bus2/ ├── bases-de-dados/ # Arquivos CSV de entrada │ ├── produtos.csv │ ├── vendas.csv │ └── empregados.csv ├── create-user-database/ # Criação de usuário e banco │ └── ...
Abstract: This study aims to increase ETL process efficiency »ud reduce processing time by applying the method of Change Data Capture (CDC) in distributed system using Hadoop Distributed file System ...