Data Ingestion with AWS Glue

Data Ingestion with AWS Glue

In this section, we will perform the following steps:

  • Configure Role Permissions for the resources we use.
  • Create a Data Catalog from our cleaned dataset with AWS Glue Crawler.
  • Transform the CSV dataset into Apache Parquet format using AWS Glue jobs.
  • Create a Data Catalog for data converted into Apache Parquet format.
  • Check Schema information. Our goal is to prepare the data ready for querying using Amazon Athena.

Contents

  1. Configure role for AWS Glue
  2. Create Data Catalog
  3. Transform to Parquet
  4. Transform to Parquet-2
  5. Create New Data Catalog
  6. Check Schema Information