Download Dataset

Download Dataset

  1. Download the dataset from Kaggle.

  2. After downloading to your personal computer, organize your data as shown below:

raw
├── TABLE-NAME-1
│ ├── LOAD00000001.csv
│ └── LOAD00000002.csv
└── TABLE-NAME-2
├── LOAD00000001.csv
└── LOAD00000002.csv
  • Specifically for the Airbnb listings dataset, structure the data as follows:

Here's the English Markdown syntax for your content:

markdown
Copy code
+++
title = "Download Dataset"
date = 2021
weight = 2
chapter = false
pre = "<b>1.2 </b>"
+++

#### Download Dataset

1. Download the dataset from [Kaggle](https://www.kaggle.com/).
   + In this workshop, we will use the [Airbnb listings dataset](https://www.kaggle.com/mysarahmadbhat/airbnb-listings-reviews?select=Airbnb+Data).

2. After downloading to your personal computer, organize your data as shown below:

raw
├── TABLE-NAME-1
│ ├── LOAD00000001.csv
│ └── LOAD00000002.csv
└── TABLE-NAME-2
├── LOAD00000001.csv
└── LOAD00000002.csv

kotlin
Copy code
+ Specifically for the Airbnb listings dataset, structure the data as follows:
raw
├── listings
│ ├── LOAD00000001.csv
└── reviews
├── LOAD00000001.csv
  • You can download the pre-structured dataset from the link below.
  • raw.zip (103657 ko)
  • Structure your data in Amazon S3 so that each table is in a separate folder, with all the data in its respective group.

    1. In the Cloud9 interface, click File.
    • Click Upload Local Files

    Datalake

    1. Drag and drop the structured Dataset folder into the designated area.
    • The Dataset folder will be uploaded to Cloud 9 as shown below.

    Datalake

    In this step, you have successfully downloaded the Dataset and uploaded it to the Cloud9 instance. Next, we will check the encoding to ensure that the data is encoded in UTF-8 format.