AWS Cloud Data Engineer

AWS Data Engineering

Introduction

Difference between ETL and ELT
What is DBT
What is DBT Cloud
Essential steps in the development of data pipelines
Need for ELT templatization
Pre-requisites, installing DBT and configuring a DBT project.
Discussion on supporting databases
Creating a Snowflake account and creating needed database objects
Introduction to Jinja

Kinesis

AWS Kinesis and sub-services

- Data Firehose

- Create a Firehose client using boto3.

- Data Streams

- List the Data streams under the Firehose

- Delete the Data streams

- Create a Data stream

import boto3, pandas as pd
# Load Data
data_url = "https://assets.datacamp.com/production/repositories/5668/datasets/6bba555e0e42ae31d1d634256679db718cfb8d76/vehicles.csv"

records = pd.read_csv(data_url).sample(100)

# Create a firehose client
firehose = boto3.client('firehose',
aws_access_key_id="None",
aws_secret_access_key="None",
region_name='us-east-1',
endpoint_url="http://localhost:4573")

# Create a s3 client

s3 = boto3.client('s3',
aws_access_key_id="None",
aws_secret_access_key="None",
region_name='us-east-1',
endpoint_url="http://localhost:4572")

# Create s3 bucket
s3.create_bucket(Bucket='sd-vehicle-data')

# Create a Firehose delivery stream
res = firehose.create_delivery_stream(
DeliveryStreamName="gps-delivery-stream",
DeliveryStreamType="DirectPut",

# specify the S3 bucket, which is our destination
S3DestinationConfiguration = {
"BucketARN": "arn:aws:s3:::sd-vehicle-data",
"RoleARN": "arn:aws:iam::0000000:role/firehoseDeliveryRole"
})
# Print the stream ARN
print("Firehose Stream ARN is: {}".format(res['DeliveryStreamARN']))

for idx, row in records.iterrows():
    payload = ' '.join(str(value) for value in row)
    payload = payload + "\n"
   print("Sending payload: {}".format(payload))
   res = firehose.put_record(
        DeliveryStreamName = 'gps-delivery-stream',
        Record = {'Data': payload})
   print("Record Id is: {}".format(res['RecordId']))

objects = s3.list_objects(Bucket='sd-vehicle-data')['Contents']

df = []
for obj in objects:
    data_file = s3.get_object(Bucket='sd-vehicle-data', Key=obj['Key'])
    dfs.append(pd.read_csv(data_file['Body'], delimiter = " ", names= ["record_id", "timestamp", "vin", "lon", "lat", "speed"]))
data = pd.concat(dfs)
print(data.groupby(['vin'])['speed'].max())

- Data Analytics

Introduction to Lambda & Step unctions

What is a Lambda function and when to use the Lambda functions
Limitations of Lambda functions
AWS Lambda features
What is a Step function and when to use it
Features of Step functions
Creating a Lambda function from the blueprint
Adding CloudWatch events and alerts
Creating an alarm
Using AWS CLI to create Lambda function
Using AWS CLI to CloudWatch events and alerts

Creating an AWS free account and Launch the EC2 Instance

Selecting the aws Linux free tire t2.micro
Create a role and assign the AmazonS3FullAccess profile.
Start working in the EC2 instance (ssh command).
ssh -i "C:\Users\farid\Downloads\Farid_EC2_Keys.pem" ec2-user@ec2-13-233-115-179.ap-south-1.compute.amazonaws.com

Creating and using the virtual environment

Creating a dependencies/ requirements file (etl_requirments.txt) for Lambda to use.

Working with Lambda

What is a Lambda layer?
Create a Lambda layer
Creating a Lambda function and adding layers
Adding the code to the Lambda function and testing.
Creating rules

Performing data testing

Advantages of the Test feature
Working with basic tests and tests folder
Unique, not null, and relationship/or references
Running the DBT tests
Wring the custom tests

Working with DBT Cloud

AWS - Data Engineering

AWS Data Engineering

This website uses cookies.