Mastering AWS Glue Unit Testing for PySpark Jobs with Pytest

  Рет қаралды 4,516

DataEng Uncomplicated

DataEng Uncomplicated

9 ай бұрын

This video is a step-by-step guide on how to write unit tests to test functions in a pyspark job that works on the AWS Glue Service. This video will cover how to write sample dataset to test our glue job transformations to make sure they are doing what we are expecting.
Buy me a Coffee - www.buymeacoffee.com/dataengu
Tutorial Links:
Configure Docker with AWS Glue Jobs - • Develop AWS Glue Jobs ...
#aws, #awsglue, #pytest, #pyspark

Пікірлер: 15
@thepravinbtech
@thepravinbtech 8 ай бұрын
Hi Data eng your knowledge in AWS and way of teaching is excellent could you please share the videos on CICD pipeline to deploy the glue jobs to production
@DataEngUncomplicated
@DataEngUncomplicated 8 ай бұрын
Thanks for the kind words! Yes actually this was going to be one of my next videos. How to deploy a glue job to terraform with terraform.
@0777deep
@0777deep 3 ай бұрын
Thanks !
@harshadk4264
@harshadk4264 3 ай бұрын
Do you use the Factory Design pattern?
@Angleito
@Angleito 3 ай бұрын
how do you add third party python libraries ?
@DataEngUncomplicated
@DataEngUncomplicated 3 ай бұрын
I don't know an elegant way to do this but you can go into the docker container and install the python libraries you need directly that way.
@joseluisvega3237
@joseluisvega3237 8 ай бұрын
I've been looking to develop some unit tests with pytest but I would like to mock everything related to the Glue Environment. I've been trying to do it through MonkeyPatch but the problem I have is when I transform the dybamicframe to dataFrame, it's also expecting a full mock of the dataFrame and it's functions. Any experience with that?
@DataEngUncomplicated
@DataEngUncomplicated 8 ай бұрын
Hi, Can you explain how your approach is different than how I created the unit test in the video? If you design your functions to do one particular thing, it makes it much easier to write unit tests for it.
@joseluisvega3237
@joseluisvega3237 8 ай бұрын
The approach is to be able to run the unit test without a glue environment, no docker image, pure local développement (my laptop). Mocking GlueContext and DybamicFrame. The tests would use the mocks of these instances so there's no interaction with AWS glue at all.
@DataEngUncomplicated
@DataEngUncomplicated 8 ай бұрын
Yea I don't know how you can achieve this. your environment you are running the glue jobs need to have the python libraries installed so you can execute the code. The way I set it up is I am doing 100% local development but glue is in a docker container. If you can't use docker, you need to install and set up spark directly on your local machine. I tried to do this following the documentation but it was messy and I couldn't get it to work in the end
@renyang2320
@renyang2320 3 ай бұрын
Your functions based job is quite straightforward. Would you like to organize your glue job in a Python class?
@DataEngUncomplicated
@DataEngUncomplicated 3 ай бұрын
I made the script just for this KZfaq video, sure things could be organized into classes if it makes sense?
@kckc1289
@kckc1289 3 ай бұрын
How would you recommend local dev and organization -> uploading to AWS for scripts with multiple files ?
@kckc1289
@kckc1289 3 ай бұрын
Do you have a Github for this Pytest example?
@DataEngUncomplicated
@DataEngUncomplicated 3 ай бұрын
Hey, checkout my videos on local development for AWS glue. I covered topics like using interactive sessions, pycharm and vs code with a docker container with AWS glue. In order to upload them, I recommend managing them with IaC with terraform or cdk.
A clash of kindness and indifference #shorts
00:17
Fabiosa Best Lifehacks
Рет қаралды 106 МЛН
Зачем он туда залез?
00:25
Vlad Samokatchik
Рет қаралды 3 МЛН
Looks realistic #tiktok
00:22
Анастасия Тарасова
Рет қаралды 104 МЛН
Learn to Efficiently Test ETL Pipelines
35:13
Databricks
Рет қаралды 10 М.
How To Write Unit Tests in Python • Pytest Tutorial
35:34
pixegami
Рет қаралды 134 М.
Unit testing Python code using Pytest + GitHub Actions
23:02
AWS Hands-On: ETL with Glue and Athena
22:35
Cumulus Cycles
Рет қаралды 25 М.
Author AWS Glue jobs with PyCharm Using AWS Glue Interactive Sessions
12:23
DataEng Uncomplicated
Рет қаралды 13 М.
Unit testing with Databricks | Jonathan Neo | November 2021
44:45
Melbourne Databricks User Group
Рет қаралды 16 М.