AlexIoannides/pyspark-example-project

Implementing best practices for PySpark ETL jobs and applications.

PythonShellpythondata-sciencesparketlpysparkdata-engineeringetl-pipelineetl-job
This is stars and forks stats for /AlexIoannides/pyspark-example-project repository. As of 28 Apr, 2024 this repository has 1268 stars and 604 forks.

PySpark Example Project This document is designed to be read in parallel with the code in the pyspark-template-project repository. Together, these constitute what we consider to be a 'best practices' approach to writing ETL jobs using Apache Spark and its Python ('PySpark') APIs. This project addresses the following topics: how to structure ETL code in such a way that it can be easily tested and debugged; how to pass configuration parameters to a PySpark job; how to handle dependencies on other modules...
Read on GithubGithub Stats Page
repotechsstarsweeklyforksweekly
Bafomet666/BigbroCSSJavaScriptSCSS1230230
Chatnaut/ArclightCSSJavaScriptPHP1320100
zuri-training/Animation-Library---Team-92-RepoCSSHTMLJavaScript2080
docker-library/ghostDockerfileShell64803190
GDQuest/godot-platformer-2dGDScriptC++Python5460670
weaveworks/tf-controllerGoMakefileShell8940830
iand675/hikerHaskellShell0000
JamesWoolfenden/pikeGoHCLMakefile3540180
skorch-dev/skorchJupyter NotebookPython5.3k03630
akinsho/git-conflict.nvimLuaShell6220210