Responsibilities
Create and maintain optimal data pipeline architecture; assemble large, complex data sets that meet
functional / non-functional requirements.
Design the right schema to support the functional requirement and consumption patter.
Design and build production data pipelines from ingestion to consumption.
Build the necessary datamarts, data warehouse required for optimal extraction, transformation, and
loading of data from a wide variety of data sources.
Create necessary preprocessing and postprocessing for various forms of data for training/ retraining
and inference ingestions as required.
Create data visualization and business intelligence tools for stakeholders and data scientists for
necessary business/ solution insights.
Identify, design, and implement internal process improvements: automating manual data processes,
optimizing data delivery, etc.
Ensure our data is separated and secure across national boundaries through multiple data centers and
AWS regions.
Requirements
You should have a bachelors or master’s degree in computer science, Information Technology or
other quantitative fields
You should have at least 5 years working as a data engineer in supporting large data transformation
initiatives related to machine learning, with experience in building and optimizing pipelines and data
sets
Strong analytic skills related to working with unstructured datasets.
Experience with AWS cloud services: EC2, EMR, RDS, Redshift, S3, Athena and familiarity with
various log formats from AWS.
Experience with object-oriented/object function scripting languages: Python, Pyspark, Java, C++, etc.
General
Experience in, Dbeaver tool, AWS Glue ETL, AWS Crawler, AWS Lambda, Glue Data Catalog,
AWS Glue Studio.
Experience with big data tools: Hadoop, Spark, Kafka, etc.
Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
Experience with stream-processing systems: Storm, Spark-Streaming, etc.
You should be a good team player and committed for the success of team and overall project.