AWS Data Engineer

Rivago Infotech Inc • Los Angeles Metropolitan Area
Visa Sponsorship
Apply
AI Summary

We are seeking a highly skilled Data Engineer to design, implement, and optimize large-scale data pipelines on AWS using Databricks.

Key Highlights
Design, implement, and optimize large-scale data pipelines
Collaborate with cross-functional teams and business stakeholders
Work with business stakeholders to understand requirements and deliver tailored, high-quality data solutions
Key Responsibilities
Design, implement, and optimize large-scale data pipelines, ensuring scalability, reliability, and high performance.
Partner with downstream teams to prepare data for dashboards, analytics, and BI tools.
Work closely with business stakeholders to understand requirements and deliver tailored, high-quality data solutions.
Technical Skills Required
Databricks PySpark/Spark SQL Delta Lake Unity Catalog Lakehouse Architecture Table Triggers Databricks Runtime GitLab AWS cloud IAM Networking fundamentals Storage integration
Benefits & Perks
H1b Workable Near by Relocation
Hybrid work arrangement (Flexible hybrid work experience between the office and remote locations)
Nice to Have
Experience developing real-time or near real-time data solutions.
Knowledge of streaming frameworks such as Spark Structured Streaming.
Experience with Databricks Runtime configurations and advanced features.

Job Description


H1b Workable

Near by Relocation only



Job Title: AWS Databricks Data Engineer

Location : Los Angeles CA (Hybrid)

Hire type : FTE / CTH

Implementation partner - **********

End Client - Confidential

Interview mode: Video/Virtual


Job Description –

We are seeking a highly skilled AWS Data Engineer with strong expertise in SQL, Python, PySpark, Data Warehousing, and Cloud-based ETL to join our data engineering team. The ideal candidate will design, implement, and optimize large-scale data pipelines, ensuring scalability, reliability, and high performance. This role requires close collaboration with cross-functional teams and business stakeholders to deliver modern, efficient data solutions.


Key Responsibilities

1. Data Pipeline Development

  • Build and maintain scalable ETL/ELT pipelines using Databricks on AWS.
  • Leverage PySpark/Spark and SQL to transform and process large, complex datasets.
  • Integrate data from multiple sources including S3, relational/non-relational databases, and AWS-native services.

2. Collaboration & Analysis

  • Partner with downstream teams to prepare data for dashboards, analytics, and BI tools.
  • Work closely with business stakeholders to understand requirements and deliver tailored, high‑quality data solutions.

3. Performance & Optimization

  • Optimize Databricks workloads for cost, performance, and efficient compute utilization.
  • Monitor and troubleshoot pipelines to ensure reliability, accuracy, and SLA adherence.
  • Apply query optimization, Spark tuning, and shuffle minimization best practices when handling tens of millions of rows.

4. Governance & Security

  • Implement and manage data governance, access control, and security policies using Unity Catalog.
  • Ensure compliance with organizational and regulatory data‑handling standards.

5. Deployment & DevOps

  • Use Databricks Asset Bundles for deployment of jobs, notebooks, and configuration across environments.
  • Maintain effective version control of Databricks artifacts using GitLab or similar tools.
  • Use CI/CD pipelines to support automated deployments and environment setups.


Technical Skills (Required)

  • Strong expertise in Databricks (Delta Lake, Unity Catalog, Lakehouse Architecture, Table Triggers, Workflows, Delta Live Pipelines, Databricks Runtime, etc.).
  • Proven ability to implement robust PySpark solutions.
  • Hands‑on experience with Databricks Workflows & orchestration.
  • Solid knowledge of Medallion Architecture (Bronze/Silver/Gold).
  • Significant experience designing or rebuilding batch‑heavy data pipelines.
  • Strong background in query optimization, performance tuning, and Spark shuffle optimization.
  • Ability to handle and process tens of millions of records efficiently.
  • Familiarity with Genie enablement concepts (understanding required; deep experience optional).
  • Experience with CI/CD, environment setup, and Git-based development workflows.
  • Solid understanding of AWS cloud, including:
  • IAM
  • Networking fundamentals
  • Storage integration (S3, Glue Catalog, etc.)


Preferred Experience

  • Experience with Databricks Runtime configurations and advanced features.
  • Knowledge of streaming frameworks such as Spark Structured Streaming.
  • Experience developing real-time or near real-time data solutions.
  • Exposure to GitLab pipelines or similar CI/CD systems.


Certifications (Optional)

  • Databricks Certified Data Engineer Associate / Professional
  • AWS Data Engineer or AWS Solutions Architect certification


Similar Jobs

Explore other opportunities that match your interests

Business Systems Analyst

Data Science
•
2w ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Not Applicable

delta computer consulting

Los Angeles Metropolitan Area

Senior Data Scientist

Data Science
•
1h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

vanguard

United State

Logistics Risk Analyst

Data Science
•
6h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

jnd inc.

United State

Subscribe our newsletter

New Things Will Always Update Regularly