Data Scientist

Data Engineer

Results-driven Data Engineer with 3+ years of expertise in designing, building, and optimizing scalable data pipelines, ETL/ELT workflows, and Enterprise Data Warehouses (EDW). Proficient in big data processing, distributed computing, and cloud-native architectures, leveraging AWS, Snowflake, Apache Spark, and dbt. Strong background in SQL optimization, data modeling, workflow orchestration, and security compliance (GDPR, HIPAA, SOC 2). Adept at automating data governance, monitoring, and cost-efficient cloud workflows, ensuring high-performance, reliable, and secure data infrastructure for business intelligence and analytics.


Experience: 4 years

Yearly salary: $100,000

Hourly rate: $50

Nationality: 🇮🇳 India

Residency: 🇺🇸 United States


Experience

Associate Consultant
Oracle
2022 - 2023
Directed cross-functional collaboration to transition international banking branches (New York, London, Dubai) into live production, streamlining workflows for 500+ employees and achieving a 20% boost in service delivery speed. Engaged in quantitative analysis and problem-solving initiatives, resulting in strategic insights that supported 3 major banking projects, optimizing existing trading strategies and contributing to a 15% reduction in operational costs. Designed and automated 15+ banking reports with tools like Oracle Analytics Server and BI Publisher, boosting data efficiency by 30%, reducing workload by 25% and enhancing decision-making.
Data Engineer
DXC Technologies
2021 - 2022
Collaborated with business stakeholders to gather data pipeline supplies, defining ETL workflows and data models to process 10+ TB daily using Apache Spark (PySpark) & Hadoop HDFS, optimizing customer behavior analysis for 30% better targeting. Developed and optimized PySpark ETL scripts, easing transformation latency by 40% while ensuring 99.9% data accuracy via schema validation, deduplication, and cleansing, refining data consistency for downstream analytics and industry intelligence. Automated 1,000+ daily/hourly batch jobs using Apache Airflow, implementing failure recovery and retry mechanisms, reducing manual intervention by 90% and ensuring 99.5% on-time data availability for analytics and reporting teams. Optimized data storage and retrieval by implementing Hadoop HDFS partitioning & compression (Snappy, Gzip), cutting storage costs by 25% while improving read/write speeds, accelerating data processing for high-volume analytical workloads. Designed and maintained Apache Hive external tables, enabling petabyte-scale distributed SQL querying, reducing ad-hoc query execution time by 50%, and empowering business teams with structured, self-serve data insights for decision-making. Integrated Azure Data Lake Storage & Databricks, leveraging distributed computing, caching, and parallel processing to accelerate transformations by 3x, improving cloud-based data workflows, collaboration, and scalability. Implemented real-time monitoring and security using Apache Airflow UI & Log4j, reducing MTTR by 60% while enforcing SSL/TLS encryption & RBAC controls, ensuring compliance with enterprise security and data governance policies.

Skills

data-science
english