Python Developer Intern - Web Scraping and Data Acquisition

tony, llc • United State
Remote
Apply
AI Summary

Join Shop ONline New York as a Python Developer Intern to build robust web scrapers and data pipelines. Design, build, and maintain scrapers and API clients to collect structured data from approved sources. Collaborate with analytics, merchandising, and engineering teams to ensure accurate and timely datasets.

Key Highlights
Design and build web scrapers and API clients
Collect structured data from approved sources
Collaborate with analytics, merchandising, and engineering teams
Key Responsibilities
Design, build, and maintain scrapers and API clients
Collect structured data from approved sources
Normalize raw data to a defined schema
Export cleaned datasets to CSV/XLSX and Google Sheets
Technical Skills Required
Python requests BeautifulSoup Selenium Playwright pandas REST APIs Git virtual environments dependency management
Benefits & Perks
Mentorship and code reviews from experienced engineers
Portfolio of production-grade scrapers, API clients, and normalized datasets
Certificate of completion and letter of recommendation
Nice to Have
Use of tenacity/logging libraries
Pydantic for validation
Prefect/Airflow for orchestration

Job Description


Non-Paid Internship for Python Developer


Company: Shop ONline New York

Location: Fully Remote

Duration: 3 months

Number of Hours Per Week: 20


About Shop ONline New York:

We are an e-commerce company preparing for launch. To power pricing intelligence, catalog quality, and market insights, we need reliable data pipelines. This internship is for a -focused web scraping and data acquisition specialist who can build robust collectors, normalize results to a schema, and deliver clean datasets to stakeholders.


Role Overview:


You will design, build, and maintain  scrapers and API clients that collect structured data from approved sources. Your work includes pagination, authentication, retries, logging, and error handling, followed by normalization into a defined schema and export to Excel/Google Sheets. You will collaborate with analytics, merchandising, and engineering to ensure datasets are accurate, timely, and usable.


Key Responsibilities:


  • Build  scrapers using requests and BeautifulSoup; use Selenium or Playwright only when browser automation is required.
  • Integrate REST APIs; handle pagination, rate limits, and authentication (API keys, OAuth) reliably.
  • Implement robust controls: retries with backoff, structured logging, exception handling, and idempotent runs.
  • Normalize raw data to a defined schema and maintain a clear data dictionary with field types and definitions.
  • Deduplicate records, validate types, and add metadata (source, timestamps, run status) for auditability.
  • Export cleaned datasets to CSV/XLSX and Google Sheets; prepare concise data notes for recipients.
  • Schedule and version jobs (cron/GitHub Actions), maintain requirements.txt/poetry, and document setup and usage.
  • Follow legal, policy, and ethical guidelines: respect robots.txt, terms of service, privacy, and compliance standards.
  • Collaborate with stakeholders to refine requirements and adjust schemas as business needs evolve.


Qualifications:


  • Experience building scrapers with  requests and BeautifulSoup; ability to use Selenium or Playwright when necessary.
  • Strong pandas skills for cleaning, joining, reshaping, validation, and export to Excel/Google Sheets.
  • Comfortable integrating REST APIs with pagination and authentication flows.
  • Demonstrated ability to design resilient scripts with retries, logging, and error handling.
  • Experience normalizing data into a predefined schema and maintaining a data dictionary.
  • Git proficiency and familiarity with virtual environments and dependency management.
  • Understanding of ethical scraping practices, robots.txt, and rate limiting.


Nice to Have:


  • Use of tenacity/logging libraries, Pydantic for validation, or Prefect/Airflow for orchestration.
  • Basic SQL for loading data into a warehouse; experience with Google Sheets API.
  • Proxy management and captcha handling where permitted and appropriate.


Exceptional Internship Benefits:


  • Ship real data pipelines used by leadership for decisions.
  • Mentorship and code reviews from experienced engineers.
  • A portfolio of production-grade scrapers, API clients, and normalized datasets.
  • Certificate of completion and, based on performance, a detailed letter of recommendation.



Similar Jobs

Explore other opportunities that match your interests

Senior Director, Product Design

Programming
•
4h ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Director

Jobgether

United State

Angular Front-End Developer

Programming
•
4h ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

ICF

United State
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Associate

Jobgether

United State

Subscribe our newsletter

New Things Will Always Update Regularly