Join Shop ONline New York as a Python Developer Intern to build robust web scrapers and data pipelines. Design, build, and maintain scrapers and API clients to collect structured data from approved sources. Collaborate with analytics, merchandising, and engineering teams to ensure accurate and timely datasets.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Nice to Have
Job Description
Non-Paid Internship for Python Developer
Company: Shop ONline New York
Location: Fully Remote
Duration: 3 months
Number of Hours Per Week: 20
About Shop ONline New York:
We are an e-commerce company preparing for launch. To power pricing intelligence, catalog quality, and market insights, we need reliable data pipelines. This internship is for a -focused web scraping and data acquisition specialist who can build robust collectors, normalize results to a schema, and deliver clean datasets to stakeholders.
Role Overview:
You will design, build, and maintain scrapers and API clients that collect structured data from approved sources. Your work includes pagination, authentication, retries, logging, and error handling, followed by normalization into a defined schema and export to Excel/Google Sheets. You will collaborate with analytics, merchandising, and engineering to ensure datasets are accurate, timely, and usable.
Key Responsibilities:
- Build scrapers using requests and BeautifulSoup; use Selenium or Playwright only when browser automation is required.
- Integrate REST APIs; handle pagination, rate limits, and authentication (API keys, OAuth) reliably.
- Implement robust controls: retries with backoff, structured logging, exception handling, and idempotent runs.
- Normalize raw data to a defined schema and maintain a clear data dictionary with field types and definitions.
- Deduplicate records, validate types, and add metadata (source, timestamps, run status) for auditability.
- Export cleaned datasets to CSV/XLSX and Google Sheets; prepare concise data notes for recipients.
- Schedule and version jobs (cron/GitHub Actions), maintain requirements.txt/poetry, and document setup and usage.
- Follow legal, policy, and ethical guidelines: respect robots.txt, terms of service, privacy, and compliance standards.
- Collaborate with stakeholders to refine requirements and adjust schemas as business needs evolve.
Interested in remote work opportunities in Development & Programming? Discover Development & Programming Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
Qualifications:
- Experience building scrapers with requests and BeautifulSoup; ability to use Selenium or Playwright when necessary.
- Strong pandas skills for cleaning, joining, reshaping, validation, and export to Excel/Google Sheets.
- Comfortable integrating REST APIs with pagination and authentication flows.
- Demonstrated ability to design resilient scripts with retries, logging, and error handling.
- Experience normalizing data into a predefined schema and maintaining a data dictionary.
- Git proficiency and familiarity with virtual environments and dependency management.
- Understanding of ethical scraping practices, robots.txt, and rate limiting.
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
Nice to Have:
- Use of tenacity/logging libraries, Pydantic for validation, or Prefect/Airflow for orchestration.
- Basic SQL for loading data into a warehouse; experience with Google Sheets API.
- Proxy management and captcha handling where permitted and appropriate.
Exceptional Internship Benefits:
- Ship real data pipelines used by leadership for decisions.
- Mentorship and code reviews from experienced engineers.
- A portfolio of production-grade scrapers, API clients, and normalized datasets.
- Certificate of completion and, based on performance, a detailed letter of recommendation.
Similar Jobs
Explore other opportunities that match your interests
Jobgether
Angular Front-End Developer
ICF