Founding Software Engineer

NYC / RemoteFull-time$150k-$275k + 0.5%-3% equity

Build the web data infrastructure that AI agents need. Don't make them use our outdated tools, bring bespoke actual search, crawl, and indexing systems that work for agents at scale.

About Zipf.AI

We're building web data infrastructure for interacting with external information. The existing options were built decades ago for humans clicking on 10 blue links. Today, AI agents need something different: control, precision, state, and the ability to tune every knob.

Our founding team has built search and LLM systems at Microsoft Bing, Snowflake, Neeva, Walmart, Qualtrics, and Mendel.AI. We've published dozens of papers with thousands of citations, hold dozens of patents, and shipped systems handling billions of queries daily.

Now we're creating reliable infrastructure for search, crawl, and workflow — with all the controls and customization businesses actually need.

We are not building yet another hype-chasing company that works its employees to the bone. We are a deeply curious and genuine group of people. While most of us could talk shop for all hours of the day we heavily encourage everyone to get healthy doses of reality.

The Role

We are building an entirely novel web data layer. There is simply no better learning opportunity.

We are ready to hit the ground running and delight our customers. We need software engineers who are scrappy and curious.

We are obsessed with metrics of true quality. We are thrilled to be building completely outside the gravitational pull of ad-dollars.

Choose your own job title. Grow into the Head of Product, Head of Modeling Quality, or Head of Forward Deployed Engineering.

What You'll Build

Reality: you'll work on whatever needs doing. We are racing to delight customers and get real durable revenue on the books. Here's what we're shipping in the next 6 months:

Crawling & Workflows

• Intelligent, scalable web crawlers (Playwright, Puppeteer)
• Pagination, sitemap generation, link discovery
• JavaScript rendering, rate limiting, anti-bot handling
• Content extraction pipelines with quality validation

Search Infrastructure

• Mixture-of-indexes L0 Search Architecture
• Session-based search with state
• Dynamic Multi-Stage Search Architecture
• Hybrid retrieval (BM25, dense vectors, sparse-dense fusion)

Quality & Modeling

• Query classification and routing (PyTorch, transformers)
• Relevance modeling (infoNCE vs BCE loss functions)
• Embedding models and reranking systems
• Understanding when we're serving good vs bad results

Infrastructure & Operations

• Inference optimization (vLLM, PISA, TRT-LLM)
• API reliability, monitoring, scaling (PostgreSQL, Redis)
• Prototype to production infrastructure
• Customer debugging and optimization

What We're Looking For

The Intersection

You sit somewhere at the intersection of:

Infrastructure builders who design systems that scale and don't fall over.
Experience with: Distributed systems, PostgreSQL, Redis, AWS/GCP, API design, monitoring & observability
Modelers who understand ML/AI and can improve retrieval quality
Experience with: PyTorch, transformers, embedding models, loss functions (infoNCE, BCE), inference optimization (vLLM, TRT-LLM, PISA)
Forward-deployed engineers who talk to customers and ship what they need
Experience with: Customer-facing technical work, debugging production issues, rapid prototyping, turning feedback into features

Customer Obsession

• Genuinely care about solving customer problems, not just elegant code
• Willing to do whatever customers need - even if it's "not your job"
• Take ownership of outcomes, not just outputs

Technical Fundamentals

We care more about how you think than your specific tech stack. That said, you should have:

• Strong fundamentals in distributed systems and web infrastructure
• Experience with search/retrieval, web scraping, ML systems, or high-scale backend
• Comfort with Python, TypeScript, or C++ (ideally multiple)
• Familiarity with modern ML tooling (PyTorch, transformers, inference optimization) OR deep systems experience
• Willingness to write code that may not exist in a few months — iteration over perfection

Early-Stage Mindset

• We're pre-PMF. You'll build things that get thrown away and may crack as we scale. You're excited by that.
• Default to action and iteration over lengthy planning
• Comfortable with ambiguity and changing priorities
• Want ownership and impact more than a defined role

Our 6-Month Goals (You'll Help Us Get There)

1.
Product: Launch Search + Crawl APIs publicly with real customers using them in production with 5B+ documents in index and 200m + Documents crawled per day.
2.
Customers: 5 major case studies with custom micro web indexes solving their specific business problems. Driving $X0,000/month in revenue.
3.
Proof: Demonstrate that our mixture-of-indexes + session-based approach actually delivers better results for agent use cases

The Reality Check

What Makes This Hard

• Building genuinely novel infrastructure (no established patterns)
• Context switching between crawling bugs, modeling, and customer calls
• Fast pace - things break, priorities shift
• Early-stage compensation (running lean)

What Makes This Rewarding

• Real technical challenges building search from scratch
• Huge impact as one of the first engineers
• Learn from a team that's built billion-scale systems
• Early-stage equity (at very modest hype-free valuation) that can easily 10x in the next two years.

Tech Stack Deep Dive

This is our current stack, but we're pragmatic — if something better fits the problem, we'll use it.

Backend & Data

• Python 3.11+ - Core services
• FastAPI / Flask - API layer
• PostgreSQL 15+ - Primary datastore
• Redis 7+ - Caching & queues
• Supabase - Backend services

ML & AI

• PyTorch 2.0+ - Model training
• Transformers - Hugging Face ecosystem
• vLLM - Fast inference serving
• TRT-LLM / PISA - Optimization
• Claude / GPT-4 - LLM APIs

Frontend & Infra

• Next.js 14+ - React framework
• TypeScript - Type safety
• Tailwind CSS - Styling
• AWS - Cloud infrastructure
• Docker - Containerization

Crawling & Scraping

• Playwright - Browser automation
• Puppeteer - Headless Chrome
• BeautifulSoup / lxml - HTML parsing
• Scrapy - Crawling framework
• Proxy management - Anti-bot handling

Search & Retrieval

• BM25 - Lexical search
• Dense embeddings - Semantic search
• FAISS / Milvus - Vector indexes
• Elasticsearch - Full-text search
• Custom rankers - Hybrid retrieval

Ops & Monitoring

• GitHub Actions - CI/CD
• Vercel - Frontend hosting
• Sentry - Error tracking
• Prometheus / Grafana - Metrics
• PostHog - Product analytics

Compensation

We believe in transparent, fair compensation. Salary and equity scale with experience and impact potential.

Early Career (0-2 years)

Entry

$150k-$200k + 1.0-2.0% equity

Strong fundamentals, eager to learn, excited about early-stage chaos

Mid-Level (3-7 years)

Core

$200k-$250k + 1.5-2.5% equity

Proven ability to ship, comfortable with ambiguity, can own entire features

Senior+ (8+ years)

Leadership

$250k-$275k + 2.0-3.0% equity

Deep expertise in search/ML/infra, can architect systems, comfortable leading and mentoring

Note: Equity is at a very modest, hype-free valuation. We raised at reasonable terms and aren't playing the Silicon Valley valuation inflation game. Your equity has real potential to 10x+ in the next 2-3 years as we hit product-market fit and scale.

Details

Location

Flexible, with strong preference for being in NYC a few days per month. Remote-first culture, but we believe some in-person time builds better teams. Quarterly Offsites to focus and recharge.

Benefits

We take care of our team: unlimited vacation (honor system), company-wide holiday closures (Christmas week + Fourth of July week), flexible hybrid schedule, comprehensive medical/dental, 401(k) match, generous parental leave (6 months birthing / 3 months non-birthing), conference travel support, and team offsites twice a year.

View full benefits package

Ready to Apply?

Send us your resume, a note about why you're interested, and something you've built that you're proud of.

We care about what you can do and how you think, not credentials.

Apply Now

Questions?

"Is this more ML or infra?"

Yes. Both. And product. And ops. That's the point.

"What's the tech stack?"

Python, TypeScript, C++, PostgreSQL, Redis, Next.js, AWS. For ML: PyTorch, transformers, vLLM, TRT-LLM. For crawling: Playwright, Puppeteer. We care more about picking the right tool than religious adherence to specific tech.

"Do I need a PhD?"

No. Some of our employees have advanced degrees, but we care about ability to ship and solve problems.

"What if I haven't done search before?"

That's fine. If you've built scalable systems or worked on ML infrastructure, you'll figure it out.

Zipf is an equal opportunity employer. We value diversity and are committed to creating an inclusive environment for all employees.