Founding Software Engineer
Build the web data infrastructure that AI agents need. Don't make them use our outdated tools, bring bespoke actual search, crawl, and indexing systems that work for agents at scale.
About Zipf.AI
We're building web data infrastructure for interacting with external information. The existing options were built decades ago for humans clicking on 10 blue links. Today, AI agents need something different: control, precision, state, and the ability to tune every knob.
Our founding team has built search and LLM systems at Microsoft Bing, Snowflake, Neeva, Walmart, Qualtrics, and Mendel.AI. We've published dozens of papers with thousands of citations, hold dozens of patents, and shipped systems handling billions of queries daily.
Now we're creating reliable infrastructure for search, crawl, and workflow — with all the controls and customization businesses actually need.
We are not building yet another hype-chasing company that works its employees to the bone. We are a deeply curious and genuine group of people. While most of us could talk shop for all hours of the day we heavily encourage everyone to get healthy doses of reality.
The Role
We are building an entirely novel web data layer. There is simply no better learning opportunity.
We are ready to hit the ground running and delight our customers. We need software engineers who are scrappy and curious.
We are obsessed with metrics of true quality. We are thrilled to be building completely outside the gravitational pull of ad-dollars.
Choose your own job title. Grow into the Head of Product, Head of Modeling Quality, or Head of Forward Deployed Engineering.
What You'll Build
Reality: you'll work on whatever needs doing. We are racing to delight customers and get real durable revenue on the books. Here's what we're shipping in the next 6 months:
Crawling & Workflows
- • Intelligent, scalable web crawlers (Playwright, Puppeteer)
- • Pagination, sitemap generation, link discovery
- • JavaScript rendering, rate limiting, anti-bot handling
- • Content extraction pipelines with quality validation
Search Infrastructure
- • Mixture-of-indexes L0 Search Architecture
- • Session-based search with state
- • Dynamic Multi-Stage Search Architecture
- • Hybrid retrieval (BM25, dense vectors, sparse-dense fusion)
Quality & Modeling
- • Query classification and routing (PyTorch, transformers)
- • Relevance modeling (infoNCE vs BCE loss functions)
- • Embedding models and reranking systems
- • Understanding when we're serving good vs bad results
Infrastructure & Operations
- • Inference optimization (vLLM, PISA, TRT-LLM)
- • API reliability, monitoring, scaling (PostgreSQL, Redis)
- • Prototype to production infrastructure
- • Customer debugging and optimization
What We're Looking For
The Intersection
You sit somewhere at the intersection of:
- Infrastructure builders who design systems that scale and don't fall over.
Experience with: Distributed systems, PostgreSQL, Redis, AWS/GCP, API design, monitoring & observability
- Modelers who understand ML/AI and can improve retrieval quality
Experience with: PyTorch, transformers, embedding models, loss functions (infoNCE, BCE), inference optimization (vLLM, TRT-LLM, PISA)
- Forward-deployed engineers who talk to customers and ship what they need
Experience with: Customer-facing technical work, debugging production issues, rapid prototyping, turning feedback into features
Customer Obsession
- • Genuinely care about solving customer problems, not just elegant code
- • Willing to do whatever customers need - even if it's "not your job"
- • Take ownership of outcomes, not just outputs
Technical Fundamentals
We care more about how you think than your specific tech stack. That said, you should have:
- • Strong fundamentals in distributed systems and web infrastructure
- • Experience with search/retrieval, web scraping, ML systems, or high-scale backend
- • Comfort with Python, TypeScript, or C++ (ideally multiple)
- • Familiarity with modern ML tooling (PyTorch, transformers, inference optimization) OR deep systems experience
- • Willingness to write code that may not exist in a few months — iteration over perfection
Early-Stage Mindset
- • We're pre-PMF. You'll build things that get thrown away and may crack as we scale. You're excited by that.
- • Default to action and iteration over lengthy planning
- • Comfortable with ambiguity and changing priorities
- • Want ownership and impact more than a defined role
Our 6-Month Goals (You'll Help Us Get There)
- 1.Product: Launch Search + Crawl APIs publicly with real customers using them in production with 5B+ documents in index and 200m + Documents crawled per day.
- 2.Customers: 5 major case studies with custom micro web indexes solving their specific business problems. Driving $X0,000/month in revenue.
- 3.Proof: Demonstrate that our mixture-of-indexes + session-based approach actually delivers better results for agent use cases
The Reality Check
What Makes This Hard
- • Building genuinely novel infrastructure (no established patterns)
- • Context switching between crawling bugs, modeling, and customer calls
- • Fast pace - things break, priorities shift
- • Early-stage compensation (running lean)
What Makes This Rewarding
- • Real technical challenges building search from scratch
- • Huge impact as one of the first engineers
- • Learn from a team that's built billion-scale systems
- • Early-stage equity (at very modest hype-free valuation) that can easily 10x in the next two years.
Tech Stack Deep Dive
This is our current stack, but we're pragmatic — if something better fits the problem, we'll use it.
Backend & Data
- • Python 3.11+ - Core services
- • FastAPI / Flask - API layer
- • PostgreSQL 15+ - Primary datastore
- • Redis 7+ - Caching & queues
- • Supabase - Backend services
ML & AI
- • PyTorch 2.0+ - Model training
- • Transformers - Hugging Face ecosystem
- • vLLM - Fast inference serving
- • TRT-LLM / PISA - Optimization
- • Claude / GPT-4 - LLM APIs
Frontend & Infra
- • Next.js 14+ - React framework
- • TypeScript - Type safety
- • Tailwind CSS - Styling
- • AWS - Cloud infrastructure
- • Docker - Containerization
Crawling & Scraping
- • Playwright - Browser automation
- • Puppeteer - Headless Chrome
- • BeautifulSoup / lxml - HTML parsing
- • Scrapy - Crawling framework
- • Proxy management - Anti-bot handling
Search & Retrieval
- • BM25 - Lexical search
- • Dense embeddings - Semantic search
- • FAISS / Milvus - Vector indexes
- • Elasticsearch - Full-text search
- • Custom rankers - Hybrid retrieval
Ops & Monitoring
- • GitHub Actions - CI/CD
- • Vercel - Frontend hosting
- • Sentry - Error tracking
- • Prometheus / Grafana - Metrics
- • PostHog - Product analytics
Compensation
We believe in transparent, fair compensation. Salary and equity scale with experience and impact potential.
Early Career (0-2 years)
EntryStrong fundamentals, eager to learn, excited about early-stage chaos
Mid-Level (3-7 years)
CoreProven ability to ship, comfortable with ambiguity, can own entire features
Senior+ (8+ years)
LeadershipDeep expertise in search/ML/infra, can architect systems, comfortable leading and mentoring
Note: Equity is at a very modest, hype-free valuation. We raised at reasonable terms and aren't playing the Silicon Valley valuation inflation game. Your equity has real potential to 10x+ in the next 2-3 years as we hit product-market fit and scale.
Details
Location
Flexible, with strong preference for being in NYC a few days per month. Remote-first culture, but we believe some in-person time builds better teams. Quarterly Offsites to focus and recharge.
Benefits
We take care of our team: unlimited vacation (honor system), company-wide holiday closures (Christmas week + Fourth of July week), flexible hybrid schedule, comprehensive medical/dental, 401(k) match, generous parental leave (6 months birthing / 3 months non-birthing), conference travel support, and team offsites twice a year.
View full benefits packageReady to Apply?
Send us your resume, a note about why you're interested, and something you've built that you're proud of.
We care about what you can do and how you think, not credentials.
Apply NowQuestions?
"Is this more ML or infra?"
Yes. Both. And product. And ops. That's the point.
"What's the tech stack?"
Python, TypeScript, C++, PostgreSQL, Redis, Next.js, AWS. For ML: PyTorch, transformers, vLLM, TRT-LLM. For crawling: Playwright, Puppeteer. We care more about picking the right tool than religious adherence to specific tech.
"Do I need a PhD?"
No. Some of our employees have advanced degrees, but we care about ability to ship and solve problems.
"What if I haven't done search before?"
That's fine. If you've built scalable systems or worked on ML infrastructure, you'll figure it out.
Zipf is an equal opportunity employer. We value diversity and are committed to creating an inclusive environment for all employees.