Table of Contents
- Why We Needed to Leave Shopify Search & Filtering Behind
- Our Approach to Technology Selection
- Why Not Shopify Apps or Elasticsearch?
- Why Typesense Was the Right Fit
- The Architecture: Two Codebases Working Together
- Part 1: Backend Pipeline: Extracting Data from Shopify
- Part 2: Backend Pipeline: Data Transformations
- Part 3: Typesense Schema Design: Thinking About Query Patterns
- Part 4: The Sync Pipeline Workflow
- Part 5: Replacing Shopify's UI
- Potential Improvements
- Conclusion
A deep dive into the thought process and implementation logic behind building lightning-fast search and filtering for 10,000+ artworks using Typesense.
Why We Needed to Leave Shopify Search & Filtering Behind
MeMeraki.com is an e-commerce platform for handcrafted Indian artwork, working with master artisans across the country to bring traditional art forms online. Our catalog spans more than 10,000 original artworks, making it one of the largest collections of Indian arts and crafts available for purchase on the internet.
In 2024, while aggressively scaling this catalog, we ran into an unexpected and silent failure. Search and filtering abruptly disappeared from our most important collection page: https://www.memeraki.com/collections/paintings
A lot of first time visitors either organically land up here from Google or choose to browse this collection first to get an idea of length and breadth of artworks we cover.
Nothing had changed on our end. There were no recent theme deployments, no configuration updates, and no new features toggled on. After an hour of confusion and panic, we identified the root cause. Sometime during that day, our catalog size had breached the 5,000 product threshold. We had hit a hard, fundamental limitation of Shopify: once a collection exceeds 5,000 items, the platform natively drops all sorting and filtering capabilities on that page.
This was not a minor usability regression. It broke the core discovery experience. Customers could no longer filter artworks by artist, artform, or price. Sorting by newest arrivals or price was unavailable. Users were left to scroll through thousands of products in a single unstructured list.
At the same time, Shopify’s native search was already showing its limits. Our product taxonomy is deeply layered. Each artwork carries attributes such as artform like Madhubani, Warli, or Pichwai, artist name, region of origin, color palette, and thematic context. The search engine struggled with typos, failed to match close variants, and could not consistently rank results by relevance across these dimensions.
We decided that since we needed an external solution for collection filtering anyway, we might as well replace the entire search experience with something purpose-built.
Our Approach to Technology Selection
Before diving into specific tools, it's worth explaining how we evaluate technology at MeMeraki. As a two-person team with limited budget and time, we can't afford to chase shiny objects or spend months on implementation. Every technology decision is filtered through a set of principles shaped by our constraints.
The Reality of a Small Team
Most technical blog posts come from companies with dedicated infrastructure teams, DevOps engineers, and months of runway for experimentation. That's not us. Our two people team (as of writing this) has to split time between matinings what’s already out there and engineering new features we want to add.
This constraint is actually clarifying. It forces us to be ruthless about what matters and honest about what we can maintain.
Our Selection Criteria
When evaluating any technology, we ask five questions:
1. Can we be productive in days, not weeks?
Learning curves matter when you're resource-constrained. If a tool requires extensive training, complex configuration, or deep expertise to use effectively, it's probably not for us, regardless of how powerful it is. We prioritize tools with sensible defaults, clear documentation, and quick time-to-value.
2. Does it solve our specific problem without excess complexity?
Enterprise tools often bundle features we'll never use. That complexity isn't free – it's more surface area for bugs, more documentation to wade through, more cognitive overhead. We prefer focused tools that do one thing well over platforms that do everything adequately.
3. What's the total cost of ownership?
Sticker price is just the beginning. We factor in: time spent on setup and maintenance, operational burden (monitoring, upgrades, incident response), pricing predictability at scale, and hidden costs like egress fees or per-operation charges. A "free" tool that requires a dedicated server and weekly maintenance isn't actually free.
4. What happens if we need to leave?
Vendor lock-in is a real risk for small teams. If a service shuts down, raises prices dramatically, or degrades in quality, can we migrate away without rebuilding from scratch? We favor tools with open-source alternatives, standard data formats, and portable patterns.
5. Does it integrate with what we already have?
We're not greenfield. We run on Shopify, use specific frontend patterns, and have existing workflows. Any new tool needs to fit into this ecosystem without requiring us to rebuild everything around it.
Shopify Search Apps
The Shopify App Store has dozens of search enhancement apps. We evaluated several, but they all share fundamental limitations:
-
Invasive Template Overrides: The majority of applications we audited operate by forcibly replacing native collection and search page templates with their own proprietary code. Adopting this architecture meant permanently sacrificing our custom user interface enhancements and severely crippling our ability to execute future frontend iterations. Like all Shopify apps they’ll also be injecting third party JavaScript directly into the theme, increasing bundle size, adding external network calls without us having control over load sequencing.
-
No control over indexing logic. Our data model relies heavily on complex product metadata stored well outside of default product fields. We required exact precision over our indexing payload, needing to ingest highly specific custom data points while completely ignoring irrelevant standard fields. Standard applications entirely lacked this crucial level of granular control.
- Unpredictable pricing and vendor lock-in. Truth be told after seeing hundreds of Shopify apps in last 3 years. Pricing structures rarely align with actual underlying infrastructure costs or r feature depth. Instead to me, the financial models appear heavily inflated to exploit the finite set of available ecosystem options and the predominantly non technical nature of the standard merchant user base. Personally, I refuse to pay for anything that can be replaced a few python scripts and a database. You’ll be surprised by how many apps fail this criteria. :)
Why Not Shopify Apps or Elasticsearch?
Before settling on Typesense, we evaluated the obvious alternatives. Each had deal-breakers for our specific situation.
PS: Shoutout to Sharoon Thomas from fulfil.io for turning us onto Typesense.
Elasticsearch / OpenSearch
Elasticsearch is the industry standard for search. It was the first product we looked at. Here's why we didn't go that route:
- Operational burden. Requires managing clusters, shards, replicas, JVM tuning, and version upgrades even with managed services.
- Steep learning curve. The Query DSL is powerful but verbose; we estimated weeks before being productive with faceted search.
- No built-in typo tolerance. Fuzzy matching requires explicit configuration for edit distances, field selection, and relevance tuning.
- Overkill for 10k documents. Infrastructure designed for billions; even the smallest production deployment would have significant costs. It’s would’ve been like driving a tank to buy groceries.
Algolia
Algolia was our lead choice before we discovered Typesense. Typesense is a direct competitor to Algolia with similar features. We chose Typesense for a few reasons:
-
Open source. We always default to open-source. Although we chose to host on Typesense Cloud. Typesense can be self-hosted, providing leverage in pricing and vendor lock-ins.
The open source ecosystem was also a practical advantage during implementation. We were able to leverage the community forums to troubleshoot as well as get feedback on our schema config while we were implementing the solution.
-
Cost. Algolia charges per search operation (including each autocomplete keystroke); Typesense was significantly cheaper at our traffic levels.
- Simpler feature set. Algolia does include some feature like Existing UI library, A/B testing, personalization, and analytics that we’d love to have.
But we could easily get some of that functionality back by integrating with other tools we already have. For analytics and experimentation, we integrate with PostHog. For UI parity, Typesense already had an adapter for InstantSearch.js, allowing us to achieve comparable frontend behaviour.
Why Typesense Was the Right Fit
Typesense hit the sweet spot for our needs:
- Typo tolerance out of the box with no configuration required
- Faceted search was designed into the core
- InstantSearch.js adapter gave us access to battle-tested UI components
- Simple schema definition that maps directly to our data model
- Predictable pricing
- Managed cloud we prefer OSS + their managed cloud. Allows us to avoid DevOps overhead while supporting the project.
- Fast enough for our scale (sub-50ms queries on 10k documents)
The Architecture: Two Codebases Working Together
Our solution required decoupling the direct dependency between the Shopify Admin API and the Shopify theme layer, and introducing an intermediary search service in between, all while maintaining absolute visual parity within the frontend user experience.

The Backend Pipeline runs as a Python application, separate from Shopify. Its job is to periodically fetch all our products, collections, articles, and page data from Shopify, transform it into a search-optimised format, and push it to Typesense Cloud. Think of it as a scheduled sync job that keeps Typesense up-to-date with whatever's in Shopify.
The Frontend lives inside our Shopify theme as Liquid templates and JavaScript. When a customer visits a collection page or uses the search bar, the frontendnow communicates directly with the Typesense Search API instead of relying on Shopify’s native search or Liquid collection rendering.
We also repurposed our existing HTML and CSS as structural scaffolding within instantsearch, dynamically hydrating product grids with results from Typesense. This allowed us to preserve visual consistency, maintain our established design system, and avoid a disruptive frontend rewrite while completely replacing the underlying search engine.
Part 1: Backend Pipeline: Extracting Data from Shopify
GraphQL and Rate Limits
We use Shopify's GraphQL Admin API to fetch products with all their metafields and collection memberships in a single request. To handle Shopify's aggressive rate limits, our HTTP client uses connection pooling and exponential backoff with retry logic, respecting Retry-After headers so the pipeline runs unattended.
The Collection-First Approach
Instead of fetching all products and then figuring out which collections they belong to, we do it the other way around: we iterate through every collection and fetch its products.

This needs to be done to preserve the manual sort order that merchandisers set in Shopify. When you manually arrange products in a Shopify collection, that order is only available when you query products through that collection. If you fetch products directly, you lose this ordering information entirely.
So our pipeline:
- Iterates through all collections one-by-one.
- For each collection, fetches its products in the order they're sorted
- Tracks the position (1st, 2nd, 3rd, etc.) of each product within that collection
The Multi-Collection Challenge
In Shopify, a single product can appear in multiple collections. When we encounter a product we've already seen (because it appeared in an earlier collection), we merge the new collection information into the existing product record rather than creating a duplicate. The product ends up with an array of all its collection memberships and a mapping of collection IDs to sort positions.

This merge logic is critical. Without it, we'd either have duplicate products (bad for search) or lose collection membership data (bad for filtering).
Part 2: Backend Pipeline: Data Transformations

Shopify's data structure is optimised for e-commerce operations, not for full-text search. Several transformations were necessary for compatibility with Typesense as well as optimising for search:
Metafields are nested and inconsistent: We use different data types to store different data points. Sometimes it's a simple string, sometimes it's a JSON array, sometimes it's a Python-style list literal. We built parsing helpers that try JSON parsing first, fall back to Python literal parsing, and gracefully return empty values if both fail.
IDs are Global ID URIs: Shopify returns IDs like gid://shopify/Product/123456789. For Typesense, we only need the numeric part. Since, anyway, each document type - product, collection, blog and article gets its own index (collection) in typesense.
Text needs normalisation: Product titles and descriptions sometimes contain Unicode characters and HTML elements that can cause search inconsistencies. We normalise all text to ASCII using the Unidecode library, so "Räjästhäni" becomes "Rajasthani" for more reliable search matching.
Dates need to be Unix timestamps: Typesense sorts and filters dates as integers. We convert all ISO 8601 date strings to Unix timestamps.
Designing the Sort Order Structure
The collection-specific sort order was one of our trickiest design decisions. We needed each product to store its position in every collection it belonged to. At query time, we then needed to dynamically sort by the correct position field based on the collection being viewed.
When we were implementing this, Typesense v26.0 had recently introduced support for indexing nested properties. We went through multiple iterations to ensure the nested structure was correctly indexed and sortable. We had to get feedback from the Typesense team during the process, since the documentation wasn’t very detailed at that time.
We chose to store sort orders as an object with collection IDs as keys and positions as values. This structure is more verbose than an array, but it allows Typesense to sort by a specific collection's order using a dynamic field reference. When viewing the "Madhubani" collection, we sort by the position stored under that collection's ID.
We ultimately chose to store sort orders as an object where collection IDs are keys and positions are values. Conceptually, it looks like:
sort_order: {
"123456789": 1,
"987654321": 14
}
This structure enables Typesense to execute rapid sorting using a field reference derived from the current collection ID as dynamic context. So, when a user queries the Madhubani collection, the engine instantaneously sorts the payload by targeting the precise integer value stored explicitly under that distinct collection ID.
Part 3: Typesense Schema Design: Thinking About Query Patterns

When designing the Typesense schema, we worked backwards from how users would interact with search and filters:
-
Facetable fields are those users filter by. We made artform, artist, vendor, colors, themes, regions, and price facetable. Typesense pre-computes facet counts, so showing "Madhubani (234)" next to a filter option is instant.
-
Sortable fields are those users sort by. We enabled sorting on title (A-Z), price (low-high), publish date (newest), and the sort order object (collection default).
-
Indexed fields are those Typesense searches through. We index title, description, artist, and artform for full-text search. When someone searches "blue peacock painting", Typesense looks through these fields.
-
Non-indexed fields are just returned in results but not searchable. Image URLs and dimensions fall here, searching by image URL makes no sense.
- Optional fields handle missing data gracefully. Not every product has an artist name or color palette assigned, so these fields are marked optional.
Multiple Indices for Different Content Types
Rather than cramming everything into one index, we created separate indices for products, collections, articles, and pages. Each has its own schema optimised for that content type.

This separation matters because:
- Products have prices and inventory; articles don't
- Collections have product counts; pages don't
- Search relevance weights differ (title matters more for articles than products)
On the search page, we query all four indices simultaneously and show results grouped by type.
Note: Later, based on user behaviour data and product decisions, we decided to drop articles and pages from our search results.
Part 4: The Sync Pipeline Workflow

Separation of Concerns
The pipeline is split into distinct steps that can run independently:
Step 1 - Download from Shopify: Fetches all data and stores it as individual JSON files (one per product, one per collection). This creates a local snapshot of the Shopify catalog.
Step 2 - Compile to JSONL: Combines individual JSON files into bulk-import-friendly JSONL format (one JSON object per line). This step can be skipped if the JSON files haven't changed.
Step 3 - Import to Typesense: Pushes the JSONL files to Typesense using upsert semantics (insert if new, update if exists). This is idempotent; running it twice produces the same result.
This separation means we can re-import to Typesense without re-downloading from Shopify (useful if we change the schema), or we can download new data without importing it (useful for inspection).
Chunked Imports for Large Datasets
Typesense can accept bulk imports, but sending 10,000 products in one request is not ideal. We chunk the import into batches of 5,000 documents. Each chunk is imported separately, and the script tracks successes and failures per chunk.
If a batch fails, we log the errors and continue with the next batch. This partial-failure tolerance means a few malformed documents don't stop the entire import.
Upsert for Idempotency
We use upsert mode for all imports. If a product already exists in Typesense (matched by ID), it's updated. If it's new, it's inserted.
Part 5: Replacing Shopify's UI
For the frontend layer, we leveraged InstantSearch.js for collection pages and Autocomplete.js for the global search experience.
Although these libraries are developed by Algolia, Typesense provides official adapters that translate their widget APIs into Typesense-compatible search queries.
Since we had already invested in custom UI components within our Shopify theme, we did not adopt the default widget styling. Instead, we refactored our existing HTML and Liquid markup into JavaScript templates compatible with InstantSearch’s rendering model.
Search Configuration
- Multiple Typesense nodes for high availability (if one node is slow, it falls back to others)
- A nearest node for geographic optimisation (routes to the closest node)
- A read-only API key that's safe to expose in client-side JavaScript
- Typo tolerance, so "Madhubanni" finds "Madhubani"
- Field weights that prioritise title matches over description matches
Collection Pages: Filtering Without Shopify
The collection page is where we most dramatically diverge from Shopify's native behaviour. The Liquid template renders an empty container with placeholder skeletons. Then JavaScript takes over:

- It reads the current collection handle from the page
- It queries Typesense for products in the matching collection.
- It applies any active filters (from URL parameters or user clicks)
- It sorts by the collection-specific sort order by default
- It renders product cards into the container
- It populates the sidebar with filter options and counts
The entire filtering and sorting system is powered by Typesense + InstantSearch. Shopify's native collection filtering is never invoked.
Skeleton Loading for Perceived Performance
While Typesense is fast (typically under 50ms), there's still a moment between page load and results appearing. We show grey placeholder boxes shaped like product cards during this time. This skeleton loading pattern makes the page feel faster than showing a blank space or spinner.
URL State Synchronisation
InstantSearch.js can sync filter state to the URL. This is good for SEO. Plus, when a user filters by "Madhubani" and price range "₹5000-₹10000", the URL updates to reflect this. They can share this filtered view with others, or bookmark it for later. Refreshing the page restores their filters.
Potential Improvements
While our current implementation serves us well, there are several enhancements we're considering for future iterations:
- Real-time sync via webhooks - For now, since we’re adding max ~100 new SKUs a month a scheduled job does the trick. Eventually, we’ll want to replace batch jobs with Shopify webhooks for near-instant updates.
- Multi-language support, Synonyms and curated results - Support for local languages, Map spelling variations, boost seasonal products, pin results
- Personalised ranking - Use browsing history to surface relevant artforms per user
- Image-based search - Visual similarity search using vector embeddings.
Conclusion
In hindsight, replacing Shopify’s native search and filtering with a custom Typesense integration instead of installing an app was the right decision for us.
In exchange for a few focused weeks of development, we gained materially faster and smarter search, reliable faceting at scale, and precise control over indexing, ranking, and merchandising logic.
The complexity lay in the details of preserving collection sort orders, merging multi-collection products, standardising Metafield data, and integrating smoothly with an existing Shopify theme.
Shopify remains exceptionally good at what it is designed to do. It abstracts payments, infrastructure, security, and core commerce primitives without requiring self-hosting or custom platform engineering. For any Shopify store hitting the 10k product limitation or superior discovery capabilities, this specific blueprint offers a highly pragmatic solution. It provides a calculated, gradual path toward a completely decoupled frontend architecture without prematurely abandoning the robust core of the underlying stack.
