Philosophy

The ideas behind BigDataSEO.com and RIBA.

Architecture Over Tactics

Most SEO advice focuses on tactics: optimize your title tags, build more backlinks, write longer content. These tactics work — at small scale. But they become irrelevant when the primary constraint on your organic performance is that search engines can't physically reach most of your pages.

At large scale, architecture is the tactic. The structure of your site — how pages are organized, linked, and made discoverable — determines whether your content can even participate in search. Everything else is downstream.

Mathematics, Not Opinions

RIBA is built on a mathematical foundation: the square-root bucketing formula. This isn't a heuristic or a best practice — it's a provable relationship between dataset size, bucket count, and tree depth that minimizes crawl distance while maximizing browse page value.

When someone asks "how many browse pages should I create?", the answer isn't "it depends" — it's √N. When they ask "what's the maximum depth?", the answer is log_k(N) where k is the bucket size. These are computable, testable answers.

Crawl Efficiency as a First Principle

Search engines are bounded systems. They have finite crawl budgets, finite processing capacity, and finite index space. Your site competes for these finite resources against every other site on the internet.

Crawl efficiency — the ratio of useful crawl events to total crawl events — is therefore the most important metric for large-scale SEO. A site where 90% of Googlebot's requests produce unique, indexable content will outperform a site where 30% does, all else being equal.

Every architectural decision should be evaluated through this lens: does this make my site more or less crawl-efficient?

Browse Pages Are Not Doorway Pages

A common objection to RIBA is that browse pages are "doorway pages" — thin intermediary pages created solely for search engines. This misunderstanding comes from conflating two very different things.

Doorway pages are low-quality pages with no unique content, created to rank for specific queries and funnel users to a different destination. RIBA browse pages are genuine navigation aids with unique, structured content: curated item listings, facet summaries, aggregate statistics, and rich metadata. They serve users and crawlers equally.

The test is simple: if you removed search engines from the equation entirely, would the page still be useful for human navigation? RIBA browse pages pass this test.

Open Specification

RIBA is published as an open specification, not a proprietary methodology. Anyone can implement it, audit against it, extend it, or critique it. The RIBA Registry tracks public implementations, both verified and self-reported.

This openness is deliberate. Crawl architecture patterns are too important to be locked behind consulting retainers or enterprise software licenses. The specification should be available to every engineer building a large-scale website.

Scale Is the Differentiator

BigDataSEO.com exists because scale creates qualitatively different problems, not just quantitatively bigger ones. A site with 100 pages and a site with 10 million pages don't have the same problems at different magnitudes — they have fundamentally different problems that require fundamentally different solutions.

This site is built for people working on the latter. If your site has fewer than 10,000 pages, conventional SEO wisdom will serve you well. If it has more, welcome — you're in the right place.