Project menu

Official Google answers to common SEO questions

Enter your question: Please wait...Please wait...

Question:

What is the best way to deal with BIG sitemaps.xml (e.g. more than 1,000,000 pages)?

Google's Matt Cutts answers:

Official Google Answer

Summary of the answer:

The summary of the answer is available to paid members. Become a member now. It's fast, easy and risk-free.

Related questions:

Question Views
The Sitemap.xml file states there are 10,000 URLs but only 1500 have been indexed. After numerous crawls it does not appear Google is going to index these additional detail pages. What can I do to get Google to index my unique and current detail pages. 556
If Google crawls 1,000 pages/day, Googlebot crawling many dupe content pages may slow down indexing of a large site. In that scenario, do you recommend blocking dupes using robots.txt or is using Meta Robots noindex,nofollow a better alternative? 525
Should a "Sale Page" be in a robots.txt file to avoid duplicate content? 477
Can I use robots.txt to optimize Googlebot's crawl? 381
What are Google's plans for indexing the deep web? 467
A question to non-intended duplicate content: If an online shop can be reached through several TLDs (like .de, .at, .ch) and the only difference is the currency (and necessarily the checkout process) does Google consider this duplicate content? 415
Can Google provide a way to mark a section of our pages as being less important for being indexed/snippeted by Google? 516
If we were to syndicate my written content (entire articles) to multiple domains then would we be able to use the imminent cross-domain <link rel="canonical" tag to confirm which site we would like to index for a given piece of content? 430
How can Googlebot crawl and index pages that don't have any links to them on my website? 511
We still have old content in the index. We block them via robots.txt, use 404 and delete via Webmaster Tools, but Google still keeps it. What can we do to quickly delete content from the index? 464