Chuyển đến nội dung chính

Site Crawl, Day 1: Where Do You Start?

Posted by Dr-Pete

When you're faced with the many thousands of potential issues a large site can have, where do you start? This is the question we tried to tackle when we rebuilt Site Crawl. The answer depends almost entirely on your site and can require deep knowledge of its history and goals, but I'd like to outline a process that can help you cut through the noise and get started.

Simplistic can be dangerous

Previously, we at Moz tried to label every issue as either high, medium, or low priority. This simplistic approach can be appealing, even comforting, and you may be wondering why we moved away from it. This was a very conscious decision, and it boils down to a couple of problems.

First, prioritization depends a lot on your intent. Misinterpreting your intent can lead to bad advice that ranges from confusing to outright catastrophic. Let's say, for example, that we hired a brand-new SEO at Moz and they saw the following issue count pop up:

Almost 35,000 NOINDEX tags?! WHAT ABOUT THE CHILDREN?!!

If that new SEO then rushed to remove those tags, they'd be doing a lot of damage, not realizing that the vast majority of those directives are intentional. We can make our systems smarter, but they can't read your mind, so we want to be cautious about false alarms.

Second, bucketing issues by priority doesn't do much to help you understand the nature of those problems or how to go about fixing them. We now categorize Site Crawl issues into one of five descriptive types:

  • Critical Crawler Issues
  • Crawler Warnings
  • Redirect Issues
  • Metadata Issues
  • Content Issues

Categorizing by type allows you to be more tactical. The issues in our new "Redirect" category, for example, are going to have much more in common, which means they potentially have common fixes. Ultimately, helping you find problems is just step one. We want to do a better job at helping you fix them.

1. Start with Critical Crawler Issues

That's not to say everything is subjective. Some problems block crawlers (not just ours, but search engines) from getting to your pages at all. We've grouped these "Critical Crawler Issues" into our first category, and they currently include 5XX errors, 4XX errors, and redirects to 4XX. If you have a sudden uptick in 5XX errors, you need to know, and almost no one intentionally redirects to a 404.

You'll see Critical Crawler Issues highlighted throughout the Site Crawl interface:

Look for the red alert icon to spot critical issues quickly. Address these problems first. If a page can't be crawled, then every other crawler issue is moot.

2. Balance issues with prevalence

When it comes to solving your technical SEO issues, we also have to balance severity with quantity. Knowing nothing else about your site, I would say that a 404 error is probably worth addressing before duplicate content — but what if you have eleven 404s and 17,843 duplicate pages? Your priorities suddenly look very different.

At the bottom of the Site Crawl home, check out "Moz Recommends Fixing":

We've already done some of the math for you, weighting urgency by how prevalent the issue is. This does require some assumptions about prioritization, but if your time is limited, we hope it at least gives you a quick starting point to solve a couple of critical issues.

3. Solve multi-page issues

There's another advantage to tackling issues with high counts. In many cases, you might be able to solve issues on hundreds (or even thousands) of pages with a single fix. This is where a more tactical approach can save you a lot of time and money.

Let's say, for example, that I want to dig into my 916 pages on Moz.com missing meta descriptions. I immediately notice that some of these pages are blog post categories. So, I filter by URL:

I can quickly see that these pages account for 392 of my missing descriptions — a whopping 43% of them. If I'm concerned about this problem, then it's likely that I could solve it with a fairly simple CMS page, wiping out hundreds of issues with a few lines of code.

In the near future, we hope to do some of this analysis for you, but if filtering isn't doing the job, you can also export any list of issues to CSV. Then, pivot and filter to your heart's content.

4. Dive into pages by PA & crawl depth

If you can't easily spot clear patterns, or if you've solved some of those big issues, what next? Fixing thousands of problems one URL at a time is only worthwhile if you know those URLs are important.

Fortunately, you can now sort by Page Authority (PA) and Crawl Depth in Site Crawl. PA is our own internal metric of ranking ability (primarily powered by link equity), and Crawl Depth is the distance of a page from the home-page:

Here, I can see that there's a redirect chain in one of our MozBar URLs, which is a very high-authority page. That's probably one worth fixing, even if it isn't part of an obvious, larger group.

5. Watch for spikes in new issues

Finally, as time goes on, you'll also want to be alert to new issues, especially if they appear in large numbers. This could indicate a sudden and potentially damaging change. Site Crawl now makes tracking new issues easy, including alert icons, graphs, and a quick summary of new issues by category:

Any crawl is going to uncover some new pages (the content machine never rests), but if you're suddenly seeing hundreds of new issues of a single type, it's important to dig in quickly and make sure nothing's wrong. In a perfect world, the SEO team would always know what changes other people and teams made to the site, but we all know it's not a perfect world.

I hope this gives you at least a few ideas for how to quickly dive into your site's technical SEO issues. If you're an existing customer, you already have access to Moz's new Site Crawl and all of the features discussed in this post.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

Nhận xét

Bài đăng phổ biến từ blog này

What Google's GDPR Compliance Efforts Mean for Your Data: Two Urgent Actions

Posted by willcritchlow It should be quite obvious for anyone that knows me that I’m not a lawyer, and therefore that what follows is not legal advice. For anyone who doesn’t know me: I’m not a lawyer, I’m certainly not your lawyer, and what follows is definitely not legal advice. With that out of the way, I wanted to give you some bits of information that might feed into your GDPR planning, as they come up more from the marketing side than the pure legal interpretation of your obligations and responsibilities under this new legislation. While most legal departments will be considering the direct impacts of the GDPR on their own operations, many might miss the impacts that other companies’ (namely, in this case, Google’s) compliance actions have on your data. But I might be getting a bit ahead of myself: it’s quite possible that not all of you know what the GDPR is, and why or whether you should care. If you do know what it is, and you just want to get to my opinions, go ahead and ...

Optimizing AngularJS Single-Page Applications for Googlebot Crawlers

Posted by jrridley It’s almost certain that you’ve encountered AngularJS on the web somewhere, even if you weren’t aware of it at the time. Here’s a list of just a few sites using Angular: Upwork.com Freelancer.com Udemy.com Youtube.com Any of those look familiar? If so, it’s because AngularJS is taking over the Internet. There’s a good reason for that: Angular- and other React-style frameworks make for a better user and developer experience on a site. For background, AngularJS and ReactJS are part of a web design movement called single-page applications, or SPAs . While a traditional website loads each individual page as the user navigates the site, including calls to the server and cache, loading resources, and rendering the page, SPAs cut out much of the back-end activity by loading the entire site when a user first lands on a page. Instead of loading a new page each time you click on a link, the site dynamically updates a single HTML page as the user interacts with the site...

How We More than Doubled Conversions & Leads for a New ICO [Case Study]

Posted by jkuria Summary We helped Repux generate 253% more leads, nearly 100% more token sales and millions of dollars in incremental revenue during their initial coin offering (ICO) by using our CRO expertise . The optimized site also helped them get meetings with some of the biggest names in the venture capital community — a big feat for a Poland-based team without the pedigree typically required (no MIT, Stanford, Ivy League, Google, Facebook, Amazon, Microsoft background). The details: Repux is a marketplace that lets small and medium businesses sell anonymized data to developers. The developers use the data to build “artificially intelligent” apps, which they then sell back to businesses. Business owners and managers use the apps to make better business decisions. Below is the original page, which linked to a dense whitepaper. We don’t know who decided that an ICO requires a long, dry whitepaper, but this seems to be the norm! This page above suffers from several issues: ...