Skip to content

How to crawl a staging server

Crawling a staging server is a must before a website goes live. It allows you to make sure no SEO problems have cropped up before the big Go Live.
Ever work in a development center? If not, here is a rundown of what it can feel like: herding cats that all walk on keyboards. My daily routine used to be launching redesigned websites, migrated websites, new websites and fixing old websites. None of these websites would ever go live without my checking the staging server to avoid potential disasters.

What types of disasters can you face when you don’t crawl a staging server?

• No proper tagging on the website
• Website can’t be crawled
• 404 errors
• Content problems
• Broken links, images, etc.

What you should look for when crawling?

The list is not exhaustive but I would focus on:

  • Image ALTs
  • Content hierarchy (any H1s missing?)
  • ALL THE THINGS!…that can be broken
  • Testing the redirection plan
  • Meta titles (missing, too long, too short)
  • Meta descriptions (missing, too long, too short)
  • Anything that can’t be crawled for some reason
  • Funky redirects
  • Hreflang (if your site is in multiple languages)
  • Canonicals
None shall pass!
I got this visual from Kaise on Deviant Art

Normally, a staging service is protected. There are security measures in place to keep people (yes, that includes you) from crawling it.

How To Crawl A Staging Server

There’s multiple way to get over the security hump and crawl a staging server. I would recommend speaking with the appropriate powers that be within the company to inform them that you need to crawl that staging server.

1. Basic Authentication – i.e. password required

Screaming Frog provides a nice option: you get to enter login details and then, it starts crawling away. You can of course choose to use other tools that offer the same option.
Go to the “Spider Configuration” option in Screaming Frog and select the “Request Authentication”:

screaming frog request authentification

2. VPN Access

If your staging server is behind a firewall you can request a VPN access. Once you have that, crawling the staging server should be a breeze…if your tools are located on your own network. If not, then it’s going to be difficult.

3. Whitelist An IP Address

If you have a situation where your staging platform redirects you to the right server after a login, any crawler offering the login option will not be useful to you. Basic authentification fails because of the redirect.
What can you do? Well, you get a massive hint from the title: get someone to whitelist your IP address for the staging server.

4. Create A Custom User Agent

Googlebot and Slurp and all the other search engine bots are not the only ones that get to have some fun crawling! Screaming Frog lets you create user agents. You can request your own custom user agent to be whitelisted so that you can crawl the staging server.
You should note that this Screaming Frog option is not included in the free version of the software.
To get started creating your own user agent in Screaming Frog, you need to go to “User-Agent Configuration” and input whatever you need:

screaming frog custom agent

5. Old-School Method

If the staging server just can’t be accessed from the outside you may have to visit the office! This can be tedious when you are not an in-house resource, otherwise, it’s expected of you.