Download SEO factors via SEO tools for excel

This week had had the need to use seo tools for excel and gained a new appreciation for the free resource.

Basically I want to download a sites page titles, H1 headers into excel in order to perform further analysis. First I needed to download a URL list of all the pages of the site. To achieve this I used scrapebox which has a wonderful feature that scrapes all major search engines. In scrapebox this is what I did:

1)      Use the command

2)      Choose to scrape all the search engines Google, Bing and AOL

3)      Harvest the URLS then remove duplicate URL’s to get a clean list

This gave a clean list of URL’s that I simply pasted into a spreadsheet and run the titles command in excel tools for SEO. I got a result like this:

Title H1 H2
Special Footwear & Orthotics | Bespoke Shoes | Orthotics Central London – Free consultation Beautifully HAND crafted specialist Footwear & Orthotics in Central London
Accident Assessment Claims – Anthony Andrews Master Shoemaker Accident Assessment Claims Related Items
Adaptions, Build-ups and Raises by Special Footwear & Orthotics, Central London W1 Adaptations, Build-ups and Raises related items
Uncategorized Archives – Anthony Andrews Master Shoemaker


Complete list of seo tools for excel here but below are the main ones I found useful:

=GooglePageRank([URL]) – To get the Google PageRank by URL

=GoogleIndexCount([URL]) – To get  an approximation of the number of pages indexed by Google by domain (the equivalent to “”)

=WhoIs(string domain) – To get  WhoIs for a domain

=IsDomainRegistered([DOMAIN]) – Returns “true” if a domain is registered. WhoIs servers are configured in SeoTools.config.xml

=FacebookLikes([URL]) – To get  the number of total Facebook Likes of an url (same count as in a badge)

=AlexaReach([URL]) – To get  Alexa Traffic Reach Rank

=ResolveIp([URL]) – Resolves the IP address of the domain in an URL

=HttpStatus([URL]) – To get  the HTTP status code and its description. Also retrieves the Location header (useful for debugging redirects)

=ResponseTime([URL]) – Returns the number of milliseconds it takes for an url to load (cached)

=LinkCount([URL]) – Shows the number of links on a page (cached)

=HtmlTitle([URL]) – Shows the HTML title on a page (cached)

=HtmlMetaDescription([URL]) – Shows the HTML meta description on a page (cached)

=HtmlH1([URL]) – Shows the first HTML H1 element on a page (cached)

=HtmlH2([URL]) – Shows the first HTML H2 element on a page (cached)

=HtmlH3([URL]) – Shows the first HTML H3 element on a page (cached)

=XPathOnUrl([URL]; [XPATH]) – Gets the url and returns the result from xpath

Related posts:

SEO tools for excel examples by distilled

Excel for SEO by distilled

SEO Site Audit Case Study –

Domain Statistics is a 15 year old website with a good domain authority and Google page rank 5 for the homepage. The main stats are as follows:

  • The domain has IP address and is located in UK.
  • The domain’s age is 15 year(s), 2 month(s).
  • Domain PR  Domain Google PageRank    RP PR 5
  • Alexa Rank  Alexa Rank                 10355
  • Compete Rank  Compete Rank in Compete.Com               128960
  • Compete Traffic  Traffic according to Compete.Com        11823
  • DMOZ Listing  DMOZ Listing         Yes
  • Yahoo! Dir Listing  Yahoo! Dir Listing         Yes
  • Rel= canonical has been implemented on all pages
  • ok  0 pages have duplicate rel=”canonical” code
  • ok  0 pages have conflicting character encoding

Content and Structure Crawl Stats

I looked at the main SEO factors of page titles, meta description, content, graphic alt text, sitemap, robots.txtx and URL structure. Below are the most important metrics, the best practices and what was found following an  crawl of


Server Response Codes

Crawl-ability affects the way the search engines crawl and index the site. You want to see a high number 1xx or 2xx server response; a few 301/302 redirects and even fewer error pages indicated by 4xx and 5xx pages. A crawl test using produced 3885 pages with the following outcomes:

  • 2719 (70%) pages are good to crawl (their server response code is 200)
  • 9 (0.23%) pages redirect to other pages (their server response code is 301)
  • 690 (18%) pages redirect to other pages (their server response code is 302)
  • 1 pages have server errors (their server response code is 404)
  • 1 pages have server errors (their server response code is 500)

Server response 301 redirects pass link and page authority value but 302’s are meant for temporary redirects. We would need to investigate the high number of 302 server responses to determine if they are a correct use.

Load Speed

Google has stated load speed could be a factor in ranking similar sites.  The PageSpeed Score indicates how much faster a page could be. A high score indicates little room for improvement, while a lower score indicates more room for improvement.

Running in Google page speed tool  got an overall PageSpeed Score of 77 (out of 100) which is fast enough.  However a crawl of the site showed a warning 2727 pages have too big size (> 100 kB) which can affect load speed of the page and site. Main improvement recommendations are to leverage browser caching, combine images into CSS sprites and specify a cache validator.

Duplicate Content

Some product description has been copied from elsewhere possibly the manufacturers’ catalogue. An example is seen on the Carrera Kraken Mountain Bike page where has detected duplicate content with other websites.

Information Architecture

Information architecture affects the way search engines robots navigate the site and crucially also the user experience. Here we look at how easy it is for information top flow through Halfords site.

  • Site navigation: The main departments are represented by main categories from homepage with dropdown features detailing sections. This gives a good user experience and is search engine friendly.
  • Labelling and naming conventions: The category labelling is user friendly and keyword rich and without being over optimised.
  • Directory structures: The directory structure goes from main navigation to individual products in 3 layers or less which is what is recommended.


It is important to use a URL that includes a keyword, effectively describes its corresponding content and is not much longer than 115 characters. The URL’s on the site are not search engine or human friendly for example:


It is better to implement a URL rewrite to get a URL like “”.

Page titles

Some page titles are not optimised – e.g. hybrid bikes category shows “Halfords | Hybrid Bikes | Best Hybrid Bike | Hybrid Bikes Reviews”. The word reviews should be substituted with a “buy” term to better describe the page.

Competitor Benchmark Report is effectively in three markets of bikes, car accessories and camping. I benchmarked the bikes and car accessories keywords of against main competitors of each industry. The competing sites with high keywords visibility are for bikes and for car accessories.

The keywords I used were selected by scraping the respective parts of the site using Google keyword tool. The resulting keywords for bikes and car parts can be seen in appendix 1.

Keywords Visibility Reports

In general has more visibility for the keywords than in Google and Bing. The keywords visibility report shows Halfords has 59 keywords in top 10 (first page) of Google and Bing compared to 36 for


Keywords visibility report

The visibility of 136 keywords related to car accessories shows has 102 keywords in the top 10 first of Google and Bing compared to 37 for


Keywords visibility report - car maintenance

Backlink History

The historical backlink history report is used to demonstrate the velocity and diversity of link acquisition. It should ideally show a steady rise in both number of links and linking domains over time.

Backlink profiles – vs.

The historical backlink profile shows Halfords has had a higher velocity of links since October 2010 from a higher diversity of domains than


The chart below shows rate of link acquisition and linking domains to has also increased significantly since October 2010 whereas has had a steady increase. It is noticeable however that has surpassed Halfords in its diversity of linking domains which is a positive aspect of their profile.


Anchor Text Distribution

It is good to have some keyword rich anchor text links without overdoing it which can result in a Google penguin update penalty. A good rule of thumb is that 75% of anchor texts should be brand and the rest a mixture of money term keywords. So a high percentage of brand and other anchor text is a good indication of a healthy backlink profile. has a healthy distribution of anchor texts with only 25% being keyword rich “bikes”. The majority of links (67%) are internal which do not pass as much value.


MajesticSEO Anchor text diversity

Below pie chart shows the anchor text report for page has far more keyword rich anchor texts such as “car parts” (54%) than This could be considered over optimised under the Penguin update which suggests only about 20-25% of anchor texts should be keyword rich money terms.

MajesticSEO Anchor text diversity car parts

Page Metrics

A high page authority demonstrates likelihood of the page to rank well for given keywords regardless of content.

Comparison of shows they have a lower page authority than Although they have more external links, Halfords have more links in total, more links to root and C blocks (diversity).  Halfords also has a better social profile in terms of likes and shares on Facebook, Twitter and Goggle +1. Evans has more external followed links which is a positive vote in their favour.

Open site explorer site metrics

The car maintenance comparison shows  has a higher page authority than possibly as result of more links in total. This is despite having more followed links, more diversity in Linking, links to root domain and C blocks and social shares.  There is a better SEOmoz authority (page MozRank and MozTrust) for due to the better link profile.

Open site explorer site metrics car parts

Keywords – Bikes & Car Maintenance/Accessories

Adwords Keywords list SEO review

I carried out a very brief review of pages and below are some highlighted potential SEO improvements, reasoning why it needs to be done and some of the benefits.


Good practices observed

  • There is good use of H1’s and H2’s and image alt tags
  • There is a good use of breadcrumbs
  • The page titles are SEO friendly
  • There is internal linking to related content but poor contextual linking.

URL Structure

Some url’s are too long (e.g. .

Best practice is to “try and stick to 3 – 5 words as research shows that by sticking to short

URLs you get both better rankings and better click-through.” Read more: best practices for url structure


Internal contextual linking

There is little or no internal contextual linking in some articles. An increase in contextual linking might offer site visitors a better user experience and improve search engine ranking of affected pages.

Article Images

Most articles lack images which could be used to emphasis keyword with alt tags and make user better user experience.

Social/community engagement

The social share buttons should appear at the end of an article not on side of page. This could prompt readers to share more content.

The retired “Google buzz” button still showing and should be substituted with a Google plus button.


Duplicate content

Some pages are duplicate content so they may not be indexed by the search engines. For example is a duplicate of the homepage So whereas a similar non-duplicate page is indexed, does not appear to be as per screen shots below. I would therefore recommend writing of original content for to ensure it ranks.



Google Webmaster Guidelines – Back to Basics

Webmaster guidelines are the basic rules that Google provides to make it easier to crawl your website. In an internet world that is full of experts it is always good to review the guidelines every now and then.

Google webmaster guidelines are split into three levels as follows:

1)      Technical Guidelines

2)      Quality guidelines.

3)      Design and content guidelines.

We take a brief look at each:

Technical Guidelines

Google Webmaster tools Technical Guidelines

You should know that Google bot works on the text written words so java script, frames, flash and session IDs will restrict the amount of content it can index.

You should make use of robots.txt file to tell Google bot about any content you don’t want indexed.

If you use a content management system (CMS), then make sure it produces pretty links and not urls with extended IDs that do not describe the page.

Make sure that there is not much restricting the load from loading fast as load times are now a (small) ranking factor.

Quality Guidelines

Google Webmaster tools Quality Guidelines

These guidelines are aimed at not tricking search engines. For instance you should avoid putting different content for users and search engines. Google also gives an outlet if you believe another site is abusing these guidelines and flying under their radar. You have the ability to report them through a dedicated website for Google webmaster tools spam report. It’s all bout main ethos of Google of giving users great experience by providing unique content that adds value to surfers and the internet in general.

A good rough guide is to ask yourself if you would put out the same content if it was not for search engines.

They also ask that you should avoid web scrapping programs aimed at constantly checking your rankings though this is difficult to avoid all together in the work of an SEO consultant.

Design & content Guidelines


Google Webmaster tools Design & Content GuidelinesThese are very important guides as they relate to the structures of the site and its effect on crawling ability. For instance, if every page is not reachable from another then it could be tantamount to sending the Google spider done a dead end road. This can easily be fixed by making sure the sidebar navigation somehow connects to every page on site.

As far as content is concerned Google asks that it be informative and avoid churnalism type of repeating the same content as many other places elsewhere.

You should do keyword search and make sure you use those words that users of that text will type into a search engine to find it.

You should check for broken links via webmaster tools and other sources and try to fix them.

You should also ensure that your page titles and meta tags are descriptive of the content in the webpage.

Remember that each link from a page reduces its value so try to keep total number of links to a reasonable number.

Related Blogs

Website Audit

Below table shows a summary of the main SEO factors to looked at when auditing a website.

SEO Factor Best Practice Observed
Page Titles Include Keyword but make it appealing to click from SERP and keep within 60 characters
Meta description Have a unique meta description for each page with up to 160 characters. See other best practices below.
Meta Keywords 1. Select 5 to 8 keyphrases that reflect the theme or context of the page.2. Use different phrase variants including plurals, misspellings and synonyms.
Content Text Aim for 200 to 400 words with a keyword density of 5-10%.
Graphic alt text Include alt tag for any images.
Sitemap Google sitemaps can potentially help you include a higher proportion of your pages in the index and potentially enable you to notify Google of changes
Robots.txt file Use Robots.txt to control indexing of pages by search engines.

Best practice with Meta descriptions


1. Create a unique meta description for every page where practical or have meta description automatically generated to include the product keyword name plus a summary of the value proposition of the site.

2. Create a powerful meta description that combines a call-to-action and encourages click-through on SERPs.

3. Try to keep within 15-20 words (160 characters) which is what is visible within the SERPS of Google.

4. Limit to 2-4 keyphrases per page.

5. Do not use too many keywords or use too many irrelevant filler words within this meta tag since this will reduce keyword density.

6. Avoid undue repetition – 2 to 3 times maximum, otherwise may be assessed as spamming.

7. Incorporate phrase variants and synonyms within copy.

8. Vary words on all pages within site.

9. Make different from <title> tag since this may be a sign of keyword stuffing.


Best practice for page content is:

  • Have 200-400 words with a keyword density of 5-10%.
  • Include a range of synonyms and Latent Semantic Indexing (LSI) as well as identical keyphrases and Vary use of keywords in target keyphrases- i.e.  They don’t have to be in a consecutive phrase.
  • Good practice is to include the phrase towards the top of the document and then regularly through the document and again towards end of the document with possible hyperlinks of where to go for more information.

The robots.txt file is used to exclude pages from being indexed and to reduce bandwidth of crawlers. Example pages to exclude are outdated pages, shopping cart and broken links.

Amendments Audit Trail & Monitoring

It is useful to have an audit trail and approval process of on-page amendments alongside monitoring to make sure the desired effect has taken place in ranking the site.

The bst method I have found is to set up a spreadsheet with a different sheet for each page to be amended as follows:

1)    Create a list of pages that need amending and agree which web pages it is worth spending time on- i.e. they are not discontinued product pages.

2)    Create a separate excel page for each website page that needs to be amended showing current content and new content to be uploaded as per below.

3)    Get sign off for amendments to be done and note date page amended.

4)    Add amended page to keywords watch list so that effects can be tracked