Skip to content

Command line SEO tool to crawl pages of a website and find out which pages have links to a target website.

License

Notifications You must be signed in to change notification settings

SquareByte-webdev/seo-links-to

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

seo-links-to

Description

Command line SEO tool to crawl pages of a website and find out which pages have links to a target website.

Installing

$ git clone https://github.com/SlimTim10/seo-links-to
$ cd seo-links-to
$ npm install

Usage

$ node main.js --url=<website-URL> --links-to=<target-URL> [--depth=<number>]

The given URL can be a website or an XML sitemap. The links-to argument is what is searched for in the links of the pages of the given website or sitemap. Specifically, it looks for links beginning with the given links-to argument. The page scanning depth is only used if a website URL is given (not a sitemap) and the default depth is 2.

For example, to scan for pages of https://crawler-test.com that include any links to http://robotto.org:

$ node main.js --url=https://crawler-test.com/ --links-to=http://robotto.org/ --depth=1
Scanning for pages (1 level deep)...
Found 405 pages on https://crawler-test.com/

Scanning pages for links...
........................................................................................................................................................................................................
........................................................................................................................................................................................................
.....

Found 2 pages that link to http://robotto.org/:

https://crawler-test.com/links/broken_links_external
https://crawler-test.com/links/max_external_links

To scan for pages of https://hackaday.com/news-sitemap.xml that include any links to https://github.com:

$ node main.js --url=https://hackaday.com/news-sitemap.xml --links-to=https://github.com
Found 24 pages on https://hackaday.com/news-sitemap.xml

Scanning pages for links...
........................

Found 7 pages that link to https://github.com:

https://hackaday.com/2021/07/17/control-an-irl-home-from-minecraft/
https://hackaday.com/2021/07/17/png-image-decoding-library-does-it-with-minimal-ram/
https://hackaday.com/2021/07/17/reinvented-retro-contest-winners-announced/
https://hackaday.com/2021/07/17/open-source-is-choice/
https://hackaday.com/2021/07/16/interpreters-in-scala/
https://hackaday.com/2021/07/16/just-what-have-we-become/
https://hackaday.com/2021/07/16/hackaday-podcast-127-whippletree-clamps-sniffing-your-stomach-radio-multimeter-hum-fix-and-c64-demo-no-c64/

Dependencies

  • Node.js version 12 or later

About

Command line SEO tool to crawl pages of a website and find out which pages have links to a target website.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 100.0%