WebInspector

Ruby gem to inspect web pages. It scrapes a given URL and returns its title, description, meta tags, links, images, and more.

Installation

Add this line to your application's Gemfile:

gem 'webinspector'

And then execute:

$ bundle

Or install it yourself as:

$ gem install webinspector

Usage

Initialize a WebInspector instance

page = WebInspector.new('http://example.com')

With options

page = WebInspector.new('http://example.com', {
  timeout: 30,                         # Request timeout in seconds (default: 30)
  retries: 3,                          # Number of retries (default: 3)
  headers: {'User-Agent': 'Custom UA'} # Custom HTTP headers
})

Accessing response status and headers

page.response.status  # 200
page.response.headers # { "server"=>"apache", "content-type"=>"text/html; charset=utf-8", ... }
page.status_code      # 200
page.success?         # true if the page was loaded successfully
page.error_message    # returns the error message if any

Accessing page data

page.url           # URL of the page
page.scheme        # Scheme of the page (http, https)
page.host          # Hostname of the page (like, example.com, without the scheme)
page.port          # Port of the page
page.title         # title of the page from the head section
page.description   # description of the page
page.links         # array of all links found on the page (absolute URLs)
page.images        # array of all images found on the page (absolute URLs)
page.meta          # meta tags of the page
page.favicon       # favicon URL if available

Working with meta tags

page.meta                 # all meta tags
page.meta['description']  # meta description
page.meta['keywords']     # meta keywords
page.meta['og:title']     # OpenGraph title

Filtering links and images by domain

page.domain_links('example.com')  # returns only links pointing to example.com
page.domain_images('example.com') # returns only images hosted on example.com

Searching for words

page.find(["ruby", "rails"]) # returns [{"ruby"=>3}, {"rails"=>1}]

JavaScript and Stylesheets

page.javascripts  # array of all JavaScript files (absolute URLs)
page.stylesheets  # array of all CSS stylesheets (absolute URLs)

Language Detection

page.language  # detected language code (e.g., "en", "es", "fr")

Structured Data

page.structured_data  # array of JSON-LD structured data objects
page.microdata        # array of microdata items
page.json_ld          # alias for structured_data

Security Information

page.security_info  # hash with security details: { secure: true, hsts: true, ... }

Performance Metrics

page.load_time  # page load time in seconds
page.size       # page size in bytes

Content Type

page.content_type  # content type header (e.g., "text/html; charset=utf-8")

Technology Detection

page.technologies  # hash of detected technologies: { jquery: true, react: true, ... }

HTML Tag Statistics

page.tag_count  # hash with counts of each HTML tag: { "div" => 45, "p" => 12, ... }

RSS/Atom Feeds

page.feeds  # array of RSS/Atom feed URLs found on the page

Social Media Links

page.social_links  # hash of social media profiles: { facebook: "url", twitter: "url", ... }

Robots.txt and Sitemap

page.robots_txt_url  # URL to robots.txt
page.sitemap_url     # array of sitemap URLs

CMS Detection

page.cms_info  # hash with CMS details: { name: "WordPress", version: "6.0", themes: [...], plugins: [...] }

Accessibility Score

page.accessibility_score  # hash with score (0-100) and details: { score: 85, details: [...] }

Mobile-Friendly Check

page.mobile_friendly?  # true if the page has viewport meta tag and responsive CSS

Export all data to JSON

page.to_hash # returns a hash with all page data

Changelog

Version 1.2.0

New Features:

RSS/Atom feed detection with feeds method
Social media profile extraction with social_links method
CMS detection and information with cms_info method (WordPress, Drupal, Joomla, Shopify, Wix, Squarespace)
Accessibility scoring with accessibility_score method
Mobile-friendly detection with mobile_friendly? method
Robots.txt and sitemap URL detection with robots_txt_url and sitemap_url methods

Improvements:

Enhanced Request module with valid? and ssl? methods for better URL validation
Improved Meta module with author and publisher extraction
Better error handling across all modules
Performance improvements with internal caching

Contributors

Steven Shelby (@stevenshelby)
Sam Nissen (@samnissen)

License

The WebInspector gem is released under the MIT License.

Contributing

Fork it ( https://github.com/davidesantangelo/webinspector/fork )
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create a new Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
bin		bin
lib		lib
spec		spec
.gitignore		.gitignore
.rspec		.rspec
.travis.yml		.travis.yml
Gemfile		Gemfile
LICENSE		LICENSE
README.md		README.md
Rakefile		Rakefile
webinspector.gemspec		webinspector.gemspec

License

davidesantangelo/webinspector

Folders and files

Latest commit

History

Repository files navigation

WebInspector

Installation

Usage

Initialize a WebInspector instance

With options

Accessing response status and headers

Accessing page data

Working with meta tags

Filtering links and images by domain

Searching for words

JavaScript and Stylesheets

Language Detection

Structured Data

Security Information

Performance Metrics

Content Type

Technology Detection

HTML Tag Statistics

RSS/Atom Feeds

Social Media Links

Robots.txt and Sitemap

CMS Detection

Accessibility Score

Mobile-Friendly Check

Export all data to JSON

Changelog

Version 1.2.0

Contributors

License

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages