Wednesday, March 23rd, 2016

Scraping a Site with WGet

wget -E -k -p -r -nH \
 -P /path/to/output/directory \
 -X /skip-me,/skip-me-too  \

The above command will download all of the files from and place them in the directory /page/to/output/directory/, except for the contents of and

The site will be available for offline viewing, with the exception of the ignored folders. Assets in these folders will still be linked to Wget will not pull down files from any domains outside of either.

Command Explained


  • -E (--adjust-extension)
    Converts extension to .html where appropriate.

  • -k (--convert-links)
    Convert links in document for local viewing.

  • -p (--page-requisites)
    Download stylesheets, images, and any other files needed to view the page locally.

  • -r (--recursive)
    Enter subfolders and get their contents, and so on.

  • -nH
    Disable host-prefixed filenames. Prevents the creation of root folders like hostname.tld.

  • -P (--directory-prefix)
    Save the downloaded files in this directory.

  • -X (--exclude-directories)
    Do not download files from within these directories. Links to this pages will be prefixed with the FQD and will not be available locally.


