From 25adbdabb47499fe641c7bd9595024ff82667058 Mon Sep 17 00:00:00 2001 From: Xavier Roche Date: Mon, 19 Mar 2012 12:51:31 +0000 Subject: httrack 3.30.1 --- HelpHtml/filters.html | 261 -------------------------------------------------- 1 file changed, 261 deletions(-) delete mode 100644 HelpHtml/filters.html (limited to 'HelpHtml/filters.html') diff --git a/HelpHtml/filters.html b/HelpHtml/filters.html deleted file mode 100644 index 6438dab..0000000 --- a/HelpHtml/filters.html +++ /dev/null @@ -1,261 +0,0 @@ - - - - - - - HTTrack Website Copier - Offline Browser - - - - - - - - - -
HTTrack Website Copier
- - - - -
Open Source offline browser
- - - - -
- - - - -
- - - - -
- - -

Filters: Advanced

- -
- -See also: The FAQ
- -
- - You have to know that once you have defined - starts links, the default mode is to mirror these links - i.e. if one of your start page is - www.someweb.com/test/index.html, all links starting with www.someweb.com/test/ will be - accepted. But links directly in www.someweb.com/.. will not be accepted, however, because - they are in a higher strcuture. This prevent HTTrack from mirroring the whole site. (All - files in structure levels equal or lower than the primary links will be retrieved.)
-
-
- But you may want to download files that are not directly in the subfolders, or on the - contrary refuse files of a particular type. That is the purpose of filters. -
- -
- To accept a family of links (for example, all links with a specific name or type), you just have to add - an authorization filter, like +*.gif. The pattern is a plus (this one: +), - followed by a pattern composed of letters and wildcards (this one: *). -

- To forbide a family of links, define - an authorization filter, like -*.gif. The pattern is a dash (this one: -), - followed by a the same kind of pattern as for the authorization filter. -

- Example: +*.gif will accept all files finished by .gif
- Example: -*.gif will refuse all files finished by .gif
-
- -
- Let's talk a little more about patterns: - -
- Filters are analyzed by HTTrack from the first filter to the last one. The complete URL - name is compared to filters defined by the user or added automatically by HTTrack.

- A link has an higher priority than the one before it - hierarchy is important:
- -
- - - -
- +*.gif -image*.gif - - Will accept all gif files BUT image1.gif,imageblue.gif,imagery.gif and so on -
- -image*.gif +*.gif - - Will accept all gif files, because the second pattern is prioritary (because it is defined AFTER the first one) -
-
- -
- We saw that patterns are composed of letters and wildcards (*), as in */image*.gif - -


- Special wild cards can be used for specific characters: (*[..])

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
*any characters (the most commonly used)
*[file] or *[name]any filename or name, e.g. not /,? and ; characters
*[path]any path (and filename), e.g. not ? and ; characters
*[a,z,e,r,t,y]any letters among a,z,e,r,t,y
*[a-z]any letters
*[0-9,a,z,e,r,t,y]any characters among 0..9 and a,z,e,r,t,y
*[]no characters must be present after
- - -


- Here are some examples of filters: (that can be generated automatically using the - interface)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
www.thisweb.com* This will refuse/accept this web site (all links located in it will be rejected)
*.com/*This will refuse/accept all links that contains .com in them
*cgi-bin* This will refuse/accept all links that contains cgi-bin in them
www.*.com/*[path].zip This will refuse/accept all zip files in .com addresses
*someweb*/*.tar*This will refuse/accept all tar (or tar.gz etc.) files in hosts containing someweb
*/*somepage*This will refuse/accept all links containing somepage (but not in the address)
*.htmlThis will refuse/accept all html files.
- Warning! With this filter you will accept ALL html files, even those in other addresses. - (causing a global (!) web mirror..) Use www.someweb.com/*.html to accept all html files from - a web.
*.html*[]Identical to *.html, but the link must not have any supplemental characters - at the end (links with parameters, like www.someweb.com/index.html?page=10, will be - refused)
- -
- - -
-
-
- - - - - -
- - - - - - -- cgit v1.2.3