From 64cc4a88da8887ef1f7f4d90be0158d2cc76222d Mon Sep 17 00:00:00 2001
From: Xavier Roche
+
contrary refuse files of a particular type. That is the purpose of filters.
-
+Scan rules based on URL or extension (e.g. accept or refuse all .zip or .gif files)
+
+
+
+
| + | |
| +*.gif -image*.gif | Will accept all gif files BUT image1.gif,imageblue.gif,imagery.gif and so on |
| + | |
| -image*.gif +*.gif |
Will accept all gif files, because the second pattern is prioritary (because it is defined AFTER the first one)
@@ -155,6 +206,8 @@ See also: The FAQ |
| * | +* | any characters (the most commonly used) | |
| *[file] or *[name] | +*[file] or *[name] | any filename or name, e.g. not /,? and ; characters | |
| *[path] | +*[path] | any path (and filename), e.g. not ? and ; characters | |
| *[a,z,e,r,t,y] | +*[a,z,e,r,t,y] | any letters among a,z,e,r,t,y | |
| *[a-z] | +*[a-z] | any letters | |
| *[0-9,a,z,e,r,t,y] | +*[0-9,a,z,e,r,t,y] | any characters among 0..9 and a,z,e,r,t,y | |
| *[] | -no characters must be present after | +*[\*] | +the * character |
| *[<NN] | -the file size must be smaller than NN KB
- (note: this may cause broken files during the download) |
+ *[\\] | +the \ character |
| *[>NN] | -the file size must be greater than NN KB
- (note: this may cause broken files during the download) |
+ *[\[\]] | +the [ or ] character |
| *[<NN>MM] | -the file size must be smaller than NN KB and greater than MM KB
- (note: this may cause broken files during the download) |
+ *[] | +no characters must be present after |
| www.thisweb.com* | +www.thisweb.com* | This will refuse/accept this web site (all links located in it will be rejected) |
| *.com/* | +*.com/* | This will refuse/accept all links that contains .com in them |
| *cgi-bin* | +*cgi-bin* | This will refuse/accept all links that contains cgi-bin in them |
| www.*[path].com/*[path].zip | +www.*[path].com/*[path].zip | This will refuse/accept all zip files in .com addresses |
| *someweb*/*.tar* | +*someweb*/*.tar* | This will refuse/accept all tar (or tar.gz etc.) files in hosts containing someweb |
| */*somepage* | +*/*somepage* | This will refuse/accept all links containing somepage (but not in the address) |
| *.html | +*.html | This will refuse/accept all html files. Warning! With this filter you will accept ALL html files, even those in other addresses. (causing a global (!) web mirror..) Use www.someweb.com/*.html to accept all html files from a web. |
| *.html*[] | +*.html*[] | Identical to *.html, but the link must not have any supplemental characters at the end (links with parameters, like www.someweb.com/index.html?page=10, will be refused) |
+
+ Size patterns:
| *[<NN] | +the file size must be smaller than NN KB
+ (note: this may cause broken files during the download) |
+
| *[>NN] | +the file size must be greater than NN KB
+ (note: this may cause broken files during the download) |
+
| *[<NN>MM] | +the file size must be smaller than NN KB and greater than MM KB
+ (note: this may cause broken files during the download) |
+
+ Here are some examples of filters: (that can be generated automatically using the
+ interface)
| -*[<10] | +the file will be forbidden if its size is smaller than 10 KB | +
| -*[>50] | +the file will be forbidden if its size is greater than 50 KB | +
| -*[<10] -*[>50] | +the file will be forbidden if if its size is smaller than 10 KB or greater than 50 KB | +
| +*[<80>1] | +the file will be accepted if if its size is smaller than 80 KB and greater than 1 KB | +
+
+ Here are some examples of filters: (that can be generated automatically using the
+ interface)
| -mime:application/octet-stream | +This will refuse all links of type 'application/octet-stream' that were already scheduled for download + (i.e. the download will be aborted) | +
| -mime:application/* | +This will refuse all links of type begining with 'application/' that were already scheduled for download + (i.e. the download will be aborted) | +
| -mime:application/* +mime:application/pdf | +This will refuse all links of type begining with 'application/' that were already scheduled for download, except for 'application/pdf' ones + (i.e. all other 'application/' link download will be aborted) | +
| -mime:video/* | +This will refuse all video links that were already scheduled for download + (i.e. all other 'application/' link download will be aborted) | +
| -mime:video/* -mime:audio/* | +This will refuse all audio and video links that were already scheduled for download + (i.e. all other 'application/' link download will be aborted) | +
| -mime:*/* +mime:text/html +mime:image/* | +This will refuse all links that were already scheduled for download, except html pages, and images + (i.e. all other link download will be aborted). Note that this is a very unefficient way of filtering + files, as aborted downloads will generate useless requests to the server. You are strongly advised to + use additional URL scan rules | +
+
+ Here are some examples of good/bad scan rules interactions:
| Purpose | +Method | +Result | +
| Download all html and images on www.example.com | +-* +www.example.com/*.html +www.example.com/*.php +www.example.com/*.asp +www.example.com/*.gif +www.example.com/*.jpg +www.example.com/*.png -mime:*/* +mime:text/html +mime:image/* |
+ Good: efficient download | +
| -* +www.example.com/* -mime:*/* +mime:text/html +mime:image/* |
+ Bad: many aborted downloads, leading to poor performances and server load | +|
| Download only html on www.example.com, plus ZIP files | +-* +www.example.com/*.html +www.example.com/somedynamicscript.php +www.example.com/*.zip -mime:* +mime:text/html +mime:application/zip |
+ Good: ZIP files will be downloaded, even those generated by 'somedynamicscript.php' | +
| -* +www.example.com/*.html -mime:* +mime:text/html +mime:application/zip |
+ Bad: ZIP files will never be scheduled for download, and hence the zip mime scan rule will never be used | +|
| Download all html, and images smaller than 100KB on www.example.com | +-* +www.example.com/*.html +www.example.com/*.php +www.example.com/*.asp +www.example.com/*.gif*[<100] +www.example.com/*.jpg*[<100] +www.example.com/*.png*[<100] -mime:*/* +mime:text/html +mime:image/* |
+ Good: efficient download | +
| -* +www.example.com/**[<100] -mime:*/* +mime:text/html +mime:image/* |
+ Bad: many aborted downloads, leading to poor performances and server load | +