From 25adbdabb47499fe641c7bd9595024ff82667058 Mon Sep 17 00:00:00 2001 From: Xavier Roche Date: Mon, 19 Mar 2012 12:51:31 +0000 Subject: httrack 3.30.1 --- html/faq.html | 937 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 937 insertions(+) create mode 100644 html/faq.html (limited to 'html/faq.html') diff --git a/html/faq.html b/html/faq.html new file mode 100644 index 0000000..c5146a9 --- /dev/null +++ b/html/faq.html @@ -0,0 +1,937 @@ + + + + + + + HTTrack Website Copier - Offline Browser + + + + + + + + + +
HTTrack Website Copier
+ + + + +
Open Source offline browser
+ + + + +
+ + + + +
+ + + + +
+ + +

F A Q

+ +
+ +


+

    +Tips: +
  • In case of troubles/problems during transfer, first check the hts-log.txt (and hts-err.txt) files to figure out what happened. These log files report all +events that may be useful to detect a problem. You can also ajust the debug level of the log files in the option +
  • +The tutorial written by Fred Cohen is a very good document to read, to understand how to use the engine, +how the command line version works, and how the window version works, too! All options are described and explained in +clear language! +
  • +
+

+ + + +

+ +
+
+
+ +Very Frequently Asked Questions:

+ +Q: HTTrack does not capture all files I want to capture!
+A: This is a frequent question, generally related to the filters. +BUT first check if your problem is not related to the
robots.txt website rules. +
+
+Okay, let me explain how to precisely control the capture process.
+
+Let's take an example:
+
+Imagine you want to capture the following site:
+www.someweb.com/gallery/flowers/
+
+HTTrack, by default, will capture all links encountered in www.someweb.com/gallery/flowers/ or in lower directories, like +www.someweb.com/gallery/flowers/roses/.
+It will not follow links to other websites, because this behaviour might cause to capture the Web entirely!
+It will not follow links located in higher directories, too (for example, www.someweb.com/gallery/flowers/ itself) because this +might cause to capture too much data.
+
+This is the default behaviour of HTTrack, BUT, of course, if you want, you can tell HTTrack to capture other directorie(s), website(s)!.. +
+In our example, we might want also to capture all links in www.someweb.com/gallery/trees/, and in www.someweb.com/photos/
+
+This can easily done by using filters: go to the Option panel, select the 'Scan rules' tab, and enter this line: +(you can leave a blank space between each rules, instead of entering a carriage return)
++www.someweb.com/gallery/trees/*
++www.someweb.com/photos/*

+
+This means "accept all links begining with www.someweb.com/gallery/trees/ and www.someweb.com/photos/" +- the + means "accept" and the final * means "any character will match after the previous ones". +Remember the *.doc or *.zip encountered when you want to select all files from a certain type on your computer: +it is almost the same here, except the begining "+"
+
+Now, we might want to exclude all links in www.someweb.com/gallery/trees/hugetrees/, because with the previous filter, +we accepted too many files. Here again, you can add a filter rule to refuse these links. Modify the previous filters to:
++www.someweb.com/gallery/trees/*
++www.someweb.com/photos/*
+-www.someweb.com/gallery/trees/hugetrees/*

+
+You have noticed the - in the begining of the third rule: this means "refuse links matching the rule" +; and the rule is "any files begining with www.someweb.com/gallery/trees/hugetrees/
+ +Voila! With these three rules, you have precisely defined what you wanted to capture.
+
+A more complex example?
+
+Imagine that you want to accept all jpg files (files with .jpg type) that have "blue" in the name and located in www.someweb.com
++www.someweb.com/*blue*.jpg
+
+More detailed information can be found here!
+
+
+ +
+General questions:
+

+ +Q: Is there any 'spyware' or 'adware' in this program? Can you prove that there isn't any?
+A: No ads (banners), and absolutely no 'spy' features inside the program.
+The best proof is the software status: all sources are released, and everybody can check them. Open source is the best protection against privacy problems - HTTrack is an open source project, free of charge and free of any spy 'features'.
+ +

Q: This software is 'free', but I bought it from an authorized reseller . What's going on?
+A: +HTTrack is free (free as in 'freedom') as it is covered by the GNU General Public License (GPL). +You can freely download it, without paying any fees, copy it to your friends, and modify it if you respect the license. +There are NO official/authorized resellers, because HTTrack is NOT a commercial product. +But you can be charged for duplication fees, or any other services (example: software CDroms or shareware collections, or fees for maintenance), +but you should have been informed that the software was free software/GPL, and you MUST have received a copy of the GNU General Public License. +Otherwise this is dishonnest and unfair. + + +

Q: Are there any risks of viruses with this software?
+A: For the software itself: +All official releases (at httrack.com) are checked against all known viruses, and the packaging process is also checked. Archives are stored on Un*x servers, not really concerned by viruses.
+For files you are downloading on the WWW using HTTrack: You may encounter websites which were corrupted by viruses, and downloading data on these websites might be dangerous (as dangerous as if using a regular Browser). Always ensure that websites you are crawling are safe. + (Note: remember that using an antivirus software is a good idea once you are connected to the Internet)
+ +

Q: The install is not working on NT without administrator rights!
+A: That's right. You can, however, install WinHTTrack on your own machine, and then copy your WinHTTrack folder from your Program Files folder to another machine, in a temporary directory (e.g. C:\temp\) + +

Q: Where can I find French/other languages documentation?
+A: Windows interface is available on several languages, but not yet the documentation! + +

Q: Is HTTrack working on NT/2000?
+A: Yes, it does + +

Q: What's the difference between HTTrack and WinHTTrack?
+A: WinHTTrack is the Windows release of HTTrack (with a graphic shell) + +

Q: Is HTTrack Mac compatible?
+A: No, because of a lack of time. But sources are available + +

Q: Can HTTrack be compiled on all Un*x?
+A: It should. The Makefile may be modified in some cases, however + +

Q: I use HTTrack for professional purpose. What about restrictions/license fee?
+A: HTTrack is covered by the GNU General Public License (GPL). There is no restrictions using HTTrack for professional purpose, +except if you develop a software which uses HTTrack components (parts of the source, or any other component). +See the license.txt file for more information + +

Q: Is there any license royalties for distributing a mirror made with HTTrack?
+A: No. + +

Q: Is a DLL/library version available?
+A: Yes. The default distribution includes a DLL (Windows) or a .so (Un*X), used by the program + +

Q: Is there a X11/KDE shell available for Linux and Un*x?
+A: Yes. See the download/contribution section at www.httrack.com! + +

+Troubleshooting:
+

+ +Q: Some sites are captured very well, other aren't. Why?
+A: +There are several reasons (and solutions) for a mirror to fail. Reading the log files (ans this FAQ!) is generally a VERY good idea to figure out what occured. + +
+ +There are cases, however, that can not be (yet) handled: + +
    +
  • Flash sites - no full support
  • +
  • Intensive Java/Javascript sites - might be bogus/incomplete
  • +
  • Complex CGI with built-in redirect, and other tricks - very complicated to handle, and therefore might cause problems
  • +
  • Parsing problem in the HTML code (cases where the engine is fooled, for example by a false comment (<!--) which has no closing comment (-->) detected. + Rare cases, but might occur. + A bug report is then generally good! +
  • +
+ +Note: +For some sites, setting "Force old HTTP/1.0 requests" option can be useful, as this option uses more basic requests (no HEAD request for example). +This will cause a performance loss, but will increase the compatibility with some cgi-based sites. +
+ +
+ +Q: Only the first page is caught. What's wrong?
+A: First, check the hts-log.txt file (and/or hts-err.txt error log file) - this can give you precious information.
+The problem can be a website that redirects you to another site (for example, www.someweb.com to public.someweb.com) : +in this case, use filters to accept this site
+This can be, also, a problem in the HTTrack options (link depth too low, for example)
+ +

Q: With WinHTTrack, sometimes the minimize in system tray causes a crash!
+A: This bug sometimes appears in the shell on some systems. If you encounter this problem, avoid minimizing the window! + +

Q: Are https URL working?
+A: Yes, HTTrack does support (since 3.20 release) https (secure socket layer protocol) sites + +

Q: Are ipv6 URL working?
+A: Yes, HTTrack does support (since 3.20 release) ipv6 sites, using A/AAAA entries, or direct v6 addresses (like http://[3ffe:b80:12:34:56::78]/) + +

Q: Files are created with strange names, like '-1.html'!
+A: Check the build options (you may have selected user-defined structure with wrong parameters!) + +

Q: When capturing real audio/video links (.ram), I only get a shortcut!
+A: Yes, but .ra/.rm associated file should be captured together - except if rtsp:// protocol is used (not supported by HTTrack yet), or if proper filters are needed + +

Q: Using user:password@address is not working!
+A: Again, first check the hts-log.txt and hts-err.txt error log files - this can give you precious information
+The site may have a different authentication scheme - form based authentication, for example. +In this case, use the URL capture features of HTTrack, it might work. +
Note: If your username and/or password contains a '@' character, you may have to replace all '@' +occurences by '%40' so that it can work, such as in user%40domain.com:foobar@www.foo.com/auth/. +You may have to do the same for all "special" characters like spaces (%20), quotes (%22).. +
+

+ +Q: When I use HTTrack, nothing is mirrored (no files) What's +happening?
+A: First, be sure that the URL typed is correct. Then, check if you need to use a +proxy server (see proxy options in WinHTTrack or the -P proxy:port option in the +command line program). The site you want to mirror may only accept certain browsers. You +can change your "browser identity" with the Browser ID option in the OPTION box. +Finally, you can have a look at the hts-log.txt (and hts-err.txt) file to see what +happened.
+
+ +
Q: There are missing files! What's happening?
+A: You may want to capture files that exist in a different folder, or in another web site. +You may also want to capture files that are forbidden by default by the
robots.txt website rules. +In these cases, HTTrack does not capture these links automatically, you have to tell it to do so. +

+
  • Either use the filters.
    +Example: You are downloading http://www.someweb.com/foo/ and can not get .jpg images located +in http://www.someweb.com/bar/ (for example, http://www.someweb.com/bar/blue.jpg)
    +Then, add the filter rule +www.someweb.com/bar/*.jpg to accept all .jpg files from this location
    +You can, also, accept all files from the /bar folder with +www.someweb.com/bar/*, or only html files with +www.someweb.com/bar/*.html and so on..

    +
  • +If the problems are related to robots.txt rules, that do not let you access some folders (check in the logs if you are not sure), +you may want to disable the default robots.txt rules in the options. (but only disable this option with great care, +some restricted parts of the website might be huge or not downloadable) +
+
+
+ +Q: There are corrupted images/files! How to fix them?
+A: First check the log files to ensure that the images do really exist remotely and are not fake html error pages renamed into .jpg ("Not found" errors, for example). +Rescan the website with "Continue an interrupted download" to catch images that might be broken due to various errors (transfer timemout, for example). +Then, check if the broken image/file name is present in the log (hts-log.txt) - in this case you will find there the reason why the file has not been properly caught. +
If this doesn't work, delete the corrupted files (Note: to detect corrupted images, you can browse the directories with a tool like ACDSee and then delete them) +and rescan the website as described before. HTTrack will be obliged to recatch the deleted files, and this time it should work, if they do really exist remotely!.
+
+
+ +
Q: FTP links are not caught! What's happening?
+A: FTP files might be seen as external links, especially if they are located in outside domain. You have either to accept all external links (See the links options, -n option) or +only specific files (see
filters section).
+Example: You are downloading http://www.someweb.com/foo/ and can not get ftp://ftp.someweb.com files
+Then, add the filter rule +ftp.someweb.com/* to accept all files from this (ftp) location
+
+
+ +Q: I got some weird messages telling that robots.txt do not allow several files to be captured. What's going on?
+A: +These rules, stored in a file called robots.txt, are given by the website, to specify which links or folders should not be caught by robots and spiders +- for example, /cgi-bin or large images files. +They are followed by default by HTTrack, as it is advised. Therefore, you may miss some files that would have been downloaded without +these rules - check in your logs if it is the case:
+Info: Note: due to www.foobar.com remote robots.txt rules, links begining with these path will be forbidden: /cgi-bin/,/images/ (see in the options to disable this) + +
+If you want to disable them, just change the corresponding option in the option list! (but only disable this option with great care, +some restricted parts of the website might be huge or not downloadable) +
+
+
+ +
Q: I have duplicate files! What's going on?
+A: This is generally the case for top indexes (index.html and index-2.html), isn't it? +
+This is a common issue, but that can not be easily avoided!
+For example, http://www.foobar.com/ and http://www.foobar.com/index.html might be the same pages. +But if links in the website refers both to http://www.foobar.com/ and http://www.foobar.com/index.html, these two pages will be caught. +And because http://www.foobar.com/ must have a name, as you may want to browse the website locally (the / would give a directory listing, NOT the index itself!), +HTTrack must find one. Therefore, two index.html will be produced, one with the -2 to show that the file had to be renamed. +
+It might be a good idea to consider that http://www.foobar.com/ and http://www.foobar.com/index.html are the same links, to avoid +duplicate files, isn't it? +NO, because the top index (/) can refer to ANY filename, and if index.html is generally the default name, index.htm can be choosen, +or index.php3, mydog.jpg, or anything you may imagine. (some webmasters are really crazy) +
+
+Note: In some rare cases, duplicate data files can be found when the website redirect to another file. This issue should be rare, and might be avoided using filters. +
+
+
+ +
Q: I'm downloading too many files! What can I do?
+A: This is often the case when you use too large a filter, for example +*.html, which asks the +engine to catch all .html pages (even ones on other sites!). In this case, try to use more specific filters, like +www.someweb.com/specificfolder/*.html
+If you still have too many files, use filters to avoid somes files. For example, if you have too many files from www.someweb.com/big/, +use -www.someweb.com/big/* to avoid all files from this folder. Remember that the default behaviour of the engine, when +mirroring http://www.someweb.com/big/index.html, is to catch everything in http://www.someweb.com/big/. Filters are your friends, +use them! +
+
+
+ +
Q: The engine turns crazy, getting thousands of files! What's going on?
+A: This can happen if a loop occurs in some bogus website. For example, a page that refers to itself, with a timestamp +in the query string (e.g. http://www.someweb.com/foo.asp?ts=2000/10/10,09:45:17:147). +These are really annoying, as it is VERY difficult to detect the loop (the timestamp might be a page number). +To limit the problem: set a recurse level (for example to 6), or avoid the bogus pages (use the filters) + +
+
+ +
Q: File are sometimes renamed (the type is changed)! Why?
+A: By default, HTTrack tries to know the type of remote files. This is useful when links like +http://www.someweb.com/foo.cgi?id=1 can be either HTML pages, images or anything else. +Locally, foo.cgi will not be recognized as an html page, or as an image, by your browser. HTTrack has to rename the file +as foo.html or foo.gif so that it can be viewed.
+
+
+ +
Q: File are sometimes *incorrectly* renamed! Why?
+A: Sometimes, some data files are seen by the remote server as html files, or images : in this case HTTrack is +being fooled.. and rename the file. This can generally be avoided by using the "use HTTP/1.0 requests" option. +You might also avoid this by disabling the type checking in the option panel. + +
+
+ +
Q: How do I rename all ".dat" files into ".zip" files?
+A: Simply use the --assume dat=application/x-zip option + +
+
+ +
Q: I can not access several pages (access forbidden, or redirect to another location), but I can with my browser, what's going on?
+A: You may need cookies! Cookies are specific data (for example, your username or password) that are sent to your browser once +you have logged in certain sites so that you only have to log-in once. For example, after having entered your username in a website, you can +view pages and articles, and the next time you will go to this site, you will not have to re-enter your username/password.
+To "merge" your personnal cookies to an HTTrack project, just copy the cookies.txt file from your Netscape folder (or the cookies located into the Temporary Internet Files folder for IE) +into your project folder (or even the HTTrack folder) +
+
+
+ +
Q: Some pages can't be seen, or are displayed with errors!
+A: Some pages may include javascript or java files that are not recognized. For +example, generated filenames. There may be transfer problems, too (broken pipe, etc.). But +most mirrors do work. We still are working to improve the mirror quality of HTTrack.
+
+
+ +
Q: Some Java applets do not work properly!
+A: Java applets may not work in some cases, for example if HTTrack failed to detect all included classes +or files called within the class file. Sometimes, Java applets need to be online, because remote files are +directly caught. Finally, the site structure can be incompatible with the class (always try to keep the original site structure +when you want to get Java classes)
+If there is no way to make some classes work properly, you can exclude them with the filters. +They will be available, but only online. +
+
+
+ +
Q: HTTrack is taking too much time for parsing, it is very slow. What's wrong?
+A: Former (before 3.04) releases of HTTrack had problems with parsing. It was really slow, and performances -especially +with huge HTML files- were not really good. The engine is now optimized, and should parse very quickly all html files. +For example, a 10MB HTML file should be scanned in less than 3 or 4 seconds.
+
+Therefore, higher values mean that the engine had to wait a bit for testing several links. + +
    +
  • Sometimes, links are malformed in pages. +"a href="/foo"" instead of "a href="/foo/"", for example, is a common mistake. It will force the engine to +make a supplemental request, and find the real /foo/ location. +
  • +

    +
  • Dynamic pages. Links with names terminated by .php3, .asp or other type which are different from the regular +.html or .htm will require a supplemental request, too. HTTrack has to "know" the type (called "MIME type") of a file +before forming the destination filename. Files like foo.gif are "known" to be images, ".html" are obviously HTML pages - but ".php3" +pages may be either dynamically generated html pages, images, data files...
    +
    +If you KNOW that ALL ".php3" and ".asp" pages are in fact HTML pages on a mirror, use the assume option:
    +--assume php3=text/html,asp=text/html +

    +This option can be used to change the type of a file, too : the MIME type "application/x-MYTYPE" will always have the "MYTYPE" type. +Therefore,
    +--assume dat=application/x-zip +
    +will force the engine to rename all dat files into zip files +
  • +
+ + +

+
+ +
Q: HTTrack is being idle for a long time without +transfering. What's happening?
+A: Maybe you try to reach some very slow sites. Try a lower TimeOut value (see +options, or -Txx option in the command line program). Note that you will abandon +the entire site (except if the option is unchecked) if a timeout happen You can, with the +Shell version, skip some slow files, too.
+
+ +
Q: I want to update a site, but it's taking too much time! What's happening?
+A: First, HTTrack always tries to minimize the download flow by interrogating the server about the +file changes. But, because HTTrack has to rescan all files from the begining to rebuild the local site structure, +it can take some time. +Besides, some servers are not very smart and always consider that they get newer files, forcing HTTrack to reload them, +even if no changes have been made! +
+
+ +
Q: I wanted to update a site, but after the update the site disappeared!! What's going on?
+A: You may have done something wrong, but not always + +
    +
  • The site has moved : the current location only shows a notification. Therefore, all other files have been deleted to show the current state of the website!
  • +
  • The connection failed: the engine could not catch the first files, and therefore deleted everything. +To avoid that, using the option "do not purge old files" might be a good idea
  • +
  • You tried to add a site to the project BUT in fact deleted the former addresses.
    +Example: A project contains 'www.foo.com www.bar.com' and you want to add 'www.doe.com'. +Ensure that 'www.foo.com www.bar.com www.doe.com' is the new URL list, and NOT 'www.doe.com'! +
  • +
+ +

+ +
Q: I am behind a firewall. What can I do?
+A: You need to use a proxy, too. Ask your administrator to know the proxy server's +name/port. Then, use the proxy field in HTTrack or use the -P proxy:port option +in the command line program.
+

+ +

Q: HTTrack has crashed during a mirror, what's happening?
+A: We are trying to avoid bugs and problems so that the program can be as reliable as +possible. But we can not be infallible. If you occurs a bug, please check if you have the +latest release of HTTrack, and send us an email with a detailed description of your +problem (OS type, addresses concerned, crash description, and everything you deem to be +necessary). This may help the other users too.
+
+
+ +
Q: I want to update a mirrored project, but HTTrack is retransfering all pages. What's going on?
+A: First, HTTrack always rescans all local pages to reconstitute the website structure, and it can take some time. +Then, it asks the server if the files that are stored locally are up-to-date. On most sites, pages are not +updated frequently, and the update process is fast. But some sites have dynamically-generated pages that are considered as +"newer" than the local ones.. even if they are identical! Unfortunately, there is no possibility to avoid this problem, +which is strongly linked with the server abilities. + +
+
+ +
Q: I want to continue a mirrored project, but HTTrack is rescanning all pages. What's going on?
+A: HTTrack has to (quickly) rescan all pages from the cache, without retransfering them, to rebuild the internal file structure. However, this process can take some time with huge sites +with numerous links. + +
+
+ +
Q: HTTrack window sometimes "disappears" at then end of a mirrored project. What's going on?
+A: This is a known bug in the interface. It does NOT affect the quality of the mirror, however. We are still hunting it down, +but this is a smart bug.. + +
+
+ +
Questions concerning a mirror:
+ +
+
Q: I want to mirror a Web site, but there are some files outside +the domain, too. How to retrieve them?
+A: If you just want to retrieve files that can be reached through links, just activate +the 'get file near links' option. But if you want to retrieve html pages too, you can both +use wildcards or explicit addresses ; e.g. add www.someweb.com/* to accept all +files and pages from www.someweb.com.
+
+
Q: I have forgotten some URLs of files during a long +mirror.. Should I redo all?
+A: No, if you have kept the 'cache' files (in hts-cache), cached files will not be +retransfered.
+
+
Q: I just want to retrieve all ZIP files or other files in a web +site/in a page. How do I do it?
+A: You can use different methods. You can use the 'get files near a link' option if +files are in a foreign domain. You can use, too, a filter adress: adding +*.zip +in the URL list (or in the filter list) will accept all ZIP files, even if these files are +outside the address.
+Example : httrack www.someweb.com/someaddress.html +*.zip will allow +you to retrieve all zip files that are linked on the site.

+
+
Q: There are ZIP files in a page, but I don't want to transfer +them. How do I do it?
+A: Just filter them: add -*.zip in the filter list.
+
+
Q: I don't want to download ZIP files bigger than 1MB and MPG files smaller than 100KB. Is it possible?
+A: You can use
filters for that ; using the syntax:
+-*.zip*[>1000] -*.mpg*[<100]
+
+Q: I don't want to load gif files.. but what may happen if I +watch the page?
+A: If you have filtered gif files (-*.gif), links to gif files will be +rebuilt so that your browser can find them on the server.
+
+
Q: I don't want to download thumbnail images.. is it possible?
+A: Filters can not be used with image pixel size ; but you can filter on file size (bytes). +Use advanced
filters for that ; such as:
+-*.gif*[<10] to exclude gif files smaller than 10KiB. +

+
+Q: I get all types of files on a web site, but I didn't select +them on filters!
+A: By default, HTTrack retrieves all types of files on authorized links. To avoid +that, define filters like
-* +<website>/*.html ++<website>/*.htm +<website>/ +*.<type wanted>
+Example: httrack www.someweb.com/index.html -* +www.someweb.com/*.htm* +www.someweb.com/*.gif +www.someweb.com/*.jpg
+
+
Q: When I use filters, I get too many files!
+A: You might use too large a filter, for example *.html will get ALL html +files identified. If you want to get all files on an address, use www.<address>/*.html.
+If you want to get ONLY files defined by your filters, use something like -* +www.foo.com/*, because ++www.foo.com/* will only accept selected links without forbidding other ones!
+There are lots of possibilities using filters.
+Example:httrack www.someweb.com +*.someweb.com/*.htm*
+
+
Q: When I use filters, I can't access another domain, but I +have filtered it!
+A: You may have done a mistake declaring filters, for example +www.someweb.com/* +-*someweb* will not work, because -*someweb* has an upper priority (because it has +been declared after +www.someweb.com)
+
+
Q: Must I add a  '+' or '-' in the filter list when I want +to use filters?
+A: YES. '+' is for accepting links and '-' to avoid them. If you forget it, HTTrack +will consider that you want to accept a filter if there is a wild card in the syntax - e.g. ++<filter> is identical to <filter> if <filter> contains a wild card (*) +(else it will be considered as a normal link to mirror)

+
+Q: I want to find file(s) in a web-site. How do I do it?
+A: You can use the filters: forbid all files (add a -* in the +filter list) and accept only html files and the file(s) you want to retrieve (BUT do not +forget to add +<website>*.html in the filter list, or pages will not be +scanned! Add the name of files you want with a */ before ; i.e. if you want to +retrieve file.zip, add */file.zip)
+Example:httrack www.someweb.com +www.someweb.com/*.htm* +thefileiwant.zip
+
+
+ +
Q: I want to download ftp files/ftp site. How do I do it?
+A: First, HTTrack is not the best tool to download many ftp files. Its ftp engine is basic (even if reget are +possible) and if your purpose is to download a complete site, use a specific client.
+You can download ftp files just by typing the URL, such as ftp://ftp.somesite.com/pub/files/file010.zip and list ftp directories +like ftp://ftp.somesite.com/pub/files/
.
+Note: For the filters, use something like +ftp.somesite.com/* +
+ +
Q: How can I retrieve .asp or .cgi sources instead of .html result?
+A: You can't! For security reasons, web servers do not allow that. + +

Q: How can I remove these annoying <!-- Mirrored from... --> from html files?
+A: Use the footer option (-%F, or see the WinHTTrack options) + +

Q: Do I have to select between ascii/binary transfer mode?
+A: No, http files are always transfered as binary files. Ftp files, too (even if ascii mode could be selected) + +

Q: Can HTTrack perform form-based authentication?
+A: Yes. See the URL capture abilities (--catchurl for command-line release, or in the WinHTTrack interface) + +

Q: Can I redirect downloads to tar/zip archive?
+A: Yes. See the shell system command option (-V option for command-line release) + +

Q: Can I use username/password authentication on a site?
+A: Yes. Use user:password@your_url (example: http://foo:bar@www.someweb.com/private/mybox.html) + +

Q: Can I use username/password authentication for a proxy?
+A: Yes. Use user:password@your_proxy_name as your proxy name (example: smith:foo@proxy.mycorp.com) + +

Q: Can HTTrack generates HP-UX or ISO9660 compatible files?
+A: Yes. See the build options (-N, or see the WinHTTrack options) + +

Q: If there any SOCKS support?
+A: Not yet! + +

Q: What's this hts-cache directory? Can I remove it?
+A: NO if you want to update the site, because this directory is used by HTTrack for this purpose. +If you remove it, options and URLs will not be available for updating the site + +

Q: What is the meaning of the Links scanned: 12/34 (+5) line in WinHTTrack/WebHTTrack?
+A: 12 is the number of links scanned and stored, 34 the total number of links detected to be parsed, and 5 the number of files downloaded in background. +In this example, 17 links were downloaded out of a (temporary) total of 34 links. + +

Q: Can I start a mirror from my bookmarks?
+A: Yes. Drag&Drop your bookmark.html file to the WinHTTrack window (or use file://filename for command-line release) and select +bookmark mirroring (mirror all links in pages, -Y) or bookmark testing (--testlinks) + +

Q: Can I convert a local website (file:// links) to a standard website?
+A: Yes. Just start from the top index (example: file://C:\foopages\index.html) and mirror the local website. +HTTrack will convert all file:// links to relative ones. + + +

Q: Can I copy a project to another folder - Will the mirror work?
+A: Yes. There is no absolute links, all links are relative. +You can copy a project to another drive/computer/OS, and browse is without installing anything. + +

Q: Can I copy a project to another computer/system? Can I then update it ?
+A: Absolutely! You can keep your HTTrack favorite folder (C:\My Web Sites) in your local hard disk, copy it +for a friend, and possibly update it, and then bring it back!
You can copy individual folders (projects), too: exchange +your favorite websites with your friends, or send an old version of a site to someone who has a faster connection, and +ask him to update it!
+ + +
+Note: Export (Windows <-> Linux)
+The file and cache structure is compatible between Linux/Windows, but you may have to do some changes, like the path
+ + + + + +
+ Windows -> Linux/Unix +
+ Copy (in binary mode) the entire folder and then to update it, enter into it and do a
+ + httrack --update -O ./ + +

+ + Note: You can then safely replace the existing folder (under Windows) with this one, because + the Linux/Unix version did not change any options
+ Note: If you often switch between Windows/Linux with the same project, it might be a good idea to edit the hts-cache/doit.log file + and delete old "-O" entries, because each time you do a httrack --update -O ./ an entry is added, + causing the command line to be long +
+
+ Linux/Unix -> Windows +
+ Copy (in binary mode) the entire folder in your favorite Web mirror folder. + Then, select this project, AND retype ALL URLs AND redefine all options as if you were + creating a new project. + This is necessary because the profile (winprofile.ini) has not be created with the Linux/Unix version. + But do not be afraid, WinHTTrack will use cached files to update the project! +
+
+ +
+ +

Q: How can I grab email addresses in web pages?
+A: You can not. HTTrack has not be designed to be an email grabber, like many other (bad) products. + + +
+
+
+Other problems:
+
+ +Q: My problerm is not listed!
+A: Feel free to
contact us! +
+ +


+ + +

+
+
+ + + + + +
+ + + + + + -- cgit v1.2.3