summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-06-02Cleanup in debian/rulesXavier Roche
2013-06-02Cleanup in configure scriptsXavier Roche
2013-06-02Added crawl-test.sh to extra distsXavier Roche
2013-06-02Added minimalistic unit tests for HTTrack (about time :p)Xavier Roche
2013-06-01Cleaned up a bunck of warnings.Xavier Roche
2013-06-013.47-16Xavier Roche
2013-06-01Fixed issue 14 (http://code.google.com/p/httrack/issues/detail?id=14) ↵Xavier Roche
related to the way non-ascii characters are being decoded Rationale: * inside URI * non-ascii characters are read with the page encoding, and transformed into UTF-8 * url-escaped %xx are considered utf-8 sequences to be decoded, unless they form invalis sequences (in such case we left them as-is) * html entities (names, or decimal/hex) are decoded as utf-8 characters * inside query string * non-ascii characters are read as binary, and escaped using %xx * url-escaped %xx are left unless not harmful (alphanum, for example) * html entities (names, or decimal/hex) are decoded as utf-8 characters and encoded back to the page encoding (possibly using %xx) * inside hostnames * non-ascii characters are encoded using IDNA Example: * are equivalent in a iso-8859-1 page: http://foo/café.html http://foo/caf%c3%a9.html http://caf&#a9;.html
2013-06-01Do not magically detect UTF-8 pages as "utf-8" charset, because is changes ↵Xavier Roche
the way links are decoded.
2013-06-01CosmeticXavier Roche
2013-06-01Added hts_readUTF8()Xavier Roche
2013-06-01Added hts_unescapeUrl()Xavier Roche
2013-05-31Fixed warnings.Xavier Roche
2013-05-31Updated vcprojXavier Roche
2013-05-313.47.15Xavier Roche
2013-05-31Fixed charset for top index titles.Xavier Roche
2013-05-31Fixed hts_unescapeEntities()Xavier Roche
2013-05-31Removed buggy code.Xavier Roche
2013-05-31Fixed typo.Xavier Roche
2013-05-31Misplaced continueXavier Roche
2013-05-31Fixed issue 14 (http://code.google.com/p/httrack/issues/detail?id=14)Xavier Roche
Rationale: * hostname is ASCII, non-ascii characters shall be encoded with IDNA * URI filenames may embed non-ascii characters, which MUST be UTF-8 encoded * query string may embed non-ascii characters, which are encoded with the pahe charset into %xx codes
2013-05-30Added missing htsentities.h htsentities.sh entries.Xavier Roche
2013-05-30Added hts_unescape_entities(), a rewrite of the HTML entities decoder.Xavier Roche
Fixed HTML entities decoding which was done before charset decoding.
2013-05-30Added hts_writeUTF8()Xavier Roche
2013-05-30Flush 3.47-14Xavier Roche
2013-05-26debian-only uploadsXavier Roche
2013-05-26Build-depends on libssl-devXavier Roche
2013-05-26Updated changelogXavier Roche
2013-05-263.47.14Xavier Roche
2013-05-26webhttrack/htsserver: ensure the local hostname is resolvable, and fallback ↵Xavier Roche
to localhost if necessary.
2013-05-25Fixed 260-characters limiterXavier Roche
Fixes issue 9 (http://code.google.com/p/httrack/issues/detail?id=9) Rationale: according to MSDN, we must limit the directory path to 248 characters, actually ("When using an API to create a directory, the specified path cannot be so long that you cannot append an 8.3 file name (that is, the directory name cannot exceed MAX_PATH minus 12).")
2013-05-25Added hts_convertUCS4StringToUTF8() and hts_convertUTF8StringToUCS4()Xavier Roche
2013-05-21DOS 8+3 fixes.Xavier Roche
2013-05-21Save IDNA-enabled hostnames as UTF-8 on disk (ie. www.héhé.fr/foo).Xavier Roche
2013-05-21Fixed hts_isCharsetUTF8()Xavier Roche
Added hts_copyStringUTF8()
2013-05-21Added hts_isStringIDNA()Xavier Roche
2013-05-20Added IDNA testing toolXavier Roche
2013-05-20Rewritten UTF-8 primitives for hts_convertStringUTF8ToIDNA()Xavier Roche
Added hts_convertStringIDNAToUTF8()
2013-05-19removed gz_is_available flag (zlib is mandatory)Xavier Roche
2013-05-19Fixed warningsXavier Roche
2013-05-19Fixed warningsXavier Roche
2013-05-19openssl is no longer dynamically probed at stratup, but dynamically linkedXavier Roche
2013-05-19Missing credits from 3.47.13Xavier Roche
2013-05-193.47.13Xavier Roche
2013-05-19Missing history record from 3.47.12Xavier Roche
2013-05-19Fixed htscharset.hXavier Roche
2013-05-19Fixed htscharsetXavier Roche
2013-05-19Added missing punycode.c in build.Xavier Roche
2013-05-19Missing files introduced in r244Xavier Roche
2013-05-19Added support for IDNA / RFC 3492 (Punycode) handling within URLs.Xavier Roche
2013-05-18Fixed "Bogus charset for requests when filenames have non-ascii characters ↵Xavier Roche
(RFC 3986)" (http://code.google.com/p/httrack/issues/detail?id=12)