Age | Commit message (Collapse) | Author | |
---|---|---|---|
2013-06-02 | Cleanup in debian/rules | Xavier Roche | |
2013-06-02 | Cleanup in configure scripts | Xavier Roche | |
2013-06-02 | Added crawl-test.sh to extra dists | Xavier Roche | |
2013-06-02 | Added minimalistic unit tests for HTTrack (about time :p) | Xavier Roche | |
2013-06-01 | Cleaned up a bunck of warnings. | Xavier Roche | |
2013-06-01 | 3.47-16 | Xavier Roche | |
2013-06-01 | Fixed issue 14 (http://code.google.com/p/httrack/issues/detail?id=14) ↵ | Xavier Roche | |
related to the way non-ascii characters are being decoded Rationale: * inside URI * non-ascii characters are read with the page encoding, and transformed into UTF-8 * url-escaped %xx are considered utf-8 sequences to be decoded, unless they form invalis sequences (in such case we left them as-is) * html entities (names, or decimal/hex) are decoded as utf-8 characters * inside query string * non-ascii characters are read as binary, and escaped using %xx * url-escaped %xx are left unless not harmful (alphanum, for example) * html entities (names, or decimal/hex) are decoded as utf-8 characters and encoded back to the page encoding (possibly using %xx) * inside hostnames * non-ascii characters are encoded using IDNA Example: * are equivalent in a iso-8859-1 page: http://foo/café.html http://foo/caf%c3%a9.html http://caf&#a9;.html | |||
2013-06-01 | Do not magically detect UTF-8 pages as "utf-8" charset, because is changes ↵ | Xavier Roche | |
the way links are decoded. | |||
2013-06-01 | Cosmetic | Xavier Roche | |
2013-06-01 | Added hts_readUTF8() | Xavier Roche | |
2013-06-01 | Added hts_unescapeUrl() | Xavier Roche | |
2013-05-31 | Fixed warnings. | Xavier Roche | |
2013-05-31 | Updated vcproj | Xavier Roche | |
2013-05-31 | 3.47.15 | Xavier Roche | |
2013-05-31 | Fixed charset for top index titles. | Xavier Roche | |
2013-05-31 | Fixed hts_unescapeEntities() | Xavier Roche | |
2013-05-31 | Removed buggy code. | Xavier Roche | |
2013-05-31 | Fixed typo. | Xavier Roche | |
2013-05-31 | Misplaced continue | Xavier Roche | |
2013-05-31 | Fixed issue 14 (http://code.google.com/p/httrack/issues/detail?id=14) | Xavier Roche | |
Rationale: * hostname is ASCII, non-ascii characters shall be encoded with IDNA * URI filenames may embed non-ascii characters, which MUST be UTF-8 encoded * query string may embed non-ascii characters, which are encoded with the pahe charset into %xx codes | |||
2013-05-30 | Added missing htsentities.h htsentities.sh entries. | Xavier Roche | |
2013-05-30 | Added hts_unescape_entities(), a rewrite of the HTML entities decoder. | Xavier Roche | |
Fixed HTML entities decoding which was done before charset decoding. | |||
2013-05-30 | Added hts_writeUTF8() | Xavier Roche | |
2013-05-30 | Flush 3.47-14 | Xavier Roche | |
2013-05-26 | debian-only uploads | Xavier Roche | |
2013-05-26 | Build-depends on libssl-dev | Xavier Roche | |
2013-05-26 | Updated changelog | Xavier Roche | |
2013-05-26 | 3.47.14 | Xavier Roche | |
2013-05-26 | webhttrack/htsserver: ensure the local hostname is resolvable, and fallback ↵ | Xavier Roche | |
to localhost if necessary. | |||
2013-05-25 | Fixed 260-characters limiter | Xavier Roche | |
Fixes issue 9 (http://code.google.com/p/httrack/issues/detail?id=9) Rationale: according to MSDN, we must limit the directory path to 248 characters, actually ("When using an API to create a directory, the specified path cannot be so long that you cannot append an 8.3 file name (that is, the directory name cannot exceed MAX_PATH minus 12).") | |||
2013-05-25 | Added hts_convertUCS4StringToUTF8() and hts_convertUTF8StringToUCS4() | Xavier Roche | |
2013-05-21 | DOS 8+3 fixes. | Xavier Roche | |
2013-05-21 | Save IDNA-enabled hostnames as UTF-8 on disk (ie. www.héhé.fr/foo). | Xavier Roche | |
2013-05-21 | Fixed hts_isCharsetUTF8() | Xavier Roche | |
Added hts_copyStringUTF8() | |||
2013-05-21 | Added hts_isStringIDNA() | Xavier Roche | |
2013-05-20 | Added IDNA testing tool | Xavier Roche | |
2013-05-20 | Rewritten UTF-8 primitives for hts_convertStringUTF8ToIDNA() | Xavier Roche | |
Added hts_convertStringIDNAToUTF8() | |||
2013-05-19 | removed gz_is_available flag (zlib is mandatory) | Xavier Roche | |
2013-05-19 | Fixed warnings | Xavier Roche | |
2013-05-19 | Fixed warnings | Xavier Roche | |
2013-05-19 | openssl is no longer dynamically probed at stratup, but dynamically linked | Xavier Roche | |
2013-05-19 | Missing credits from 3.47.13 | Xavier Roche | |
2013-05-19 | 3.47.13 | Xavier Roche | |
2013-05-19 | Missing history record from 3.47.12 | Xavier Roche | |
2013-05-19 | Fixed htscharset.h | Xavier Roche | |
2013-05-19 | Fixed htscharset | Xavier Roche | |
2013-05-19 | Added missing punycode.c in build. | Xavier Roche | |
2013-05-19 | Missing files introduced in r244 | Xavier Roche | |
2013-05-19 | Added support for IDNA / RFC 3492 (Punycode) handling within URLs. | Xavier Roche | |
2013-05-18 | Fixed "Bogus charset for requests when filenames have non-ascii characters ↵ | Xavier Roche | |
(RFC 3986)" (http://code.google.com/p/httrack/issues/detail?id=12) |