summaryrefslogtreecommitdiff
path: root/src/htsparse.c
AgeCommit message (Collapse)Author
2013-06-14Escaping fixes when the final disk filename contains a %Xavier Roche
2013-06-14Malformed escaping.Xavier Roche
2013-06-01Fixed issue 14 (http://code.google.com/p/httrack/issues/detail?id=14) ↵Xavier Roche
related to the way non-ascii characters are being decoded Rationale: * inside URI * non-ascii characters are read with the page encoding, and transformed into UTF-8 * url-escaped %xx are considered utf-8 sequences to be decoded, unless they form invalis sequences (in such case we left them as-is) * html entities (names, or decimal/hex) are decoded as utf-8 characters * inside query string * non-ascii characters are read as binary, and escaped using %xx * url-escaped %xx are left unless not harmful (alphanum, for example) * html entities (names, or decimal/hex) are decoded as utf-8 characters and encoded back to the page encoding (possibly using %xx) * inside hostnames * non-ascii characters are encoded using IDNA Example: * are equivalent in a iso-8859-1 page: http://foo/café.html http://foo/caf%c3%a9.html http://caf&#a9;.html
2013-05-31Fixed charset for top index titles.Xavier Roche
2013-05-31Fixed hts_unescapeEntities()Xavier Roche
2013-05-31Removed buggy code.Xavier Roche
2013-05-31Fixed issue 14 (http://code.google.com/p/httrack/issues/detail?id=14)Xavier Roche
Rationale: * hostname is ASCII, non-ascii characters shall be encoded with IDNA * URI filenames may embed non-ascii characters, which MUST be UTF-8 encoded * query string may embed non-ascii characters, which are encoded with the pahe charset into %xx codes
2013-05-30Added hts_unescape_entities(), a rewrite of the HTML entities decoder.Xavier Roche
Fixed HTML entities decoding which was done before charset decoding.
2013-05-19openssl is no longer dynamically probed at stratup, but dynamically linkedXavier Roche
2013-05-19Added support for IDNA / RFC 3492 (Punycode) handling within URLs.Xavier Roche
2013-05-18Fixed "Bogus charset for requests when filenames have non-ascii characters ↵Xavier Roche
(RFC 3986)" (http://code.google.com/p/httrack/issues/detail?id=12)
2013-05-18Fixed "Bogus charset on disk when filenames have non-ascii characters" ↵Xavier Roche
(http://code.google.com/p/httrack/issues/detail?id=11)
2013-05-17Fixed "onMouseOver="src='image'" is not recognized" and MANY other ↵Xavier Roche
javascript issues (http://code.google.com/p/httrack/issues/detail?id=4)
2013-05-17Fixed "Image in CSS not parsed" ↵Xavier Roche
(http://code.google.com/p/httrack/issues/detail?id=2)
2013-05-15Indenting cleanup for htsparse.cXavier Roche
setup: indent -l80 -lc80 -nhnl -nut -bad -bap -bbo -br -brf -bli2 -brs -bls -br -ss -sai -pmt -nsaw -nsaf -nprs -i2 -ce -npsl -npcs -cs -sob -cdw -nbc -lp logs: indent: htsparse.c:364: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:366: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:368: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:370: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:387: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:738: Warning:old style assignment ambiguity in "=*". Assuming "= *" indent: htsparse.c:907: Warning:old style assignment ambiguity in "=*". Assuming "= *" indent: htsparse.c:925: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:970: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:971: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:1261: Warning:old style assignment ambiguity in "=*". Assuming "= *" indent: htsparse.c:1277: Warning:old style assignment ambiguity in "=*". Assuming "= *" indent: htsparse.c:1410: Warning:old style assignment ambiguity in "=*". Assuming "= *" indent: htsparse.c:1459: Warning:old style assignment ambiguity in "=*". Assuming "= *" indent: htsparse.c:1494: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:1504: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:1541: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:1583: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:1597: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:1625: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:2975: Warning:old style assignment ambiguity in "=-". Assuming "= -"
2013-05-15Removed unused hack HTS_CHECK_STRANGEDIRXavier Roche
2013-05-14Merge sources from windows-1252 to utf-8Xavier Roche
2013-05-13Introducing the hts_log_print() logging functionXavier Roche
* cleaned up logging
2013-05-03Fixed bug inside hts_mirror_wait_for_next_file() that may lead to race ↵Xavier Roche
conditions in downloaded files, leading to download several times the same file, possibly ending with "Unexpected 412/416 error" errors.
2013-04-21Build warning cleanup.Xavier Roche
* introduced SOClen type (aka. socklen_t)
2013-02-28Fixed bogus charset because the meta http-equiv tag is placed too far in the ↵Xavier Roche
html page
2012-05-12Fixed parsing issue with js files due to </script> tags (Vasiliy)Xavier Roche
2012-05-08Generate error pages when needed (Brent Palmer)Xavier Roche
2012-05-07Check for bogus links (Vasiliy)Xavier Roche
2012-05-07Charset fixesXavier Roche
2012-05-06Minor fixes regarding utf-8 filename handlingXavier Roche
2012-05-06UTF-8 filenames handling (based on HTML page charset)Xavier Roche
2012-05-01Added a "K5" feature to handle transparent proxies (Brent Palmer)Xavier Roche
2012-03-24Replaced exit(1) by abort()Xavier Roche
* fixed lintian shlib-calls-exit
2012-03-24GPL v3Xavier Roche
2012-03-19httrack 3.45.2Xavier Roche
2012-03-19httrack 3.45.1Xavier Roche
2012-03-19httrack 3.44.1Xavier Roche
2012-03-19httrack 3.43.12 (again)Xavier Roche
2012-03-19httrack 3.43.5Xavier Roche
2012-03-19httrack 3.43.12Xavier Roche
2012-03-19httrack 3.41.2Xavier Roche
2012-03-19httrack 3.40.4Xavier Roche
2012-03-19httrack 3.33.16Xavier Roche
2012-03-19httrack 3.30.1Xavier Roche
2012-03-19Imported httrack 3.20.2Xavier Roche