summaryrefslogtreecommitdiff
path: root/src/htsparse.c
AgeCommit message (Collapse)Author
2014-05-04Big cleanup in string primitives and abort functionsXavier Roche
2014-05-02Fixed issue #42 (long query strings with accents)Xavier Roche
2014-05-02Big cleanup in functions writing to a char buffer without proper size boundary.Xavier Roche
2014-04-242013 is so last year.Xavier Roche
2013-09-14Removed dead code (_WIN32_WCE)Xavier Roche
2013-09-13Removed MMS (Microsoft Media Server) ripping code (mmsrip)Xavier Roche
* protocol was finally dropped in Windows Media Services 2008 * mmsrip is not supported anymore * some licensing issues regarding the protocol (ha-ha)
2013-09-13Fixed FSF GPL reference (postal address removed, added website)Xavier Roche
Fixed year notice
2013-08-17Fixed issue 25 regarding un-encoding of characters such as # in the filename.Xavier Roche
2013-08-16Make "Unexpected 412/416 error" non-fatal until I find out why in hell this ↵Xavier Roche
bug is still out there.
2013-08-12Big cleanup in core heap hashtable code, rewritten using new fancy hashtables.Xavier Roche
2013-07-18Fixed buggy keep-alive handling, leading to waste connectionsXavier Roche
2013-07-15Fixed buggy referer while parsing: the referer of all links in the page is ↵Xavier Roche
the current page being parsed, NOT the parent page. (alexei dot co at gmail dot com) * closes: issue #20
2013-07-09Attempt to replace tmpnam()Xavier Roche
2013-07-07Fixed warningXavier Roche
2013-06-18transfered => transferredXavier Roche
2013-06-14Escaping fixes when the final disk filename contains a %Xavier Roche
2013-06-14Malformed escaping.Xavier Roche
2013-06-01Fixed issue 14 (http://code.google.com/p/httrack/issues/detail?id=14) ↵Xavier Roche
related to the way non-ascii characters are being decoded Rationale: * inside URI * non-ascii characters are read with the page encoding, and transformed into UTF-8 * url-escaped %xx are considered utf-8 sequences to be decoded, unless they form invalis sequences (in such case we left them as-is) * html entities (names, or decimal/hex) are decoded as utf-8 characters * inside query string * non-ascii characters are read as binary, and escaped using %xx * url-escaped %xx are left unless not harmful (alphanum, for example) * html entities (names, or decimal/hex) are decoded as utf-8 characters and encoded back to the page encoding (possibly using %xx) * inside hostnames * non-ascii characters are encoded using IDNA Example: * are equivalent in a iso-8859-1 page: http://foo/café.html http://foo/caf%c3%a9.html http://caf&#a9;.html
2013-05-31Fixed charset for top index titles.Xavier Roche
2013-05-31Fixed hts_unescapeEntities()Xavier Roche
2013-05-31Removed buggy code.Xavier Roche
2013-05-31Fixed issue 14 (http://code.google.com/p/httrack/issues/detail?id=14)Xavier Roche
Rationale: * hostname is ASCII, non-ascii characters shall be encoded with IDNA * URI filenames may embed non-ascii characters, which MUST be UTF-8 encoded * query string may embed non-ascii characters, which are encoded with the pahe charset into %xx codes
2013-05-30Added hts_unescape_entities(), a rewrite of the HTML entities decoder.Xavier Roche
Fixed HTML entities decoding which was done before charset decoding.
2013-05-19openssl is no longer dynamically probed at stratup, but dynamically linkedXavier Roche
2013-05-19Added support for IDNA / RFC 3492 (Punycode) handling within URLs.Xavier Roche
2013-05-18Fixed "Bogus charset for requests when filenames have non-ascii characters ↵Xavier Roche
(RFC 3986)" (http://code.google.com/p/httrack/issues/detail?id=12)
2013-05-18Fixed "Bogus charset on disk when filenames have non-ascii characters" ↵Xavier Roche
(http://code.google.com/p/httrack/issues/detail?id=11)
2013-05-17Fixed "onMouseOver="src='image'" is not recognized" and MANY other ↵Xavier Roche
javascript issues (http://code.google.com/p/httrack/issues/detail?id=4)
2013-05-17Fixed "Image in CSS not parsed" ↵Xavier Roche
(http://code.google.com/p/httrack/issues/detail?id=2)
2013-05-15Indenting cleanup for htsparse.cXavier Roche
setup: indent -l80 -lc80 -nhnl -nut -bad -bap -bbo -br -brf -bli2 -brs -bls -br -ss -sai -pmt -nsaw -nsaf -nprs -i2 -ce -npsl -npcs -cs -sob -cdw -nbc -lp logs: indent: htsparse.c:364: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:366: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:368: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:370: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:387: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:738: Warning:old style assignment ambiguity in "=*". Assuming "= *" indent: htsparse.c:907: Warning:old style assignment ambiguity in "=*". Assuming "= *" indent: htsparse.c:925: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:970: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:971: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:1261: Warning:old style assignment ambiguity in "=*". Assuming "= *" indent: htsparse.c:1277: Warning:old style assignment ambiguity in "=*". Assuming "= *" indent: htsparse.c:1410: Warning:old style assignment ambiguity in "=*". Assuming "= *" indent: htsparse.c:1459: Warning:old style assignment ambiguity in "=*". Assuming "= *" indent: htsparse.c:1494: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:1504: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:1541: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:1583: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:1597: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:1625: Warning:old style assignment ambiguity in "=-". Assuming "= -" indent: htsparse.c:2975: Warning:old style assignment ambiguity in "=-". Assuming "= -"
2013-05-15Removed unused hack HTS_CHECK_STRANGEDIRXavier Roche
2013-05-14Merge sources from windows-1252 to utf-8Xavier Roche
2013-05-13Introducing the hts_log_print() logging functionXavier Roche
* cleaned up logging
2013-05-03Fixed bug inside hts_mirror_wait_for_next_file() that may lead to race ↵Xavier Roche
conditions in downloaded files, leading to download several times the same file, possibly ending with "Unexpected 412/416 error" errors.
2013-04-21Build warning cleanup.Xavier Roche
* introduced SOClen type (aka. socklen_t)
2013-02-28Fixed bogus charset because the meta http-equiv tag is placed too far in the ↵Xavier Roche
html page
2012-05-12Fixed parsing issue with js files due to </script> tags (Vasiliy)Xavier Roche
2012-05-08Generate error pages when needed (Brent Palmer)Xavier Roche
2012-05-07Check for bogus links (Vasiliy)Xavier Roche
2012-05-07Charset fixesXavier Roche
2012-05-06Minor fixes regarding utf-8 filename handlingXavier Roche
2012-05-06UTF-8 filenames handling (based on HTML page charset)Xavier Roche
2012-05-01Added a "K5" feature to handle transparent proxies (Brent Palmer)Xavier Roche
2012-03-24Replaced exit(1) by abort()Xavier Roche
* fixed lintian shlib-calls-exit
2012-03-24GPL v3Xavier Roche
2012-03-19httrack 3.45.2Xavier Roche
2012-03-19httrack 3.45.1Xavier Roche
2012-03-19httrack 3.44.1Xavier Roche
2012-03-19httrack 3.43.12 (again)Xavier Roche
2012-03-19httrack 3.43.5Xavier Roche