summaryrefslogtreecommitdiff
path: root/src/htsparse.c
AgeCommit message (Collapse)Author
2023-01-14Fixed sprintfXavier Roche
2017-04-01Updated year to 2017Xavier Roche
2016-09-26Fixed weird condition which anyway does what we wanted (spotted by dcb314)Xavier Roche
2016-04-26Updated year 2016Xavier Roche
2015-03-142014 is so last year!Xavier Roche
2014-06-10Added the following compiler flags:Xavier Roche
* -Wcast-qual * -Wmissing-parameter-type * -Wold-style-definition
2014-06-06Cleanup.Xavier Roche
2014-06-06Splitted typed arrays in htsarrays.hXavier Roche
Cleaned-up page generation
2014-06-03Potential fix for htshash.c:330 assertion failure: "error invalidating hash ↵Xavier Roche
entry"
2014-05-29Better "too many links" reporting.Xavier Roche
2014-05-29Fixed regression over ./Xavier Roche
2014-05-29tr -d '\r'Xavier Roche
2014-05-29Big cleanup: introducing cleaner lien_adrfilsave and lien_adrfil structures ↵Xavier Roche
holding address/uri or address/uri/filename rather than passing opaque char* of unknown size.
2014-05-29Removed duplicate opt->lien_tot and opt->liens members in some functions.Xavier Roche
2014-05-29Allocation cleanup (why "+2", why ?)Xavier Roche
2014-05-28Replaced sprintf() by hts_template_format_str()Xavier Roche
2014-05-28Rewrite template formatting to be format-injection proof.Xavier Roche
2014-05-26Big links heap handling cleanup, and removed very old and legacy macrosXavier Roche
2014-05-23"const correctness" cleanupXavier Roche
added the following default flags: -Wformat -Wformat-security -Wmultichar -Wwrite-strings fixed several other warnings
2014-05-14Fixed hashtable corruption because of dirty code directly modifying the host ↵Xavier Roche
address in memory, leading to have hashtable positions not anymore valid. This issue was especially triggered when a redirect was processed ("Warning moved treated for .." messages) * closes: #43
2014-05-04Replaced ugly cat by snprintfXavier Roche
2014-05-04Big cleanup in string primitives and abort functionsXavier Roche
2014-05-02Fixed issue #42 (long query strings with accents)Xavier Roche
2014-05-02Big cleanup in functions writing to a char buffer without proper size boundary.Xavier Roche
2014-04-242013 is so last year.Xavier Roche
2013-09-14Removed dead code (_WIN32_WCE)Xavier Roche
2013-09-13Removed MMS (Microsoft Media Server) ripping code (mmsrip)Xavier Roche
* protocol was finally dropped in Windows Media Services 2008 * mmsrip is not supported anymore * some licensing issues regarding the protocol (ha-ha)
2013-09-13Fixed FSF GPL reference (postal address removed, added website)Xavier Roche
Fixed year notice
2013-08-17Fixed issue 25 regarding un-encoding of characters such as # in the filename.Xavier Roche
2013-08-16Make "Unexpected 412/416 error" non-fatal until I find out why in hell this ↵Xavier Roche
bug is still out there.
2013-08-12Big cleanup in core heap hashtable code, rewritten using new fancy hashtables.Xavier Roche
2013-07-18Fixed buggy keep-alive handling, leading to waste connectionsXavier Roche
2013-07-15Fixed buggy referer while parsing: the referer of all links in the page is ↵Xavier Roche
the current page being parsed, NOT the parent page. (alexei dot co at gmail dot com) * closes: issue #20
2013-07-09Attempt to replace tmpnam()Xavier Roche
2013-07-07Fixed warningXavier Roche
2013-06-18transfered => transferredXavier Roche
2013-06-14Escaping fixes when the final disk filename contains a %Xavier Roche
2013-06-14Malformed escaping.Xavier Roche
2013-06-01Fixed issue 14 (http://code.google.com/p/httrack/issues/detail?id=14) ↵Xavier Roche
related to the way non-ascii characters are being decoded Rationale: * inside URI * non-ascii characters are read with the page encoding, and transformed into UTF-8 * url-escaped %xx are considered utf-8 sequences to be decoded, unless they form invalis sequences (in such case we left them as-is) * html entities (names, or decimal/hex) are decoded as utf-8 characters * inside query string * non-ascii characters are read as binary, and escaped using %xx * url-escaped %xx are left unless not harmful (alphanum, for example) * html entities (names, or decimal/hex) are decoded as utf-8 characters and encoded back to the page encoding (possibly using %xx) * inside hostnames * non-ascii characters are encoded using IDNA Example: * are equivalent in a iso-8859-1 page: http://foo/café.html http://foo/caf%c3%a9.html http://caf&#a9;.html
2013-05-31Fixed charset for top index titles.Xavier Roche
2013-05-31Fixed hts_unescapeEntities()Xavier Roche
2013-05-31Removed buggy code.Xavier Roche
2013-05-31Fixed issue 14 (http://code.google.com/p/httrack/issues/detail?id=14)Xavier Roche
Rationale: * hostname is ASCII, non-ascii characters shall be encoded with IDNA * URI filenames may embed non-ascii characters, which MUST be UTF-8 encoded * query string may embed non-ascii characters, which are encoded with the pahe charset into %xx codes
2013-05-30Added hts_unescape_entities(), a rewrite of the HTML entities decoder.Xavier Roche
Fixed HTML entities decoding which was done before charset decoding.
2013-05-19openssl is no longer dynamically probed at stratup, but dynamically linkedXavier Roche
2013-05-19Added support for IDNA / RFC 3492 (Punycode) handling within URLs.Xavier Roche
2013-05-18Fixed "Bogus charset for requests when filenames have non-ascii characters ↵Xavier Roche
(RFC 3986)" (http://code.google.com/p/httrack/issues/detail?id=12)
2013-05-18Fixed "Bogus charset on disk when filenames have non-ascii characters" ↵Xavier Roche
(http://code.google.com/p/httrack/issues/detail?id=11)
2013-05-17Fixed "onMouseOver="src='image'" is not recognized" and MANY other ↵Xavier Roche
javascript issues (http://code.google.com/p/httrack/issues/detail?id=4)
2013-05-17Fixed "Image in CSS not parsed" ↵Xavier Roche
(http://code.google.com/p/httrack/issues/detail?id=2)