utf8proc

Commit Graph

Author	SHA1	Message	Date
Steven G. Johnson	e6fba4aa8c	update header file comments (closes #157 )	6 years ago
Steven G. Johnson	5c632c5742	NEWS for 2.4, updated version numbers (which I forgot in 2.3, grrr)	6 years ago
Steven G. Johnson	abf81603ba	add utf8proc_unicode_version (#151 )	6 years ago
Steven G. Johnson	3637d51855	doc clarification (closing #110 )	6 years ago
Steven G. Johnson	6a659a5843	doc fixes, don't export stdint and limits.h values UINT16_MAX and SSIZE_MAX	6 years ago
Steven G. Johnson	60a2398184	copyright year updates	6 years ago
Steven G. Johnson	d4a58cfec5	update data and algorithms for Unicode 11 (#140 )	6 years ago
Steven G. Johnson	8639450134	NEWS for upcoming 2.2 release, version bump	7 years ago
Steven G. Johnson	bdc8b9e4b2	Case folding fixes (#133 ) * Fixes allowing for “Full” folding and NFKC_CaseFold compliance. * Only include C (Common) and F (Full) foldings from CaseFolding.txt. Removed S (Simple) since F & S are specified to be exclusive. * Extend UTF8PROC_IGNORE to also ignore unassigned codepoints (such as \u2065) which are specified as being discarded by NFKC_CF. * Document the changes to UTF8PROC_IGNORE in header. * Add NFKC_CF helper function with documentation. * restore old IGNORE behavior, add UTF8PROC_STRIPNA, rename to utf8proc_NFKC_Casefold, add a test * success message * test that IGNORE does not strip NA * data update * NFKC_Casefold shouldn't strip NA	7 years ago
past-due	48949bd3eb	Static library support improvements (#123 ) * `#define UTF8PROC_STATIC` to disable DLLEXPORT `#define UTF8PROC_STATIC` to disable DLLEXPORT * [CMake] Automatically define UTF8PROC_STATIC if BUILD_SHARED_LIBS is off * [Makefile] Support additional UTF8PROC_DEFINES, which can be used to specify flags like `-DUTF8PROC_STATIC`	7 years ago
Steven G. Johnson	d688ac1226	version bump to 2.1.1 (#131 )	7 years ago
Christopher Baker	2a2f97e193	Update documentation to reflect Unicode 9.0.0. (#107 ) This makes the inline documentation match the README.	8 years ago
Árpád Goretity 	31a8788886	removed inclusion of non-portable header file (#94 )	8 years ago
Steven G. Johnson	4ac3154acc	whoops	8 years ago
Steven G. Johnson	78f336addd	use ptrdiff_t rather than ssize_t, as ssize_t is non-standard (it is POSIX, not C)	8 years ago
Steven G. Johnson	59334e4499	use stdbool.h and inttypes.h in MSVC 2013 and later, and use more C99-compatible definitions of false and true earlier (fix #90 )	8 years ago
Steven G. Johnson	b4621f43c3	new utf8proc_map_custom for hooking in user-defined custom mappings (#89 ) * new utf8proc_map_custom for hooking in user-defined custom mappings * whoops, add test program * NEWS, version bump for 2.1 * change test functions to static so that gcc doesn't complain about missing prototypes	8 years ago
Steven G. Johnson	f5567f306a	typo in docstrings	8 years ago
Michael Drake	70bbed8626	Tlsa/ucs4 normalize (#88 ) * Split codepoint sequence normalisation out into separate function. This creates utf8proc_normalize_utf32() which takes and returns a UTF-32 string, applying the following options: - UTF8PROC_NLF2LS - UTF8PROC_NLF2PS - UTF8PROC_NLF2LF - UTF8PROC_STRIPCC - UTF8PROC_COMPOSE - UTF8PROC_STABLE The utf8proc_reencode() function has been updated to call the new utf8proc_normalize_utf32(). * Update code documentation: utf8proc_reencode handles UTF8PROC_CHARBOUND.	8 years ago
Jakub Vít	caef918abd	Change definition of UINT16_MAX macro (#84 ) Change UINT16_MAX from `~(utf8proc_uint16_t)0` to fixed value `65535U` to prevent weird behaviour in complex expressions.	8 years ago
Tony Kelman	8e3174f334	NEWS and version numbers for 2.0.2 (#81 ) * Add NEWS.md items for #79 and #80 * Prepare version numbers for 2.0.2 * Also update API version to 2.0.2	8 years ago
Steven G. Johnson	f0bf106569	NEWS and version bump for 2.0.1 release, to come out shortly	8 years ago
Keno Fischer	c0a1ff81fc	Walk back ABI breaking changes (#76 )	8 years ago
Benito van der Zander	eeebf70bcf	Smaller tables (#68 ) * convert sequences to utf-16 (saves 25kb) * store sequence length in properties instead using -1 termination (saves 10kb) * cache index for slightly faster data creation * store lower/upper/title mapping in sequence array (saves 25kb). Add utf8proc_totitle, as title_mapping cannot be used to get the title codepoint anymore. Rename xxx_mapping to xxx_seqindex, so programs assuming a value with the old meaning fail at compile time * change combination array data type to uint16 (saves 40kb) * merge 1st and 2nd comb index (saves 50kb) * kill empty prefix/suffix in combination array (saves 50kb) * there was no need to have a separate combination start array, it can be merged in a single array * some fixes * mark the table as const again * and regen	8 years ago
Keno Fischer	41c6b23aab	Unicode 9 updates (#70 ) * Updates for Unicode 9.0.0 TR29 Changes - New rules GB10/(12/13) are used to combine emoji-zwj sequences/ (force grapheme breaks every two RI codepoints). Unfortunately this breaks statelessness of grapheme-boundary determination. Deal with this by ignoring the problem in utf8proc_grapheme_break, and by hacking in a special case in decompose - ZWJ moved to its own boundclass, update what is now GB9 accordingly. - Add comments to indicate which rule a given case implements - The Number of bound classes Now exceeds 4 bits, expand to 8 and reorganize fields * Import Unicode 9 data * Update Grapheme break API to expose state override * Bump MAJOR version	8 years ago
Michaël Meyer	26436c9775	Reduce the size of the binary. Use integers instead of pointers in Unicode tables. Saves 226 kb / 716 kb in the compiled library.	9 years ago
Steven G. Johnson	6d4d7a9acf	update Unicode version in header-file comment	9 years ago
Steven G. Johnson	fd20b184dd	update copyright statements to list recent contributors and year	9 years ago
Steven G. Johnson	d75985cf09	bump API/ABI version to 1.3, add NEWS	10 years ago
Steven G. Johnson	a8fb4b1772	add toupper/tolower functions (for JuliaLang/julia#11471 )	10 years ago
Scott Paul Jones	6249e6b8b1	Fix #34 handle 66 Unicode non-characters, also improve performance and surrogate handling	10 years ago
Tony Kelman	0a818c7003	Prefix other C99 typedefs with utf8proc_	10 years ago
Tony Kelman	ad27722923	Use a new typedef utf8proc_ssize_t to avoid define collisions with MSVC	10 years ago
Steven G. Johnson	a1c429a45b	rename DLLEXPORT to UTF8PROC_DLLEXPORT to prevent conflicts with other header files that define DLLEXPORT	10 years ago
Steven G. Johnson	41287a1116	more documentation English and formatting cleanups	10 years ago
Steven G. Johnson	2f8469c3cc	some documentation improvements	10 years ago
Steven G. Johnson	11d2ece545	indentation consistency	10 years ago
Steven G. Johnson	c851c67888	put the API version as #defines in the header file (as discussed in #30 )	10 years ago
Steven G. Johnson	32c605cfa7	mainpage dox tweaks	10 years ago
Jonas Fonseca	03a4e8854a	Fix #26 : use doxygen for generating API docs	10 years ago
Steven G. Johnson	dad0cbdcab	update NEWS for 1.2-dev	10 years ago
Steven G. Johnson	3822984606	remove requirement that get_property and decompose_char argument be in range 0x0 to 0x10ffff	10 years ago
Steven G. Johnson	a4c84d2063	fix #2 : add charwidth function	10 years ago
Tony Kelman	a8b688c734	Minimal cmake build script move flags for MSVC rename lump.txt to lump.md, add data/*.txt to .gitignore	10 years ago
Steven G. Johnson	402883c78e	rename back to utf8proc now that we are taking over maintenance	10 years ago
Steven G. Johnson	0f9e9796e6	fix #15 : redefine UTF8PROC_CATEGORY_CN to 0 consistent with what we actually return	10 years ago
Steven G. Johnson	397a1eabea	update graphemes for Unicode 7, add utf8proc_grapheme_break function	10 years ago
Tony Kelman	d61d551d5a	s/LIBRARY_EXPORTS/MOJIBAKE_EXPORTS/	10 years ago
Tony Kelman	6b67865984	add DLLEXPORT definition for __GNUC__ >= 4	10 years ago
Tony Kelman	a840e5dae1	add DLLEXPORT to all functions in mojibake.h	10 years ago

49 Commits (b48f5d074ffc8e84f2ee420a2ef8832d3ab32ebf)