utf8proc

Commit Graph

Author	SHA1	Message	Date
bfredl	3de4596fbe	properties: add "ambiguous_width" property for ambiguous East Asian Width (#270 ) Some characters have their width defined as "Ambiguous" in UAX#11. These are typically rendered as single-width by modern monospace fonts, and utf8proc correctly returns charwidth==1 for these. However some applications might need to support older CJK fonts where characters which where two-byte in legacy encodings were rendered as double-width. An example of this is the 'ambiwidth' option of vim and neovim which supports rendering in terminals using such wideness rules. Add an 'ambiguous_width' property to utf8proc_property_t for such characters.	2 months ago
dundargoc	5568eff49a	docs: add examples for common usecases (#267 )	4 months ago
Steven G. Johnson	9f2d8d3e85	Update README.md	4 months ago
dundargoc	eba7dfb44f	ci(macos): install julia dependency (#268 ) Otherwise the job fails with the error message "/bin/sh: julia: command not found"	4 months ago
dundargoc	dce38103be	build: include clangd files to .gitignore (#263 )	6 months ago
dundargoc	894e8107ac	build: remove unnecessary policy check (#262 ) Minimum version is 3.5 and policy CMP0048 was introduced in 3.0, meaning that it will always be set to `NEW`.	7 months ago
Claire Foster	1fe43f5a6d	Remove ruby compat hacks (#259 ) * Fix two minor bugs from the Ruby code First, `categroy` rather than `code` was used in constructing the `control_boundary` property as related to the characters U+200C and U+200D. This seemed incorrect and should be fixed. This could be an observable bugfix for any C code which inspects the `control_boundary` property. Second, when reading composition exclusions, Ruby's String hex method produces zero rather than nil if no number is found. For example $ ruby -e 'puts "# blah".hex' 0 This led to the character `'\0'` being included in the `exclusions` and `excl_versions` sets which is incorrect. However this seems asymptomatic because `'\0'` is never part of a composition. (In terms of the C code, the use of `comp_exclusion` is guarded by the `comb_index` property which is `UINT16_MAX` for `'\0'`.) * Cleanup: Remove sequence ordering hack This hack changed the ordering of sequences encoded in the sequences table and was added so we could easily prove equivalence to the Ruby data generator code. However, it's no longer needed and removing it shouldn't result in any functional change.	10 months ago
Michael Williamson	a78bee90c2	Use stdint.h instead of inttypes.h (#223 ) This improves support for targeting wasm32 with clang 12.	10 months ago
Claire Foster	0a8526c8d6	Port ruby data_generator.rb to Julia (#258 ) * Port ruby data_generator.rb to Julia This reduces the number of dependencies needed when regenerating the C code. The new code also separates C code generation from unicode data analysis somewhat more cleanly which should be better factored for generating a Julia version of the data files in the future. The output is identical to the original Ruby script, for now. Some bugs which were found in the process are noted as FIXMEs in the Julia source and can be fixed next. * Replace some explicit loops with a utility function * fixup! Port ruby data_generator.rb to Julia * Update Makefile * Update data/Makefile * Update data/Makefile * Update data/Makefile * Update data/Makefile * Update data/data_generator.jl --------- Co-authored-by: Steven G. Johnson <stevenj@mit.edu>	10 months ago
dundargoc	a9c6332ad1	upgrade minimum cmake version (#255 ) This will silence the following warning: CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required): Compatibility with CMake < 3.5 will be removed from a future version of CMake. Update the VERSION argument <min> value or use a ...<max> suffix to tell CMake that the project does not need compatibility with older versions.	12 months ago
Steven G. Johnson	2bbb1ba932	updates for doxygen 1.9	1 year ago
Steven G. Johnson	34db3f7954	untar into new directory	1 year ago
Steven G. Johnson	e9d255772a	make distcheck should keep tarball, rm directory	1 year ago
Steven G. Johnson	8bf9d71a27	add make distcheck	1 year ago
Steven G. Johnson	bb84f2ee82	make dist target	1 year ago
Steven G. Johnson	05886a9fd1	version 2.9 bump (#254 )	1 year ago
Steven G. Johnson	46a442b121	Unicode 15.1 support (#253 ) * Unicode 15.1 support * always update state * fix GB9c logic * print indic_conjunct_break in printproperty * fix grapheme test * update utf8proc_decompose_char docs * more GB9c tests	1 year ago
Steven G. Johnson	1cb28a66ca	v2.8.0 bump (#248 ) * version 2.8.0 bump * NEWS link	2 years ago
Steven G. Johnson	3c4929495a	unicode 15 support (#247 )	2 years ago
Harmen Stoppels	1f1e42d3b8	Add c flag when invoking ar (#241 ) `llvm-ar` warns when the archive does not exist and `c` is not passed.	2 years ago
Randy	63f31c908e	Improve fuzzer code coverage (#239 ) * fuzz: test grapheme break functions * fuzz: cover character lumping	2 years ago
Randy	39dbf507d7	fuzz: limit input length (#238 ) Longer inputs can lead to timeouts on oss-fuzz	3 years ago
Steven G. Johnson	f0b370716b	don't use make in cmake instructions (closes #236 )	3 years ago
Steven G. Johnson	2484e2ed5e	update Doxygen config with doxygen -u	3 years ago
Steven G. Johnson	8ca6144c85	copyright year update	3 years ago
Steven G. Johnson	e3b9a890cb	prepare for 2.7.0 release	3 years ago
Steven G. Johnson	b093cf9dd4	update for unicode 14 (#233 )	3 years ago
Steven G. Johnson	0e59e0b035	rm travis	3 years ago
Steven G. Johnson	7c14edafcb	update gitignore	3 years ago
woclass	bab7aecdde	[ci] set github CI (#229 ) * [ci] set github CI: ubuntu, windows, macOS * [ci] add make.yml * [ci] Skip macOS check MANIFEST temporary	3 years ago
Markus F.X.J. Oberhumer	27e8a16049	cmake: fix installation directories and also install pkgconfig file (#224 )	3 years ago
Benito van der Zander	8a4cd4c903	reduce lenencode bits (#232 )	3 years ago
extrowerk	462093b392	GNUInstallDirs support (#159 )	4 years ago
Randy	93a88b4310	OSS-Fuzz integration updates (#219 ) * fix build * CIFuzz integration * update fuzzer * undo changes to build * ossfuzz.sh: fix copy path	4 years ago
Randy	c17ea5dfef	OSS-Fuzz initial integration (#216 ) * add fuzz target * update fuzzer * add fuzzer to build with basic entry point * add build script * cleanup * build fuzz target using cmake in oss-fuzz env * ossfuzz.sh add newline * update build	4 years ago
Mike Glorioso	610730f231	Fix Sign-Conversion warnings in library and test code (#214 ) * JuliaStrings#169 turn on sign-conversion warnings Signed-off-by: Mike Glorioso <mike.glorioso@gmail.com> * JuliaStrings#169 fix sign-conversion warnings for utf8proc.c fix sign-converstion warnings for utf8proc_iterate uc requires at most 21 bits to identify a unicode codepoint, so there is no need for it to be unsigned multiple locations use, modify, or store uc with a signed value the only exception is line 137 where uc is compared with an unsigned value fix sign-converstion warnings for utf8proc_tolower, utf8proc_toupper, utf8proc_totitle all three methods have sign conversion warnings when calling seqindex_decode_index seqindex_decode_index uses the passed value as an index to an array utf8proc_sequences as utf8proc_sequences is hard-coded and smaller than 2^31 - 1 we can safely cast to unsigned fix sign-converstion warnings for utf8proc_decompose_char lines with this warning use the defined function utf8proc_decompose_lump in the function, a hardcoded unsigned value (1<<12) is complemented then cast as a signed value as the intent is to remove the 12th bit flag from options, a signed value, and explicit cast is safe fix sign-conversion warnings for utf8proc_map_custom result is declared as signed, but is only expected to contain values between 0 and 4 sizeof returns an unsigned value. result must be cast to unsigned Signed-off-by: Mike Glorioso <mike.glorioso@gmail.com> * JuliaStrings#169 fix sign-conversion warnings for test/* fix sign-conversion warnings for test/tests.c encode change type for d to match return value of utf8proc_encode_char fix sign-conversion warnings for test/graphemetest.c checkline si, i, and j are unsigned size types, utf8proc_map and utf8proc_iterate accept and return signed size types utf8proc_map treats negative strlen values as 0. the strlen used by the test must be similarly limited utf8proc_iterate treats negative strlen values as 4 which will be less than the unsigned size fix unused-but-set-variable warning by checking the glen value fix sign-conversion warnings for test/case.c main the if block ensures that tested codepoint fits in wint_t, but needs to include u and l as well c, u, and l can be safely cast to wint_t fix sign-conversion warnings for test/iterate.c all values used for len are below 8, so an explicit cast is safe updated types for more portable test code fix sign-conversion warnings for test/printproperty.c main change type of c to signed to resolve all sign-converstion warnings. replace sscanf(... &c) wiht sscanf(... &x) followed by explicit sign converstion Signed-off-by: Mike Glorioso <mike.glorioso@gmail.com>	4 years ago
Steven G. Johnson	0520d6f724	download test data to build directory (fixes #212 )	4 years ago
Steven G. Johnson	f1f51b8242	ensure ruby is in UTF-8 mode (#209 ) * ensure ruby is in UTF-8 mode * Revert "ensure ruby is in UTF-8 mode" This reverts commit 587b7b6b7215f91b1ae52aefc82d359f2f378a61. * ensure Ruby reads files in UTF-8 encoding	4 years ago
Steven G. Johnson	3203baa737	fix manifest	4 years ago
Steven G. Johnson	28416640ed	2.6.1 version bump	4 years ago
Steven G. Johnson	8239639e3f	fix NULL args in grapheme_break_stateful	4 years ago
Steven G. Johnson	df2997a300	update doxygen config with doxygen -u	4 years ago
Steven G. Johnson	cea3cd158f	bump to version 2.6	4 years ago
Steven G. Johnson	0643a64479	Fix grapheme breaks on string-initial (#205 ) * Fix extended emoji + zwj combo * Patch initial repeated regional flags and extended+zwj emoj * Merge conditions for setting breaks bt region * updated fix * perform tests for both utf8proc_map and manual calls to utf8proc_grapheme_break_stateful * consolidate tests Co-authored-by: Thomas Marks <marksta@umich.edu>	4 years ago
Tim Gates	6f7d73071a	docs: fix simple typo, encounted -> encountered (#201 ) There is a small typo in utf8proc.h. Should read `encountered` rather than `encounted`.	4 years ago
Steven G. Johnson	5622a0a51b	add islower/isupper functions (#196 ) * add islower/isupper functions * added test * more tests + bugfix * Makefile fix * rm iscase test on make clean	4 years ago
xkszltl	08f9999a06	Switch to HTTPS for referencing `www.unicode.org`. (#193 ) Resolve https://github.com/JuliaStrings/utf8proc/issues/192	5 years ago
Stefan Floeren	b5211c88af	Unify include file handling (#190 ) The cmake file expects the parent folder to be named "utf8proc", otherwise the target_include_directories won't work, as it references an unknown path. This deviates from the install targets (both cmake and makefile) in putting the include file into a subfolder in contrast to the top level folder. This also prevents using the library with the recent cmake addition of FetchContent. This change unifies the include file handling by using the local path for cmake as well. This might break existing uses. As a workaround, we could add a dummy include file in the old location (new utf8proc subfolder). I'm not sure if that is necessary. Co-authored-by: Stefan Floeren <stefan-floeren@users.noreply.github.com>	5 years ago
Andreas-Schniertshauer	e51f416e0c	Fix memory leaks in tests case.c and misc.c (#189 ) * Add: tests to CMakeLists.txt * Disable compilation of charwidth, graphemetest and normtest because of missing getline * Refactoring: UTF8PROC_ENABLE_TESTING default Off, move tests that don't compile on windows to NOT MSVC section, add testing to appveyor.yml * Add: testing to travis * Changed: flag to WIN32 because MinGW has the same problem as MSVC * Commented out graphemetest and normtest because they fail. * Re-added: graphemetest and normtest added missing data to the path of the text files. * Fix: last commit was party wrong normtest failed. * * Commented out graphemetest and normtest because they fail, because in CMakeLists is missing building of data. * Add: mingw_static, mingw_shared, msvc_shared, msvc_static to ignore list * Add: prefix utf8proc. to tests * Fix: memory leaks in tests case.c and misc.c forgot to call free after calling utf8proc_NFKC_Casefold Co-authored-by: Andreas-Schniertshauer <Andreas-Schniertshauer@users.noreply.github.com>	5 years ago
Steven G. Johnson	ffba678bf4	Revert "disable tests under mingw" (#187 ) This reverts commit `7e834d7702`.	5 years ago

1 2 3 4 5 ...

308 Commits (master) All Branches Search

308 Commits (master)

All Branches