Steven G. Johnson 02f4e1890c charwidth=1 for soft hyphen and unassigned codepoints (#135 ) * use width=1 for soft hyphen and for unassigned/PUA codepoints * don't count unassigned codepoints when comparing with system wcwidth * more tests * indentation fixes * NEWS for 135 * remove special-casing for arabic control characters affecting a span of numbers, which are sometimes zero-width and sometimes not * regenerate		6 years ago
bench	use a different variable name for nested loop in bench.c (#80)	8 years ago
data	charwidth=1 for soft hyphen and unassigned codepoints (#135)	6 years ago
test	charwidth=1 for soft hyphen and unassigned codepoints (#135)	6 years ago
.gitignore	added test for #128	7 years ago
.travis.yml	Move -Wmissing-prototypes from Makefile to .travis.yml (#79)	8 years ago
CMakeLists.txt	Enhance CMakeLists.txt (#138)	7 years ago
Doxyfile	Fix #26: use doxygen for generating API docs	10 years ago
LICENSE.md	updated NEWS etc. for 1.2 release	10 years ago
MANIFEST	NEWS for upcoming 2.2 release, version bump	7 years ago
Makefile	NEWS for upcoming 2.2 release, version bump	7 years ago
NEWS.md	charwidth=1 for soft hyphen and unassigned codepoints (#135)	6 years ago
README.md	note Unicode 10 support	7 years ago
appveyor.yml	Fix MinGW build test	9 years ago
lump.md	Minimal cmake build script	10 years ago
utf8proc.c	Case folding fixes (#133)	7 years ago
utf8proc.h	NEWS for upcoming 2.2 release, version bump	7 years ago
utf8proc_data.c	charwidth=1 for soft hyphen and unassigned codepoints (#135)	6 years ago
utils.cmake	Minimal cmake build script	10 years ago

README.md

utf8proc

utf8proc is a small, clean C library that provides Unicode normalization, case-folding, and other operations for data in the UTF-8 encoding. It was initially developed by Jan Behrens and the rest of the Public Software Group, who deserve nearly all of the credit for this package. With the blessing of the Public Software Group, the Julia developers have taken over development of utf8proc, since the original developers have moved to other projects.

(utf8proc is used for basic Unicode support in the Julia language, and the Julia developers became involved because they wanted to add Unicode 7 support and other features.)

(The original utf8proc package also includes Ruby and PostgreSQL plug-ins. We removed those from utf8proc in order to focus exclusively on the C library for the time being, but plan to add them back in or release them as separate packages.)

The utf8proc package is licensed under the free/open-source MIT "expat" license (plus certain Unicode data governed by the similarly permissive Unicode data license); please see the included LICENSE.md file for more detailed information.

Quick Start

For compilation of the C library run make.

General Information

The C library is found in this directory after successful compilation and is named libutf8proc.a (for the static library) and libutf8proc.so (for the dynamic library).

The Unicode version supported is 10.0.0.

For Unicode normalizations, the following options are used:

Normalization Form C: STABLE, COMPOSE
Normalization Form D: STABLE, DECOMPOSE
Normalization Form KC: STABLE, COMPOSE, COMPAT
Normalization Form KD: STABLE, DECOMPOSE, COMPAT

C Library

The documentation for the C library is found in the utf8proc.h header file. utf8proc_map is function you will most likely be using for mapping UTF-8 strings, unless you want to allocate memory yourself.

To Do

See the Github issues list.

Contact

Bug reports, feature requests, and other queries can be filed at the utf8proc issues page on Github.