utf8proc/README.md

# libmojibake

[libmojibake](https://github.com/JuliaLang/libmojibake) is
a lightly updated fork of the [utf8proc
library](http://www.public-software-group.org/utf8proc) from Jan
Behrens and the rest of the [Public Software
Group](http://www.public-software-group.org/), who deserve *nearly all
of the credit* for this package: a small, clean C library that
provides Unicode normalization, case-folding, and other operations for
data in the [UTF-8 encoding](http://en.wikipedia.org/wiki/UTF-8).

The reason for this fork is that `utf8proc` is used for basic Unicode
support in the [Julia language](http://julialang.org/) and the Julia
developers wanted Unicode 7 support and other features, but the Public
Software Group is currently occupied with other projects.  We hope
that our fork can be merged back into the mainline `utf8proc` package
before too long.

(The original `utf8proc` package also includes Ruby and PostgreSQL plug-ins.
We removed those from `libmojibake` in order to focus exclusively on the C
library for the time being.  We will strive to keep API changes to a minimum,
so `libmojibake` should still be usable with the old plug-in code.)

Like `utf8proc`, the `libmojibake` package is licensed under the
free/open-source [MIT "expat"
license](http://opensource.org/licenses/MIT) (plus certain Unicode
data governed by the similarly permissive [Unicode data
license](http://www.unicode.org/copyright.html#Exhibit1)); please see
the included `LICENSE.md` file for more detailed information.

## Quick Start ##

For compilation of the C library run `make`.

## General Information ##

The C library is found in this directory after successful compilation
and is named `libmojibake.a` (for the static library) and
`libmojibake.so` (for the dynamic library).

The Unicode version being supported is 5.0.0.
*Note:* Version 4.1.0 of Unicode Standard Annex #29 was used, as
version 5.0.0 had not been available at the time of implementation.

For Unicode normalizations, the following options are used:

* Normalization Form C:  `STABLE`, `COMPOSE`
* Normalization Form D:  `STABLE`, `DECOMPOSE`
* Normalization Form KC: `STABLE`, `COMPOSE`, `COMPAT`
* Normalization Form KD: `STABLE`, `DECOMPOSE`, `COMPAT`

## C Library ##

The documentation for the C library is found in the `utf8proc.h` header file.
`utf8proc_map` is function you will most likely be using for mapping UTF-8
strings, unless you want to allocate memory yourself.

## To Do ##

* detect stable code points and process segments independently in order to save memory
* do a quick check before normalizing strings to optimize speed
* support stream processing

## Contact ##

Bug reports, feature requests, and other queries can be filed at
the [libmojibake page on Github](https://github.com/JuliaLang/libmojibake/issues).
Rename libutf8proc -> libmojibake. Closes #5 10 years ago			`# libmojibake`
markdown and other cosmetic updates 10 years ago
Rename libutf8proc -> libmojibake. Closes #5 10 years ago			`[libmojibake](https://github.com/JuliaLang/libmojibake) is`
markdown and other cosmetic updates 10 years ago			`a lightly updated fork of the [utf8proc`
			`library](http://www.public-software-group.org/utf8proc) from Jan`
			`Behrens and the rest of the [Public Software`
			`Group](http://www.public-software-group.org/), who deserve *nearly all`
			`of the credit* for this package: a small, clean C library that`
			`provides Unicode normalization, case-folding, and other operations for`
			`data in the [UTF-8 encoding](http://en.wikipedia.org/wiki/UTF-8).`

Rename libutf8proc -> libmojibake. Closes #5 10 years ago			The reason for this fork is that `utf8proc` is used for basic Unicode
markdown and other cosmetic updates 10 years ago			`support in the [Julia language](http://julialang.org/) and the Julia`
slight clarification in README 10 years ago			`developers wanted Unicode 7 support and other features, but the Public`
			`Software Group is currently occupied with other projects. We hope`
			that our fork can be merged back into the mainline `utf8proc` package
			`before too long.`
markdown and other cosmetic updates 10 years ago
Rename libutf8proc -> libmojibake. Closes #5 10 years ago			(The original `utf8proc` package also includes Ruby and PostgreSQL plug-ins.
			We removed those from `libmojibake` in order to focus exclusively on the C
markdown and other cosmetic updates 10 years ago			`library for the time being. We will strive to keep API changes to a minimum,`
Rename libutf8proc -> libmojibake. Closes #5 10 years ago			so `libmojibake` should still be usable with the old plug-in code.)
markdown and other cosmetic updates 10 years ago
Rename libutf8proc -> libmojibake. Closes #5 10 years ago			Like `utf8proc`, the `libmojibake` package is licensed under the
markdown and other cosmetic updates 10 years ago			`free/open-source [MIT "expat"`
			`license](http://opensource.org/licenses/MIT) (plus certain Unicode`
			`data governed by the similarly permissive [Unicode data`
			`license](http://www.unicode.org/copyright.html#Exhibit1)); please see`
			the included `LICENSE.md` file for more detailed information.

markdown fixes, prettified NEWS 10 years ago			`## Quick Start ##`
markdown and other cosmetic updates 10 years ago
			For compilation of the C library run `make`.

markdown fixes, prettified NEWS 10 years ago			`## General Information ##`
markdown and other cosmetic updates 10 years ago
			`The C library is found in this directory after successful compilation`
Rename libutf8proc -> libmojibake. Closes #5 10 years ago			and is named `libmojibake.a` (for the static library) and
			`libmojibake.so` (for the dynamic library).
markdown and other cosmetic updates 10 years ago
			`The Unicode version being supported is 5.0.0.`
			`Note: Version 4.1.0 of Unicode Standard Annex #29 was used, as`
			`version 5.0.0 had not been available at the time of implementation.`

			`For Unicode normalizations, the following options are used:`

Fix formatting of Normalization Form C 10 years ago			* Normalization Form C: `STABLE`, `COMPOSE`
markdown and other cosmetic updates 10 years ago			* Normalization Form D: `STABLE`, `DECOMPOSE`
			* Normalization Form KC: `STABLE`, `COMPOSE`, `COMPAT`
			* Normalization Form KD: `STABLE`, `DECOMPOSE`, `COMPAT`

markdown fixes, prettified NEWS 10 years ago			`## C Library ##`
markdown and other cosmetic updates 10 years ago
			The documentation for the C library is found in the `utf8proc.h` header file.
			`utf8proc_map` is function you will most likely be using for mapping UTF-8
			`strings, unless you want to allocate memory yourself.`

markdown fixes, prettified NEWS 10 years ago			`## To Do ##`
markdown and other cosmetic updates 10 years ago
			`* detect stable code points and process segments independently in order to save memory`
			`* do a quick check before normalizing strings to optimize speed`
			`* support stream processing`

markdown fixes, prettified NEWS 10 years ago			`## Contact ##`
markdown and other cosmetic updates 10 years ago
			`Bug reports, feature requests, and other queries can be filed at`
Rename libutf8proc -> libmojibake. Closes #5 10 years ago			`the [libmojibake page on Github](https://github.com/JuliaLang/libmojibake/issues).`
markdown and other cosmetic updates 10 years ago