Browse Source

Unicode 13 support (#179)

* exclude Sk from zero-width chars (closes #167)

* update for Unicode 13
pull/172/merge
Steven G. Johnson 5 years ago
committed by GitHub
parent
commit
b48f5d074f
No known key found for this signature in database GPG Key ID: 4AEE18F83AFDEB23
  1. 2
      README.md
  2. 4
      data/Makefile
  3. 2
      data/charwidths.jl
  4. 2
      utf8proc.c
  5. 11072
      utf8proc_data.c

2
README.md

@ -60,7 +60,7 @@ The C library is found in this directory after successful compilation
and is named `libutf8proc.a` (for the static library) and
`libutf8proc.so` (for the dynamic library).
The Unicode version supported is 12.1.0.
The Unicode version supported is 13.0.0.
For Unicode normalizations, the following options are used:

4
data/Makefile

@ -22,10 +22,10 @@ CharWidths.txt: charwidths.jl EastAsianWidth.txt
$(JULIA) charwidths.jl > $@
# Unicode data version (must also update utf8proc_unicode_version function)
UNICODE_VERSION=12.1.0
UNICODE_VERSION=13.0.0
# Unicode emoji version (managed separately from UNICODE_VERSION)
UNICODE_EMOJI_VERSION=12.0
UNICODE_EMOJI_VERSION=13.0
UnicodeData.txt:
$(CURL) $(CURLFLAGS) -o $@ -O http://www.unicode.org/Public/$(UNICODE_VERSION)/ucd/UnicodeData.txt

2
data/charwidths.jl

@ -60,7 +60,7 @@ zerowidth = Set{Int}() # categories that may contain zero-width chars
push!(zerowidth, UTF8PROC_CATEGORY_MN)
push!(zerowidth, UTF8PROC_CATEGORY_MC)
push!(zerowidth, UTF8PROC_CATEGORY_ME)
push!(zerowidth, UTF8PROC_CATEGORY_SK)
# push!(zerowidth, UTF8PROC_CATEGORY_SK) # see issue #167
push!(zerowidth, UTF8PROC_CATEGORY_ZL)
push!(zerowidth, UTF8PROC_CATEGORY_ZP)
push!(zerowidth, UTF8PROC_CATEGORY_CC)

2
utf8proc.c

@ -101,7 +101,7 @@ UTF8PROC_DLLEXPORT const char *utf8proc_version(void) {
}
UTF8PROC_DLLEXPORT const char *utf8proc_unicode_version(void) {
return "12.1.0";
return "13.0.0";
}
UTF8PROC_DLLEXPORT const char *utf8proc_errmsg(utf8proc_ssize_t errcode) {

11072
utf8proc_data.c

File diff suppressed because it is too large
Loading…
Cancel
Save