|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.microsoft.tfs.core.util.CodePageMapping
public class CodePageMapping
Important: Read the Default Endian Note section before using this class.
CodePageMapping
implements a mapping of code pages to Java
Charset
s. This mapping is needed because TFS stores file encoding
information as code page numbers. To make use of the encoding information
from Java, we need to translate a code page into an appropriate Java
Charset
to use.
Each code page maps to 0 or more canonical charset names. If a code page maps to more than one charset name, the names are tried in sequence until one is found that is a valid charset in the current Java virtual machine.
Each canonical charset name maps to 0 or 1 code page integers. If a charset name maps to a code page integer, that code page is considered the best approximation for that charset.
The mappings are based on hardcoded data. The mappings can be added to or overridden at runtime by setting system properties:
codePageMapping.X
, where X is the desired code page number to
map. The value of the property should be a comma-separated list of charset
names to try, in sequence, when mapping the code page.charsetMapping.X
, where X is the charset name. The value of the
property should be the code page integer to use when mapping the charset.-DcodePageMapping.949=x-windows-949,x-IBM949,x-IBM949C -DcharsetMapping.x-windows-949=949 -DcharsetMapping.x-IBM949=949 -DcharsetMapping.x-IBM949C=949
Java and Windows assume opposite byte orders when the endian-unspecified encoding names "UTF-16" and "UTF-32" are used for encoding and decoding text.
As a Java Charset name, "UTF-16" and "UTF-32" mean "read big-endian if no BOM, always write big-endian". The Unicode Standard specifies this behavior in Section 3.10 (Unicode Encoding Schemes), item D98 (D101 specifies the same behavior for UTF-32):
D98: "The UTF-16 encoding scheme may or may not begin with a BOM. However, when there is no BOM, and in the absence of a higher-level protocol, the byte order of the UTF-16 encoding scheme is big-endian."
However, Windows doesn't follow the standard when these names are used. It assumes "read little-endian if no BOM, always write little-endian".
In this class, Windows code page 1201 (aka "Unicode (Big-Endian)", "unicodeFFFE") is mapped to the Java Charset name "UTF-16" which triggers big-endian behavior with readers/writers. Correspondingly, if Java tells us a reader/writer is in "UTF-16" encoding, we want to tell TFS that we're using Windows code page 1201. "UTF-32" works similarly.
Additionally, Windows code page 1200 (aka "Unicode", "utf-16"; little-endian assumed) must map from/to the explicit-endian Java Charset name "UTF-16LE". Make sure to specify the endian-explicit "UTF-16LE" Java Charset (or "UTF32-LE") if you mean little-endian.
Charset
Nested Class Summary | |
---|---|
static class |
CodePageMapping.UnknownCodePageException
An exception thrown to indicate that a code page specified as an argument to a CodePageMapping method was unknown to that class. |
static class |
CodePageMapping.UnknownEncodingException
An exception thrown to indicate that either a Charset or the name
of an encoding specified as an argument to a CodePageMapping
method was unknown to that class. |
Constructor Summary | |
---|---|
CodePageMapping()
|
Method Summary | |
---|---|
static java.nio.charset.Charset |
getCharset(int codePage)
Translates the specified code page into a Charset . |
static java.nio.charset.Charset |
getCharset(int codePage,
boolean mustExist)
Translates the specified code page into a Charset . |
static java.nio.charset.Charset[] |
getCharsets()
Gets a list of charsets that are mappable to code pages. |
static int |
getCodePage(java.nio.charset.Charset charset)
Translates the specified Charset into a code page. |
static int |
getCodePage(java.nio.charset.Charset charset,
boolean mustExist)
Translates the specified Charset into a code page. |
static int |
getCodePage(java.lang.String encoding)
Translates the specified encoding into a code page. |
static int |
getCodePage(java.lang.String encoding,
boolean mustExist)
Translates the specified encoding into a code page. |
static int[] |
getCodePages()
Gets a list of codepages that are mappable to code pages. |
static java.lang.String |
getEncoding(int codePage)
Translates the specified code page into an encoding. |
static java.lang.String |
getEncoding(int codePage,
boolean mustExist,
boolean mustBeSupportedCharset)
Attempts to translate the specified code page into an encoding. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public CodePageMapping()
Method Detail |
---|
public static java.nio.charset.Charset[] getCharsets()
public static int[] getCodePages()
public static java.lang.String getEncoding(int codePage)
CodePageMapping.UnknownCodePageException
is thrown.
Otherwise, this method returns a charset name that is supported by this
Java virtual machine.
codePage
- a code page to translate
null
)
CodePageMapping.UnknownCodePageException
public static java.lang.String getEncoding(int codePage, boolean mustExist, boolean mustBeSupportedCharset)
Attempts to translate the specified code page into an encoding.
If the code page does not map to an encoding, the mustExist
parameter specifies the policy. If mustExist
is
true
, an CodePageMapping.UnknownCodePageException
is thrown.
Otherwise, null
is returned.
If the code page maps to an encoding that is not supported by this Java
virtual machine, the mustBeSupportedCharset
specifies the
policy. If mustBeSupportedCharset
is true
, an
CodePageMapping.UnknownCodePageException
is thrown. Otherwise, the non-supported
encoding is returned.
codePage
- a code page to translatemustExist
- if true
, the code page must map to a known encodingmustBeSupportedCharset
- if true
, the code page must map to a supported
charset in this Java virtual machine
mustBeSupportedCharset
is false
and may
be null
if mustExist
is
false
CodePageMapping.UnknownCodePageException
public static java.nio.charset.Charset getCharset(int codePage)
Charset
. If the code
page can not be translated, an CodePageMapping.UnknownCodePageException
is
thrown.
codePage
- a code page to translate
Charset
for the code page (never null
)
CodePageMapping.UnknownCodePageException
public static java.nio.charset.Charset getCharset(int codePage, boolean mustExist)
Translates the specified code page into a Charset
.
If the code page does not map to an Charset
, the
mustExist
parameter specifies the policy. If
mustExist
is true
, an
CodePageMapping.UnknownCodePageException
is thrown. Otherwise, null
is returned.
codePage
- a code page to translatemustExist
- if true
, the code page must map to a Charset
Charset
for the code page, which may be
null
if mustExist
is false
CodePageMapping.UnknownCodePageException
public static int getCodePage(java.lang.String encoding)
CodePageMapping.UnknownEncodingException
is thrown.
encoding
- an encoding to translate (must not be null
)
CodePageMapping.UnknownEncodingException
public static int getCodePage(java.lang.String encoding, boolean mustExist)
Translates the specified encoding into a code page.
If the encoding does not map to a code page, the mustExist
parameter specifies the policy. If mustExist
is
true
, an CodePageMapping.UnknownEncodingException
is thrown.
Otherwise, 0
is returned. The value 0 is not a valid code
page value for TFS.
encoding
- an encoding to translate (must not be null
)mustExist
- if true
, the encoding must map to a code page
0
if
mustExist
is false
CodePageMapping.UnknownEncodingException
public static int getCodePage(java.nio.charset.Charset charset)
Charset
into a code page. If the
Charset
can not be translated, an
CodePageMapping.UnknownEncodingException
is thrown.
charset
- a Charset
to translate (must not be null
)
CodePageMapping.UnknownEncodingException
public static int getCodePage(java.nio.charset.Charset charset, boolean mustExist)
Translates the specified Charset
into a code page.
If the Charset
does not map to a code page, the
mustExist
parameter specifies the policy. If
mustExist
is true
, an
CodePageMapping.UnknownEncodingException
is thrown. Otherwise, 0
is
returned. The value 0 is not a valid code page value for TFS.
charset
- a Charset
to translate (must not be null
)mustExist
- if true
, the Charset
must map to a code page
Charset
, which may be 0
if mustExist
is false
CodePageMapping.UnknownEncodingException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |