com.microsoft.tfs.core.util
Class CodePageMapping

java.lang.Object
  extended by com.microsoft.tfs.core.util.CodePageMapping

public class CodePageMapping
extends java.lang.Object

Overview

Important: Read the Default Endian Note section before using this class.

CodePageMapping implements a mapping of code pages to Java Charsets. This mapping is needed because TFS stores file encoding information as code page numbers. To make use of the encoding information from Java, we need to translate a code page into an appropriate Java Charset to use.

Each code page maps to 0 or more canonical charset names. If a code page maps to more than one charset name, the names are tried in sequence until one is found that is a valid charset in the current Java virtual machine.

Each canonical charset name maps to 0 or 1 code page integers. If a charset name maps to a code page integer, that code page is considered the best approximation for that charset.

The mappings are based on hardcoded data. The mappings can be added to or overridden at runtime by setting system properties:

For example:
 -DcodePageMapping.949=x-windows-949,x-IBM949,x-IBM949C
 -DcharsetMapping.x-windows-949=949
 -DcharsetMapping.x-IBM949=949
 -DcharsetMapping.x-IBM949C=949
 

Default Endian Note ("UTF-16", "UTF-32")

Java and Windows assume opposite byte orders when the endian-unspecified encoding names "UTF-16" and "UTF-32" are used for encoding and decoding text.

As a Java Charset name, "UTF-16" and "UTF-32" mean "read big-endian if no BOM, always write big-endian". The Unicode Standard specifies this behavior in Section 3.10 (Unicode Encoding Schemes), item D98 (D101 specifies the same behavior for UTF-32):

D98: "The UTF-16 encoding scheme may or may not begin with a BOM. However, when there is no BOM, and in the absence of a higher-level protocol, the byte order of the UTF-16 encoding scheme is big-endian."

However, Windows doesn't follow the standard when these names are used. It assumes "read little-endian if no BOM, always write little-endian".

In this class, Windows code page 1201 (aka "Unicode (Big-Endian)", "unicodeFFFE") is mapped to the Java Charset name "UTF-16" which triggers big-endian behavior with readers/writers. Correspondingly, if Java tells us a reader/writer is in "UTF-16" encoding, we want to tell TFS that we're using Windows code page 1201. "UTF-32" works similarly.

Additionally, Windows code page 1200 (aka "Unicode", "utf-16"; little-endian assumed) must map from/to the explicit-endian Java Charset name "UTF-16LE". Make sure to specify the endian-explicit "UTF-16LE" Java Charset (or "UTF32-LE") if you mean little-endian.

See Also:
Charset

Nested Class Summary
static class CodePageMapping.UnknownCodePageException
          An exception thrown to indicate that a code page specified as an argument to a CodePageMapping method was unknown to that class.
static class CodePageMapping.UnknownEncodingException
          An exception thrown to indicate that either a Charset or the name of an encoding specified as an argument to a CodePageMapping method was unknown to that class.
 
Constructor Summary
CodePageMapping()
           
 
Method Summary
static java.nio.charset.Charset getCharset(int codePage)
          Translates the specified code page into a Charset.
static java.nio.charset.Charset getCharset(int codePage, boolean mustExist)
           Translates the specified code page into a Charset.
static java.nio.charset.Charset[] getCharsets()
          Gets a list of charsets that are mappable to code pages.
static int getCodePage(java.nio.charset.Charset charset)
          Translates the specified Charset into a code page.
static int getCodePage(java.nio.charset.Charset charset, boolean mustExist)
           Translates the specified Charset into a code page.
static int getCodePage(java.lang.String encoding)
          Translates the specified encoding into a code page.
static int getCodePage(java.lang.String encoding, boolean mustExist)
           Translates the specified encoding into a code page.
static int[] getCodePages()
          Gets a list of codepages that are mappable to code pages.
static java.lang.String getEncoding(int codePage)
          Translates the specified code page into an encoding.
static java.lang.String getEncoding(int codePage, boolean mustExist, boolean mustBeSupportedCharset)
           Attempts to translate the specified code page into an encoding.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CodePageMapping

public CodePageMapping()
Method Detail

getCharsets

public static java.nio.charset.Charset[] getCharsets()
Gets a list of charsets that are mappable to code pages.

Returns:
Known Charsets

getCodePages

public static int[] getCodePages()
Gets a list of codepages that are mappable to code pages.

Returns:
Known Code Pages

getEncoding

public static java.lang.String getEncoding(int codePage)
Translates the specified code page into an encoding. If the code page can not be translated, an CodePageMapping.UnknownCodePageException is thrown. Otherwise, this method returns a charset name that is supported by this Java virtual machine.

Parameters:
codePage - a code page to translate
Returns:
a valid encoding for the code page (never null)
Throws:
CodePageMapping.UnknownCodePageException

getEncoding

public static java.lang.String getEncoding(int codePage,
                                           boolean mustExist,
                                           boolean mustBeSupportedCharset)

Attempts to translate the specified code page into an encoding.

If the code page does not map to an encoding, the mustExist parameter specifies the policy. If mustExist is true, an CodePageMapping.UnknownCodePageException is thrown. Otherwise, null is returned.

If the code page maps to an encoding that is not supported by this Java virtual machine, the mustBeSupportedCharset specifies the policy. If mustBeSupportedCharset is true, an CodePageMapping.UnknownCodePageException is thrown. Otherwise, the non-supported encoding is returned.

Parameters:
codePage - a code page to translate
mustExist - if true, the code page must map to a known encoding
mustBeSupportedCharset - if true, the code page must map to a supported charset in this Java virtual machine
Returns:
an encoding for the code page, which may be unsupported if mustBeSupportedCharset is false and may be null if mustExist is false
Throws:
CodePageMapping.UnknownCodePageException

getCharset

public static java.nio.charset.Charset getCharset(int codePage)
Translates the specified code page into a Charset. If the code page can not be translated, an CodePageMapping.UnknownCodePageException is thrown.

Parameters:
codePage - a code page to translate
Returns:
a Charset for the code page (never null)
Throws:
CodePageMapping.UnknownCodePageException

getCharset

public static java.nio.charset.Charset getCharset(int codePage,
                                                  boolean mustExist)

Translates the specified code page into a Charset.

If the code page does not map to an Charset, the mustExist parameter specifies the policy. If mustExist is true, an CodePageMapping.UnknownCodePageException is thrown. Otherwise, null is returned.

Parameters:
codePage - a code page to translate
mustExist - if true, the code page must map to a Charset
Returns:
a Charset for the code page, which may be null if mustExist is false
Throws:
CodePageMapping.UnknownCodePageException

getCodePage

public static int getCodePage(java.lang.String encoding)
Translates the specified encoding into a code page. If the encoding can not be translated, an CodePageMapping.UnknownEncodingException is thrown.

Parameters:
encoding - an encoding to translate (must not be null)
Returns:
a code page appropriate for passing to TFS
Throws:
CodePageMapping.UnknownEncodingException

getCodePage

public static int getCodePage(java.lang.String encoding,
                              boolean mustExist)

Translates the specified encoding into a code page.

If the encoding does not map to a code page, the mustExist parameter specifies the policy. If mustExist is true, an CodePageMapping.UnknownEncodingException is thrown. Otherwise, 0 is returned. The value 0 is not a valid code page value for TFS.

Parameters:
encoding - an encoding to translate (must not be null)
mustExist - if true, the encoding must map to a code page
Returns:
a code page for the encoding, which may be 0 if mustExist is false
Throws:
CodePageMapping.UnknownEncodingException

getCodePage

public static int getCodePage(java.nio.charset.Charset charset)
Translates the specified Charset into a code page. If the Charset can not be translated, an CodePageMapping.UnknownEncodingException is thrown.

Parameters:
charset - a Charset to translate (must not be null)
Returns:
a code page appropriate for passing to TFS
Throws:
CodePageMapping.UnknownEncodingException

getCodePage

public static int getCodePage(java.nio.charset.Charset charset,
                              boolean mustExist)

Translates the specified Charset into a code page.

If the Charset does not map to a code page, the mustExist parameter specifies the policy. If mustExist is true, an CodePageMapping.UnknownEncodingException is thrown. Otherwise, 0 is returned. The value 0 is not a valid code page value for TFS.

Parameters:
charset - a Charset to translate (must not be null)
mustExist - if true, the Charset must map to a code page
Returns:
a code page for the Charset, which may be 0 if mustExist is false
Throws:
CodePageMapping.UnknownEncodingException


© 2015 Microsoft. All rights reserved.