com.microsoft.tfs.core.util
Class FileEncodingDetector

java.lang.Object
  extended by com.microsoft.tfs.core.util.FileEncodingDetector

public final class FileEncodingDetector
extends java.lang.Object

Static methods which assist in detecting text encodings in disk files. Always returns encodings represented by known FileEncoding instances (which can be queried for their text names and code page numbers).

Since:
TEE-SDK-10.1
Thread-safety:
immutable

Method Summary
static FileEncoding detectEncoding(java.lang.String path, FileEncoding encodingHint)
          Detects the encoding used for a server or local path with hints.
protected static boolean looksLikeANSI(byte[] bytes, int limit)
          Tests whether the given byte array looks like an ANSI text file with the default text encoding, i.e.
protected static boolean looksLikeEBCDIC(byte[] bytes, int limit)
          Tests whether the given byte array looks like an EBCDIC text file (contains character values that would be present in an EBCDIC text file without the control characters that would not).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

detectEncoding

public static FileEncoding detectEncoding(java.lang.String path,
                                          FileEncoding encodingHint)
Detects the encoding used for a server or local path with hints. For server paths, the contents are never read and only the hint is used.

The FileEncoding.AUTOMATICALLY_DETECT encoding hint is only valid for local (not server) paths that do not contain wildcard characters, exist on disk, and are files (not directories). If it is specified for other kinds of paths, an exception is thrown.

Encoding hints are evaluated in the following way:

Parameters:
path - the path to detect encoding for (must not be null)
encodingHint - the encoding hint (must not be null)
Returns:
the FileEncoding that matches the given file's encoding.
Throws:
TECoreException - if the specified encoding hint is not valid for the type of path given

looksLikeANSI

protected static boolean looksLikeANSI(byte[] bytes,
                                       int limit)
Tests whether the given byte array looks like an ANSI text file with the default text encoding, i.e. can be decoded with the current ANSI character set. In multi-byte character sets (like Japanese, for example) the entire byte array might not be converted entirely, because at the end of array it might contain a broken multi-byte character. We still accept this kind of files as ANSI ones if the not converted reminder of the array is short enough.

Parameters:
bytes - the bytes to check for ANSI-ness (must not be null)
limit - the maximum number of bytes to read.
Returns:
true if the given bytes look like part of an ANSI text file, false if they do not (because they contain control characters or other patterns).

looksLikeEBCDIC

protected static boolean looksLikeEBCDIC(byte[] bytes,
                                         int limit)
Tests whether the given byte array looks like an EBCDIC text file (contains character values that would be present in an EBCDIC text file without the control characters that would not).

Parameters:
bytes - the bytes to check for EBCDIC-ness (must not be null)
limit - the maximum number of bytes to read.
Returns:
true if the given bytes look like part of an EBCDIC text file, false if they do not (because they contain control characters or other patterns).


© 2015 Microsoft. All rights reserved.