The line separator is the name
given to the character or characters used to separate lines of text, and varies
from platform to platform. On Windows, it is the CR character (carriage return)
followed by the LF character (linefeed). On UNIX, it is the LF character alone,
often referred to as the newline character. The following program passes this
character to println. What does it print? Is its behavior platform
dependent?
public class LinePrinter { public static void main(String[] args) { // Note: \u000A is Unicode representation of linefeed (LF) char c = 0x000A; System.out.println(c); } }
Solution 16: Line Printer
The behavior of
this program is platform independent: It won't compile on any platform. If you
tried to compile it, you got an error message that looks something like
this:
LinePrinter.java:3: ';' expected // Note: \u000A is Unicode representation of linefeed (LF) ^ 1 error
If you are like most people, this message did not help to
clarify matters.
The key to this puzzle is the comment on the third line of the
program. Like the best of comments, this one is true. Unfortunately, this one is
a bit too true. The compiler not only translates Unicode escapes into the
characters they represent before it parses a program into tokens (Puzzle 14), but it does
so before discarding comments and white space [JLS 3.2].
This program contains a single Unicode escape
(\u000A), located in its sole comment. As the comment tells you, this
escape represents the linefeed character, and the compiler duly translates it
before discarding the comment. Unfortunately, this linefeed character is the
first line terminator after the two slash
characters that begin the comment (//) and so terminates the comment
[JLS 3.4]. The words following the escape (is Unicode representation of
linefeed (LF)) are therefore not part of the comment; nor are they
syntactically valid.
To make this more concrete, here is what the program looks like
after the Unicode escape has been translated into the character it
represents:
public class LinePrinter {
public static void main(String[] args) {
// Note:
is Unicode representation of linefeed (LF)
char c = 0x000A;
System.out.println(c);
}
}
The easiest way to fix the program is to remove the Unicode
escape from the comment, but a better way is to initialize c with an
escape sequence instead of a hex integer literal, obviating the need for the
comment:
public class LinePrinter { public static void main(String[] args) { char c = '\n'; System.out.println(c); } }
Once this has been done, the program will compile and run, but
it's still a questionable program. It is platform dependent for exactly the
reason suggested in the puzzle. On certain platforms, such as UNIX, it will
print two complete line separators; on others, such as Windows, it won't.
Although the output may look the same to the naked eye, it could easily cause
problems if it were saved in a file or piped to another program for subsequent
processing.
If you want to print two blank lines, you should invoke
println twice. As of release 5.0, you can use printf instead
of println, with the format string "%n%n". Each occurrence of
the characters %n will cause printf to print the appropriate
platform-specific line separator.
Hopefully, the last three puzzles have convinced you that
Unicode escapes can be thoroughly confusing. The lesson is simple: Avoid Unicode escapes except where they are truly
necessary. They are rarely necessary.
No comments:
Post a Comment
Your comments are welcome!