Wednesday, July 24, 2013

Variable Names and Unicode

Even though you may be entering your Java programs in an environment that stores ASCII characters,all Java source code is in Unicode. Although the original source code that you create may be ASCII, it is converted to Unicode characters internally, before it is compiled. While you can write any Java language statement using ASCII, the fact that Java supports Unicode provides you with immense flexibility. It means that the identifiers that you use in your source program can use any national language character set that is defined within the Unicode character set, so your programs can use French, Greek, or Russian variable names, for example, or even names in several different languages, as long as you have the means to enter them in the first place. The same applies to character data that your program defines.

Tuesday, July 23, 2013

Naming Your Variables

The name that you choose for a variable, or indeed the name that you choose for anything in Java, is called an identifier. An identifier can be any length, but it must start with a letter, an underscore (_), or a dollar sign ($). The rest of an identifier can include any characters except those used as operators in Java (such as +, –, or *), but you will be generally better off if you stick to letters, digits, and the underscore character.

Java is case-sensitive, so the names republican and Republican are not the same. You must not include blanks or tabs in the middle of a name, so Betty May is out but you could have BettyMay or even Betty_May. Note that you can’t have 6Pack as a name since you cannot start a name with a numeric digit. Of course, you could use sixPack as an alternative.

Subject to the restrictions I have mentioned, you can name a variable almost anything you like, except for two additional restraints - you can’t use keywords in Java as a name for something, and a name can’t be anything that could be interpreted as a constant value - as a literal, in other words. Keywords are words that are an essential part of the Java language. You saw some keywords in the previous chapter,and you will learn a few more in this chapter. If you’d like to know what they all are now, see the complete list in Appendix A. The restriction on constant values is there because, although it is obvious why a name can’t be 1234 or 37.5, constants can also be alphabetic, such as true and false, for example,which are literals of type boolean. Of course, the basic reason for these rules is that the compiler has to be able to distinguish between your variables and other things that can appear in a program. If you try to use a name for a variable that makes this impossible, then it’s not a legal name.

Data and Variables

A variable is a named piece of memory that you use to store information in your Java program a piece of data of some description. Each named piece of memory that you define in your program is able to store data only of one particular type. If you define a variable to store integers, for example,you can’t use it to store a value that is a decimal fraction, such as 0.75. If you’ve defined a variable that you use to refer to a Hat object, you can only use it to reference an object of type Hat. Since the type of data that each variable can store is fixed, the compiler can verify that each variable you define in your program is not used in a manner or a context that is inappropriate to its type. If a method in your program is supposed to process integers,the compiler will be able to detect when you inadvertently try to use the method with some other kind of data - for example, a string or a numerical value that is not integral.

Explicit data values that appear in your program are called literals. Each literal will also be of a particular type: 25, for example, is an integer literal of type int. I will go into the characteristics of the various types of literals that you can use as I discuss each variable type.

Before you can use a variable you must specify its name and type in a declaration statement. Before I describe how you write a declaration for a variable, let’s consider what flexibility you have in choosing a name.

Java and Unicode

Programming to support languages that use anything other than the Latin character set has always been a major problem. There are a variety of 8-bit character sets defined for many national languages, but if you want to combine the Latin character set and Cyrillic in the same context, for example, things can get difficult. If you want to handle Japanese as well, it becomes impossible with an 8-bit character set because with 8 bits you have only 256 different codes so there just aren’t enough character codes to go round. Unicode is a standard character set that was developed to allow the characters necessary for almost all languages to be encoded. It uses a 16-bit code to represent a character (so each character occupies 2 bytes), and with 16 bits up to 65,535 non-zero character codes can be distinguished. With so many character codes available, there is enough to allocate each major national character set its own set of codes, including character sets such as Kanji, which is used for Japanese and which requires thousands of character codes. It doesn’t end there though. Unicode supports three encoding forms that allow up to a million additional characters to be represented.

I say each Unicode character usually occupies 2 bytes because Java supports Unicode 4.0, which allows 32-bit characters called surrogates. You might think that the set of 64K characters that you can represent with 16 bits would be sufficient, but it isn’t. Far-eastern languages such as Japanese, Korean, and Chinese alone involve more than 70,000 ideographs, and surrogates are used to represent characters that are not contained within the basic multilingual set that is defined by 16-bit characters.

Java Applications

Every Java application contains a class that defines a method called main(). The name of this class is the name that you use as the argument to the Java interpreter when you run the application. You can call the class whatever you want, but the method which is executed first in an application is always called main(). When you run your Java application, the method main() will typically cause methods belonging to other classes to be executed, but the simplest possible Java application program consists of one class containing just the method main(). As you will see below, the main() method has a particular fixed form, and if it is not of the required form, it will not be recognized by the Java interpreter as the method where execution starts.

You can see how this works by taking a look at just such a Java program. You need to enter the program code using your favorite plaintext editor, or if you have a Java development system with an editor, you can enter the code for the example using that. When you have entered the code, save the file with the same name as that used for the class and with the extension .java. For this example the file name will be OurFirstProgram.java. The program consists of a definition for a class I have called OurFirstProgram. The class definition contains only one method, the method main(). The first line of the definition for the method main() is always of the form:

public static void main(String[] args)

The code for the method appears between the pair of curly braces. This version of the method has only one executable statement:

System.out.println(“Krakatoa, EAST of Java??”);

So what does this statement do? Let’s work through it from left to right:

[1] System is the name of a standard class that contains objects that encapsulate the standard I/O devices for your system - the keyboard for command-line input and command-line output to the display. It is contained in the package java.lang, so it is always accessible just by using the simple class name System.

[2] The object out represents the standard output stream - the command line on your display screen - and is a data member of the class System. The member, out, is a special kind of member of the System class. Like the method main() in our OurFirstProgram class, it is static. This means that out exists even though there are no objects of type System. Using the class name, System, separated from the member name out by a period - System.out - references the out member.

[3] The bit at the rightmost end of the statement, println(“Krakatoa, EAST of Java??”),calls the println() method that belongs to the object out, and that outputs the text string that appears between the parentheses to your display. This demonstrates one way in which you can call a class method - by using the object name followed by the method name, with a period separating them. The stuff between the parentheses following the name of a method is information that is passed to the method when it is executed. As we said, for println() it is the text we want to output to thebcommand line. You can compile this program using the JDK compiler with the command

javac OurFirstProgram.java
or with the -classpath option specified:
javac –classpath . OurFirstProgram.java

If it didn’t compile, there’s something wrong somewhere. Here’s a checklist of possible sources of the problem:

[4] You forgot to include the path to the jdk1.5.0\bin directory in your PATH, or maybe you did not specify the path correctly. This will result in your operating system not being able to find the javac compiler that is in that directory.

[5] You made an error typing in the program code. Remember Java is case-sensitive, so OurfirstProgram is not the same as OurFirstProgram, and of course, there must be no spaces in the class name. If the compiler discovers an error, it will usually identify the line number in the code where the error was found. In general, watch out for confusing zero, 0, with a small letter o,or the digit one, 1, with the small letter l. All characters such as periods, commas, and semicolons in the code are essential and must be in the right place. Parentheses, (), curly braces, {}, and square brackets, [], always come in matching pairs and are not interchangeable.

[6] The source file name must match the class name exactly. The slightest difference will result in an
error. It must have the extension .java. Once you have compiled the program successfully, you can execute it with the command:

java –ea OurFirstProgram

The -ea option is not strictly necessary since this program does not use assertions, but if you get used to putting it in, you won’t forget it when it is necessary. If you need the -classpath option specified:

java –ea –classpath . OurFirstProgram

Assuming the source file compiled correctly, and the jdk1.5.0\bin directory is defined in your path, the most common reason for the program failing to execute is a typographical error in the class name, OurFirstProgram. The second most common reason is writing the file name, OurFirstProgram.class, in the command, whereas it should be just the class name, OurFirstProgram.