Tuesday, July 23, 2013

Java and Unicode

Programming to support languages that use anything other than the Latin character set has always been a major problem. There are a variety of 8-bit character sets defined for many national languages, but if you want to combine the Latin character set and Cyrillic in the same context, for example, things can get difficult. If you want to handle Japanese as well, it becomes impossible with an 8-bit character set because with 8 bits you have only 256 different codes so there just aren’t enough character codes to go round. Unicode is a standard character set that was developed to allow the characters necessary for almost all languages to be encoded. It uses a 16-bit code to represent a character (so each character occupies 2 bytes), and with 16 bits up to 65,535 non-zero character codes can be distinguished. With so many character codes available, there is enough to allocate each major national character set its own set of codes, including character sets such as Kanji, which is used for Japanese and which requires thousands of character codes. It doesn’t end there though. Unicode supports three encoding forms that allow up to a million additional characters to be represented.

I say each Unicode character usually occupies 2 bytes because Java supports Unicode 4.0, which allows 32-bit characters called surrogates. You might think that the set of 64K characters that you can represent with 16 bits would be sufficient, but it isn’t. Far-eastern languages such as Japanese, Korean, and Chinese alone involve more than 70,000 ideographs, and surrogates are used to represent characters that are not contained within the basic multilingual set that is defined by 16-bit characters.

Java Applications

Every Java application contains a class that defines a method called main(). The name of this class is the name that you use as the argument to the Java interpreter when you run the application. You can call the class whatever you want, but the method which is executed first in an application is always called main(). When you run your Java application, the method main() will typically cause methods belonging to other classes to be executed, but the simplest possible Java application program consists of one class containing just the method main(). As you will see below, the main() method has a particular fixed form, and if it is not of the required form, it will not be recognized by the Java interpreter as the method where execution starts.

You can see how this works by taking a look at just such a Java program. You need to enter the program code using your favorite plaintext editor, or if you have a Java development system with an editor, you can enter the code for the example using that. When you have entered the code, save the file with the same name as that used for the class and with the extension .java. For this example the file name will be OurFirstProgram.java. The program consists of a definition for a class I have called OurFirstProgram. The class definition contains only one method, the method main(). The first line of the definition for the method main() is always of the form:

public static void main(String[] args)

The code for the method appears between the pair of curly braces. This version of the method has only one executable statement:

System.out.println(“Krakatoa, EAST of Java??”);

So what does this statement do? Let’s work through it from left to right:

[1] System is the name of a standard class that contains objects that encapsulate the standard I/O devices for your system - the keyboard for command-line input and command-line output to the display. It is contained in the package java.lang, so it is always accessible just by using the simple class name System.

[2] The object out represents the standard output stream - the command line on your display screen - and is a data member of the class System. The member, out, is a special kind of member of the System class. Like the method main() in our OurFirstProgram class, it is static. This means that out exists even though there are no objects of type System. Using the class name, System, separated from the member name out by a period - System.out - references the out member.

[3] The bit at the rightmost end of the statement, println(“Krakatoa, EAST of Java??”),calls the println() method that belongs to the object out, and that outputs the text string that appears between the parentheses to your display. This demonstrates one way in which you can call a class method - by using the object name followed by the method name, with a period separating them. The stuff between the parentheses following the name of a method is information that is passed to the method when it is executed. As we said, for println() it is the text we want to output to thebcommand line. You can compile this program using the JDK compiler with the command

javac OurFirstProgram.java
or with the -classpath option specified:
javac –classpath . OurFirstProgram.java

If it didn’t compile, there’s something wrong somewhere. Here’s a checklist of possible sources of the problem:

[4] You forgot to include the path to the jdk1.5.0\bin directory in your PATH, or maybe you did not specify the path correctly. This will result in your operating system not being able to find the javac compiler that is in that directory.

[5] You made an error typing in the program code. Remember Java is case-sensitive, so OurfirstProgram is not the same as OurFirstProgram, and of course, there must be no spaces in the class name. If the compiler discovers an error, it will usually identify the line number in the code where the error was found. In general, watch out for confusing zero, 0, with a small letter o,or the digit one, 1, with the small letter l. All characters such as periods, commas, and semicolons in the code are essential and must be in the right place. Parentheses, (), curly braces, {}, and square brackets, [], always come in matching pairs and are not interchangeable.

[6] The source file name must match the class name exactly. The slightest difference will result in an
error. It must have the extension .java. Once you have compiled the program successfully, you can execute it with the command:

java –ea OurFirstProgram

The -ea option is not strictly necessary since this program does not use assertions, but if you get used to putting it in, you won’t forget it when it is necessary. If you need the -classpath option specified:

java –ea –classpath . OurFirstProgram

Assuming the source file compiled correctly, and the jdk1.5.0\bin directory is defined in your path, the most common reason for the program failing to execute is a typographical error in the class name, OurFirstProgram. The second most common reason is writing the file name, OurFirstProgram.class, in the command, whereas it should be just the class name, OurFirstProgram.

Sunday, January 8, 2012

Java’s Class Library

A library in Java is a collection of classes - usually providing related facilities - that you can use in your programs. The Java class library provides you with a whole range of goodies, some of which are essential for your programs to work at all, and some of which make writing your Java programs easier. To say that the standard class library covers a lot of ground would be something of an understatement, so I won’t be going into it in detail here; however, you will be looking into how to apply many of the facilities it provides throughout the book of Java.

Since the class library is a set of classes, it is stored in sets of files where each file contains a class definition. The classes are grouped together into related sets that are called packages, and each package is stored in a separate directory. A class in a package can access any of the other classes in the package. A class in another package may or may not be accessible.

The package name is based on the path to the directory in which the classes belonging to the package are stored. Classes in the package java.lang for example are stored in the directory path java\lang (or java/lang under Unix). This path is relative to a particular directory that is automatically known by the Java run-time environment that executes your code. You can also create your own packages that will contain classes of your own that you want to reuse in different contexts, and that are related in some way.

The JDK includes a growing number of standard packages - well over 100 the last time I counted. Some of the packages you will meet most frequently are:

[1] java.lang :-
These classes support the basic language features and the handling of arrays and strings. Classes in this package are always available directly in your programs by default because this package is always automatically loaded with your program.

[2] java.io :-
Classes for data input and output operations.

[3] java.util :-
This package contains utility classes of various kinds, including classes for managing data within collections or groups of data items.

[4] javax.swing :-
These classes provide easy-to-use and flexible components for building graphical user interfaces (GUIs). The components in this package are referred to as Swing components.

[5] java.awt :-
Classes in this package provide the original GUI components (JDK 1.1) as well as some basic support necessary for Swing components. java.awt.geom These classes define two-dimensional geometric shapes.

[6] java.awt.event :-
The classes in this package are used in the implementation of windowed applications to handle events in your program. Events are things like moving the mouse, pressing the left mouse button, or clicking on a menu item.

As noted previously, you can use any of the classes from the java.lang package in your programs by default. To use classes from the other packages, you typically use import statements to identify the names of the classes that you need from each package. This allows you to reference the classes by the simple class name. Without an import statement you would need to specify the fully qualified name of each class from a package each time you refer to it. As we will see in a moment, the fully qualified name for a class includes the package name as well as the basic class name. Using fully qualified class names would make your program code rather cumbersome, and certainly less readable. It would also make them a lot more tedious to type in.

You can use an import statement to import the name of a single class from a package into your program, or all the class names. The two import statements at the beginning of the code for the applet you saw earlier in this post are examples of importing a single class name. The first was:

import javax.swing.JApplet;

This statement imports the JApplet class name that is defined in the javax.swing package. Formally, the name of the JApplet class is not really JApplet - it is the fully qualified name javax.swing.JApplet. You can use the unqualified name only when you import the class or the complete package containing it into your program. You can still reference a class from a package even if you don’t import it though - you just need to use the full class name, javax.swing.JApplet. You could try this out with the applet you saw earlier if you like. Just delete the two import statements from the file and use the full class names in the program. Then recompile it. It should work the same as before. Thus, the fully qualified name for a class is the name of the package in which it is defined, followed by a period, followed by the name given to the class in its definition. You could import the names of all the classes in the javax.swing package with the statement:

import javax.swing.*;

The asterisk specifies that all the class names are to be imported. Importing just the class names that your source code uses makes compilation more efficient, but when you are using a lot of classes from a package you may find it more convenient to import all the names. This saves typing reams of import statements for one thing. We will do this with examples of Java code in the book to keep the number of lines to a minimum. However, there are risks associated with importing all the names in a package. There may be classes with names that are identical to names you have given to your own classes, which would obviously create some confusion when you compile your code.

As I indicated earlier, the standard classes do not appear as files or directories on your hard disk. They are packaged up in a single compressed file, rt.jar, that is stored in the jre/lib directory. This directory is created when you install the JDK on your computer. A .jar file is a Java archive - a compressed archive of Java classes. The standard classes that your executable program requires are loaded automatically from rt.jar, so you don’t have to be concerned with it directly at all.

Java Program Structure

Let’s summarize how a Java program is structured:

[1] A Java program always consists of one or more classes.

[2] You typically put the program code for each class in a separate file, and you must give each file the same name as that of the class that is defined within it.

[3] A Java source file name must have the extension .java. Thus your file containing the class Hat will be called Hat.java and your file containing the class BaseballPlayer must have the file name BaseballPlayer.java.

This program clearly majors on apparel, with four of the five classes representing clothing. Each source file contains a class definition, and all of the files that go to make up the program are stored in the same directory. The source files for your program contain all the code that you wrote, but this is not everything that is ultimately included in the program. There is also code from the Java standard class library, so let’s take a peek at what that can do.

Advantages of Using Objects

As I said at the outset, object-oriented programs are written using objects that are specific to the problem being solved. Your pinball machine simulator may well define and use objects of type Table, Ball, Flipper, and Bumper. This has tremendous advantages, not only in terms of easing the development process and making the program code easier to understand, but also in any future expansion of such a program. Java provides a whole range of standard classes to help you in the development of your program, and you can develop your own generic classes to provide a basis for developing programs that are of particular interest to you.

Because an object includes the methods that can operate on it as well as the data that defines it, programming using objects is much less prone to error. Your object-oriented Java programs should be more robust than the equivalent in a procedural programming language. Object-oriented programs take a little longer to design than programs that do not use objects since you must take care in the design of the classes that you will need, but the time required to write and test the code is sometimes substantially less than that for procedural programs. Object-oriented programs are also much easier to maintain and extend.