The Bluffer's Guide to Java

This is the first of an occasional series dedicated to the software engineer who needs to start using a programming language quickly and is prepared to learn the finer points "on the job" - after all it is the ability to write well engineered software that counts. This Guide assumes an acquaintance with C and the basic ideas of class-structuring.

AutoCompile is a short example of Java which automatically recompiles any source files in a directory that have changed. You might like to look at this example to provide a bit of context for what follows. As an exercise, you might like to consider how the example would be extended to cope with multiple directories or new and deleted files.

Java at a Glance

Java was designed by James Gosling of Sun and is now marketted by JavaSoft, a subsidiary of Sun. It is targetted as the Internet programming language for browser-side "applets" to major server-side applications.

C-style syntax
C-style primitive types, but no unsigned, and char is 16-bit Unicode not 8-bit ASCII.
Class structured with inheritance from a superclass and inheritance from one or more interfaces.
Methods and variables are either class or instance (class methods and variables are denoted by static.)
Exception handling provided by throw and try...catch.
Methods must declare exceptions that may be thrown out of them.
Principally targetted for the Java Virtual Machine, a bytecode interpreter but with support for native implementations of classes and just-in-time compilation of the bytecodes.
Safe: strong-typed with rigourous checks at runtime including verification of bytecodes.

Basic elements of Java

Java is a simple class-structured language which uses a C-style syntax but does not carry with it the immense baggage of C++. A class is simply an encapsulation of data and procedure. Classes are the only structured type mechanism available in Java. So expect to see typical Java source as consisting of a series of class declarations containing the methods for those classes - and methods look very similar to Ansi-C procedures (i.e., with typed arguments).

Java source is written in Unicode, but this subtlety can usually be overlooked as Java compilers accept an ASCII encoding.

Java provides three types of comment: C-style, C++-style and its own document comment:

    /* C-style comment */
    // C++-style comment up to the line end
    /** Documentation comment. HTML can be embedded here along
     ** with a few other tags that the javadoc tool uses.
     **/

The javadoc tool generates a set of documentation pages for your source in HTML. It is well worth investigating.

As in C, lexical tokens in Java are case-sensitive and separated by white-space. Also as in C, the longest compound token is formed regardless of context (i.e., ++ is always the increment operator not a dyadic and monadic plus next to each other).

Primitive Types

Java provides a familiar set of primitive types:

boolean: Unlike C, Java has a distinct type for Boolean values with literals true and false This will be familiar to Algol, Pascal, etc., programmers.
int: 32-bit signed integer. C-like literals, e.g., 1066 077 0xd0d0
short: 16-bit signed integer. No direct literals - cast an int, e.g., (short) 1024
byte: 8-bit signed integer. No direct literals - cast an int, e.g., (byte) 42
long: 64-bit signed integer. C-like literals (remember the distinguishing 'l' or 'L' suffix), e.g., 314159265355L 077L 0xdeadd0d0f1d0L
double: 64-bit IEEE floating point. C-like literals, e.g., 2.7182818 42.
float: 32-bit IEEE floating point. C-like literals with distinguishing 'f' or 'F' suffix, e.g., 2.718F 42F
char: 16-bit Unicode character (not 8-bit ASCII - well done Java!). Single-quoted literal, e.g., 'J'
String: This is really a built-in class, but it does little harm to think of it is as primitive. Double-quoted literal, e.g., "Java", with the same escape codes as C, e.g., "End\n" includes a newline.

Note:

there are no unsigned types.
most of the usual arithmetic and logical operators are defined for Java:
```
 + - * / % ! == != < <= > >= & | && || << >> >>> 
```
/ and % does integer and floating-point division and remainder. Integer division truncates towards zero.
>>> does unsigned shift right.
+ between strings does concatenation.
the comparisons and logical operations deliver a boolean type.
mixed-precision arithmetic delivers the "longest and widest" type.
floating-point arithmetic is done to the IEEE standard, and may generate exceptions (see later).
widening conversions (e.g., int -> float) are done automatically, but unlike C and Fortran, narrowing conversions (e.g., int -> short) need an explicit cast.
== and != between strings compare references not lexical sequences. Use the equals method instead, as in var.equals ("blue").

Java also provides a class encapsulation of each of the primitive types for manipulation as an abstract object. Class Integer, for example, is a class with just one int instance variable.

Classes and Methods

A class is an encapsulation of data and procedure. A simple class declaration is of the form:

    
    class <ClassName> { <Body> }

and a declaration of a class which extends another is of the form:

    
    class <ClassName> extends <ParentName> { <Body> }

where Body may contain:

class variable declarations

Class variables are shared by all instances of a class. Declarations are of the form:

    static <type> <name> [ = <initial-value>];

where type is the type of the value held by the variable and the optional initial-value is the initial value of the variable when the class is loaded. Several variables may be declared with the same type by separating their definitions with commas.
Class variables play a similar role to global variables in other languages.

class method declarations

Class methods can only reference their arguments and class variables. Declarations are of the form:

    static <type> <name> (<arguments>) { <body> }

where type is the type of the value returned by the method and may be void; arguments are zero, one or many (comma-separated) declarations of each argument as type followed by name.
Class methods play a similar role to global procedures in other languages.

Hint: If you write one class containing all your methods and make them static, then you are close to writing a traditional C source file full of procedures. But you would be missing the point of Java!

static initialisers

Static initialisers are invoked once when the class is loaded to allow class variables to be set up. They are of the form

    static { <body> }

instance variable declarations

Each object has its own set of instance variables that are allocated as part of constructing a new object. Declarations are of the form:

    <type> <name>;

and several variables of the same type can be declared by separating the names with commas.
Instance variables are similar to fields in structures in other languages.

instance method declarations

Instance methods can reference instance variables as well as their arguments and class variables. They are always invoked from an object which can be thought of as an implicit extra argument called this. Declarations are of the form:

    <type> <name> (<arguments>) { <body> }

in the same fashion as class methods. Instance methods have no analogy with procedural programming constructs.

constructor declarations

Constructors are similar to instance methods, but are only invoked when a new object of that class is created. Declarations are of the form:

    <ClassName> (<arguments>) { <body> }

Note that they are distinguished from instance methods by not specifying the return type and by having the same name as the class. One can be invoked as follows:

    new <ClassName> (<arguments>)

where arguments are the actual values to be bound to the constructor argument names. As it suggests, this delivers a new object of the class.

abstract method declarations

Abstract method declarations define a method that will be supplied by all subclasses of this class, but which is not defined by this class itself. An abstract method (which may be a class or an instance method) is defined as follows:

    abstract [static] <type> <name> (<arguments>) {}

A class containing abstract method declarations must itself be declared as abstract and may not be instantiated - only subclasses of it may be instantiated. The utility is that by manipulating the class itself, you can invoke the abstract method and get the behaviour defined by the subclass for the object as determined at runtime.

Java refers to class and instance variables as fields and fields and methods as class members. Member declarations can be prefixed with additional keywords that control their usage:

public: Public class members are visible anywhere that the class is available (see Packages for details) and public instance members are visible whenever an object of that class is available.
private: Private members are only visible within the class declaration.
protected: Protected members are only visible within the declaration of the class or a subclass of it.
final: Final members may not be overriden by member declarations in subclasses.
synchronized: Synchonized methods are invoked via a mutual-exclusion lock which ensures only one thread of execution is invoking the method at once.
native: Native methods are implemented in another language - refer to the Java Development Kit documentation for details of how to use this.

Member declarations may have only one of public, private or protected. If they have no such specification they are 'friendly' which means public within the local package and private within any other package.

Accessing Members

In general members of a class are accessed via the 'dot' notation. A class member access is of the form:

    <ClassName>.<memberName>

e.g., Math.sqrt, and an instance member access is of the form:

    <object-expression>.<memberName>

where object-expression is an expression which delivers an object of a class that contains memberName - most often this will be a variable name or the result of a method invocation.

Note that a declaration can be used before it has been seen (Java does not require warnings of declarations to come) and remember that not all members of a class are always visible - see earlier.

Members of a class are accessible within its declaration without the class or object dot prefix. Similarly declarations of any ancestor classes are also visible without the prefix. Sometimes member declarations may be overriden by argument or local declarations, in which case the full prefix notation can be used for discrimination and the pseudo-variables this and super are available for selecting against the object or its superclass. This is most often useful in a constructor, e.g.,

    class Point
    {
        int x, y;

        Point (int x, y) { this.x = x; this.y = y; }
    }

Method invocations have an extra twist: overloading. A class may contain many method declarations with the same name as long as their arguments differ in type. Java uses the types of the actual arguments in the invocation to determine which version of the method to invoke. This also applies to constructors. Overloading is a powerful concept for hiding tedious detail that C programmers will need to get some familiarity with. Programmers in Algol 68, Fortran 90, Ada, C++ and so forth will find Java's overloading straight-forward.

Interfaces

Java does not provide multiple-inheritance: a class may only extend one other class. This saves a lot of confusion as to which members come from which superclasses, but it also limits the flexibility of the class structure. Java's solution to this problem is the interface. In essence, an interface is a set of methods that will be supported by any class that implements that interface - objects of disjoint class types may then be manipulated as if of the same interface type. An interface is similar to an abstract superclass.
An interface declaration is of the form:

    interface <InterfaceName> { <abstract-method-declarations> }

where abstract-method-declarations are method declarations with no bodies (but no abstract keyword either).
A class declaration using interfaces is of the form:

    class <ClassName> 
        [extends <ParentName>] 
        implements <InterfaceNames> 
    { <Body> }

where InterfaceNames are a comma-separated list of one or more interface names. Such a class must contain declarations for all methods and with the same argument types as declared in the interfaces named.

Arrays

Java provides one-dimensional arrays, but do not be deceived by the apparent similarity with C. Java arrays know their size and attempts to access elements outside the bounds of an array will meet with failure. Neither can you be sloppy about whether you have a pointer to one thing or a pointer to many things as in C: arrays and scalars are separate types and there are no conversions between them. (In fact, Java arrays form a new class that is hidden by the implementation, but this fact can normally be ignored.) Java arrays are not, however, as powerful as those provided by Algol 68 or Fortran 90.

Array types are described by appending square brackets to the element type, e.g., int[ ] or int[ ][ ].
(Aside: In a declaration, you can also use C-style syntax where the brackets go after the identifier.)

Arrays are created by an extension of the normal constructor:

 new int [100]

(but, annoyingly, this cannot be elided with a declaration, so int [100] x; is illegal).

The number of elements in any array can be found with the instance variable length as in array.length.

Arrays can be indexed to access an element (either to read it or write to it, but you can't hold onto the reference):

 x [n] = x [n+1]

All arrays have an origin or lower bound of 0, and hence an upper bound of .length-1.

Arrays can be copied with the clone() method which delivers a new independent reference holding the same element values. e.g.,

 int [ ] y = x.clone();

y can then be modified in isolation to x.

An array of chars is not a String although there exist methods to convert between the two.

Multi-dimensional arrays are in fact constructed as arrays of arrays of...
A rectangular array can be constructed in one operation:

 double [ ][ ] square = new double [3][3]

or an arbitrary shape array can be constructed in several operations:

    double [ ][ ] triang = new double [2][];
    triang [0] = new double [1];
    triang [1] = new double [2];

Statements

Java provides much the same control structures as C. Statements are terminated by semicolons and blocks are enclosed in braces. A block can be usually be used in place of a single statement.

Declarations

A declaration is of the form:

    <type> <name> [ = <initial-value>];

where, in contrast to C, initial-value is not restricted to a compile-time value. Java also allows variables to be referenced textually before its declaration: Java insists that local variables are assigned to before being referenced and this rule is used to prevent cyclic declarations such as int x = y, y = x; . Java also allows declarations and statements to be mixed within a block (Algol 68 programmers will like this!).
e.g.,

    { int x = f (1); push (x); int y = f (x); push (y); return x+y; }

Assignment

An assignment is of the form:

    <variable> = <value>

where variable can be a class, instance, or local variable or an array element and value may be any expression of the same type as the variable. (Aside: Java provides automatic 'widening' conversions between the value and variable, but not 'narrowing' conversions in contrast to C and Fortran.)

If statement

An if statement is of the form

    if (<expression>) 
        <statement>
    else
        <statement>

where expression must be a boolean expression and where the else and following statement are optional. Java, like C, suffers from the dangling else problem and both resolve the problem by deciding that an else part belongs to the closest if statement.

Java, like C, has a separate form for a conditional expression:

    <boolean-expr> ? <then-expr> : <else-expr>

which delivers the value of then-expr if boolean-expr evaluates to true and delivers the value of else-expr.
Hint: it is wise to always enclose the conditional expression in brackets to avoid confusion with operator binding.

Switch statement

A switch statement is of the form

    switch (<expression>)
    {
        case <labelexpr>:
            <statements>
            break;
        default:
            <statements>
            break;
    }

The switch expression must be of type int, short, byte or char. The labelexpr must be a compile-time constant. The statements in a case or default part are optional as is the break statement itself. As in C, a case part that doesn't end in a break statement 'falls through' to the next case part - this can be a source of error for the unaware - and the best stylistic use of this facility is to allow one group of statements to be entered via several case labels.

Loop statements

A for statement is of the form

    for (<init> ; <expression> ; <update>)
        <statement>

init is a list of statements executed before the loop is properly underway. Java allows declarations here which have scope of just the loop. expression (which must be of type boolean) is a test made on each cycle of the loop (including the first) for whether to continue or not. update is a list of statements executed on each round trip of the loop and statement is the body of the loop. Any or all of init, expression and update are optional and statement can be an empty statement (i.e., just the semicolon).
e.g.,

    for (int n = 0; n < array.length; n++)
        System.out.println (array [n]);

A while statement and do statement are of the form

    while (<expression>) <statement>
    do <statement> while (<expression>);

both of which repeatedly execute statement while expression (which must be of boolean type) is true. The difference is that the while statement performs the test before each iteration (and hence statement might not be executed at all) and the do statement performs the test after each iteration (and hence statement is always executed at least once).

The break statement allows control to jump out of a loop and the continue statement allows control to jump back to the start of the loop. They are written:

    break;
    continue;

Java also allows a label to be specified after a break or continue in which case control transfers out to the statement with that label. We will leave this to the reader to discover uses for.

Exceptions

Java's support for exception handling is very pleasant. It works on a similar model as that introduced by Ada. The Java Virtual Machine raises some exceptions such as ArrayStoreException while others are raised within Java code via the throw statement.

An exception is simply an instance of the class Exception or one of its descendants. Note that an exception object must be created before an exception can be thrown. It is common practice to define your own exception class to extend Exception and this provides an easy opportunity to package up additional information about why the exception was raised.
e.g.,

    class ParseException extends Exception { String offendor; ... }

and when we come to raise a ParseException we create a new object of this class and can specify a String offendor which might be the misunderstood text. This can then be read by the exception handler and used to direct the error recovery.

The throw statement takes an instance of an Exception and then control transfers outwards and upwards until an enclosing try statement is encountered.

    throw <exception>;

You can think of it as an unusual return statement.

The try statement is of the form:

    try <statement>
    catch (<exception> <X>) <handler-statement>
    finally <final-statement>

statement (usually a block) is the statement to be tried. There may be several catch parts, each of which defines a parameter X which is of type exception and which holds the actual exception object during processing of the handler-statement. The finally is a statement that will always be executed regardless of whether any exception was raised. It is optional.

Java makes the nice distinction between checked and unchecked exceptions: methods that can have a checked exception thrown out of them must declare this fact (Java beginners will find failure to do this a common compiler error, but easy to fix). Broadly speaking unchecked exceptions are things like trying to dereference a null pointer, while all user-defined exceptions are checked exceptions. So a method that can throw checked exceptions declares this after the argument list with the keyword throws followed by a list of exceptions, e.g.,

    void DoSomethingRisky () throws MyException, OhMyGodException

Hint: to catch any exception simply declare a catch part with the superclass Exception as the exception. You should investigate some of the standard methods provided with Exception - a useful one when debugging is printStackTrace().

Packages

Java uses packages to manage separate compilation units. A package is simply a collection of public classes that can be imported into other packages, though note that the standard Unix & PC toolset currently only allows one public class per package. The package name is given at the head of a Java source file simply as:

    package <my.package.name>;

Package names exist in a (world-wide) global namespace and it is conventional to include your Internet hostname as part of the full package name (in reverse order though), so I might name a package as uk.co.demon.occl-cam.fruits.

Packages and classes within them are named using the 'dot' notation, e.g., the package java.lang contains the class java.lang.String. You can always reference classes this way, but it is obviously clumsy and Java lets you import the classes for use without any further formality. The import statements should be at the head of the source file after the package name if present. You can either import a specific class:

    import java.io.DataStream;

or you can import all classes in a package 'on-demand':

    import java.io.*;

(java.io is a standard package of I/O classes).

Packages are principally used for distributing libraries of classes or for partitioning large projects. You don't have to use them as there is a default package if no name is supplied into which all class declarations go providing a simple global namespace local to your working directory.

Using Classes in familiar ways

Using Classes as Structures

A simple structure can be expressed in Java as a class with no methods and no inheritors.
For example, the C version might be:

    typedef struct point { int x, y; } point;

The Java version would be:

    class point {int x, y; }

Using Classes as Unions

Unions have to be constructed `inside-out' in Java - start with a superclass for the union as a whole and extend that class with the constituent members of the union.
For example, the C version might be:

    typedef struct { int x, y; } point;
    typedef struct { float x, y; } vector;
    typedef union { point p; vector v; } coord;

The Java version would be:

    abstract class coord { }
    class point extends coord { int x, y; }
    class vector extends coord { float x, y; }

Java is strong-typed and will not let you write a point and read a vector. You must either query what type of a coord that you have or include methods within the classes to handle the specific cases.

Note however that this approach does not provide for types which are constituents of more than one distinct union.

Using Classes as Procedures

Procedures need to be encapsulated in a class and it is often useful to define an interface (just as in C you might typedef your function for convenience of reference): a class can then implement that interface.
For example, the C version might be:

    typedef float *f (float) Function;
    float integrate (float lw, float up, int steps, Function f) { ... }
    float (float x) sin2x { return sin (2*x); }
        ....
        integrate (0.0, 0.4, 10, sin2x);

The Java version would be:

    interface Function { float (float) f; }
    class Integrator
    {
        public static float integrate (float lw, float up, int steps, Function f) { ... }
    }
    class sin2x implements Function
    {
        float (float x) f { return sin (2*x); }
    }
        ....
        Integrator.integrate (0.0, 0.4, 10, new sin2x);

Compare this to the Algol 68 version:

    PROC integrate = (REAL lw, up, INT steps, PROC (REAL) REAL f) REAL: ...
        ....
        integrate (0.0, 0.4, 10, (REAL x) REAL: sin (2*x))

But note how easily the Java version can be extended to provide a Function which has hidden state (e.g., the standard algorithm for computing normal variates).

What else does Java have to offer?

Standard alone programs can be written in Java as with any other programming language but the Java language is only part of the Java solution. There is a lot more than can be covered in this Bluffer's Guide, so here are a quick summary of some topics that you might like to investigate further:

API Packages: In addition to the standard java.lang, java.util and java.io packages, a wide range of portable APIs have been defined for Java for handling a variety of modern programming tasks. java.net provides support for sockets and so on. java.applet and java.awt provide support for GUI applications and web browser applets.
Native methods: Java methods cannot be naively linked to code from arbitrary other languages, but the toolkit includes javah which generates C headers for implementing Java methods in C (these are declared native in Java).
Dynamic binding: Java defers binding of class fields and methods until runtime. This allows new classes to be seamlessly bound into a running Java program (a spreadsheet could load a class to handle a new type of cell, the HotJava browser dynamically loads new classes to handle new types of embedded image). Far easier than DLLs or shared object libraries.
Simple dynamic polymorphism: Java will not let you convert pointers between arbitrary reference types. By manipulating the global superclass Object you can manipulate objects of any class in your program (although you can only really hold onto objects and pass them on). This gives you similar functionality to using void * in C, but with the security of full type-checking.

Java Resources

Your main stop for finding out more about Java has to be the JavaSoft web-site particularly the download page. Here you can find out what the latest developments are, find definitions of the Java APIs as they become available and find a wealth of documentation about the Java Language Specification, the Java Development Kit and the Java Virtual Machine.
Here are some other useful pointers:

IBM, OS/2 and Java

The Party line on Java

I have to say that Java is one of the most interesting languages I've seen for a long time. Not because of the facilities that it offers the programmer - in fact more because Java is feature-light! The designers of Java have chosen a good solid basis that is simple but powerful and then resisted the temptation to add bells and whistles to it. The phrase "tried and trusted" keeps springing to mind about most of Java's facilities. The result is a clean, dependable language, that forces you to design in reusability and extensibility from the start and imposes some discipline on you to make your code robust - and then checks that you have obeyed the rules where possible.

The development environment as provided by the JDK is still a little crude, but balance this against a good set of portable APIs to handle modern programming tasks (which no other major programming language that I am aware of can claim), integration with the new wave of Internet and intranet computing and it has to be a commercial winner.

How long this will last, who can tell - if the C++ crowd get their grubby paws on it, I may have something very different to say about Java 2.0!