Java StreamTokenizer Class
Last modified: April 16, 2025
The java.io.StreamTokenizer
class parses an input stream into tokens.
It can identify numbers, quoted strings, and various comment styles. This class
is useful for simple lexical analysis tasks.
StreamTokenizer
breaks input into tokens that are either words,
numbers, strings, or special characters. It provides methods to configure what
should be considered whitespace, comments, or word characters. The class is not
thread-safe.
StreamTokenizer Class Overview
StreamTokenizer
takes an InputStream
or Reader
as input. It categorizes tokens into types like TT_WORD
,
TT_NUMBER
, or TT_EOF
. The class provides methods to
configure token recognition rules.
public class StreamTokenizer { public StreamTokenizer(InputStream in); public StreamTokenizer(Reader r); public void resetSyntax(); public void wordChars(int low, int hi); public void whitespaceChars(int low, int hi); public void ordinaryChars(int low, int hi); public void ordinaryChar(int ch); public void commentChar(int ch); public void quoteChar(int ch); public void parseNumbers(); public void eolIsSignificant(boolean flag); public void slashStarComments(boolean flag); public void slashSlashComments(boolean flag); public void lowerCaseMode(boolean fl); public int nextToken(); public int ttype; public String sval; public double nval; public int lineno(); }
The code above shows key methods and fields of StreamTokenizer
.
The nextToken
method reads the next token. Token type is stored in
ttype
, with string and numeric values in sval
and
nval
.
Creating a StreamTokenizer
StreamTokenizer can be created from either an InputStream or Reader. The Reader version is preferred for character stream handling. The class needs configuration before use to define token recognition rules.
import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; import java.io.StreamTokenizer; public class Main { public static void main(String[] args) { try { // Create from Reader BufferedReader reader = new BufferedReader(new FileReader("data.txt")); StreamTokenizer tokenizer1 = new StreamTokenizer(reader); // Basic configuration tokenizer1.wordChars('a', 'z'); tokenizer1.wordChars('A', 'Z'); tokenizer1.whitespaceChars(' ', ' '); tokenizer1.whitespaceChars('\n', '\n'); tokenizer1.whitespaceChars('\r', '\r'); tokenizer1.whitespaceChars('\t', '\t'); System.out.println("StreamTokenizer created and configured"); reader.close(); } catch (IOException e) { e.printStackTrace(); } } }
This example shows how to create a StreamTokenizer from a Reader. We configure it to recognize letters as word characters and common whitespace characters. Always close the underlying stream when done. Configuration is essential before tokenizing.
Basic Tokenizing Example
The simplest use of StreamTokenizer reads tokens until end of file. Each call to
nextToken
returns the token type. The token value is stored in
either sval
(for words) or nval
(for numbers).
import java.io.StringReader; import java.io.StreamTokenizer; public class Main { public static void main(String[] args) { String input = "Hello 123 World 45.67"; StringReader reader = new StringReader(input); StreamTokenizer tokenizer = new StreamTokenizer(reader); try { tokenizer.parseNumbers(); while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) { switch (tokenizer.ttype) { case StreamTokenizer.TT_WORD: System.out.println("Word: " + tokenizer.sval); break; case StreamTokenizer.TT_NUMBER: System.out.println("Number: " + tokenizer.nval); break; default: System.out.println("Other: " + (char) tokenizer.ttype); } } } catch (Exception e) { e.printStackTrace(); } } }
This example tokenizes a simple string containing words and numbers. The
parseNumbers
method enables number recognition. The switch statement
handles different token types. TT_EOF indicates end of input stream.
Handling Quoted Strings
StreamTokenizer can recognize quoted strings when configured with
quoteChar
. The entire quoted string becomes a single token. Both
single and double quotes can be configured as quote characters.
import java.io.StringReader; import java.io.StreamTokenizer; public class Main { public static void main(String[] args) { String input = "Name 'John Doe' Age 25 City \"New York\""; StringReader reader = new StringReader(input); StreamTokenizer tokenizer = new StreamTokenizer(reader); try { // Configure single and double quotes as quote characters tokenizer.quoteChar('\''); tokenizer.quoteChar('"'); while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) { if (tokenizer.ttype == '\'' || tokenizer.ttype == '"') { System.out.println("Quoted string: " + tokenizer.sval); } else if (tokenizer.ttype == StreamTokenizer.TT_WORD) { System.out.println("Word: " + tokenizer.sval); } else if (tokenizer.ttype == StreamTokenizer.TT_NUMBER) { System.out.println("Number: " + tokenizer.nval); } } } catch (Exception e) { e.printStackTrace(); } } }
This example demonstrates handling both single and double quoted strings. The
quoted content is available in sval
. The quote character itself is
returned as the token type. This allows distinguishing between different quote
styles.
Customizing Token Recognition
StreamTokenizer provides extensive customization of what constitutes tokens. You
can define ranges of characters as word chars, whitespace, or ordinary chars.
The resetSyntax
method clears all previous settings.
import java.io.StringReader; import java.io.StreamTokenizer; public class Main { public static void main(String[] args) { String input = "user@example.com 192.168.1.1 #comment"; StringReader reader = new StringReader(input); StreamTokenizer tokenizer = new StreamTokenizer(reader); try { // Reset and customize syntax tokenizer.resetSyntax(); tokenizer.wordChars('a', 'z'); tokenizer.wordChars('A', 'Z'); tokenizer.wordChars('0', '9'); tokenizer.wordChars('@', '@'); tokenizer.wordChars('.', '.'); tokenizer.whitespaceChars(' ', ' '); tokenizer.commentChar('#'); while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) { if (tokenizer.ttype == StreamTokenizer.TT_WORD) { System.out.println("Token: " + tokenizer.sval); } } } catch (Exception e) { e.printStackTrace(); } } }
This example shows custom syntax configuration. We treat '@' and '.' as word
characters to handle emails and IPs. The '#' is a comment character. After
resetSyntax
, all characters are "ordinary" until configured.
Tracking Line Numbers
StreamTokenizer can track line numbers during parsing. The lineno
method returns the current line number. This is useful for error reporting in
source code processing.
import java.io.StringReader; import java.io.StreamTokenizer; public class Main { public static void main(String[] args) { String input = "First line\nSecond line\nThird line"; StringReader reader = new StringReader(input); StreamTokenizer tokenizer = new StreamTokenizer(reader); try { tokenizer.eolIsSignificant(true); while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) { if (tokenizer.ttype == StreamTokenizer.TT_EOL) { System.out.println("End of line " + tokenizer.lineno()); } else if (tokenizer.ttype == StreamTokenizer.TT_WORD) { System.out.println("Word at line " + tokenizer.lineno() + ": " + tokenizer.sval); } } } catch (Exception e) { e.printStackTrace(); } } }
This example demonstrates line number tracking. eolIsSignificant(true)
makes end-of-line markers significant. The lineno
method helps
identify token locations. This is valuable for compiler or parser implementations.
Handling Comments
StreamTokenizer supports both C-style (/* */) and C++-style (//) comments. The
slashStarComments
and slashSlashComments
methods
control this behavior. Comment content is skipped during tokenizing.
import java.io.StringReader; import java.io.StreamTokenizer; public class Main { public static void main(String[] args) { String input = "code /* comment */ more // line comment\nend"; StringReader reader = new StringReader(input); StreamTokenizer tokenizer = new StreamTokenizer(reader); try { tokenizer.slashStarComments(true); tokenizer.slashSlashComments(true); while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) { if (tokenizer.ttype == StreamTokenizer.TT_WORD) { System.out.println("Token: " + tokenizer.sval); } else if (tokenizer.ttype == StreamTokenizer.TT_EOL) { System.out.println("End of line"); } } } catch (Exception e) { e.printStackTrace(); } } }
This example shows comment handling in StreamTokenizer. Both comment styles are
enabled. The tokenizer automatically skips comment content. Only non-comment
tokens are returned by nextToken
.
Source
Java StreamTokenizer Class Documentation
In this article, we've covered the essential methods and features of the Java StreamTokenizer class. Understanding these concepts is crucial for text parsing and lexical analysis in Java applications.
Author
List all Java tutorials.