ODFDOM Tutorial Index > Creating Text Documents

Creating Text Documents Using ODFDOM

In this tutorial, you will learn to use the ODFDOM Toolkit API to take an XML file that describes movies and turn it into an OpenDocument text file. Here is a partial screenshot of what the result will look like:

Word processing document showing title, synopsis, and cast of Citizen Kane

The program is written to be run from the command line program; you invoke it with a command like this:

java -cp .:lib/odfdom-java-0.12.0-jar-with-dependencies.jar \ MakeMovieDocument inputfile outputfile

Variables

The program starts out by declaring variables that you need to accomplish the task. First, you need a variable to hold the name of the input file, a Document for the parsed XML, and an XPath object to allow you to easily access the information in the document.

String inputFileName; Document inputDocument; XPath inputXPath;

You will need quite a few variables for the output OpenDocument file. In order to understand these, you need to know how an OpenDocument file is structured. It’s actually a .zip format file that contains a content.xml file that holds the main content (in this case, the text), and a styles.xml file that holds some of the presentation information. (There are other files in the .zip, but they don’t concern us in this tutorial.)

The presentation information in the styles.xml file consists of the named styles such as “Heading 1,” “Default,” and the other styles names that appear in the word processor’s drop-down menu.

The content.xml file for a word processing document has all of the content as a child of an <office:text> element. The content.xml file also contains some presentation information; the automatic styles. These are styles that are automatically created when you click the bold or italic icons in the word processor.

Here, then, are the variables needed to process the output file. Notice the naming conventions: OdfOfficeStyles is the class that represents the <office:styles> element.

String outputFileName; OdfTextDocument outputDocument; OdfContentDom contentDom; // the document object model for content.xml OdfStylesDom stylesDom; // the document object model for styles.xml // the office:automatic-styles element in content.xml // this is here for the sake of completeness; this program doesn't use it OdfOfficeAutomaticStyles contentAutoStyles; // the office:styles element in styles.xml OdfOfficeStyles stylesOfficeStyles; // the office:text element in the content.xml file OfficeTextElement officeText;

This having been done, here is the main() method, which creates an application and runs it via the run() method:

public static void main(String[] args) { MakeMovieDocument app = new MakeMovieDocument(); app.run(args); } public void run(String[] args) { if (args.length == 2) { inputFileName = args[0]; outputFileName = args[1]; parseInputFile(); setupOutputDocument(); if (outputDocument != null && inputDocument != null) { cleanOutDocument(); addOfficeStyles(); processInputDocument(); saveOutputDocument(); } } else { System.err.println("Usage: MakeMovieDocument infile outfile"); } }

Parsing the Input File

This has nothing to do with ODF; it’s the standard opening and parsing of an XML file, but it’s here for the sake of completeness:

void parseInputFile() { DocumentBuilder builder = null; inputDocument = null; try { inputXPath = XPathFactory.newInstance().newXPath(); builder = DocumentBuilderFactory.newInstance().newDocumentBuilder(); inputDocument = builder.parse(inputFileName); } catch (IOException e) { System.err.println("Unable to read input file."); System.err.println(e.getMessage()); } catch (Exception e) { System.err.println("Unable to parse input file."); System.err.println(e.getMessage()); } }

Creating the Output Document

The setupOutputDocument() method starts by calling newTextDocument() to create an ODF text document from a template that is built into the library. Once you have the document, the method gets the the Document Object Model (a subclass of Document) for the content.xml and styles.xml file.

setupOutputDocument() then retrieves the automatic styles in content.xml and the named styles in styles.xml (or creates them if they don’t exist yet). It finishes by retrieving the <office:text> element from the content DOM. All of the headings and paragraphs that make up the document’s content will be children of this element.

void setupOutputDocument() { try { outputDocument = OdfTextDocument.newTextDocument(); contentDom = outputDocument.getContentDom(); stylesDom = outputDocument.getStylesDom(); contentAutoStyles = contentDom.getOrCreateAutomaticStyles(); stylesOfficeStyles = outputDocument.getOrCreateDocumentStyles(); officeText = outputDocument.getContentRoot(); } catch (Exception e) { System.err.println("Unable to create output file."); System.err.println(e.getMessage()); outputDocument = null; } }

Clearing Content from the Output Document

The templates included in the ODFDOM toolkit have content in them; a newly-created text document has a paragraph that contains no text. The cleanOutDocument() method gets rid of this paragraph, by repeatedly removing the first child of the officeText node until there are no more.

void cleanOutDocument() { Node childNode; childNode = officeText.getFirstChild(); while (childNode != null) { officeText.removeChild(childNode); childNode = officeText.getFirstChild(); } }

How Styles Work

You create a style with an OfficeStylesElement object. This object has a name property and a displayName property. (For details, see the previous tutorial.)

Each style belongs to a family. The family tells what kind of element this style is applied to. Styles for paragraphs or headings belong to the Paragraph family; styles for inline text belong to the Text family. (For all the family names, see details Chart, DrawingPage, Graphic, List, Paragraph, Presentation, Ruby, Section, Table, TableCell, TableColumn, TableRow, and Text.)

Within the style object are the style properties. These properties come in property sets, and a style can have properties from more than one set. In the output document, the heading that reads “The Cast” uses properties from the ParagraphProperties set to specify its margins. It uses properties from the TextProperties set to specify that it is italic. (For all the property set names, see details ChartProperties, DrawingPageProperties , GraphicProperties, HeaderFooterProperties, ListLevelProperties, PageLayoutProperties, ParagraphProperties, RubyProperties, SectionProperties, TableCellProperties, TableColumnProperties, TableProperties, TableRowProperties, and TextProperties).

Adding Styles

We add named styles to the styles.xml file in the addOfficeStyles() method. It starts off by retrieving the default paragraph style and setting it to 10 point.

void addOfficeStyles() { OdfStyleDefaultStyle defaultStyle; OdfStyle style; StyleParagraphPropertiesElement pProperties; StyleTabStopsElement tabStops; StyleTabStopElement tabStop; // Set default font size to 10 point defaultStyle = stylesOfficeStyles.getDefaultStyle( OdfStyleFamily.Paragraph); style.setProperty(OdfTextProperties.FontSize, "10pt");

International Issues

The italicized line in the preceding code does what we want, but only for documents that have Western fonts. For documents that might contain Asian or complex fonts (such as Hindi, Arabic, etc.) you would also like to set the FontSizeAsian and FontSizeComplex. Similarly, when setting FontWeight (for bold) or FontStyle (for italic), you will probably also want to set the FontWeightAsian, FontWeightComplex FontStyleAsian, and FontStyleComplex properties as well.

Since setting font weight and style and size are frequent occurrences, and since you really do want your documents to be international-friendly, the italicized line is replaced with this code:

setFontSize(defaultStyle, "10pt");

This is a call to one of the following three utility routines to make your life easier:

void setFontWeight(OdfStyleBase style, String value) { style.setProperty(StyleTextPropertiesElement.FontWeight, value); style.setProperty(StyleTextPropertiesElement.FontWeightAsian, value); style.setProperty(StyleTextPropertiesElement.FontWeightComplex, value); } void setFontStyle(OdfStyleBase style, String value) { style.setProperty(StyleTextPropertiesElement.FontStyle, value); style.setProperty(StyleTextPropertiesElement.FontStyleAsian, value); style.setProperty(StyleTextPropertiesElement.FontStyleComplex, value); } void setFontSize(OdfStyleBase style, String value) { style.setProperty(StyleTextPropertiesElement.FontSize, value); style.setProperty(StyleTextPropertiesElement.FontSizeAsian, value); style.setProperty(StyleTextPropertiesElement.FontSizeComplex, value); }

Named Styles

The addOfficeStyles() method adds several different styles to the styles.xml DOM. There are separate styles for a movie heading, cast heading, synopsis paragraph, and an entry (paragraph) in the cast member list. This last style will also need to have specify a tab stop with dots as a leader to separate the actor’s real name from the name of the character she portrays. Finally, the method creates an inline style for the rating stars; they need to be slightly smaller than the movie title font size.

I won’t present all of the method here lest your eyes glaze over. Instead, here is the code for setting up the style for paragraphs in the synopsis; the other styles use similar code.

The sequence you should follow is:

  1. Create the style with its internal name
  2. Set the display name (since this style will be available from a selection menu in a word processor)
  3. Set any properties for the style

When you create the style using newStyle(), it is automatically added to the list of styles. Note that the code that follows does not set the top and bottom margins, so they default to zero.

style = stylesOfficeStyles.newStyle("Synopsis_20_Para", OdfStyleFamily.Paragraph); style.setStyleDisplayNameAttribute("Synopsis Para"); style.setProperty(OdfStyleParagraphProperties.Border, "0.035cm solid #000000"); style.setProperty(OdfStyleParagraphProperties.Padding, "0.25cm"); style.setProperty(OdfStyleParagraphProperties.MarginLeft, "1cm"); style.setProperty(OdfStyleParagraphProperties.MarginRight, "1cm"); style.setProperty(OdfStyleParagraphProperties.TextIndent, "0.25cm");

The other style that is different from the others is the cast paragraph with its tab stop. The hierarchy of elements in the resulting XML is:

In this instance, you must explicitly create the OdfStyleParagraphProperties object. You didn’t have to do this when using style.setProperty(...), because that method automatically creates the <style:paragraph-properties> element for you.

// Paragraph with tab stop at 7.5cm with a // leader of "." This is used for the // cast list. style = stylesOfficeStyles.newStyle("Cast_20_Para", OdfStyleFamily.Paragraph); style.setStyleDisplayNameAttribute("Cast Para"); style.setStyleFamilyAttribute(OdfStyleFamily.Paragraph.toString()); // build hierarchy from "inside out" tabStop = new OdfStyleTabStop(stylesDom); tabStop.setStylePositionAttribute("7.5cm"); tabStop.setStyleLeaderStyleAttribute("dotted"); tabStop.setStyleLeaderTextAttribute("."); tabStop.setStyleTypeAttribute("right"); tabStops = new OdfStyleTabStops(stylesDom); tabStops.appendChild(tabStop); pProperties = new OdfStyleParagraphProperties(stylesDom); pProperties.appendChild(tabStops); style.appendChild(pProperties);

Setting up all your styles is the most tedious part of the process of creating an OpenDocument file; adding content is relatively less troublesome.

Processing the Input Document

Processing the input consists of grabbing all the <movie> elements, extracting the relevant sub-elements and adding the appropriate ODFDOM objects to the output document. The following method has a try/catch block to catch exceptions thrown by the subsidiary methods.

void processInputDocument() { NodeList movieList; movieList = inputDocument.getElementsByTagName("movie"); for (int i = 0; i < movieList.getLength(); i++) { try { processTitle(movieList.item(i)); processSynopsis(movieList.item(i)); processCast(movieList.item(i)); } catch(XPathExpressionException e) { e.printStackTrace(System.err); } } }

Inserting a Heading

The processTitle() method adds the movie’s title and star rating; the stars are in a OdfSpan object, since their style requires a smaller font size. Instead of using getElementsByTagName(), this method uses XPath to extract the information.

The general sequence for adding content to the output document is to create the appropriate ODF object and use addStyledContent() to add the content (the second parameter to the method) with the style specified as the first parameter. Th

void processTitle(Node movieNode) throws XPathExpressionException { String title; String rating; OdfTextHeading heading; OdfTextSpan stars; title = inputXPath.evaluate("heading/title", movieNode); rating = inputXPath.evaluate("heading/rating", movieNode); heading = new OdfTextHeading(contentDom); heading.addStyledContent("Movie_20_Heading", title + " "); stars = new OdfTextSpan(contentDom); heading.addStyledSpan("Star_20_Span", rating); officeText.appendChild(heading); }

Adding the Synopses

Similar code adds the synopsis for each movie; the method needs a loop to handle all the <para> elements in the <synopsis> element. The paragraphs are retrieved with an XPath expression that returns a NodeSet. Rather than create a separate variable for the <para>’s text content, processSynopsis() sets the style on the paragraph when it creates it. The method also extracts the text all at once with the expression shown in the italicized code.

void processSynopsis(Node movieNode) throws XPathExpressionException { NodeList paragraphs; OdfParagraph paragraph; paragraphs = (NodeList) inputXPath.evaluate("synopsis/para", movieNode, XPathConstants.NODESET); for (int i = 0; i < paragraphs.getLength(); i++) { paragraph = new OdfParagraph(contentDom, "Synopsis_20_Para"); paragraph.addContent( paragraphs.item(i).getFirstChild().getNodeValue()); officeText.appendChild(paragraph); } }

Processing the Cast

And this is more of the same. After adding a heading for the cast, an XPath expression retrieves all the <actor> nodes. Then a for loop processes each one, again using XPath to get the actor’s name and role. If the actor’s role is mentioned, then processCast() must add a tab to separate the name and role.

You can’t do this by putting a \t character into the output; ODF treats tabs and newlines as if they were just a blank (it “normalizes” them). The addContentWhitespace() method, used in the following code, will output a <text:tab> or <text:line-break> element when it encounters a tab or newline in the content.

void processCast(Node movieNode) throws XPathExpressionException { NodeList actors; OdfTextParagraph actor; String name; String role; OdfTextHeading heading; heading = new OdfTextHeading(contentDom, "Cast_20_Heading"); heading.appendChild(contentDom.createTextNode("The Cast")); officeText.appendChild(heading); actors = (NodeList) inputXPath.evaluate("credits/actors/actor", movieNode, XPathConstants.NODESET); for (int i = 0; i < actors.getLength(); i++) { actor = new OdfTextParagraph(contentDom, "Cast_20_Para"); name = inputXPath.evaluate("name", actors.item(i)); role = inputXPath.evaluate("role", actors.item(i)); actor.addContent(name); if (role != null && !role.equals("")) { actor.addContentWhitespace("\t" + role); } officeText.appendChild(actor); } }

Saving the Output Document

This is only one line of actual code, surrounded by error handling.

void saveOutputDocument() { try { outputDocument.save(outputFileName); } catch (Exception e) { System.err.println("Unable to save document."); System.err.println(e.getMessage()); } }

The Entire Program

You may download the files for this program.

Next: Creating Spreadsheet Documents using ODFDOM