HTML Builder: A MS Word to Simple HTML Conversion Tool
HTML Builder is a VBA macro written for MS Word that allows for conversion of a Word document into an HTML document based on the Word styles in the original without adding the excessively verbose code that Word's "Save as Web Page" option produces. Needless to say, HTML Builder does not produce an exact replica of the document's formatting as the "Save as Web Page" option attempts to do. Instead, HTML Builder marks headers, paragraphs, italics, and so forth with simple HTML tags, allowing for the quick conversion of a Word document into simple HTML. It will also mark up bold, italic, and underline sections, as well as replace Unicode diacritics with entity references (though this aspect has some glitches). It also replaces the less-than (<), greater-than (>), and ampersand (&) symbols with entity references so that tags and entities included in the document converted will come out as text and not tags and entities. It does recognize and convert lists, but all lists are converted to unordered lists. The HTML Builder macro is in a prototypical, pre-beta stage and therefore is only available to involved participants of THDL and should not be widely distributed.
The conversion is based on the styles used within the document. HTML Builder first reads the list of styles in the document and presents this list to the user in a tabular form. On the left is the name of the tag and on the right is the suggested HTML tag to replace it. By double clicking on a specific style name, one can change tag associated with it or modify the tag by adding attributes. When the final conversion is run, every paragraph with that style is marked up with the designated tag, including any attributes, and the whole document is encased with a simple html header and foot. The Word document title, as last saved, is used for the <title> of the HTML page.
Instructions for Using HTML Builder
As long as one is using Word '97 or later or Mac:Word 2001, it is easy to use HTML Builder to create simple web pages. These are the step-by-step instructions:
- Download HTML Builder from the tools site and unzip it:
Download Windows Version | Download Mac Version
- Open the resulting Word document (HTMLBuilder.doc)
- Cut and paste the desired text or whole document into the HTMLBuilder.doc
- Save As a Word document but with the title for the new HTML document
- Press Ctrl + Alt + H to run the macro and a form with the list of styles will appear (see example below)
- Change tags associated with styles to suit one's needs by:
- Double-clicking on the style
- Changing the associated "Style's tag"
- Pressing the "Set tag" button
- Press the Convert button
- Save the document as text only with an '.html' extension
Example Images
Once one has copied the document into HTMLBuilder document and saved it with the desired title, one is ready to begin the conversion process. This is done by typing: Ctrl + Alt + H (no shift). Immediately, one will see a window like this:
By double clicking on a style, say "indented paragraph,ip" in the example. The style and its tages appear in the two entry lines below. If the tag is changed, then a new button appears to its right that says "Set Tag", as in the example below:
When one presses the "Set Tag" button, the tag for that style is recorded and the button disappears. As in the image below:
One may repeat this process until all the styles are associated with the desired tags. Once finished assigning tags, press the "Convert" button and the whole document will be converted using the specified tags. The document should then be saved as a text document with the extension '.htm' or '.html'. It is recommended that further editing of the HTML be done in a simple text editor or a web-page editor, rather than in Word.
HTMLBuilder (development stage not for public distribution)
Copyright 2002 Tibetan and Himalayan Digital Library
Programmed by Nathaniel Garson
Protected by the terms of the THDL Open Community License, Version 1.0.
