nyweb

v2.00

nyweb is just another incarnation of Knuth's WEB, this time language-independent, notation based on XML, and heavily inspired by nuweb.

Essentially, it is a preprocessor, separating comment from content (the program itself), outputting both into a different file (or, more precisely, multiple files as specified). Other major features are parametrizable macros definition/replacement and conditional "translation". But, contrary to the previous implementations of WEB, nyweb does not pretend that the comments are usable as true documentation, therefore no prettyprinting is supported.

Rather, it is seen that the comments are valuable only as the programmer's tool, in the process of creation. Documentation can be created together with the program, though, simply as just another type of content, written in some of documentation formats (e.g. TeX (and derived formats), or html, or formats derived from xml), output to a separate file(s).

Instead of "autodocumentation"/prettyprinting, a different but major idea of WEB is brought into front in nyweb - notably, that it supports the natural way how programs are created. First, there is a rough description in human language - equivalent of which is in the freelance and unpublished comments - and the sketch of a skeleton of the program can be written at top level, replacing the key elements simply by their desctiption, to be written (here usually as macros) and included later. Then, these lower-level elements are written, possibly in the same manner: outlining their function and replacing the details again by inclusion marks, the body of which is to be written later. After the first version of the program is written in this top-to-down manner, it is not uncommon that new functionality and features are added, often influencing more parts of the original program, and not rarely requiring a different structure than the original program was written with. In (ny)web, elements of such new functionality is written as a separate "chapter" of a set of macros, included; thus grouping together functionality, but enabling to position these elements into separate parts of the program as needed. This partially replaces and augments the natural structures provided by many programming languages, but does this without need to sacrifice efficiency (as unnecessary "artificial" functions calls would).

nyweb aims to build also on an other aspect of modern programming. We have seen in the last years a growth in "intelligent" programmer's editors. They now can highlight syntax, fold functions or other structures, create lists of functions and other structures, autocomplete names of already written/used functions and variables etc. These editors doubtlessly increased efficiency of a programmer, especially working on a big project. However, these editors rely on recognition of the programming language syntax (basically they perform the first steps of compilation, "online"), therefore are language dependent, can be confused by new features of a language, have troubles in recognising "mixed" languages/projects - in short, they are rigid and fragile.

Therefore, nyweb is seen only as an intermediate step towards - or basis of - a new programmer's editor, providing a high degree of flexibility in defining structure by the programmer, while maintaining efficiency of the resulting compiled code. It would also provide markup needed for fast and efficient movement within sources and instant lookup of key program elements, fully under the control of programmer, not exhausting him with unnecessary details of similar markups (e.g. functions lists) created automatically. Also, the "interwoven documentation" can be easily accomplished in such editor, in a very similar manner to what hypertext editors/viewers provide. This is why xml was used as the underlying markup language, enabling to add an arbitrary number of tags and marks, even in a forward-compatible manner (simply ignoring tags unknown for a given version). However, it is clear, that such editor would be a major undertaking, far beyond the capabilities of the one-man-show nyweb currently is.

Usage of nyweb

nyweb is a command-line tool, being run as
nyweb inputfile.w [outputfile.txt]
  
Comments (i.e. all text outside emit tags - see below) are output into the output file. If not given as the second parameter, the output goes to standard output (console). Often, it is wise to use nul as output file, to flush the - for most uses unnecessary - comments. The content (real program) output files are given explicitly in the source .w file - see emit tag.

Errors and warnings as they occur during processing are printed to the stderr device. Upon occurence of the first error further input processing stops and the program exists with exit code 1. On success, the exit code is 0.

Basic technicalities of nyweb

nyweb currently reproduces much of nuweb's functionality, except prettyprinting/TeX support. However, before describing the tags themselves, a few words about the xml-related stuff:

Basic tags of nyweb

emit tag

This tag encloses the "real" program (or documentation or any other content), to be saved into a file. It has one attribute, file, to determine the filename to be output into:

<emit file="myprogram.c">
  for (i = 0; i &lt; 99; i++) {
    printf("Blahblah %d\n", i);
  }
</emit>

  
(Note the usage of &lt; instead of < in the example above 1 ). There may be any number of emit tags in a single input file, even emitting to the same file: the outputs are concatenated in the order they appear in the input file. However, emit tags cannot be nested.

One more feature is taken over from nuweb: the emitted files are during processing saved to a temporary file; and after the whole input file is processed, these temporary files are compared against older version of the same file. Only if a change occurs (or the file did not exist previously), the new file is overwritten over the old file; otherwise the temporary file is silently discarded. This feature enables to use tools such as make, which rely on the timestamp of a source file to determine whether it has changed or not.

dependencies attribute of emit tag - using this attribute, a file containing dependencies for the emitted file on its .w sources. This is intended to be an input to programs like make. The value of this attribute determines the dependency file name (traditionally, it is placed into .dep subdirectory and has the same name as the emitted file plus a .d suffix). If there are multiple emit tags targeting the same emitted file, the first which has a non-empty dependency tag determines the dependency file name.

macro tag

This tag encloses any content, which won't appear in any output file directly, but can be included either in comment or emit - and it can be included any number of times - see use tag below.

Macro has one required attribute, name, which uniquely identifies the macro:

<macro name="print blahblah">
  printf("Blahblah\n");
</macro>

  
Macros can be defined only outside any other functional tag (i.e. macro, emit or use). However, use tags can be nested into macros. Macros can be defined in any order, they don't need to be defined before the use tag where they are invoked.

Multiple macros with the same name can be defined, they are then concatenated in the order they appear in the source file. This ordering can be overriden explicitly by adding an order attribute to the macro, with an integer as the value. These macros are then concatenated in ascending order (macros of the same "order" again in their order of appearance), with those macros which don't have theg order attribute defined added to the end of chain of macros. For example:
<macro name="fruits">  Apple </macro>
<macro name="fruits">  Banana </macro>
<macro name="fruits">  Orange </macro>
  
when invoked as <use name="fruits"/> would appear as " Apple Banana Orange ", but if we define the same as
<macro name="fruits" order="20">  Apple </macro>
<macro name="fruits">  Banana </macro>
<macro name="fruits" order="10">  Orange </macro>
  
when invoked as <use name="fruits"/> would appear as " Orange Apple Banana ".

use tag

This tag is replaced by the content of macro with the same name tag. It can occur in comment area as well as inside emit tags.

Since v1.06, <use macro="xxx"/> is an alternative form of <use name="xxx"/> (this is en par with the alternative form of parameter invocation, <use param="xxx"/>).

htmlize attribute of use tag - this attribute can be used with a top-level use tag (i.e. such which is not embedded within a macro). It can have any non-empty value (e.g. htmlize="1"), the value itself is ignored. This attribute causes to output or emit &lt; instead of < and &amp; instead of &, which might be useful if these escaped characters are to be outputted into a HTML (or other xML) file.

Other tags

Parametrized macros: param tag

Parametrized macros are accomplished using the param tag. This tag has dual usage:

  1. inside the use element, it contains definitions of the parameter body
  2. within a macro definition, it is replaced by the respective definition when the macro is invoked from a use
    1. in this usage, it must be an empty tag, i.e. <param name="xxx"/>
    2. since v1.06, <use param="xxx"> is an alternative (alias) for <param name="xxx"/> (in fact, this is the recommended form now, as it lines up nicely with <use macro="xxx">).
In both cases, param has a required name attribute, as an identifier.

Both usage and processing of the param elements are very similar to the usage and processing of macros, except

It is not necessary to have all parameters used in a macro defined when the macro is invoked from a use. For the undefined params, a warning is issued, but processing continues.

Example:

<macro name="filled pie"> <param name="filling"/> pie</macro>
<emit file="menu.txt">
  <use name="filled pie"> <param name="filling">Cherry</param> </use>,
  <use name="filled pie"> <param name="filling">Apple</param> </use>,
  <use name="filled pie"> <param name="filling">Chocolate</param> </use>.
</emit> 
  
will create the following menu: Cherry pie, Apple pie, Chocolate pie.

Tables (lists): table and item tags, and table attribute in use tag

Tables (available since v2.0; in the "planned" features they were mentioned under an older name as lists) are rows of items, each item containing a source text. They also can be seen as sets of params, which can be used in expansion of an appropriately written macro, one row at a time, through use of that macro together with table.

Defining tables

Tables are defined one row at a time, through a table tag. Each row contains definition of items through item tag; both having a name attribute. Rows are mutually independent, and each can have a different set of items defined. The order of definition of items within a row is irrelevant.

Rows within the table can be ordered through order attribute of the table tag; this determines the order how they are used.

As tables are primary sources of text, their definition is always at the top level, same as with ordinary macro. The text items behave as macros, i.e. they can have embedded use.

Using tables

Tables are used through use of an appropriate macro. The use will have an additional table attribute, which determines the used table and indicates that this use shall be repeated for all rows of table. Upon each iteration of use, items of the given table row behave exactly as params with the same name as the given item. This simply recycles the param mechanism. The used macro then simply has these params embedded exactly as with "ordinary" params.

There still can be params defined within use, even if it has the table tag; and these params may have the same name as items in the table. When <param/> is encountered in the expanded macro, the current table row is scanned first for item of a given name, and if not found, the set of params defined in the use is scanned for a param of the same name. This provides a convenient mechanism for default value, if not defined within a table row.

As a planned feature, "derived tables" could be created and subsequently used using "filters" and "sorters", but that's upon further development.

A simple example

Framework (boring, but it needs to be done only once):

<emit file="source.h">
// global variables declarations
  <use name="external variables declaration template" table="global variables">
    <param name="type"><use name="default type"/></param>
  </use>

// function prototypes
  [...]
</emit>

<emit file="source.c">
// includes, defines, etc.
  [...]

// global variables
  <use name="global variables definition template" table="global variables">
    <param name="type"><use name="default type"/></param>
    <param name="init"> 0 </param>
  </use>


// functions etc.
  [...]
</emit>



<macro name="external variables declaration template">  extern <param name="type"/> <param name="var"/>;
</macro>

<macro name="global variables definition template">  <param name="type"/> <param name="var"/> = <param name="init"/>;
</macro>


<macro name="default type"> uint32_t </macro>

  

Then, whenever needed, we can "create" global variables at various places throughout the .w source simply by...

<table name="global variables"> <item name="type"> int </item>   <item name="var">  a </item> <item name="init"> 5 </item> </table>
[...]
We want this variable to be defined as first, for some allocation reasons; and we want it to be of the default type:
<table name="global variables" order="1">    <item name="var">  b </item> <item name="init"> -9 </item> </table>
[...]
We want this variable to be initialized with the default initializer:
<table name="global variables"> <item name="type"> char </item>   <item name="var">  c </item>  </table>

  

... and we get... (source.h):


// global variables declarations
    extern  uint32_t    b ;
  extern  int    a ;
  extern  char    c ;


// function prototypes
  [...]

  
(source.c):


// includes, defines, etc.
  [...]

// global variables
     uint32_t    b  =  -9 ;
   int    a  =  5 ;
   char    c  =  0 ;



// functions etc.
  [...]

  

Conditional processing: define, if/else and comment tags

This is similar to conditional compilation in many programming languages:
<define name="tropical"/>

<if defined="tropical">
  <macro name="fruit">Banana</macro>
  <macro name="fruit">Tangerine</macro>
  <macro name="fruit">Orange</macro>
<else/>
  <macro name="fruit">Cherry</macro>
  <macro name="fruit">Apple</macro>
</if>
  
As macros are invoked in a different order than they are placed in the input file, placing define and if tags inside macros might lead to a difficult interpretation of things, and is strongly discouraged (although not disabled, for the benefit of those who desperately want to shoot themselves into their foot). A warning is issued if a define is encountered for a symbol, which has already been tested by an if.

comment is just an another tag, which content is simply ignored:

<comment>
  This text is completely ignored.
  <macro name="fruit">This macro is completely ignored, too.</macro>
</comment>
  

comment-s and if/else-s can be freely nested to each other into any levels.

Input files nesting: include tag

As could be expected, include tag simply literally includes the file specified by the file attribute, during processing.
<include file="common subroutines.w"/>
  

Miscellaneous: nyweb tag

The nyweb tag is used for various ad-hoc functions, determined by the attribute.

Unused macros list

Macros unused yet can be printed using nyweb tag with list="unused macros" attribute. It is wise to place this tag at the end of the source .w file:
<nyweb list="unused macros"/>
  

List of (header) tags

All header tags (i.e. those enclosed within <hN> and </hN>) from the source .w file can be listed in parseable form together with file name and line number, into an output file with nyweb tag with list="headers" attribute. This allows to input this file into some of the editors/IDEs as an "error" output, to facilitate "jumping" at a given position in the .w sources. The location of this tag in the source .w file is arbitrary.
<nyweb list="headers" [file="filename"] />
  

Timing

Duration of processing of part of source file can be printed out, using a couple of nyweb tags with time="begin" and time="end" attribute. In the latter tag, a file attribute determines the output file name where the timing is written to. Two times are written, one for each processing pass. It is wise to place this couple of tags at the beginning and end of the source .w file:
<nyweb time="begin"/>
<nyweb time="end" file="timing.txt"/>
  
Note that the total of two times even in this case will be slightly less than the total duration of nyweb run - the time needed to check the temporary output file(s) against its older version (see emit tag) is not included.

Usage tips

nyweb and C

The necessity to use escape sequences for & and < gets pretty soon annoying, when writing in C. These two characters are used quite often as operators - < as less-than, less-or-equal and left-shift; & as bitwise and logical AND and the reference operator. As an additional annoyance, if the escaped versions are used in the .w source, this cannot be simply copy/pasted into a .c source say in a different project.

While there is no good solution to this problem (until a full-featured nyweb-friendly editor gets available, which is nowhere since now), it can be partially side-stepped by replacing the operators in question by predefined macros. These macros can then be stored in a "compatibility" header, which can be easily made be available in its C version too. Such header may then contain macro definitions such as (in .w notation):

    #define ANL &amp;&amp;
    #define AND &amp;
    #define OR  |
    #define ORL ||
    #define SHL &lt;&lt;
    #define SHR >>
    #define XOR ^

    #define LT  &lt;
    #define GT  >
    #define LE  &lt;=
    #define GE  >=
    #define EQ  ==
    #define NE  !=

    #define PTR &amp;


  
(while definitions for OR, SHR etc. are not necessary, they may be then used for "symmetry" reasons).

Planned features

Changelog


- v 1.01 - added a diagnostic dump of current escaped character upon errors in parsing escaped character

- v 1.02 - added input file "caching"

- v 1.03 - added <nyweb list="headers"> tag

- v 1.04 - slightly improved error checking for macro and use tags and more informative output printouts for the most common types of error
         - bug fixed (r1.w) - undefined tag in emit crashed nyweb


- v 1.05 - memory "leak" fixed: input file "cache" destroyed at end of program

- v 1.05 - params behaviour fixed, now param expansion (use) inside param is possible

- v 1.06 - added alternative tags:
         - "use macro" as an alias for "use name"
         - empty "use param" as an alias for empty "param" (i.e. param "invocation")

- v 1.07 (20101230)
         - added define_after_used check and corresponding warning; removed warnings for defines/if-s inside macros

- v 1.08 (20110331)
         - added htmlize attribute to use tag
         - added dependencies attribute to emit tag

- v 2.00 (20120510)
         - implemented "table" and related - see chapter "Tables and their use"
         - fixed line number printed with warnings/errors (was one off)