get_html_rinfo

#!/usr/bin/perl
#
# Program to extract the information from the HTML version of the IDL
# manuals (v5.6 and on) and IDL itself, to support IDLWAVE.
#
# This version supports IDL v6.1
#
# (c) 1999, 2000 Carsten Dominik <dominik@astro.uva.nl>
# (c) 2001-2004 J.D. Smith <jdsmith@as.arizona.edu>
#
# Requires the HTML documentation files distributed with IDL 6.0 or
# later, decompiled from HTMLHelp idl.chm with Microsoft's HTML Help
# Workshop, and massaged with fix_wp_anchors.pl
#
# Talks to the local version of IDL in order to get additional information.
#
# Call this program from the command line like this:
#
#      get_html_rinfo -path path/to/htmlfiles/ -idl /path/to/idl/executable
#
# This will scan the HTML, extract routine information, talk to IDL
# and write the following file, needed by IDLWAVE:
#
# idlw-rinfo.el:       Routine information for completion etc.

# The full manpage of this program is available with "perldoc get_html_rinfo".

# Commentary:
# ===========
#
# IDL currently contains more than 1400 functions, procedures and
# object methods with more than 6500 keywords.  In order to support
# writing IDL programs, the IDLWAVE mode uses a list of routines and
# keywords.  RSI does not provide such a list in machine-readable
# form.  Therefore this program embarks on the task of extracting the
# necessary information from available sources - the IDL manuals which
# are supplied by RSI in HTMLHelp (and thence HTML) format.
# get_html_rinfo works by looking for the "Syntax" sections and
# extracting information from there.
#
# The program extracts information from manuals written by humans (the
# documentation department at RSI).  Naturally, such documents are not
# always complete, consistent or free of typos.  As a result of this,
# the list of routines and keywords extracted from the manuals will
# not be perfect.  In order to account for incorrect or inconsistent
# "Syntax" entries in the manual, get_html_rinfo contains a number of
# special matchers which detect specific entries and correct them
# manually.  See the definition of %specials in the BEGIN block.  When
# a new version of IDL is released, the actions of these special
# matchers needs to be checked, because the involved syntax entries
# may have changed.
#
# Please contact the maintainer if you find any inconsistencies between
# the routine information supplied by IDLWAVE, and the IDL documentation.
#
# Full source-guided online help, via the decompiled HTML help files,
# is available for use in many different browsers and help
# systems. See the IDLWAVE documentation for details.
#
# Acknowledgement:
# ================
#
# Without Perl, the task of reverse-engineering thousands of pages of
# documentation would have been impossible.  With Perl, it only takes
# a small (ok, medium-sized, bordering on large) program like this.
# Thanks to Larry Wall and the Perl community.
#
# Thanks to Mark Goosman & Stephanie Staley from RSI for granting me
# permission to extract and distribute routine information from the
# IDL manual, and to Doug Dirks for suggesting the use of decompiled
# HTML help instead of the (infinitely more complicated) PDF help used
# previously, and for patiently fixing the documentation typos this
# routine finds.
#
# Maintainer Information:
# =======================
# When a new version of IDL is published, the following things might need
# work in order to make the result of this program as good as possible.
# To find the corresponding places in the file, search for "UPDATE".
#
# 1. UPDATE: Manual sections
#    -----------------------
#    Each Routine description has many sections, only a few of which
#    are of interest for routine scanning.  The list can be corrected
#    when necessary.  Right now, the Syntax and Keywords sections are
#    being parsed.  This relies explicitly on the HTML "heading"
#    syntax upon which the HTML files are split into sections.  This
#    syntax should be verified to continue to be operational.
#
# 2. UPDATE: Special matchers
#    ------------------------
#    The %specials hash contains all the special matchers which fix
#    unusual or incorrect routine descriptions.  They change the entry
#    so that the parser does the right thing, or they add entries to
#    special arrays.  The first thing to do with a new version is to
#    run get_html_rinfo with the -debug flag.  Part of the output will
#    be a list of all special matchers, with additional info how often
#    the matcher matches, and if it did its actions correctly.
#    Matchers which no longer match should be checked.  Matching, but
#    not acting may be due to the fact that RSI fixed that particular
#    entry.  It is also possible that new special matchers have to be
#    written for new entries - you need to check what's new with the
#    IDL version and if it is processed correctly.  Read the
#    documentation just before the definition of the %specials hash to
#    find out how such a matcher must behave.
#
# 3. UPDATE: Special sections matchers & parsers
#    -------------------------------------------
#    Certain keywords are linked to a separate, special section, apart
#    from the routines which use those keywords.  Examples include
#    Graphics Keywords, Multi-Threading Keywords, Device Keywords, and
#    system variables.  The %special_sections hash names these
#    sections as keys, and two routines as values: one to detect
#    whether a given html file is this section (matcher), and the
#    other parse its keywords (parser).  These special sections are
#    scanned for keywords, and linked to from other routines for the
#    with the relevant keywords.
#
# 4. UPDATE: Statement regexp
#    ------------------------
#    This is also a special matcher which makes sure that no routine
#    info is produced by the syntax entries for IDL language
#    constructs.  As RSI adds new statements (as COMPILE_OPT in
#    version 5.3 or SWITCH/BREAK/CONTINUE in version 5.4), this regexp
#    must be extended to match the new statements as well.
#
# 5. UPDATE: Special help topics
#    ---------------------------
#    Some words which can show up in IDL source code have special help
#    topics (not routine names) associated with them.  The
#    %special_topics hash links downcase versions of these words to
#    the appropriate topics.  Even if word and topic are the same, it
#    must be mentioned here in order to trigger help on this word.
#
# 6. IMPONDERABLES: Who knows?
#    -------------------------
#    RSI has shown a propensity for making large changes to the
#    overall documentation format which will impact how this program
#    runs.  Sadly, this may require you to understand more of what it
#    does than you'd otherwise be willing to.  An example: with
#    IDLv6.0, all object method GetProperty, SetProperty, and Init
#    keywords were consolidated into a "Class Properties" page.
#
# For more detailed info, run `get_html_rinfo' with the `-debug' flag
# and check the resulting files get_html_rinfo.cpl,
# get_html_rinfo.rej, and get_html_rinfo.log.  The .cpl "complaint"
# file can be sent to the RSI documentation group for fixing any
# problems detected.
#
# The most important data structure is the hash %e, which collects all
# of the various routine data and cross-linking information.  It's
# structure is.
#
# $e{class}{type}{routine}  # The class, type (fun | pro) and routine name
#                  {"file"} # The file where this routine's syntax is found
#                  {"kwds"} # (Hash) Keyword Name => Link
#                  {"Get"}  # (Array) List of "get" keywords
#                  {"Set"}  # (Array) List of "set" keywords
#                  {"call"} # The calling sequence
#                  {"add_kwds"} # (Array) of refs to:
#                             \-> [needs, # needs or has keywords
#                                  name,  # routine to link to
#                                  type,  # type of routine to link to
#                                  class, # class of routine to link to
#                                  \@nokeys,# ref to key list to exclude
#                                  special, # Use routine's special kwds
#                                  get_set] # Use routine's get or set kwds
#                  {"has_special_section_kwds"}   # Has special kwds to link
#                  {"needs_special_section_kwds"} # Needs special kwds linked
#                  {"extra"} # Hash of extra routines with kwds to link to
#                          \-> {link_file}
#                                {"kwds"} # (Hash) Kwd => Link
#                                {"special"} # Special sec. linked to (if any)
#
# The data which actually make it into the rinfo file include class,
# type, routine,"kwds" with links, and the "extra" file and their kwds
# with links.  Everything else is for internal linking.
#
#============================================================================
# This file is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2, or (at your option)
# any later version.
#
# This file is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this file; see the file COPYING.  If not, write to the
# Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
# Boston, MA 02110-1301, USA.
#============================================================================

require 5.004;
use Data::Dumper;
# Parse command line options and make file names
use Getopt::Long;
GetOptions("-debug"     => \$debug,
	   "-path=s"    => \$path,
	   "-idl=s"     => \$idl,
	   "-xname=s"   => \$ignore_name_re,
	   "-xclass=s"  => \$ignore_class_re)
  or usage();

$emacs=1;

my (%special_topics,%e,%special_sections,%sysvars,%txt_kwds,%properties,
    %executive_commands,@syntax_kwds,@rejects,@complaints,@add_keywords,
    @enter);

if (@ARGV) {
  # Something on the command line
  print STDERR "Unrecognized command line args: @ARGV\n";
  exit(1);
}

$idl = $idl || "idl";

# Establish default output file names
$rinfofile  = "idlw-rinfo.el";
$topicsfile = "idlw-help-topics.el";

# Check the path
$path ||= ".";
die "Invalid path $path\n" unless !$path || -d $path;

opendir(DIR,$path) or die "can't opendir $path: $!";
@files=grep {-r "$path/$_" && /\.html$/} readdir(DIR);
closedir(DIR);

# Open the REJECT and LOG files file for debugging information
if ($debug) {
  open REJECT,  ">get_html_rinfo.rej" or
    die "Cannot write to get_html_rinfo.rej\n";
  open COMPLAIN,">get_html_rinfo.cpl" or
    die "Cannot write to get_html_rinfo.cpl\n";
  open LOG,     ">get_html_rinfo.log" or
    die "Cannot write to get_html_rinfo.log\n";
}

# Open the lisp file for output
open RINFO,">$rinfofile" or die "Cannot open $rinfofile for writing: $!";

# Scan all of the files
FILE:
foreach $file (@files) {
#  next unless $file=~/^CDF.*\.html/;
  open(FILE,"$path/$file");
  local $/=undef;			#Slurp mode
  $file_contents=<FILE>;

  # Title
  if ($file_contents=~m|<title>\s*([^<]+?)\s*</title>|) {
    $title=$1;
   } else {
    $title="";
  }

  # Check for special topic sections, which have no keywords (like
  # if...then...else)
  foreach (keys %special_topics) {
    if (!defined $special_topics{$_}{file} &&
	$title=~m|^\Q$_\E$|i) {
      $special_topics{$_}{file}=$file;
      next FILE;
    }
  }

  # Save files with upcase titles, for tracking classes
  $files{uc $title}=$file;

  # Check if file contains one of the various special keyword
  # sections (Graphics, etc.), and parse those keywords.
  foreach (keys %special_sections ) {
    if (&{$special_sections{$_}{matcher}}) { #Is it a special section?
      #Parse the section for keywords
      %{$special_sections{$_}{kwds}}=&{$special_sections{$_}{parser}};
      $special_sections{$_}{file}=$file;
      next FILE;
    }
  }

  # Look for system variable definitions
  if ($title=~/System Variables$/) {
    parse_heading($file_contents,2,sub{
		    my ($link,$sysvar,$txt)=@_;
		    $sysvar=~s/<.*?>//g;
		    $sysvar=~s/\s*System Variables?$//;
		    $sysvar=~tr/!//d;
		    $sysvar=uc $sysvar;
		    foreach $sysvar (split(/,\s*/,$sysvar)) {
		      %{$sysvars{$sysvar}{tags}}=
			parse_heading($txt,3,sub{split(/,\s*/,$_[1]);})
			  if $txt=~/structure/i;
		      $sysvars{$sysvar}{main}=$link?"$file#wp$link":$file;
		    }
		  });
    next FILE;
  }

  # Look for "Properties" pages for extra class props (new as of IDL v6.0)
  if ($title=~/^\s*([A-Za-z_0-9]+)\s+Properties\s*$/) {
    my $prop_class=$1;
    parse_heading($file_contents,3,sub{
		    my ($link,$prop,$txt)=@_;
		    return if $prop=~/^\s*$/;
		    foreach $prop (split(/,\s*/,$prop)) {
		      $prop=~s/ *$//;
		      $properties{$prop_class}{kwds}{$prop}{link}=$link;
		      foreach (qw(Get Set Init)) {
			if ($txt=~ m{$_:(?:<.*?>\s*)*\s*Yes}i) {
			  $properties{$prop_class}{kwds}{$prop}{$_}=1;
			} else {$properties{$prop_class}{kwds}{$prop}{$_}=0};
		      }
		    }
		  });

    # Maybe no properties were found: we must still delete PROPERTY kwds
    $properties{$prop_class}{stub}++ unless defined $properties{$prop_class};
    $properties{$prop_class}{file}=$file;
    next FILE;
  }

  # Look for a version string
  if(!$idlversion &&
     $file_contents=~m|<a[^>]*>[^<]*?New Features in IDL ([0-9.]+)\s*</a>|) {
    $idlversion=$1;
    next FILE;
  }

  # Normal entries: Split into the component parts
  @parts=split(m|^\s*<a\s+name="wp[0-9]+">\s+</a>\s*<h3\s+class="p?Heading2">\s+([^\n\r]+)\s+|mi,$file_contents);
  shift @parts;
  %parts=@parts;
  
  # Extract the Syntax and Keyword parts
  $syntax=$kwds="";
  foreach $part (keys %parts) {
    # Require a proper heading
    if (!$syntax && $part=~/^Syntax/) {
      $syntax=$parts{$part};
    } elsif ($part=~/^((?:[A-Z][A-Za-z0-9_]+\s+){0,2}Keywords|
                       Keywords:\s*(?:[A-Z][a-z0-9_]+\s*){1,2})\s*$/x) {
      $kwds.=$parts{$part}
	unless $1=~/Thread Pool Keywords/; # Nothing useful in TPool
    }
  }
  next FILE unless $syntax; # Normal entries require a syntax section

  diag("$file($title):>>>>>>>>>>>>>>>>>>\n");

  # Clear a few variables which are used by deeper routines to return stuff
  @rejects = @complaints = @add_keywords  = @enter = ();
  clean_up_syntax();

  # Apply the special syntax matchers
  $old_syntax=$syntax;
  &try_specials();

  # See if there are reasons to reject or complain about this entry
  if (@rejects) {
    # Reject
    $n_rejections += scalar(@rejects);
    reject($syntax,$file,@rejects);
    next FILE;
  }

  # Try to parse out the name, class, calling sequence, and syntax keywords
  unless (parse_syntax($syntax)) {
    # Make a note that this section could not be parsed and move on
    # to the next file.
    reject($syntax,$file,("Could not be parsed"));
    next FILE;
  }

  # Parse the text keywords, and compare to syntax keywords.
  @syntax_kwds=make_unique(@syntax_kwds);
  %txt_kwds=parse_keywords($kwds);
  #diag("GOT KWDS SECTION:\n  >>>$kwds\n<<<\n\n");

  # Complain strenuously
  if (@complaints) {
    # Keep, but complain
    $n_complaints += scalar(@complaints);
    complain($old_syntax,$file,@complaints);
  }

  diag("\n$file($title):<<<<<<<<<<<<<<<<<<\n\n\n");

  # Normalize the class/routine cases
  if ($class) {
    $class = case_name("class",$class);
    $name = case_name("method",$name);
  } else {
    $name  = case_name("routine",$name);
  }

  my $fname=make_full_name($class,$name);

  # If the special matchers have not set @enter, do it here
  $enter[0] = [$name,$type,$class,$call] unless @enter;

  # Write a message for debugging output
  diag(sprintf("%-20s %-15s %-20s with %3d keywords\n",
	       $type,$class,$name,scalar(keys %txt_kwds)));

  # Store the stuff.
  foreach $a (@enter) {
    my $name  = $$a[0];
    my $class = $$a[2];
    my $type  = $$a[1];
    if ($ignore_name_re && $name =~ /$ignore_name_re/o) {
      diag("Ignoring name $name because of -xname option\n");
      $ignore_name_cnt++;
      next;
    }
    if ($ignore_class_re && $class =~ /$ignore_class_re/o) {
      diag("Ignoring name $name because of -xclass option\n");
      $ignore_class_cnt++;
      next;
    }
    $e{$class}{$type}{$name}{file}=$file;
    %{$e{$class}{$type}{$name}{kwds}}=%txt_kwds if %txt_kwds;
    push @{$e{$class}{$type}{$name}{Get}},@getkwds if @getkwds;
    push @{$e{$class}{$type}{$name}{Set}},@setkwds if @setkwds;
    $e{$class}{$type}{$name}{call} = $$a[3];

    # Add keywords from other routines
    push @{$e{$class}{$type}{$name}{add_kwds}},@add_keywords if
      @add_keywords;

    # Special section links
    my @special_section_has=grep {$has_special_section_keywords{$_} eq "has"}
      (keys %has_special_section_keywords);
    push @{$e{$class}{$type}{$name}{has_special_secs_kwds}},
      @special_section_has if @special_section_has;

    my @special_section_needs=
      grep {$has_special_section_keywords{$_} eq "needs"}
	(keys %has_special_section_keywords);
    push @{$e{$class}{$type}{$name}{needs_special_secs_kwds}},
      @special_section_needs if @special_section_needs;

  }
  close(FILE);
}

# We have all the information now in one huge hash.  A few things
# still need to be fixed...

# 1. The IDL manual for object methods GetProperty, SetProperty & Init
#    include no keywords, which are documented on a special
#    "Properties" page.  These were collected during the scan - here
#    we put them into the right space, and remove the vestigial
#    "PROPERTY" keyword in the three methods
diag("****ALL PROPERTIES: \n".Dumper(%properties)."\n");

foreach $class (keys %properties) {
  my $case_class=case_name("class",$class);
  my $link_file=$properties{$class}{file};
  next unless defined $e{$case_class};
  foreach $type (qw(Get Set Init)) {
    my $t_name=$type eq "Init"?"fun":"pro";
    my $method_name=
      case_name("method",$type eq "Init"?$type:($type."Property"));
    next unless defined $e{$case_class}{$t_name}{$method_name};
    delete $e{$case_class}{$t_name}{$method_name}{kwds}{PROPERTY};
    diag("DELETING SPURIOUS PROPERTY KWD: $case_class,$t_name,$method_name\n");
    next if defined $properties{$class}{stub};
    $e{$class}{$typ}{$method_name}{extra}{$link_file}{special}=
      "$class Properties";
    foreach $kwd (keys %{$properties{$class}{kwds}}) {
      next unless $properties{$class}{kwds}{$kwd}{$type};
      $e{$case_class}{$t_name}{$method_name}{extra}{$link_file}{kwds}{$kwd}=
	$properties{$class}{kwds}{$kwd}{link};
    }
  }
}

foreach $class (keys %e) {
  next unless $class;
  $iname = case_name("method","Init");
  next unless defined $e{$class}{fun}{$iname};
  $ifile = $e{$class}{fun}{$iname}{file};
  foreach $getset ("Get","Set") {
    $pname = case_name("method",$getset."Property");
    if (defined $e{$class}{fun}{$iname} &&
	@{$e{$class}{fun}{$iname}{$getset}}) {
      foreach (@{$e{$class}{fun}{$iname}{$getset}}) {
	$e{$class}{pro}{$pname}{extra}{$ifile}{kwds}{$_}=
	  $e{$class}{fun}{$iname}{kwds}{$_};

      }
    }
  }
}

# 2. The special syntax matchers may have indicated that some
#    routine's keywords are linked to special sections (like Graphics
#    Keywords), or that keywords from special sections weren't
#    mentioned in the syntax, and need to be added outright (like
#    MultiThreading).  Do this now.
foreach $class (keys %e) {
  foreach $type (keys %{$e{$class}}) {
    foreach $name (keys %{$e{$class}{$type}}) {
      # Needs all the special section keywords added outright
      if (defined($e{$class}{$type}{$name}{needs_special_secs_kwds})) {
	foreach my $s (@{$e{$class}{$type}{$name}{needs_special_secs_kwds}}) {
	  unless (defined($special_sections{$s})) {
	    diag("No such special section: $s.");
	    next;
	  }
	  my $sfile=$special_sections{$s}{file};
	  %{$e{$class}{$type}{$name}{extra}{$sfile}{kwds}}=
	    %{$special_sections{$s}{kwds}};
	  $e{$class}{$type}{$name}{extra}{$sfile}{special}=$s;
	}
      }
      # Already mentions keywords, just needs links
      if (defined($e{$class}{$type}{$name}{has_special_secs_kwds})) {
	foreach my $s (@{$e{$class}{$type}{$name}{has_special_secs_kwds}}) {
	  unless (defined($special_sections{$s})) {
	    diag("No such special section: $sec.");
	    next;
	  }
	  my $sfile=$special_sections{$s}{file};
	  foreach (keys %{$special_sections{$s}{kwds}}) {
	    # Skip if keyword not defined in routine, or already linked
	    next if(!defined($e{$class}{$type}{$name}{kwds}{$_}) or
		    $e{$class}{$type}{$name}{kwds}{$_});
	    delete $e{$class}{$type}{$name}{kwds}{$_}; #Move to extra kwds
	    $e{$class}{$type}{$name}{extra}{$sfile}{kwds}{$_}=
	      $special_sections{$s}{kwds}{$_};
	    $e{$class}{$type}{$name}{extra}{$sfile}{special}=$s;
	  }
	}
      }
    }
  }
}

# 3. Some special matchers may have put in a request to add or link
#    keywords of one more routines (or maybe just special section
#    keywords from a routine) to another.  Since now we know all
#    keywords of all routines and their origins (special sections,
#    etc.), we can do this.

foreach $class (keys %e) {
  foreach $type ("fun","pro") {
    foreach $name (keys %{$e{$class}{$type}}) {
      $n_routines_total++;

      next unless defined($e{$class}{$type}{$name}{add_kwds});

      foreach my $add (@{$e{$class}{$type}{$name}{add_kwds}}) {
	my (%nokeys,%only_keys,$file_to_add,$keys_to_add,$special_to_add);
	my ($has,$aname,$atype,$aclass,$only_keys,$nokeys,
	    $special_sec,$getset)=@$add;
	# Does it already have the keywords, just requiring a link
	$has=$has eq "has";

	# Check if we didn't know the routine type at the time of
	# addition; default to "pro" if it exists.
	unless ($atype) {
	  if (defined($e{$aclass}{"pro"}{$aname})) { $atype="pro" }
	  elsif (defined($e{$aclass}{"fun"}{$aname})) { $atype="fun"};
	}

	# Is it a actual existing entry being asked for?
	next unless defined($e{$aclass}{$atype}{$aname});
	foreach (@$nokeys) {$nokeys{$_}++;}
	foreach (@$only_keys) {$only_keys{$_}++;}
	
	my $afile=$e{$aclass}{$atype}{$aname}{file};

	# Look for "2 degree of separation" special section keywords
	# only (e.g. "graphics keywords accepted by PLOT")
	if($special_sec and defined($e{$aclass}{$atype}{$aname}{extra})) {
	  foreach $file (keys %{$e{$aclass}{$atype}{$aname}{extra}}) {
	    if($e{$aclass}{$atype}{$aname}{extra}{$file}{special} =~
	       /$special_sec/) {
	      $keys_to_add=$e{$aclass}{$atype}{$aname}{extra}{$file}{kwds};
	      $file_to_add=$file;
	      $special_to_add=
		$e{$aclass}{$atype}{$aname}{extra}{$file}{special};
	      last;
	    }
	  }
	} else {
	  # Otherwise, add from the set of regular keywords
	  if ($getset) {
	    my @getsetkeys;
	    push @getsetkeys, @{$e{$aclass}{$atype}{$aname}{Get}}
	      if ($getset=~/get/i);
	    push @getsetkeys, @{$e{$aclass}{$atype}{$aname}{Set}}
	      if ($getset=~/set/i);
	    $keys_to_add={map {$_ => $e{$aclass}{$atype}{$aname}{kwds}{$_}}
			  @getsetkeys};
	  } else {
	    $keys_to_add=$e{$aclass}{$atype}{$aname}{kwds};
	  }
	}
	$file_to_add=$file_to_add || $afile;

	# Actually add the keys (moving unlinked ones to extra if necessary)
	foreach (keys %$keys_to_add) {
	  next if (@$only_keys && !$only_keys{$_}) || $nokeys{$_};
	  if($has) {
	    #demand that it exists in kwds already
	    # if so, move and link it in the extra kwds section
	    next if !defined($e{$class}{$type}{$name}{kwds}{$_}) or
	      $e{$class}{$type}{$name}{kwds}{$_}; # Leave an existing link
	    delete $e{$class}{$type}{$name}{kwds}{$_}; # Move to extra keywords
	  }
	  $e{$class}{$type}{$name}{extra}{$file_to_add}{kwds}{$_}=
	    $$keys_to_add{$_};
	  $e{$class}{$type}{$name}{extra}{$file_to_add}{special}=
	    $special_to_add if $special_to_add;
	}
      }
    } continue {
      $n_keywords_total += scalar(keys %{$e{$class}{$type}{$name}{kwds}})
	if defined $e{$class}{$type}{$name}{kwds};
      if(defined($e{$class}{$type}{$name}{extra})) {
	foreach (keys %{$e{$class}{$type}{$name}{extra}}) {
	  $n_keywords_total +=
	    scalar(keys %{$e{$class}{$type}{$name}{extra}{$_}{kwds}});
	}
      }
    }
  }
}

#print Dumper(%e),"\n";
#print Dumper(%special_sections),"\n";
diag("SYSVARS:\n".Dumper(%sysvars)."\n");

# Print debug information about how often each special matcher matched.
# Will only be visible in debugging mode.
diag("SPECIAL MATCHERS-------------------------------matched success failed\n");
foreach $key (sort ignoring_case keys %specials) {
  diag(sprintf("%-50s %3d    %3d    %3d\n",
	       $key,$special_matcnt{$key},$special_actcnt{$key},
	       $special_matcnt{$key}-$special_actcnt{$key}));
}
diag(sprintf("\nProblematic entries: %d rejected, %d complains.\n",
	     $n_rejections,$n_complaints));


# Write the lisp file
# Also write some statistics to STDERR
write_rinfo_header();
print RINFO "(defconst idlwave-system-routines\n";
print RINFO "  '(\n";
printf STDERR "\n Nr Class                            Npro  Nfun Ntot  Nkwd\n";
printf STDERR   "----------------------------------------------------------\n";
$classcnt = -1;
foreach $class (sort ignoring_case keys %e) {
  $npro = scalar(keys %{$e{$class}{"pro"}});
  $nfun = scalar(keys %{$e{$class}{"fun"}});
  $nkwd = 0;
  foreach $type ("pro","fun") {
    foreach $name (keys %{$e{$class}{$type}}) {
      $nkwd += scalar keys %{$e{$class}{$type}{$name}{kwds}}
	if defined($e{$class}{$type}{$name}{kwds});
      if(defined($e{$class}{$type}{$name}{extra})) {
	foreach $file (keys %{$e{$class}{$type}{$name}{extra}}) {
	  $nkwd+= scalar keys %{$e{$class}{$type}{$name}{extra}{$file}{kwds}};
	}
      }
    }
  }
  $nprotot += $npro;
  $nfuntot += $nfun;
  $nclass++;
  printf STDERR "%3d %-32s %4d %4d %5d %5d\n",
    ++$classcnt,$class,$npro,$nfun,$npro+$nfun,$nkwd;
  foreach $type ("pro","fun") {
    foreach $name (sort ignoring_case keys %{$e{$class}{$type}}) {
      print RINFO "    " . make_lisp_reader_string($class,$type,$name) . "\n";
    }
  }
}

print STDERR "-" x 58,"\n";
printf STDERR "Total                                %4d %4d %5d %5d\n",
  $nprotot,$nfuntot,$nprotot+$nfuntot,$n_keywords_total;
printf STDERR "Routines ignored due to -xname:  %4d\n",$ignore_name_cnt
  if $ignore_name_re;
printf STDERR "Routines ignored due to -xclass: %4d\n",$ignore_class_cnt
  if $ignore_class_re;
print RINFO <<EOF;
   )
  "$n_routines_total builtin routines with $n_keywords_total keywords for IDL version $idlversion.")
EOF

# Gather sysvar and class info, and write the info
talk_to_idl();
write_sysvar_info();

# Attach classes to the files documenting them.
foreach $class (keys %e) {
  if (defined $files{uc $class} && !defined($classes{$class}{file})) {
    $classes{$class}{file}=$files{uc $class};
  }
}
write_classtag_info();


write_executive_commands();

print RINFO <<EOF;
;; Special words with associated help topic files
(defconst idlwave-help-special-topic-words
  '(
EOF

foreach (sort ignoring_case keys %special_topics) {
  foreach $word (sort ignoring_case @{$special_topics{$_}{words}}) {
    print RINFO "    (\"$word\" . \"$special_topics{$_}{file}\")\n";
  }
}
print RINFO <<EOF;
)
  "Association list of help files for special context words.")

EOF

write_rinfo_footer();

close RINFO;

printf STDERR "Wrote file $rinfofile  (%4d kBytes)\n", (-s $rinfofile)/1024.;


# ==========================================================================
# ==========================================================================
#      SUB-ROUTINES

###--- LOGS AND MESSAGES ---###

sub usage {
  # Print usage information
  print STDERR <<EOF;
usage: get_html_rinfo [-debug] [-path DIR] [-idl /path/to/idl]
                      [-xname REGEXP] [-xclass REGEXP]
EOF
  exit(1);
}

sub diag {
  # Write diagnosis to STDERR and to the LOG file if we are debugging.
  my $msg = @_[0];
  if ($debug) {
    #print STDERR $msg;
    print $msg;
    print LOG $msg;
  }
}

sub reject {
  # Write a message to the reject file.
  my($string,$file,@reasons) = @_;
  if ($debug) {
    print REJECT (("-" x 80) . "\n") x 2;
    print REJECT "File:    $file\n";
    print REJECT "Syntax:  $string\n";
    foreach $reason (@reasons) {
      print REJECT "Reason:  $reason\n";
    }
  }
}

sub complain {
  # Write a message to the reject file.
  my($string,$file,@reasons) = @_;
  if ($debug) {
    print COMPLAIN (("-" x 80) . "\n") x 2;
    print COMPLAIN "File:    $file\n";
    print COMPLAIN "Syntax:  $string\n";
    foreach $reason (@reasons) {
      print COMPLAIN "Reason:  $reason\n";
    }
  }
}

###--- SYNTAX ---###

sub clean_up_syntax {
  # Clean up Syntax
  $syntax=~s|<a\s+name="(?:wp)?[0-9]+">||; #Entro Syntax
  $syntax=~s|<p\s+class="p?Note">.*?^</p>\s*||msg; # Remove notes
  $syntax=~s|<h5\s+class="p?Heading4">.*?^</h5>\s*|or|msg; # And headings
  $syntax=~s/and then,.*//s;
  $syntax=~s/<br>/\n/g;
  $syntax=~s/<.*?>//gs;   # html tags
  $syntax=~s/&nbsp;/ /g; # special chars
  $syntax=~s/&#151;/ - /g;
  $syntax=~s/&#160;/ /g;
  $syntax=~s/&#39;/'/g;
  $syntax=~s/^\s+//mg;   # Blanks lines
  $syntax=~s/\s+$//mg;
  $syntax=~s/^\s*$//mg;
  $syntax=~s/{(?!(?:,|X\s*\|\s*Y))[^}]*}//g; # Internal notes in {}
  $syntax=~s/&([gl])t;/$1=="g"?">":"<"/eg; # > and <
  $syntax=~s|^\s*or\s*||si;
#  diag("Now using cleaned syntax:>>>>>>>>\n$syntax\n<<<<<<<<<<\n");
}

sub parse_syntax {
  my $txt = $_[0];

  # Initialize a few variables.
  # Note that these are global vars which are interpreted by the caller.
  @args = @syntax_kwds = @getkwds = @setkwds = ();
  $name = $type = $class   = $call = "";

  # and any initial explanation string, as in for OPEN
  $txt =~ s/^.*?\w:(?!:)\s*//;

  # Determine type (function or procedure)
  if ($txt =~ /^\s*[a-z][a-z0-9_]+\s*=/si) {
    $type = "fun";
  } else {
    $type = "pro";
  }

  # Determine Class (ignore the name - we grab it later)
  if ($txt =~ /->\s*\[([a-z][a-z0-9_]+)::?\]([a-z][a-z0-9_]+)/i) {
    $class = $1;
    diag("parse_syntax: Got CLASS: $class\n");
  } else {diag("parse_syntax: No CLASS found\n");}

  # Clean out the remarks about BLABLA keywords
  $txt =~ s/^.*?\bkeywords\b.*?:.*?this.*?accepts.*?keywords.*$//gmi;
  $txt =~ s/^.*?\bkeywords\b.*?://gmi; #Some have keywords after them.
  
  # Clean out the "only in..." stuff
  $txt =~ s/\(only\s*in[^\)]*\)//gi;

  # Remove various other bits of detritus.
  $txt =~ s/\bor [a-z0-9 ]+,/or/gi;
  $txt =~ s/\bFor [a-z0-9 ]+:/or/gi;

  # Collapse each chunk of whitespace into a single SPACE
  $txt =~ s/[ \t\n\r]+/ /gm;

  # Get all keywords.  Keywords are things with `/' before it or with `='
  # after it.
  diag("AFTER SYNTAX CLEANUP:\n\n$txt\n\n");
  while ($txt =~m!(\{[/XYZ |]*\}|/?\[XYZ?\])?  # Leading XYZ multiplexer
	          (\/)?               # Possibly a boolean
	          \b                 
                  (_?[A-Z][A-Z0-9_]*) # The actual keyword
	          (\ *\{\s*(Get|Set|Get\s*,\s*Set)\s*\})? # Method stuff
                  \s*
                  (=)?!gx
	) {
    ($xyz,$slash,$identifier,$getset,$equal) = ($1,$2,$3,$5,$6);

    next unless $slash || $equal || ($xyz && $xyz=~m|/|);
    # Everything before the first keyword is part of the calling sequence
    $call = $` unless $call;

    # Some keywords have a {X|Y|Z} in front which must be expanded
    if ($xyz) {
      $xyz=~tr|XYZ||cd;
      @ids=map {$_ . $identifier} split(/(?=[XYZ])/,$xyz);
    } else {
      @ids = ($identifier);
    }

    # Save the recognized keywords away
    push @syntax_kwds,@ids;

    # When the keyword has a Get or Set flag, add it to these lists,
    # so it can be appended to the Get|SetProperty methods' keyword
    # lists later on.
    push @getkwds,@ids if $getset =~ /get/i;
    push @setkwds,@ids if $getset =~ /set/i;
  }

  # If there was no keyword, the whole text must be calling sequence
  $call = ($call || $txt);

  # Final whitespace and parens are killed.
  $call =~ s/[ \n\t\[\]\{\}\|,]*$//; #]
  # We may have killed needed closing parenthesis - fix this.
  $call = close_open_parens($call);

  # Find the name of the routine in the calling sequence.
  if ($override_name) {
    $name=$override_name;
    $override_name="";
  } else {
    if ($class) {
      if ($call =~ /->\s*\[[^\]]+\]([a-z][a-z0-9$_]*)/i) {
	$name = $1;
      }
    } else {
      if ($type eq "fun") {
	if ($call =~ /=\s*([a-z][a-z0-9\$_]*)/i) {
	  $name = $1;
	}
      } elsif ($type eq "pro") {
	if ($call =~ /^\s*([a-z][a-z0-9\$_]*)/si) {
	  $name = $1;
	}
      }
    }
  }

  # Return the name to indicate success
  return $name;
}

# Parse text into "Heading" Sections, optionally doing something with
# the parsed text in a callback.  If the callback is passed, it's
# return value is tested, and, if a true list, that link is saved for
# each name on the list for return.  If the return value is the scalar
# -1, no more processing is done.
sub parse_heading {
  my ($txt,$heading,$callback)=@_;
  my %ret,@ret;
  $heading=3 unless $heading;
  my $hstr='<h'.($heading+1).'\s*class="p?Heading'.$heading.'">';

  my @sections=split(m{<a\s*name="wp([^"]+)">\s*</a>  # Embedded Link
		       $hstr\s*
                       (?:<NOBR>)?
                       \s*((?:(?:<NOBR>)?[^\n\r]+(?:</NOBR>)?)+)[\n\r<] 
                      }xs,$txt);
  shift @sections;
  diag("SECTIONS ($hstr): ".join("\n++++++++++\n",@sections)."\n")
    if ($file=~/sysvars/);
  while (my ($link,$name,$text)=splice(@sections,0,3)) {
    if($callback) {
      my @ret=&$callback($link,$name,$text);
      if(@ret) {
	last if $ret[0]==-1;
	next if $ret[0]==1;
	map {$ret{$_}=$link} @ret;
      }
    } else { $ret{$name}=$link; }
  }

  return %ret
}


# Parse the keywords section, gathering links, and comparing to the
# syntax-derived list @syntax_kwds
sub parse_keywords {
  my ($txt,$heading)=@_;
  my %kwds=
    parse_heading($txt,$heading,
		  sub {
		    my ($link,$kwd,$text) = @_;
		    my @ret;
		    if ($kwd=~/WIDGET_CONTROL\s+Keywords/i) {
		      push @complaints,
			"WIDGET_CONTROL keywords listed as regular keyword.";
		      return -1; # Abort further processing
		    }
		    if ($kwd=~/Keywords?/) {
		      # Some "extra" keywords are annoyingly listed in
		      # amidst the regular keywords
		      my @all=($text=~m{[iI]n\s+addition.*?the\s+
					((?:[A-Z0-9_]+,\s+)+)
                                       and\s+([A-Z0-9_]+)\s+keywords?\s+to\s+
                                       the\s+([:A-Z0-9_]+)\s+(pro|fun)}xs);
		      diag("EXTRA KEYWORDS, Got: \n$text\n\n");
		      return 1 unless @all;

		      my $type=pop @all;
		      my ($class,$routine);
		      $routine=pop @all;
		      unshift @all,split(/,\s+/,shift @all);
		      push @complaints,
			"Extra keywords to link listed as normal keyword: " .
			  "$routine: ". join(",",@all);
		      ($class,$routine)=($routine=~/(?:([a-z][a-z0-9_]+)::?)?
						    ([a-z][a-z0-9_]+)/ix);
		      push @add_keywords,["has",$routine,$type,$class];
		      return 1;
		    }

		    # Take care of space surrounding underscores here too
		    $kwd=~s/([A-Z])(?: +_|_ +)([A-Z])/$1_$2/g;
		    return 1 unless
		      $kwd =~
			m{( # The entire keyword text
			  (?:(\[XYZ?\](_?))?               # opt. [XYZ]
			   (\!?[A-Z][A-Z0-9-_]*)         # keyword
			   (?![a-z])                     # No lowercase lets
			   (?:\s*,\s*)?                  # optional comma
			  )+)                            # >=1 of the above
			  (?:\s*\((Get|Set|Get\s*,\s*Set)\)\s*)? # (Get,Set)?
                      }gxs;
		    my ($kwd,$xyz,$underscore,$key_base,$getset)=
		      ($1,$2,$3,$4,$5);
		    if ($xyz) {
		      $key_base="_".$key_base if $underscore;
		      $xyz=~tr/XYZ//cd;
		      @ret=map {$_ . $key_base} split(/(?=[XYZ])/,$xyz);
		    } else {
		      @ret=split(/,\s*/,$kwd);
		    }
		    push @getkwds,@ret if $getset =~ /get/i;
		    push @setkwds,@ret if $getset =~ /set/i;
		    return @ret;
		  });
#  print "GOT KWDS: ",Dumper(%kwds),"\n";
  my (@missing_from_syntax,@missing_from_text,%syntax_kwds);
  
  foreach $kwd (@syntax_kwds) {
    $syntax_kwds{$kwd}++;
    unless (defined $kwds{$kwd}) {
      push @missing_from_text,$kwd;
    }
  }

  push @complaints,"Syntax keywords missing from text: " .
    join(",",@missing_from_text) if @missing_from_text;
  
  foreach $kwd (keys %kwds) {
    push @missing_from_syntax,$kwd unless $syntax_kwds{$kwd};
  }
  push @complaints,"Text keywords missing from syntax: " .
    join(",",@missing_from_syntax) if @missing_from_syntax;

  diag("  SYNTAX KWDS FOUND: \n   ".join("\n   ",sort @syntax_kwds)."\n");
  diag("  TEXT KWDS FOUND: \n   ".  join("\n   ",sort keys %kwds)."\n");

  # No link found in text, but add the missing keywords from Syntax
  # anyway.
  map {$kwds{$_}=""} @missing_from_text;
  return %kwds;
}

sub fix_keywords {
  # Fix the case of keywords, sort and uniquify the list.
  my @a;
  @a = map { case_name("keyword",$_) } @_;
  @a = make_unique(@a);
  @a = sort ignoring_case @a;
  @a;
}

sub case_name {
  # Return $name in a unique case.
  # The first time this function gets called, the default casification
  # is determined and stored.  Later calls with same or different case only
  # return this initial value.
  my ($type,$name) = @_;
  my $lcname = lc($name);
  if ($debug) {
    $origcase{$type}{$lcname} = $name unless defined $origcase{$type}{$lcname};
    if (defined $origcase{$type}{$lcname} && 
	$name ne $origcase{$type}{$lcname}) {
      diag("CASECHANGE $type: old $origcase{$type}{$lcname}, new $name\n");
    }
  }

  if (defined $casedname{$type}{$lcname}) {
    return $casedname{$type}{$lcname};
  } else {
    my $casedname;
    if ($type eq "class" || $type eq "method") {
      $casedname = $name;
    } else {
      $casedname = uc($name)
    }
    return $casedname{$type}{$lcname} = $casedname;
  }
}

sub make_full_name {
  my ($class,$name) = @_;
  ($class ? "$class" . "::" : "") . $name
}

sub balanced_parens {
  # Check if all parenthesis are balanced
  my $string = $_[0];
  $string =~ tr/()[]{}//cd;
  1 while $string =~ s/\(\)|\[\]|\{\}//g;
  $string eq "";
}

sub close_open_parens {
  # Add all necessary closing parenthesis to the end if a string
  my $rtn = $_[0];
  my $string = $_[0];
  $string =~ tr/()[]{}//cd;
  1 while $string =~ s/\(\)|\[\]|\{\}//g;
  my $close = reverse($string);
  $close =~ tr/([{/)]}/;
  $rtn . $close;
}

sub make_lisp_reader_string {
  # Make a string which will be correctly interpreted by the Emacs Lisp reader
  # as a lisp list containing all needed information about a routine.
  my ($class,$type,$name) = @_;
  my $call=$e{$class}{$type}{$name}{call};
  my ($entry,$kwds);
  
#  diag("STARTING $name CALL WITH: $call\n");
  # In the calling sequence we want `%s' instead of name and class.
  # The calling sequence will later be used as format string to make
  # a calling sequence with the correct version of class and name.
  $call =~ s/^result/Result/;
  if ($class) {
    $call =~ s/(\[)($class)(:+)(\])($name)\b/$1%s$3$4%s/gi;
  } else {
    $call =~ s/\b$name\b/%s/gi;
  }

#  diag("TESTING CALL WITH: $call\n");
  # Now we make the string which can be parsed by the Lisp reader
  # It looks like this:
  # ("NAME" TYPE "CLASS" (system) "CALLING SEQUENCE"
  #  ("FILE1.html" "KWD1" LINK1 "KWD2" LINK2 ...))
  $entry = sprintf('(%s %3s %s', "\"$name\"", $type,
		   ($class eq "" ? "nil" : "\"$class\""));
  $entry .= " (system) \"$call\"";
  $kwds = " (\"$e{$class}{$type}{$name}{file}\"";
      
  if (defined $e{$class}{$type}{$name}{kwds}) {
    $kwds.=" ";
    $kwds.=join(" ",map {"(\"$_\"" .
			   ($e{$class}{$type}{$name}{kwds}{$_}?
			    " . $e{$class}{$type}{$name}{kwds}{$_}":"") . ")"}
		(sort ignoring_case keys %{$e{$class}{$type}{$name}{kwds}}));
  }
  $kwds.= ')';
  if (defined $e{$class}{$type}{$name}{extra}) {
    foreach $file (sort keys %{$e{$class}{$type}{$name}{extra}}) {
      $kwds .= " (\"$file\" " .
	join(" ",map {"(\"$_\"" .
			($e{$class}{$type}{$name}{extra}{$file}{kwds}{$_}?
			 " . $e{$class}{$type}{$name}{extra}{$file}{kwds}{$_}":
			"") .")"}
	     (sort ignoring_case
	      keys %{$e{$class}{$type}{$name}{extra}{$file}{kwds}})) . ')';
    }
  }
  $kwds=$kwds || " nil";

  $entry .= $kwds . ')';

  # Return the entry
  $entry;
}

###--- TALKING TO IDL & WRITING RINFO FILE ---###

sub write_rinfo_header {
  my ($part_of) = $emacs ?
    $lisp_part_of_gnu_emacs : $lisp_not_part_of_gnu_emacs;
  $date = localtime;
  chomp $date;
  ($progname = $0) =~ s|.*/||;
  chomp $progname;
  $n_files_scanned = @files;
  print RINFO <<EOF;
;;; idlw-rinfo.el --- Routine Information for IDLWAVE
;; Copyright (c) 1999 Carsten Dominik
;; Copyright (c) 1999,2000,2001,2002,2003,2004,2005 Free Software Foundation

;; Author: J.D. Smith <jdsmith\@as.arizona.edu>
;; Version: VERSIONTAG
;; Keywords: languages
$part_of
;;; Commentary:

;; The constants which contain information about IDLs builtin
;; functions, procedures, system variables and class tags.  This
;; information is extracted automatically from the IDL documentation
;; and by talking to IDL.
;;
;; Created by $progname on $date
;; IDL version: $idlversion
;; Number of files scanned:  $n_files_scanned
;; Number of routines found: $n_routines_total
;; Number of keywords found: $n_keywords_total
;;
;; New versions of IDLWAVE, documentation, and more information
;; available from:
;;                 http://idlwave.org
;; 
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;;; Code:
EOF
}

sub write_rinfo_footer {
  print RINFO <<EOF;
(provide 'idlw-rinfo)
(provide 'idlwave-rinfo)

;;; idlw-rinfo.el ends here
EOF
}				#'

sub talk_to_idl {
  # Run IDL and get some important information
  my ($code,$class);
  open IDLWAVE_EL,"idlwave.el" or die "Can't open idlwave.el";
  while ($_=<IDLWAVE_EL>) {
    $code .= $_ if /^;; START OF IDLWAVE SUPPORT ROUTINES/..
      /^;; END OF IDLWAVE SUPPORT ROUTINES/;
  }
  close IDLWAVE_EL;
  $code =~ s/([^\\])\\\"/$1\"/g;
  open TMP, ">tmp.pro" or die "Can't write to tmp.pro\n";
  print TMP $code;
  close TMP;

  use IPC::Open2;
  # Run IDL in order to get the names of system variables
  print STDERR "\nTalking to $idl...\n";
  $pid = open2(\*RDR, \*WTR, $idl . ' 2>/dev/null');
  print WTR ".run tmp.pro\n";
  print WTR "idlwave_get_sysvars\n";
  my $fmt = "idlwave_get_class_tags, '%s'\n";
  foreach $class (sort ignoring_case keys %e) {
    next unless $class;
    printf WTR $fmt,$class;
  }
  print WTR "exit\n";

  # Collect the output
  while ($_=<RDR>) {
    diag("Read: $_\n");
    if (/^IDLWAVE-SYSVAR: (.*[A-Z0-9\$])/) {
      push @sv_entries,$1;
    } elsif (/^IDLWAVE-CLASS-TAGS: (.*[A-Z0-9\$])/) {
      @F=split " ",$1;
      $class = shift @F;
      $class =~ s/^!//;
      $nclass++;
      $ntags += scalar(@F);
      @{$classes{$class}{tags}} = @F;
    }
  }
  printf STDERR
    "%6d System Variables\n%6d Class Structures\n",
      scalar(@sv_entries),scalar(keys %classes);
}

sub write_sysvar_info {
  # Print the definition of the sysvar alist to the lisp file.
  my $main;
  print RINFO "\n(setq idlwave-system-variables-alist\n  '(";
  foreach (@sv_entries) {
    @F=split;
    $main = shift @F;
    $main =~ s/^!//;
    print RINFO "\n    (\"$main\"";
    $main=uc($main);
    if (@F) {
      print RINFO " (tags";
      foreach (@F) {
	if($sysvars{$main}{tags}{$_}) {
	  print RINFO " (\"$_\" . $sysvars{$main}{tags}{$_})";
	} else {
	  print RINFO " (\"$_\")";
	}
      }
      print RINFO ")";
    }

    my $link=$sysvars{$main}{main};
    if ($link) {
      print RINFO " (link \"$link\")";
    }

    print RINFO ")";
  }
  print RINFO "))\n\n";
}

sub write_executive_commands {
  print RINFO "\n(setq idlwave-executive-commands-alist '(\n";
  foreach (keys %executive_commands) {
    print RINFO "    (\"$_\" . \"$executive_commands{$_}\")\n";
  }
  print RINFO "))\n\n";
}

sub write_classtag_info {
  # print the class tags to the lisp file.
  my ($class,$class1,$nclass,$ntags,$ntags1);
  my @classes;
  my @F;
  print RINFO "\n(setq idlwave-system-class-info\n  '(";
    @classes = sort {$#{$classes{$b}{tags}} <=> $#{$classes{$a}{tags}}}
    keys %classes;
 CLASS:
  for ($i=0;$i<$#classes;$i++) {
    $c1 = $classes[$i];
    @c1tags = @{$classes{$c1}{tags}};
  SUPERCLASS_CANDIDATE:
    for ($j=$i+1;$j<=$#classes;$j++) {
      $c2 = $classes[$j];
      @c2tags = @{$classes{$c2}{tags}};
      next SUPERCLASS_CANDIDATE unless @c2tags;
      if (is_subset(\@c2tags,\@c1tags)) {
	@c1tags=@{$classes{$c1}{tags}}=
	  remove_tags($classes{$c1}{tags},@c2tags);
	push @{$inherits{$c1}}, $c2;
	$n_inherits++;
	next SUPERCLASS_CANDIDATE;
      } else {
	next SUPERCLASS_CANDIDATE;
      }
    }
  }

  foreach $class (@classes) {
    print RINFO "\n    (\"$class\"";
    if (@{$classes{$class}{tags}}) {
      print RINFO " (tags";
      foreach (@{$classes{$class}{tags}}) {
	print RINFO " \"$_\"";$ntags1++;
      }
      print RINFO ")";
    }
    if (@{$inherits{$class}}) {
      print RINFO " (inherits";
      foreach $class1 (@{$inherits{$class}}) {
	print RINFO " \"$class1\"";
      }
      print RINFO ")";
    }
    if ($classes{$class}{file}) {
      print RINFO " (link \"$classes{$class}{file}\")";
      print RINFO ")";
    }
  }
  print RINFO "))\n\n";
  print STDERR "$ntags tags, $nclass classes, $n_inherits inheritance cases deduced -> $ntags1 native tags\n" if $debug;
}

###--- UTILITY ---###

sub max {$_[0] > $_[1] ? $_[0] : $_[1]}

sub is_subset {
  # Accepts two array references as arguments.
  # Checks if the first array is a subset of the second.
  my ($ref2,$ref1) = @_;
  my @tags1 = @$ref1;
  my @tags2 = @$ref2;
  my %taghash;
  foreach (@tags1) {
    $taghash{$_}=1;
  }
  foreach (@tags2) {
    return 0  unless defined($taghash{$_});
  }
  return 1;
}

sub remove_tags {
  # Remove entries in the array in the first ref
  # the entries to remove are the remaining arguments.
  my $ref1 = shift @_;
  my @tags = @$ref1;
  my @rmtags = @_;
  my %taghash;
  my $cnt = 0;
  foreach (@tags) {
    $taghash{$_}=++$cnt;
  }
  foreach (@rmtags) {
    delete $taghash{$_};
  }
  sort {$taghash{$a} <=> $taghash{$b}} keys %taghash;
}

sub make_unique {
  # make an array unique.
  my @new;
  my %seen;
  foreach (@_) {
    push @new,$_ unless $seen{$_}++;
  }
  @new;
}

# for case-insensitive sorting..
sub ignoring_case {lc($a) cmp lc($b)}


###--- FINDING AND FIXING SPECIAL CASES OF THE SYNTAX ---###

sub try_specials {
  # Try if any of the special matchers in %specials matches the current
  # entry.
  my($key,$rtn);
  foreach (keys %special_sections) {
    $has_special_section_keywords{$_} = 0;
  }
 SPECIAL:
  foreach $key (sort keys %specials) {
    $act = 0;
    $rtn = &{$specials{$key}};
    if ($rtn) {
      $special_matcnt{$key}++;
      $special_actcnt{$key}++ if $act;
      diag("Special $key matched.  Count is $special_matcnt{$key}, $special_actcnt{$key}\n");
      next SPECIAL if $rtn == 1;
      return 1;
    }
  }
  return 0;
}

# UPDATE: Special matchers
# ------------------------
#
# The BEGIN block sets the array which contains special matchers.  The
# hash keys are just strings identifying the matcher.  The values are
# anonymous subroutines which work on the global variable $syntax
# which contains the full syntax entry of the file currently being
# scanned.  To fix a problem, the matcher may change the $syntax
# variable.  The return value of a matcher must be 0, 1, or 2; 0: The
# matcher did not match the entry.  1: The matcher matched, and
# further matchers should be tried.  2: The matcher did match and no
# further matchers should be tried.

# Matching should mean that the matcher actually found the entry it
# was supposed to find.  When the entry is successfully changed, the
# matcher should also set the variable $act.  As typos get fixed by
# RSI, some matchers will still match but not act - a sign that they
# can be removed from the %specials list.

# Matchers of buggy entries should also push onto the @complaints
# array an explanation why the matched entry is buggy.  The result can
# later be mailed to RSI to encourage them to fix the problem.

# Some entries describe actually several routines, for example
# PRINT/PRINTF.  The matcher must place entries directly into the
# @enter array for each routine.

# Finally, some routines do not list all the keywords, but just say
# "accepts all keywords accepted by PLOT" or so.  See the SURFACE
# matcher for an example how to tell the system to add the
# corresponding keywords to the routine description as well, using the
# @add_keywords array, with format:
#
#   [ has,name,type,class,only_keys,allbut_keys,specials_only,get_set ]
#
# has: The string "needs" or "has" depending on whether the routine
#    being scanned "needs" keywords added, or already "has" them, but
#    just requires linking to the proper section.
#
# name: The name of the add routine
#
# type: The type of the add routine
#
# class: The class (if any) of the add routine
#
# only_keys: A reference to a list of keywords to add
#
# allbut_keys: A reference to a list of keywords *not* to add.
#
# specials_only: A string mentioning one of the special keyword
#     section names (see %special_sections below), which will limit
#     added keywords to those with the named special section origin
#     (e.g. "Graphics"), which are actually separated by 2 degrees
#     from the original file (e.g. Surface->Plot->Graphics Keywords).
#
# get_set: Either omit, or use one of "get","set", or "get,set" to
#     specify we need to link that type of keyword from the named
#     routine.

# When this program is being run with the -d flag, it will list which
# matcher was used how often at the end.
BEGIN {

  # The following function prefixes the names of special matchers with
  # numbers, to make sure they are called in the same sequence as 
  # specified here.
  $cnt = 0;
  sub sname {sprintf("%03d %s",++$cnt,$_[0])}

  # The hash with the matchers
  %specials =
    (
     # ------------------------------------------------------------------
     # Some typos which occur globally in many entries ------------------
     # ------------------------------------------------------------------

     sname("UNBALANCED PARENTHESES") =>
     sub {
       if (!balanced_parens($syntax)) {
	 # A parenthesis is missing in the Syntax entry
	 push @complaints, "Unbalanced parentheses";
	 1;
       } else {
	 0;
       }
     },

     sname("CLUTTERED CLASSNAME") =>
     sub {
       if ($syntax=~/(\[[A-Z][A-Z0-9_]+)\$[^]]+(::\])/i) {
	 # Extra stuff in the class name
	 push @complaints, "Extra material in class name";
	 $syntax="$`$1$2$'";
	 $act=1;
       } else {
	 0;
       }
     },

     # ------------------------------------------------------------------
     # The explanation of what the "Syntax" entry is must be rejected ---
     # ------------------------------------------------------------------

     sname("SYNTAX") =>
     sub {
       if ($syntax =~ /The\s*"?syntax"?\s*section\s*shows/si) {
	 # Ignore when Syntax is used to explain itself
	 push @rejects, "Rejecting SYNTAX section explanation.";
	 $syntax = "";
	 $act = 2;
       } else {
	 0;
       }
     },

     # ------------------------------------------------------------------
     # C Functions ------------------------------------------------------
     # ------------------------------------------------------------------

     sname("C FUNCTION") =>
     sub {
       if ($syntax =~ /\A\s*Syntax\s*
	                  (int         |
			   char        |
			   void        |
			   IDL_MEMINT  |
			   IDL_FUN_RET |
			   IDL_VPTR    |
			   Client
			  )\b
	   /xsi) {
	 push @rejects, "Seems to be a C routine";
	 $syntax = "";
	 $act = 2;
       } else {
	 0;
       }
     },

     # ------------------------------------------------------------------
     # IDL statements ---------------------------------------------------
     # ------------------------------------------------------------------

     # UPDATE: Statement regexp
     sname("IDL STATEMENT") =>
     sub {
       if ($syntax =~ /\A\s*
	              (BEGIN            \s+ statements     |
		       CASE             \s+ expression     |
		       SWITCH           \s+ expression     |
		       BREAK            \s+                |
		       CONTINUE         \s+                |
		       COMMON           \s+ Block_name     |
		       COMPILE_OPT      \s+                |
		       FORWARD_FUNCTION \s+                |
		       FOR              \s+ variable       |
		       FUNCTION         \s+ Function_Name  |
		       GOTO \s* ,       \s+ label          |
		       IF               \s+ expression     |
		       PRO              \s+ Procedure_Name |
		       REPEAT           \s+ statement      |
		       WHILE            \s+ expression)
	   /ix) {
	 # Ignore
	 $syntax = "";
	 push @rejects, "IDL control stucture.";
	 $act = 2;
       } else {
	 0;
       }
     },

     # ------------------------------------------------------------------
     # Executive statements ---------------------------------------------
     # ------------------------------------------------------------------

     sname("IDL EXECUTIVE COMMAND") =>
     sub {
       if ($syntax =~ /\A\.([A-Z][A-Z_]+)/) {
	 # Command starts with a dot.  Save and ignore.
	 $syntax = "";
	 $executive_commands{$1}=$file;
	 push @rejects, "IDL executive command.";
	 $act = 2;
       } else {
	 0;
       }
     },

     # ------------------------------------------------------------------
     # Entries which must produce several rinfo list entries ------------
     # ------------------------------------------------------------------

     sname("CALL_METHOD") =>
     sub {
       if ($syntax =~ /CALL_METHOD.*\bor\b.*CALL_METHOD/s) {
	 # Can be called as function or method - make 2 entries.
	 $enter[0] = ["CALL_METHOD","pro","",
		      "CALL_METHOD, Name, ObjRef, [, P1, ..., Pn]"];
	 $enter[1] = ["CALL_METHOD","fun","",
		      "Result = CALL_METHOD, Name, ObjRef, [, P1, ..., Pn]"];
	 $act = 1;
       } else {
	 0;
       }
     },

     sname("OPEN") =>
     sub {
       if ($syntax =~ /OPENR.*OPENW.*OPENU/s) {
	 # Make this 3 separate entries
	 $enter[0] = ["OPENR","pro","","OPENR, Unit, File"];
	 $enter[1] = ["OPENW","pro","","OPENW, Unit, File"];
	 $enter[2] = ["OPENU","pro","","OPENU, Unit, File"];
	 $override_name="OPEN";
	 $act = 1;
       } else {
	 0;
       }
     },

     sname("READ") =>
     sub {
       if ($syntax =~ /^\s*READ, \[Prompt/si) {
	 # Make this 2 separate entries
	 $syntax =~ s/^[ \t]*READ,.*?\n//m;
	 $syntax =~ s/^[ \t]*READF,.*?\n//m;
	 $override_name="READ";
	 $enter[0] = ["READ", "pro","","READ, [Prompt,] Var1, ..., Varn"];
	 $enter[1] = ["READF","pro","","READF, [Prompt,] Unit, Var1, ..., Varn"];
	 $act = 1;
       } else {
	 0;
       }
     },

     sname("PRINT") =>
     sub {
       if ($syntax =~ /^\s*PRINT \[, Expr/si) {
	 # Make this two separate entries
	 $syntax =~ s/^[ \t]*PRINT .*?\n//m;
	 $syntax =~ s/^[ \t]*PRINTF .*?\n//m;
	 $override_name="PRINT";
	 $enter[0] = ["PRINT","pro","","PRINT [, Expr1, ..., Exprn]"];
	 $enter[1] = ["PRINTF","pro","","PRINTF [, Unit, Expr1, ..., Exprn]"];
	 $act = 1;
       } else {
	 0;
       }
     },

     # ------------------------------------------------------------------
     # Cleaning up some individual calling sequences --------------------
     # ------------------------------------------------------------------

     sname("ATAN") =>
     sub {
       if ($syntax =~ /ATAN.*ATAN/s) {
	 # Write a simpler calling sequence
	 $syntax = "Result = ATAN([Y,] X [, /PHASE] )";
	 $act = 1;
       } else {
	 0;
       }
     },
     sname("INTERPOL") =>
     sub {
       if ($syntax =~ /For regular grids.*INTERPOL/i) {
	 # Simplify calling sequence
	 $act = ($syntax =~ s/^[ \t]*for (ir)?regular grids:\s*//gmi);
	 1;
       } else {
	 0;
       }
     },
     sname("READ/WRITE_JPEG") =>
     sub {
       if ($syntax =~ /^\s*(READ|WRITE)_JPEG\b/si) {
	 # Remove the UNIT thing from the syntax
	 $act = ($syntax =~ s/\[,\s*Filename\s*\|\s*,\s*UNIT=lun\s*\](.*)/[, Filename]$1 [, UNIT=lun]/si);
	 $syntax =~ s/\{.*?\}//sg;
	 1;
       } else {
	 0;
       }
     },
     sname("IDLgrFont") =>
     sub {
       if ($syntax =~ /\s*IDLgrFont:[^:]\b/si) {
	 # Only one colon
	 $act = ($syntax =~ s/(?<=grFont):/::/si);
	 push @complaints, "Typo: Only one colon (IDLgrFont)";
	 1;
       } else {
	 0;
       }
     },

     sname("IDLcomIDispatch::Init") =>
     sub {
       if ($syntax =~ /obj_new\('IDLcomIDispatch[^\']*'\)/si) {
	 $enter[0] = ["Init","fun","IDLcomIDispatchCLSID","Result = Obj -> [IDLcomIDispatchCLSID::]Init()"];
	 $enter[1] = ["Init","fun","IDLcomIDispatchPROGID","Result = Obj -> [IDLcomIDispatchPROGID::]Init()"];
	 $act=1;
       } else {
	 0;
       }
     },

     sname("IDLjavaObject::Init") =>
     sub {
       if ($syntax =~ /obj_new\('IDLjavaObject/si) {
	 $syntax=~s/\$JAVACLASSNAME//;
	 $syntax=~s/^where.*?underscores.*?$//sm;
	 $act=1;
       } else {
	 0;
       }
     },
     
     sname("DEVICE GET_PAGE_SIZE") =>
     sub {
       if ($syntax =~ /^\s*DEVICE.*GET_PAGESIZE/si) {
	 $act = ($syntax=~s/GET_PAGESIZE/GET_PAGE_SIZE/);
	 push @complaints, "Keyword misspelled: GET_PAGESIZE";
	 1;
       } else {
	 0;
       }
     },

     # ------------------------------------------------------------------
     # INIT and CLEANUP -------------------------------------------------
     # ------------------------------------------------------------------

     sname("INIT & OBJ_NEW") =>
     sub {
       if ($syntax =~ /.*Init.*\bor\b.*OBJ_NEW\b/s) {
	 # Mention only "Init", not OBJ_NEW in calling sequence
	 $act = ($syntax =~ s/\bor\b.*OBJ_NEW.*subclass.*?\)//s);
	 push @complaints,"Most syntax entries for INIT methods list OBJ_NEW before INIT";
	 1;
       } else {
	 0;
       }
     },

     sname("OBJ_NEW & INIT") =>
     sub {
       if ($syntax =~ /\s*OBJ_NEW\(\s*[\`\'][^\`\']+[\'\`]\s*(.*?)\s*\bor\s*(Result\b.*Init\()[^\)]*\)(.*)/si) {
	 # Mention only "Init", not OBJ_NEW in calling sequence
	 $syntax = "$2$1$3"; 
	 $syntax =~ s/\([^\)]*in\s+a\s+subclass[^\)]*Init[^\)]*\).*$//si;
	 $syntax =~ s/\n[ \t\n]*\n/\n/;
	 $act = 1;
       } else {
	 0;
       }
     },

     sname("INIT MISSING") =>
     sub {
       if ($syntax =~ /OBJ_NEW\(\s*[\`\']([a-zA-Z0-9_]+)[\`\'](.*?)(\(Only\s+in\s+a\s+sub|\))/si) { #`)
	 my $class = $1;
	 my $args  = $2;
	 $args =~ s/^(\s*\[\s*),/$1/;
	 $syntax = "Result = Obj -> [${class}::]Init($args)";
	 push @complaints,"INIT calling sequence missing";
	 $act = 1;
       } else {
	 0;
       }
     },

     sname("OBJ_DESTROY & CLEANUP") =>
     sub {
       if ($syntax =~ /^\s*OBJ_DESTROY,\s*Obj\s+or\s*(.*)/si) {
	 # Mention only "Cleanup" in calling sequence, not OBJ_DESTROY
	 $syntax = $1;
	 $syntax =~ s/\(.*?in\s+[^)]*subclass[^)]*(Cleanu|Init)[^)]*\)//i;
	 $act = 1;
       } else {
	 0;
       }
     },

     sname("CLEANUP & OBJ_DESTROY") =>
     sub {
       if ($syntax =~ /.*Cleanup.*\bor\b.*OBJ_DESTROY\b/si) {
	 # Mention only "Cleanup" in calling sequence, not OBJ_DESTROY
	 push @complaints,"Most syntax entries for CLEANUP methods list OBJ_DESTROY before CLEANUP";
	 $act = ($syntax =~ s/\bor\b.*OBJ_DESTROY.*subclass.*?\)//s);
	 1;
       } else {
	 0;
       }
     },

     
     # ------------------------------------------------------------------
     # Attach or link keywords from other routines ----------------------
     # ------------------------------------------------------------------
     sname("ACCEPTS ALL XXX keywords") =>
     sub {
       my ($routine,$class);
       while ($syntax=~/Accepts all ([A-Z_0-9:]+) keywords/gi) {
	 $act=1;
	 $routine=$1;
	 ($class,$routine)=
	   ($routine=~/(?:([a-z][a-z0-9_]+)::?)?([a-z][a-z0-9_]+)/i);
	 push @complaints,"Keyword list not complete ($2 keywords omitted).";
	 push @add_keywords,["needs",$routine,undef,$class];
       }
       return $act;
     },

     sname("SURFACE,CONTOUR,PLOT_3DBOX (ADD PLOT KEYWORDS)") =>
     sub {
       if ($syntax =~ /^\s*(SURFACE|CONTOUR|PLOT_3DBOX)/si) {
	 if ($syntax =~
	     /Graphics Keywords: Accepts.*PLOT.*except([^\.]*)/s) {
	   # Most PLOT keywords need to be added later.
	   $act = 1;
	   push @complaints, "Keyword list not complete (PLOT kwds omitted)";
	   my $nokeys = $1;
	   $nokeys =~ s/\{XYZ\}([A-Z0-9_]+)/X$1, Y$1, Z$1/g;
	   my @nokeys = (split(/[^A-Z0-9_]+/,$nokeys));
	   shift @nokeys unless $nokeys[0]; # Useless material
	   push @add_keywords,["needs","PLOT","pro","",\@nokeys,"Graphics"];
	 }
	 1;
       } else {
	 0;
       }
     },

     sname("POLAR_CONTOUR (LINK CONTOUR KEYWORDS)") =>
     sub {
       if ($syntax =~ /^\s*POLAR_CONTOUR/si) {
	 if ($kwds =~ /accepts all of the keywords.*CONTOUR routine except([^\.]*)\./s) {
	   # Most CONTOUR keywords need to be linked later.
	   $act = 1;
	   push @complaints, "Keyword list not complete (CONTOUR kwds omitted)";
	   my $nokeys = $1;
	   $nokeys =~ s/\{XYZ\}([A-Z0-9_]+)/X$1, Y$1, Z$1/g;
	   $nokeys =~ s/and//;
	   my @nokeys = (split(/[^A-Z0-9_]+/,$nokeys));
	   shift @nokeys unless $nokeys[0]; # Useless material
	   push @add_keywords,["has","CONTOUR","pro",undef,undef,\@nokeys];
	 }
	 1;
       } else {
	 0;
       }
     },

     sname("IDLitComponent::Get|SetPropertyAttribute") =>
     sub {
       if ($syntax =~ /\s*IDLitComponent::\]?(Get|Set)PropertyAttribute/si) {
	 # Must add all the get|set keywords from RegisterProperty
	 push @add_keywords,["needs","RegisterProperty","pro","IDLitComponent",
			     undef,undef,undef,$1];
	 $act=1;
       } else {
	 0;
       }
     },

			     sname("TVSCL (ADD TV)") =>
     sub {
       if ($syntax =~ /^\s*TVSCL/si) {
	 # Most TV keywords need to be added later.
	 $act = 1;
	 push @complaints, "Keyword list not complete (TV kwds omitted)";
	 push @add_keywords,["needs","TV","pro"];
	 1;
       } else {
	 0;
       }
     },

     sname("LINK SURFACE KEYWORDS") =>
     sub {
       if ($kwds =~ m|\s+</a>SURFACE Keywords|) {
	 # All SURFACE keywords listed in syntax need to be linked later.
	 $act = 1;
	 push @complaints,
	   "Keyword list not complete (SURFACE kwds omitted)";
	 push @add_keywords,["has","SURFACE","pro"];
	 1;
       } else {
	 0;
       }
     },

     # ------------------------------------------------------------------
     # Attach or link keywords from a special section -------------------
     # ------------------------------------------------------------------

     sname("GRAPHICS KEYWORDS") =>
     sub {
       if ($syntax =~ /Graphics +Keywords *:/i) {
	 $act = 1;
	 $has_special_section_keywords{Graphics} = "has";
	 1;
       } else {
	 0;
       }
     },

     sname("THREAD POOL KEYWORDS") =>
     sub {
       if ($kwds =~ m|>\s*Thread Pool Keywords\s*$|mi) {
	 $act = 1;
	 $has_special_section_keywords{MultiThreading} = "needs";
	 1;
       } else {
	 0;
       }
     },
     
     sname("DEVICE KEYWORDS") =>
     sub {
       if ($syntax =~ /^\s*DEVICE/mi) {
	 $act = 1;
	 $has_special_section_keywords{Device} = "has";
	 1;
       } else {
	 0;
       }
     },

     # ------------------------------------------------------------------
     # Typos ------------------------------------------------------------
     # ------------------------------------------------------------------

     sname("SLASH MISSING") =>
     sub {
       if ($syntax =~ /Keywords|\[ *, *(\/|[A-Z][A-Z0-9_]* *=)/g) {
	 my $left = $` . $&;
	 my $right = $';
	 my $n=0;
	 while ($right =~ s/((\[|\|)[ \n]*,[ \n]*)([A-Z][A-Z0-9_]*)( *(\||\]))/$1\/$3$4/s) {
	   $n++;
	   push @complaints, "Slash before keyword $3 missing";
	   print "Fixing keyword $n $3\n" if $debug;
	 }
	 if ($n) {
	   $syntax = $left . $right;
	   $act = 1;
	   1;
	 } else {
	   0;
	 }
       }
     },

     sname("SPACE(S) AROUND UNDERSCORE") =>
     sub {
       if ($syntax =~ /[A-Z]( +_|_ +)[A-Z]/) {
	 $act = ($syntax =~ s/([A-Z]) *_ *([A-Z])/$1_$2/g);
	 push @complaints,"Space surrounding underscore";
	 1;
       } else {
	 0;
       }
     },


     sname("LOWERCASE KEYWORD") =>
     sub {
       if ($syntax =~ /CDF_VARCREATE/) {
	 $act= ($syntax =~ s/VariableType/VARIABLETYPE/);
	 $kwds=~ s/VariableType/VARIABLETYPE/; # Ugghh
	 push @complaints,"Lowercase keyword found";
	 1;
       } else {
	 0;
       }
     },
     
     # ------------------------------------------------------------------
     # Keyword omissions ------------------------------------------------
     # ------------------------------------------------------------------

     sname("SLASH SEPARATED FROM KEYWORD") =>
     sub {
       if ($syntax =~ /\/\s+[A-Z][A-Z0-9_]+/) {
	 $act = ($syntax =~ s|/\s+([A-Z][A-Z0-9_]+)|/$1|);
	 push @complaints,"space between slash and KEYWORD $1";
	 1;
       } else {
	 0;
       }
     },

     # ------------------------------------------------------------------
     # Miscellaneous ----------------------------------------------------
     # ------------------------------------------------------------------

     sname("CLASS MISSING")=>
     sub {
       if ($syntax =~ /->\s*[a-zA-Z0-9_]+/) {
	 return 1 unless $title;
	 ($class)=split(/::/,$title);
	 $act = ($syntax =~ s/(->\s*)/$1\[${class}::\]/);
	 push @complaints,"Class missing from syntax entry";
	 1;
       } else {
	 0;
       }
     },

     sname("CLASS DELIMITER MISPLACED") =>
     sub {
       if ($syntax =~ /[a-z][a-z0-9_]+\]::/i) {
	 $act=($syntax=~s/\]::/::\]/);
	 push @complaints,"Class written as [Class]::Method";
	 1;
       } else {
	 0;
       }
     },

     sname("CLASS BRACKET NOT CLOSED") =>
     sub {
       if ($syntax =~ /\[[a-z][a-z0-9_]+::[a-z]/i) {
	 $act=($syntax=~s/::/::\]/);
	 push @complaints,"Class written as [Class::Method";
	 1;
       } else {
	 0;
       }
     },

     sname("EXTRA SPACE AFTER CLASS") =>
     sub {
       if ($syntax =~ s/(\[[a-z][a-z0-9_]+::\])\s+([a-z])/$1$2/i) {
	 $act=1;
	 push @complaints,"Class written as [Class::] Method (with space)";
	 1;
       } else {
	 0;
       }
     },

     sname("Obj-> & CLASS BRACKETS") =>
     sub {
       if ($syntax =~ /=\s*[a-zA-Z0-9_]+::Init/) {
	 $act =
	   ($syntax =~ s/(=\s*)([a-zA-Z0-9_]+::)Init/$1 Obj ->\[$2\]Init/);
	 push @complaints,"'Obj->' and class brackets missing";
	 1;
       } else {
	 0;
       }
     },


     sname("KILL DATAMINER INTRO") =>
     sub {
       if ($syntax =~ /section shows the proper syntax for calling the method/)
	 {
	   $act=1;
	   $syntax="";
	   1;
	 } else {
	   0;
	 }
     }
    );

  # UPDATE: Special keyword sections
  # --------------------------------
  # Some keywords are present only in special, separate sections,
  # either for brevity, or because many routine refer to them (e.g.,
  # Graphics Keywords).  This hash helps locate these sections, and
  # parse the keywords from it, so that routines relevant keywords can
  # be linked to it correctly.  The format is a hash with keywords
  # identifying the section (e.g. as used with
  # %has_special_section_keywords in the syntax special matchers, and
  # matching the file <title> of the given section), and the value is
  # a reference to a hash with two subroutines.  The first,"matcher",
  # identifies whether a given file corresponds to one of the special
  # sections (it can use $title or $file_contents to help it), and the
  # second, "parser", parses the keywords in the special section into
  # %txt_kwds.

  # The hash with the matchers
  %special_sections =
    (
     Graphics=>
     { matcher=> sub {
	 $title=~"Graphics Keywords";
       },
       parser=> sub {
	 parse_keywords($file_contents,2)
       }
     },
     MultiThreading=>
     { matcher=>sub {
	 $title=~"Thread Pool Keywords";
       },
       parser=>sub {
	 parse_keywords($file_contents);
       }
     },
     Device=>
     { matcher=>sub {
	 $title=~"Keywords Accepted by the IDL Devices";
       },
       parser=>sub {
	 parse_keywords($file_contents);
       }
     });


  # UPDATE: Special help topics
  # ---------------------------

  # Some words which can show up in IDL source code have special
  # help topics associated with them.  This hash links downcase
  # versions of these special sections (which must match the title
  # of the file to be searched), to the appropriate words.  Even if
  # word and topic are the same, it must be mentioned here in order
  # to trigger help on this word to visit that special section.
  %special_topics = 
    (
     ("begin...end"=>      {words=>["begin"]}),
     (case=>               {words=>["case","endcase","of"]}),
     (common=>             {words=>["common"]}),
     (compile_opt=>        {words=>["compile_opt"]}),
     (continue=>           {words=>["continue"]}),
     ("repeat...until"=>   {words=>["endrep","repeat","until"]}),
     ("while...do"=>       {words=>["while","do","endwhile"]}),
     ("if...then...else"=> {words=>["if","then","endif","else","endelse"]}),
     (pro=>                {words=>["pro","end"]}),
     (for=>                {words=>["for","endfor"]}),
     (switch=>             {words=>["switch","endswitch"]}),
     (return=>             {words=>["return"]})
    );

  # The standard Emacs file header notice.
  $part_of_gnu_emacs = <<EOF;

This file is part of GNU Emacs.

GNU Emacs is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version.

GNU Emacs is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with GNU Emacs; see the file COPYING.  If not, write to the
Free Software Foundation, Inc., 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.
EOF
  
  # Make the Lisp versions, with the comment chars.
  ($lisp_part_of_gnu_emacs = $part_of_gnu_emacs) =~ s/^.*?\S/;; $&/mg;
}


__END__

=head1 NAME

get_html_rinfo - Extract information from IDL HTML documentation for
use with IDLWAVE

=head1 SYNOPSIS

B<get_html_rinfo> [-path path/to/html/files]
                  [-debug]
                  [-idl path/to/idl]
                  [-xname regexp]
                  [-xclass regexp]

=head1 DESCRIPTION

I<get_html_rinfo> processes the HTML documentation provided with IDL
(the Interactive Data Language, a proprietary language owned by
Research Systems Inc.).  It also talks to the IDL program itself for
additional info.  It produces the files needed by the Emacs IDLWAVE
mode to support completion, short help (routine info display) and
extensive online HTML help.  It produces the file:

=over 4

=item F<idlw-rinfo.el>

Contains the routine information needed for completion and compact
routine info display, along with URL links into the HTML help system
for routines, keywords, system variables, and special statement.  This
file must be installed on the Emacs load path and is part of the
standard IDLWAVE distribution.

=back

=head1 OPTIONS

=over 5

=item B<-path> B<path/to/html/files>

Specify the path to the HTML documentation files, as converted from
IDL's HTMLHelp docs using MS HTMLHelp Workshop.  Default is the
current directory.

=item B<-idl> B</name/or/path/of/idl/executable>

Specify the name of the IDL executable to talk to for additional
information.  The default is F<idl>.

=item B<-debug>

Debugging.  Writes diagnostics to STDERR and creates these
files:

F<get_html_rinfo.log>: A copy of the messages printed to STDERR.

F<get_html_rinfo.rej>: All syntax entries which could not be parsed
correctly, together with an explanation.

F<get_html_rinfo.cpl>: Syntax entries which contain typos or other
inconsistencies.  I<get_html_rinfo> actually fixes most of these problems,
but you should feel free to complain about them to RSI.

=item B<-xname> B<regexp>

Exclude routine names matched by the regular expression REGEXP.
For example setting this to "^CDF_" will not store information
about routines starting with the "CDF_" prefix.

=item B<-xclass> B<regexp>

Exclude object classes matched by the regular expression REGEXP.

Options may be abbreviated to uniqueness
=head1 FILES

F<get_html_rinfo.log>, F<get_html_rinfo.cpl>, F<get_html_rinfo.rej>

=head1 AUTHOR

J.D. Smith <jdsmith@as.arizona.edu>

=cut