[futurebasic] Re: Parsing

Message: < previous - next > : Reply : Subscribe : Cleanse
Home   : November 1997 : Group Archive : Group : All Groups

From: Joe Kovac <locality@...>
Date: Sat, 29 Nov 1997 10:13:34 -0400
>I need my program to parse some fairly sizable text based files.  The file
>specification is quite loose, however, so I can't know exactly where each
>element of the file will be.  What I'm looking for is a fast way to locate,
>for example, the first character in the handle that is NOT a white space
>(CHR's 9 through 13) or the first character that IS the start of a
>subsection: ({[< and such.  I'm using some FN MUNGER code that was posted
>here a while back for most of my parsing, but I haven't found a way to do
>this with that yet.
>While I'm at it, if anyone knows a way to search for a string that is not
>within a subsection (a pair of the above characters: {}, (), etc.) I'd
>appreciate hearing that, as well.  My current solution is to search for the
>string, and then search for the start and end of the subsections leading up
>to the string and making sure that the last subsection ends before the
>string begins.  It works, but it seems to me like there should be a faster
>Thanks in advance.
>-- Brian Victor
>To unsubscribe, send ANY message to <futurebasic-unsubscribe@...>

This is just off the top of my head but the best way to do most parsing is
one character at a time, and if white space doesn't have any importance,
you can skip it in the routine that gets the character:

  LONG IF (gOffset& <= gSize&)
    gChar& = PEEK ([gTextHandle&]+gOffset&)
    INC (gOffset&)
    gChar& = -1

  FN ReadChar
  WHILE (gChar& = 13 or gChar& = 32)
    FN ReadChar

Those two short FNs will read through the text block until they get to the
end, skipping spaces and CRs, When it gets to the end gChar& will be -1. To
make it work gTextHandle& has to have a handle to the text, gSize& must be
the length of the text, and gOffset& must be initalized to zero. Each time
you call GetChar, gChar& will contain the next non white character.

To actually parse something with them, I'd do something like this:

LOCAL FN IsSubStart (Char&)
  IsSubsection% = (Char& = ASC("(") or Char& = ASC ("[") or Char& = ASC ("{"))
END FN = IsSubsection%

LOCAL FN IsSubEnd (Char&)
  IsSubsection% = (Char& = ASC(")") or Char& = ASC ("]") or Char& = ASC ("}"))
END FN = IsSubsection%

LOCAL FN ParseString
  WHILE (gChar& <> ASC("/"))
    X$ = X$ + CHR$ (gChar&)
  FN GetChar

  FN GetChar
  WHILE (gChar& <> -1 AND FN IsSubEnd (gChar&) = _false)
    FN ParseString
  FN GetChar

LOCAL FN ParseBlock
  FN GetChar
  WHILE (gChar& <> -1)
    LONG IF FN IsSubStart (gChar&)
      FN ParseSub
      FN ParseString
    END IF

Parsing starts with a call to ParseBlock, which loads the first character
and goes through the symbols. At this point, because white space is
ignored, the file as you described it will either have a string or a
sub-section, so these are the two cases it looks for. If it's not a
sub-section, it must be a string.

If it's a subsection, it calls ParseSub. ParseSub skips past the
Sub-Section symbol and calls ParseString until it finds a symbol which
denotes the end of the subsection, at which point it returns to FN
ParseBlock. (I'm supposing that only strings are allowed in sub-sections...)

When parsing strings it keeps reading the string until it hits a "/". I
don't know what marks the end of a string in your file, if it's a CR, this
will need some more work.

But anyway, that's the easiest and most flexible way of doing parsing when
the format isn't rigidly set.

Hope that was useful,

Joe K.