[futurebasic] Re: Parsing

Message: < previous - next > : Reply : Subscribe : Cleanse
Home   : November 1997 : Group Archive : Group : All Groups

From: Joe Kovac <locality@...>
Date: Sat, 29 Nov 1997 10:13:34 -0400
>I need my program to parse some fairly sizable text based files.  The file
>specification is quite loose, however, so I can't know exactly where each
>element of the file will be.  What I'm looking for is a fast way to locate,
>for example, the first character in the handle that is NOT a white space
>(CHR's 9 through 13) or the first character that IS the start of a
>subsection: ({[< and such.  I'm using some FN MUNGER code that was posted
>here a while back for most of my parsing, but I haven't found a way to do
>this with that yet.
>
>While I'm at it, if anyone knows a way to search for a string that is not
>within a subsection (a pair of the above characters: {}, (), etc.) I'd
>appreciate hearing that, as well.  My current solution is to search for the
>string, and then search for the start and end of the subsections leading up
>to the string and making sure that the last subsection ends before the
>string begins.  It works, but it seems to me like there should be a faster
>method.
>
>Thanks in advance.
>
>-- Brian Victor
>
>
>--
>To unsubscribe, send ANY message to <futurebasic-unsubscribe@...>

This is just off the top of my head but the best way to do most parsing is
one character at a time, and if white space doesn't have any importance,
you can skip it in the routine that gets the character:

LOCAL FN ReadChar
  LONG IF (gOffset& <= gSize&)
    gChar& = PEEK ([gTextHandle&]+gOffset&)
    INC (gOffset&)
  XELSE
    gChar& = -1
  END IF
END FN

LOCAL FN GetChar
  FN ReadChar
  WHILE (gChar& = 13 or gChar& = 32)
    FN ReadChar
  WEND
END FN

Those two short FNs will read through the text block until they get to the
end, skipping spaces and CRs, When it gets to the end gChar& will be -1. To
make it work gTextHandle& has to have a handle to the text, gSize& must be
the length of the text, and gOffset& must be initalized to zero. Each time
you call GetChar, gChar& will contain the next non white character.

To actually parse something with them, I'd do something like this:

LOCAL FN IsSubStart (Char&)
  IsSubsection% = (Char& = ASC("(") or Char& = ASC ("[") or Char& = ASC ("{"))
END FN = IsSubsection%

LOCAL FN IsSubEnd (Char&)
  IsSubsection% = (Char& = ASC(")") or Char& = ASC ("]") or Char& = ASC ("}"))
END FN = IsSubsection%

LOCAL FN ParseString
  WHILE (gChar& <> ASC("/"))
    X$ = X$ + CHR$ (gChar&)
  WEND
  FN GetChar
END FN

LOCAL FN ParseSub
  FN GetChar
  WHILE (gChar& <> -1 AND FN IsSubEnd (gChar&) = _false)
    FN ParseString
  WEND
  FN GetChar
END FN

LOCAL FN ParseBlock
  FN GetChar
  WHILE (gChar& <> -1)
    LONG IF FN IsSubStart (gChar&)
      FN ParseSub
    XELSE
      FN ParseString
    END IF
  WEND
END FN

Parsing starts with a call to ParseBlock, which loads the first character
and goes through the symbols. At this point, because white space is
ignored, the file as you described it will either have a string or a
sub-section, so these are the two cases it looks for. If it's not a
sub-section, it must be a string.

If it's a subsection, it calls ParseSub. ParseSub skips past the
Sub-Section symbol and calls ParseString until it finds a symbol which
denotes the end of the subsection, at which point it returns to FN
ParseBlock. (I'm supposing that only strings are allowed in sub-sections...)

When parsing strings it keeps reading the string until it hits a "/". I
don't know what marks the end of a string in your file, if it's a CR, this
will need some more work.

But anyway, that's the easiest and most flexible way of doing parsing when
the format isn't rigidly set.


Hope that was useful,

Joe K.
locality@...