Processing BibTeX entries? 
Author Message
 Processing BibTeX entries?

Hi all,

My BibTeX databases are growing rapidly, so I thought it would
be nice to write myself a small database manager, and a good
way to teach myself Perl properly too.

Before I go reinventing the wheel, does anyone know of packages
for manipulating BibTeX entries, e.g. splitting them into
associative arrays?

Failing that, can anyone suggest some nice regexp's for handling
the variations of BibTeX 'var = value' entries where value:

1) can span multiple lines
2) can be quoted with either "" or {}
3) can contain extra {} pairs to force capitalisation or
   insert accents (e.g. G{\"o}del)

All suggestions gratefully received.

Cam.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Department of Computer Science,                 Telephone: +61 3 9287 9119
The University of Melbourne,                    Facsimile: +61 3 9348 1184
Parkville, 3052, Victoria, Australia.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-



Mon, 24 May 1999 03:00:00 GMT  
 Processing BibTeX entries?

[emailed and posted]
: My BibTeX databases are growing rapidly, so I thought it would
: be nice to write myself a small database manager, and a good
: way to teach myself Perl properly too.
:
: Before I go reinventing the wheel, does anyone know of packages
: for manipulating BibTeX entries, e.g. splitting them into
: associative arrays?

Been there, done that.  I tried the let's-parse-it-ourselves-in-
Perl-how-hard-can-it-be approach, and it's *hard*.  Highly
unrecommended.  I never did get it working properly; luckily I had the
good sense to ditch the code when I figured out a better way.

What better way, you ask?  Two possibilities: there is a moderately nice
tool out there called BibTool (you can find it on CTAN) that parses,
prettyprints, and does various common database-ish (sort, validate,
select, etc.) operations on BibTeX files.  You want the parse and
prettyprint side of things; open a pipe to bibtool like this:

  bibtool -- pass.comments=off -- print.line.length=999999

and it spews back your BibTeX with blank lines between every entry, and
each field on just one line.  This makes the parse-it-ourselves-in-
Perl approach *much* more tractable.  (Downright trivial, in fact.)

However, I'm kind of interested in parsing, so I went ahead and wrote my
own BibTeX parser using PCCTS (an integrated set of tools that do the
same job as lex and yacc, but in a much easier fashion).  Currently, it
does much the same as bibtool in the mode I just described, but tossing
quotes, commas, and other junk -- ie. it stuffs nice flexibly formatted
data, into a very simplified straighjacket that just happens to be a
breeze to parse in Perl.  

On the downside, my parser doesn't (yet) do anything with macros apart
from recognize them.  Plans are afoot for fixing this, but it'll be a
while yet.  

Here's an example.  For input, take this:


         title = { This is the Title of my { S P I F F Y } Paper },
         author = thor # and # {Another Author} # and # { Smith, John},
         journal = "The Journal of Things",
         sillyfield2 = "is this silly? ",
         empty = "foo",
         sillier_field! = "even" # {   sillier },
         year = 1234

Quote:
}

then if I run bibparse (that's my program) on it, I get this:




title = This is the Title of my { S P I F F Y } Paper
author = macro(thor) macro(and) Another Authormacro(and) Smith, John
journal = The Journal of Things
sillyfield2 = is this silly?
empty = foo
sillier_field! = evensillier
year = 1234

Note how whitespace in and around quotes is handled exactly the way
BibTeX does it (not a trivial feat to reproduce).  Note also the
moderately annoying way that macros are marked in the output; I suppose
a more ambitious Perl side parser could do something with these, but I
plan to fix it in the C end.  (Call me weird.)

I currently have a somewhat OO front end, BibTex.pm, which contains
packages BibTex, BibTex::File and BibTex::Entry.  You can do all the
new'ing and ->'ing yourself, or you can just call BibTex::bibloop, which
takes a list of files, a loop body (CODE ref), an output filehandle, and
a list of other args to pass to your loop body.  It loops over all
entries in all input files, and call your loop body with everything it
could want to know about the entry -- the filename and entry number
(line numbers are a bit elusive what with the layers of C and Perl
... ugh), a BibTex::Entry object which can be queried
(eg. $entry->fieldval ('author')), and any other args you happen to want
to pass.

And I have tons of example programs.  (This was used for a grant
proposal involving 45 investigators and the complete CV of each one from
1991-96, comprising several thousand entries.  Detecting duplicates in
6+ MB of typed-in bib files was an interesting challenge!)

If you really, really want my code, I can throw something together and
make it available for a *short* time.  I'm planning to come back to all
this and do it *right* early next year -- ie. XS glue between the C
parser and Perl, a truly OO interface, and (gasp!) documentation.
Eventually I plan to duplicate the functionality of BibTeX itself, but
with a *faaar* easier style language.  (Gimme a P!  and an E!  and
... oh, you can see where this is leading...)

And if you don't want my code, track down BibTool -- much easier than
trying to hack your own parser in Perl.  (*shudder*)

Good luck!

        Greg

--

Brain Imaging Centre (WB201)                 voice: (514) 398-4965 (or 1996)
Montreal Neurological Institute                fax: (514) 398-8948
Montreal, Quebec, Canada  H3A 2B4



Mon, 24 May 1999 03:00:00 GMT  
 Processing BibTeX entries?


Quote:

>Before I go reinventing the wheel, does anyone know of packages
>for manipulating BibTeX entries, e.g. splitting them into
>associative arrays?

You might try the following URL:
<http://www.ecst.csuchico.edu/~jacobsd/bib/bp/index.html>

It's a package for parsing and writting bibliography database files (not
just BibTeX).

ciao
  lutz

---------------------------------------------------------------------
Lutz Albers                                     |       What's good ?
Luederitzstr. 14, 81929-Muenchen, Germany       |      Life's good -

<http://www.muc.de/~lutz>  fax:+49-89-93940365  |          (Lou Reed)

Do not take life too seriously, you will never get out of it alive.



Tue, 25 May 1999 03:00:00 GMT  
 
 [ 3 post ] 

 Relevant Pages 

1. Processing BibTeX entries?

2. processing arrays (leave off null entries)

3. Changing entry text while event is processed

4. Lost recent mail about Text::BibTeX library

5. ANNOUNCE: Text::BibTeX 0.21

6. ANNOUNCE: Text::BibTeX 0.2

7. Anybody using Text::BibTeX 0.1?

8. ANNOUNCE: Text::BibTeX 0.1

9. Interface to BibTeX

10. NEED INFO. ABOUT A SCRIPT FOR BIBTEX

11. Bibtex module?

12. S: perl scripts aroudn BibTeX ?

 

 
Powered by phpBB® Forum Software