PS->ASCII conversion: a PS solution 
Author Message
 PS->ASCII conversion: a PS solution

The following is a shell script that uses Sun's psh to convert postscript
to ASCII.  It does a reasonable job of understanding TeX output, since
it uses positional information to determine word breaks.  It will
therefore recombine words that are broken by TeX's kerning.
It also understands some TeX specials and produces something reasonable
as output.  The greatest weakness is that it does not understand fonts.

For those who do not have psh, ghostscript may be a viable alternative.

Please let me know of any useful improvements to this.
It is provided without warranty, and absolutely freely.

============================CUT HERE====================================
#!/bin/csh -f
# Converts PostScript to ASCII on a Sun by sending the PS into
# psh.  The ASCII is spat back out on stdout.
# Written by: Len Hamey, Macquarie University, NSW 2109 AUSTRALIA,

# This version uses psh; one could also use pageview and/or
# ghostscript.
cat - $* <<'EOF' | psh
%!PS-Adobe-2.0
%%EndComments
/psa-lastx 30000 def
/psa-lasty -30000 def
/psa-beginx 0 def
/psa-gap 0 def
/psa-newline (\n) def
/psa-space ( ) def
/psa-line 512 string def
/psa-linep 0 def
/psa-recurrent 0 def
/psa-tex-specials
[
(?) % 00 = Gamma
(?) % 01 = Delta
(?) % 02 = Theta
(?) % 03 = Lambda
(?) % 04 = Xi
(?) % 05 = Pi
(?) % 06 = Sigma
(?) % 07 =
(?) % 10 = Phi
(?) % 11 = Psi
(?) % 12 = Omega
(ff) % 13
(fi) % 14
(fl) % 15
(ffi) % 16
(ffl) % 17
(i) % 20 = i without dot
(j) % 21 = j without dot
(`) % 22
(') % 23
(?) % 24 = caron
(?) % 25 = breve
(-) % 26 = macron
(?) % 27 = ring
(,) % 30 = cedilla
(B) % 31 = beta
(ae) % 32
(oe) % 33
(o/) % 34 = o with slash through it
(AE) % 35
(OE) % 36
(O/) % 37
] def
/psa-ps-specials
[
(?) % 200
(?) % 201
(?) % 202
(?) % 203
(?) % 204
(?) % 205
(?) % 206
(?) % 207
(?) % 210
(?) % 211
(?) % 212
(?) % 213
(?) % 214
(?) % 215
(?) % 216
(?) % 217
(?) % 220
(?) % 221
(?) % 222
(?) % 223
(?) % 224
(?) % 225
(?) % 226
(?) % 227
(?) % 230
(?) % 231
(?) % 232
(?) % 233
(?) % 234
(?) % 235
(?) % 236
(?) % 237
(?) % 240
(!) % 241 = upside-down !
(c) % 242 = cent
(Pd) % 243 = Pound sterling
(/) % 244 = fraction
(Yen) % 245 = Y overlayed with =
(f) % 246 = florin
(Section) % 247
(?) % 250 = currency
(') % 251 = single (vertical) quote
(``) % 252 = double quote (left)
(<<) % 253
(<) % 254
(>) % 255
(fi) % 256
(fl) % 257
(?) % 260
(--) % 261 = endash
(+) % 262 = dagger
(++) % 263 = double dagger
(.) % 264 = centred period
(?) % 265
(d) % 266 = (symbol font) partial diff [Paragraph in text font]
(*) % 267 = Bullet
(,) % 270
(,,) % 271
('') % 272
(>>) % 273
(...) % 274 = ellipsis
(%o) % 275 = perthousand
(?) % 276
(?) % 277 = question upside down
(?) % 300
(`) % 301 = grave
(') % 302 = acute
(^) % 303 = circumflex
(~) % 304 = tilde
(-) % 305 = macron
(?) % 306 = breve
(.) % 307 = dotaccent
(..) % 310 = dieresis
(?) % 311
(o) % 312 = ring
(,) % 313 = cedilla
(?) % 314
(") % 315 = hungarumlaut
(,) % 316 = ogonek
(?) % 317 = caron
(---) % 320 = emdash
(?) % 321
((R)) % 322 (symbol font)
((c)) % 323 (symbol font)
(TM) % 324 (symbol font)
(?) % 325
(?) % 326
(?) % 327
(?) % 330
(?) % 331
(?) % 332
(?) % 333
(?) % 334
(?) % 335
(?) % 336
(?) % 337
(?) % 340
(AE) % 341
((R)) % 342 (symbol font)
((c)) % 343 (symbol font) [ordfeminine in text font]
(TM) % 344
(?) % 345
(?) % 346
(?) % 347
(L) % 350
(O/) % 351
(OE) % 352
(o) % 353
(?) % 354
(?) % 355
(?) % 356
(?) % 357
(?) % 360
(ae) % 361
(?) % 362
(?) % 363
(?) % 364
(i) % 365 = dotlessi
(?) % 366
(?) % 367
(l) % 370
(o/) % 371
(oe) % 372
(B) % 373
(?) % 374
(?) % 375
(?) % 376
(?) % 377
] def
% psa-cdef: Concatenate new1, current and new2 definitions of an operator.
% <key> <new1> <new2> cdef
/psa-cdef { %def
  /psa-currentdef 3 index load def
  /psa-cdef-end exch def
  [ exch aload pop
    /psa-currentdef load dup type
    /operatortype ne {aload pop} if
    /psa-cdef-end load aload pop
  ] cvx
  def

Quote:
} def

/psa-printstring
{ % string
  psa-line psa-linep 2 index putinterval % string
  length psa-linep add /psa-linep exch store % -
Quote:
} def

/psa-printchar
{ % character
  psa-line psa-linep 2 index put % character
  pop /psa-linep psa-linep 1 add store % -
Quote:
} def

/psa-begin
{
  % show-args string nargs
  exch dup currentpoint % args nargs string string currx curry
  5 -1 roll % args string string currx curry nargs
  3 add % args string string currx curry nargs+3
  3 roll % string currx curry args string
Quote:
} bind def

/psa-end
{
  % string beginx beginy
  % Check for change in Y co-ordinate: interpreted as new-line.
  psa-lasty ne
  { psa-newline psa-printstring } if
  % Check for significant gap between characters: interpreted as space.
  % Compute gap relative to previous piece of text
  dup psa-lastx sub
  % Compute width of current text string as a measure of typical
  % character widths.
  % stack: string beginx deltax
  currentpoint pop 2 index sub % stack: string beginx deltax currx-beginx
  3 index length div 6 div % stack: string beginx deltax currx-beginx/length/6
  gt
  { psa-space psa-printstring } if % stack: string beginx
  pop
  % Process all characters, translating TeX ligatures.
  % (For non-Tex, use the PS print operator instead of the forall loop)
  { dup 31 le
    { psa-tex-specials exch get psa-printstring }
    {
      dup 128 ge
      { psa-ps-specials exch 128 sub get psa-printstring }
      { psa-printchar } ifelse
    } ifelse
  } forall
  % Output the buffered text
  psa-line 0 psa-linep getinterval print
  /psa-linep 0 store
  currentpoint /psa-lasty exch store /psa-lastx exch store

Quote:
} bind def

/show { 1 psa-begin } { psa-end } psa-cdef
/kshow { 2 psa-begin } { psa-end } psa-cdef
/widthshow { 4 psa-begin } { psa-end } psa-cdef
/ashow { 3 psa-begin } { psa-end } psa-cdef
/awidthshow { 6 psa-begin } { psa-end } psa-cdef
'EOF'
exit
============================CUT HERE======================================


Fri, 02 Dec 1994 11:05:08 GMT  
 
 [ 1 post ] 

 Relevant Pages 

1. PS -> ASCII (a solution)

2. PS chips and ASCII -> PS filters

3. Need ASCII-->PS program (in PS) that does ** TABs **

4. multiple file>PS>EPS conversion

5. ps to ascii conversion tool needed

6. Automating PS to Ascii conversion

7. PS->HTML , PS->LATEX, LATEX->HTML

8. PS->TIFF, PCL->TIFF, PCL->PS

9. PS->HTML, PS->LATEX, LATEX->HTML

10. Is there PS to ASCII conversion utility?

11. PS to 7-bit ASCII conversion

12. PS to ASCII conversion?

 

 
Powered by phpBB® Forum Software