INVALID ARRAY SUBSCRIPT leads to Pop core dump 
Author Message
 INVALID ARRAY SUBSCRIPT leads to Pop core dump

I've been mucking about with arrays and happened upon an INVALID ARRAY
SUBSCRIPT situation that caused a  core dump. With some experimentation
I was able to reproduce the problem (and unfortunately lost most
transcripts). But the following  reliably gives me a core dump:

: newarray([-5 0 10 15])->b;
: b(4)=>
;;; MISHAP - INVALID ARRAY SUBSCRIPT
  ;;; INVOLVING:  4
Segmentation fault

Obviously, it's a silly thing to do.

For me this isn't an urgent problem. And I haven't tried to dig behind
the scenes yet. Just mentioning it.

Pentium III running Mandrake 9.0 Linux workstation.
Poplog: linux.poplog.V15.53 without motif.

I assume this is not something that happens on other combinations of
hardware and OS.

Happy 2003.

Luc



Sun, 19 Jun 2005 15:50:26 GMT  
 INVALID ARRAY SUBSCRIPT leads to Pop core dump

Quote:
lucb AT telus.net writes:
> Date: Wed, 1 Jan 2003 07:50:26 +0000 (UTC)

> ... the problem (and unfortunately lost most
> transcripts). But the following  reliably gives me a core dump:

> : newarray([-5 0 10 15])->b;
> : b(4)=>
> ;;; MISHAP - INVALID ARRAY SUBSCRIPT
>   ;;; INVOLVING:  4
> Segmentation fault
>.....
> Pentium III running Mandrake 9.0 Linux workstation.
> Poplog: linux.poplog.V15.53 without motif.

> I assume this is not something that happens on other combinations of
> hardware and OS.

I confirm that this happens also with RedHat 7.3 or RedHat 8.0 on PC
with either pentium P4 or AMD athlon, and also with PC poplog running on
Windows.

On redhat 8.0

Sussex Poplog (Version 15.53 Mon Aug 21 17:36:46 BST 2000)
Copyright (c) 1982-1999 University of Sussex. All rights reserved.

Setpop
: 1 -> popsyscall;  ;;; ensure full error messages
:
: vars b = newarray([-1 5 -1 5]);
: b(6) =>

;;; MISHAP - INVALID ARRAY SUBSCRIPT
;;; INVOLVING:  6
;;; DOING    :  sys_exception_final sys_exception_handler
;;;     Segmentation fault (core dumped)

I checked using Poplog version 15 running on a PC with Windows 2000
and also got an access violation error. It opened a new window in which
the error printing went on forever, until I killed the process.

However the problem does not occur on either sparc+solaris or on an
alpha running digital Unix. In those cases you get a mishap message and
poplog continues running, as expected.

On linux + PC the segmentation fault arises with b(6,6) and
b(6, 4) and also b("cat"), so it is not a stack problem arising out of a
missing argument, but has something to do with what happens when the
index is discovered to be out of bounds or of the wrong type.

Because it is common to both windows poplog and linux PC poplog I assume
the problem is in the assembler file that defines the array checking
code.

The error handler which prints the above message is invoked by this
procedure

    define Array$-Sub_error(item);

defined in $popsrc/errors.p

That procedure is invoked in the low level machine code procedure
array_sub_error defined in this assembler file (which seems to be the
only file involved here that is specific to pc+linux (or pc+windows):

    $popsrc/aarith.s

That file defines the machine code procedure _array_sub, which computes
the offset into the array vector. It gets the array indexes off the
stack one at a time, testing them to ensure that they are integers and
in range.

It looks as if the error test works, then _array_sub calls the routine
array_sub_error which succeeds in calling Sub_error, which invokes
pop11's generic error handler, which starts printing out the error
message, and fails half way through printing.

The corresponding $popsrc/aarith.s for windows poplog was apparently
generated from the linux version then edited by hand, according to a
comment inserted by Robert Duncan.

I suspect something is wrong with the machine instructions in both files
and I wonder if someone familiar with the PC architecture can tell what
is wrong either from the linux version or the windows version. They are
accessible here if you don't have local versions:

    http://www.cs.bham.ac.uk/research/poplog/src/master/S.pcwnt/src/aarith.s
        PC + windows
    http://www.cs.bham.ac.uk/research/poplog/src/master/S.pcunix/src/aari...
        PC + linux

For comparison here are two versions that work OK:
    http://www.cs.bham.ac.uk/research/poplog/src/master/S.sun4r5/src/aari...
        Sparc + solaris

    http://www.cs.bham.ac.uk/research/poplog/src/master/S.axposf/src/aari...
        Alpha + unix

(If anyone has a version of poplog running under solaris on a PC I
expect it will have the same problem).

I decided to see whether the problem was caused by something happening
after the low level array subscript check had finished.

Printing out the calling stack (the DOING list) is done by
    sys_pr_message(count, message, idstring, severity);
defined in $popsrc/errors.p

It is invoked by sys_raise_exception via the user-definable
pop_exception_handler, which defaults to sys_exception_handler

So I tried redefining pop_exception_handler to simply print out a
message, or do nothing. But poplog still crashed.

Likewise if I define pop_pr_exception to simply print out a message
or do nothing:

define pop_pr_exception(count, message, idstring, severity);
    message =>
enddefine;

Sometimes I find that it prints the message and I can then run pop11 a
bit more before it crashes. Sometimes pop11 goes into a loop and has to
be killed using kill -9

It looks to me as if something gets corrupted before the error handler
starts printing the message, i.e. before pop_pr_exception is
invoked, but the corruption allows the process to continue for a while
before it manifests itself.

How it manifests itself seems to vary.

That is not uncommon with heap corruption, which goes undetected until
either a garbage collection or something else goes wrong.

In this case I don't know if the problem is heap corruption, or
corruption of the pop11 control stack, or something else.

I suspect the bug is in $popsrc/aarith.s for PC+linux and also in
the version for PC+windows.

Sorting this out will require help from someone who is familiar with the
intel machine instruction set.

NOTE: the code in the .s files is not pure assembler for the system.
The $popsrc/*.s include directives for the poplog assembler which
generates the actual assembler files.

See
    http://www.cs.bham.ac.uk/research/poplog/sysdoc/

Aaron
====
Aaron Sloman, ( http://www.cs.bham.ac.uk/~axs/ )
School of Computer Science, The University of Birmingham, B15 2TT, UK

PAPERS: http://www.cs.bham.ac.uk/research/cogaff/ (And free book on Philosophy of AI)
FREE TOOLS: http://www.cs.bham.ac.uk/research/poplog/freepoplog.html



Sun, 19 Jun 2005 21:39:03 GMT  
 INVALID ARRAY SUBSCRIPT leads to Pop core dump

Quote:

> vars b = newarray([-1 5 -1 5]);
> b(6) =>

> ;;; MISHAP - INVALID ARRAY SUBSCRIPT
> ;;; INVOLVING:  6
> ;;; DOING    :  sys_exception_final sys_exception_handler
> ;;;     Segmentation fault (core dumped)

My redHat 6.2 on slow 486 only writes the first 2 lines if the mishap.

Quote:
> It looks as if the error test works, then _array_sub calls the routine
> array_sub_error which succeeds in calling Sub_error, which invokes
> pop11's generic error handler, which starts printing out the error
> message, and fails half way through printing.

 I think:
  when there are no subscript errors, _array_sub exits via the
        jmp     *_PD_EXECUTE(%eax)
ELSE it gets to:
        subl    $4, %USP                ;;; reveal the last index again
        call    XC_LAB(weakref Sys$-Array$-Sub_error)
        call    XC_LAB(setpop)          ;;; in case the error returns
and crashes ?!

Quote:
> I suspect something is wrong with the machine instructions in both files
> and I wonder if someone familiar with the PC architecture can tell what
> is wrong either from the linux version or the windows version. They are
> accessible here if you don't have local versions:

 Well It's been years since I needed to punish my self with asm, but:
  rc/*.s is not 'standard' intel syntax, so heurisstics are needed.
But I haven't found out the meaning of 'slodl' in:
        slodl
        testl   %eax, %eax

Q1. Did I compile the linux V15.53 that I downloaded ?
If so I could make a spare version and patch and test it.
It's near impossible to debug without making patchs and traces.

Quote:
> Printing out the calling stack (the DOING list) is done by
>     sys_pr_message(count, message, idstring, severity);
> defined in $popsrc/errors.p

!! OK, 'help sys_pr_message' explains this. I'll look later.

Quote:
> ........
> In this case I don't know if the problem is heap corruption, or
> corruption of the pop11 control stack, or something else.

A simple rule is that each 'call' must be matched by it's (closing bracket)
'ret'. When I look at many of the *.s in my ..../src/ they are all 'well
structured', in that the 'exit' is via a 'ret'. {using mc under linux
is very usefull to find and inspect the sources.}

Analysing the structure of   DEF_C_LAB (_array_sub)
we see that it can either exit via
        jmp     *_PD_EXECUTE(%eax)
if the arguments are OK
or fall through
array_sub_error:
        subl    $4, %USP                ;;; reveal the last index again
        call    XC_LAB(weakref Sys$-Array$-Sub_error)
        call    XC_LAB(setpop)          ;;; in case the error returns
if the arguments are not OK.

{ Here's a hi-level version:
L1.8:   ;;; Start of loop: analysis
.....
IF %USP = 2
THEN goto array_sub_error
.....
IF %eax Relation %edx
THEN goto array_sub_error
.....
IF %eax Relation2 %eax
THEN goto L1.9   {i.e. skip imull }
   imull        %eax, %edx
L1.9:   .....
IF %eax ~Relation2 %eax
THEN goto L1.8
L1.10:  <-- skip to here if "zero already for a 0-dimensional array"
.....
        jmp     *_PD_EXECUTE(%eax)   _-> exit if args OK

array_sub_error:
        call    XC_LAB(weakref Sys$-Array$-Sub_error)
        call    XC_LAB(setpop)          ;;; in case the error returns
! Crash if return here !? }

When I examine the various calls to XC_LAB in src/*.s  (using mc),
I see that some are commented "never returns".
{Sometimes XC_LAB is reached by a jump. }

In the case that the code is not 'well structured'; to return to the
instruction after the call, it still must 'pop the stack appropriately'
even it uses some address other than the 'pushed return address',
to continue.

That's where I would focus.
Ie. is XC_LAB(<arg>) 'smarty pants code' which can continue/exit
without a ret instruction ?   If not it must crash.
I don't see where the XC_LAB(<arg>) code is !?
What is:     src/syscomp/do_asm.p    about ?

-- Chris Glur.



Wed, 22 Jun 2005 02:47:45 GMT  
 INVALID ARRAY SUBSCRIPT leads to Pop core dump
: > vars b = newarray([-1 5 -1 5]);
: > b(6) =>
: >
: > ;;; MISHAP - INVALID ARRAY SUBSCRIPT
: > ;;; INVOLVING:  6
: > ;;; DOING    :  sys_exception_final sys_exception_handler
: > ;;;     Segmentation fault (core dumped)

: My redHat 6.2 on slow 486 only writes the first 2 lines if the mishap.

: > It looks as if the error test works, then _array_sub calls the routine
: > array_sub_error which succeeds in calling Sub_error, which invokes
: > pop11's generic error handler, which starts printing out the error
: > message, and fails half way through printing.

:  I think:
:   when there are no subscript errors, _array_sub exits via the
:       jmp     *_PD_EXECUTE(%eax)
: ELSE it gets to:
:       subl    $4, %USP                ;;; reveal the last index again
:       call    XC_LAB(weakref Sys$-Array$-Sub_error)
        ^^^^
jmp works OK

:       call    XC_LAB(setpop)          ;;; in case the error returns
: and crashes ?!
<snip>

I have checked that in most cases assembler jumps to poplog routines
(instead of using call). So I changed (using gdb) the offending
call to a jump (fortunately jump and call are of equal length and only
opcode differs, so I just had to change single byte). After change
poplog printed rest of error message and continued with no problem.
So, my conclusion is that machine stack got corrupted -- call puts
return adress on the machine stack and appearently error handling routine
has to examine the machine stack ...

--
                              Waldek Hebisch



Sun, 26 Jun 2005 00:19:42 GMT  
 INVALID ARRAY SUBSCRIPT leads to Pop core dump

Many thanks to

and

for pin-pointing the problem in $popsrc/aarith.s

Recapitulation: Luc Beaudoin pointed out that on PC linux Poplog array
out of bounds errors caused poplog to crash. (It turns out that I had
reported the same problem in March 1991!)

This should produce a mishap (INVALID ARRAY SUBSCRIPT) and then continue
    newarray([1 10])(11) =>

It worked on Solaris and Alpha Unix poplog, but crashed poplog on
PC+linux poplog and also on PC+Windows poplog

In an earlier message I traced the problem to
    $popsrc/aarith.s

but did not know what exactly to look for.

Chris located the suspect code and Waldek's answer told me exactly what
to do to fix that file:

Quote:
> Date: 7 Jan 2003 16:19:42 GMT
> Organization: Politechnika Wroclawska

Chris Glur wrote
> : My redHat 6.2 on slow 486 only writes the first 2 lines if the mishap.

That's what happens if you have the pop11 variable popsyscall set false
(the default). In my test I gave it the value 1, which causes sytem
procedures and anonymous procedures to be printed out also.

Quote:
> :   when there are no subscript errors, _array_sub exits via the
> :     jmp *_PD_EXECUTE(%eax)
> : ELSE it gets to:
> :     subl    $4, %USP        ;;; reveal the last index again
> :     call    XC_LAB(weakref Sys$-Array$-Sub_error)
>       ^^^^

> jmp works OK
> <snip>

> I have checked that in most cases assembler jumps to poplog routines
> (instead of using call). So I changed (using gdb) the offending
> call to a jump (fortunately jump and call are of equal length and only
> opcode differs, so I just had to change single byte). After change
> poplog printed rest of error message and continued with no problem.
> So, my conclusion is that machine stack got corrupted -- call puts
> return adress on the machine stack and appearently error handling routine
> has to examine the machine stack ...

> --
>                               Waldek Hebisch


I have checked this and he is correct.

To rebuild your PC+Linux poplog system, do the following (for which you
may need root privileges if your system was installed as root):

    cd $popsrc
    edit aarith.s
        Change
            call    XC_LAB(weakref Sys$-Array$-Sub_error)
        to
            jmp XC_LAB(weakref Sys$-Array$-Sub_error)

Or copy the fixed version from

    http://www.cs.bham.ac.uk/research/poplog/src/master/S.pcunix/src/aari...

Then in the same directory do the following (as instructed in
    http://www.cs.bham.ac.uk/research/poplog/sysdoc/rebuilding

    pgcomp aarith.s

That will produce two new files
    aarith.o
    aarith.w

Then rebuild the object archive in
    $usepop/pop/obj/src.olb
    $usepop/pop/obj/src.wlb

as follows

   pglibr -r ./ *.w

You can then delete the .o file and the .w file.

Then relink poplog (See HELP NEWPOP)

    $popsrc/newpop -link -x=-xm -norsv

(At this stage your pop11 command may not work if you have redefined it
to use a local saved image. But you can always run basepop11).

Then rebuild any local saved images, and you are done.

E.g. if you have the Birmingham setup this is how to rebuild the local
saved images:

    cd $poplocal/local/com

    # run a script to build the saved images in $usepop/templocalbin
    ./mkall.local

This may take some time and will print out a lot of stuff, including
some warning messages while compiling common lisp. You may prefer to
redirect the output to a log file.

Then, to install the new saved images do
    cd $usepop

    # get rid of local saved images
    rm -rf poplocalbin

    # install the new ones
    mv templocalbin poplocalbin

If you have changed the locations of your saved images ($poplocalbin)
you'll know enough to vary the above.

In due course I shall rebuild the linux pc poplog systems with the
fix installed.

Thanks again for the help. I could not have sorted this out myself.

Now, all we need is for an expert user of windows poplog to edit the
aarith.s file recompile it and fix the problem in windows poplog and
give me back re-built versions of windows poplog (versions 15.5 and
15.53 if possible) to replace the versions here
    http://www.cs.bham.ac.uk/research/poplog/winpop/pop15-53.zip
    http://www.cs.bham.ac.uk/research/poplog/new/pcwin15.5.zip

Thanks.
Aaron
====
Aaron Sloman, ( http://www.cs.bham.ac.uk/~axs/ )
School of Computer Science, The University of Birmingham, B15 2TT, UK

PAPERS: http://www.cs.bham.ac.uk/research/cogaff/ (And free book on Philosophy of AI)
FREE TOOLS: http://www.cs.bham.ac.uk/research/poplog/freepoplog.html



Sun, 26 Jun 2005 02:37:54 GMT  
 
 [ 7 post ] 

 Relevant Pages 

1. Forwarded Re: INVALID ARRAY SUBSCRIPT leads to Pop core dump (PC linux + Windows 2000)

2. array subscripts, subscript triplets and vector subscripts...

3. _tkinter makes core dump (Re: python links with Tk in OS/2 - but core dumps)

4. To dump core or not to dump core

5. calling .inspect on array/hash causes core dump

6. core dump while Allocating memory 3D Array : F90

7. g77 large arrays core dump under Linux

8. Tclsh dumps core when performing array set

9. core or core dump file ?

10. How do I use array index to change Boolean(LED) values for 64 LED's

11. AWK using an array to subscript an array??

12. using an array as subscripts for multidimensional arrays

 

 
Powered by phpBB® Forum Software