PyObject *data - access to raw data? 
Author Message
 PyObject *data - access to raw data?

To learn python I decided to try creating a Python interface to my middleware
library, called isdio.  Before I get much passed the login/logout stuff
(though it is working) I'm curious:

The routines for sending and receiving data assume the programmer knows what
it is they want to send and what it is they are receiving.  Is there a way for
me to access PyObject data in a raw format, along with it's length, so that if
the programmer is passing a string of 20 characters I get a void * to the 20
characters AND can find out the length is 20 so I don't write too many bytes
in the network packet?

Is there a way I can coerce an arbitrary number of bytes in to PyObject
without knowing what kind of PyObject it is that's receiving it?  Say the
programmer knows they're getting a string, how can I return a string object if
all I know about in the receiving function is
a) the bytes I read
b) the number of bytes I read
c) a PyObject (I suppose) from PyArg_ParseTuple().

If I need to replace the value (c) contains, can I do this from within the C
function?

Theoretically, there will be a python module with python code wrapping the C
functions in isdio which may make it easier..

For the fun of it, I've attached my first module, isdio.c

[ isdio.c < 1K ]
#include "isdio.h"
#include "python1.5/Python.h"

static PyObject *login(PyObject *self, PyObject *args)
{
        char *hostname, *service;
        int priority;
        int socket;

        if (!PyArg_ParseTuple(args, "zzi", &hostname, &service, &priority)) {
                return NULL;
        }

        socket = isdLogin(hostname, service, priority);

        return Py_BuildValue("i", socket);

Quote:
}

static PyObject *logout(PyObject *self, PyObject *args)
{
        int socket;

        if (!PyArg_ParseTuple(args, "i", &socket)) {
                return NULL;
        }

        isdLogout(socket);

        Py_INCREF(Py_None);
        return Py_None;

Quote:
}

static PyMethodDef isdio_methods[] = {
        { "login", login, 1, "int socket = isdLogin(string hostname, string servicename, int priority)" },
        { "logout", logout, 1, "int result = isdLogout(int socket)" },
        { NULL, NULL }

Quote:
};

void initisdio(void)
{
        Py_InitModule("isdio", isdio_methods);
Quote:
}



Fri, 24 Jan 2003 03:00:00 GMT  
 PyObject *data - access to raw data?

| To learn python I decided to try creating a Python interface to my middleware
| library, called isdio.  Before I get much passed the login/logout stuff
| (though it is working) I'm curious:
|
| The routines for sending and receiving data assume the programmer knows what
| it is they want to send and what it is they are receiving.  Is there a way for
| me to access PyObject data in a raw format, along with it's length, so that if
| the programmer is passing a string of 20 characters I get a void * to the 20
| characters AND can find out the length is 20 so I don't write too many bytes
| in the network packet?

Yes, for string of course you may do this, for example with '#' in the
format.  When you know the size of the data and there might be 0 value
characters in there, it's obviously a good idea too.

Anything else you want to send, the basic idea is that you should
call on it to represent itself as text.  This marshalling business
is potentially tricky, and there is already lots of good work done
so you don't have to go there yourself.  I would say look at "struct"
first, which works with plain data but is limited to machine data
types, and pickle, which handles complex objects and includes its
own unpacking instructions in the data.  Whether you use an existing
marshalling function or not, your approach will end up looking at
least superficially similar.

| Is there a way I can coerce an arbitrary number of bytes in to PyObject
| without knowing what kind of PyObject it is that's receiving it?  Say the
| programmer knows they're getting a string, how can I return a string object if
| all I know about in the receiving function is
| a) the bytes I read
| b) the number of bytes I read
| c) a PyObject (I suppose) from PyArg_ParseTuple().
|
| If I need to replace the value (c) contains, can I do this from within the C
| function?

You don't want to do that, if I understand you right.  Objects should
be returned literally as the function return value, tuple of objects if
required.  A function may also modify the contents of a list or dictionary,
but that would be pretty unusual for a library routine.  Basically you
can't know what the programmer expects to get back.

| Theoretically, there will be a python module with python code wrapping the C
| functions in isdio which may make it easier..

Good, that's sure where I would tackle this stuff.




Fri, 24 Jan 2003 03:00:00 GMT  
 PyObject *data - access to raw data?

Quote:

> <snip>Anything else you want to send, the basic idea is that you should
> call on it to represent itself as text.  This marshalling business
> is potentially tricky, and there is already lots of good work done
> so you don't have to go there yourself.  I would say look at "struct"
> first, which works with plain data but is limited to machine data
> types, and pickle, which handles complex objects and includes its
> own unpacking instructions in the data.  Whether you use an existing
> marshalling function or not, your approach will end up looking at
> least superficially similar.

Am I to assume then Python programmers would call a network write routine with an
object representing itself as text, so I could safely use PyArg_ParseTuple looking
for strings?  In the C API, a programmer can call the function with a (void *) and
a len, and the send() function dutifully sends it across.  I'm unsure from reading
the response how exactly this would work in Python.

Quote:
> <snip>
> You don't want to do that, if I understand you right.  Objects should
> be returned literally as the function return value, tuple of objects if
> required.  A function may also modify the contents of a list or dictionary,
> but that would be pretty unusual for a library routine.  Basically you
> can't know what the programmer expects to get back.

If a Python module wants to send a C-like structure to another program written in a
a different language (binary representation aside - assume they've already
considered it) how then would I access the data of, say, the struct?  Hmm.  I'll
take a look at the struct module to see if that answers my question.


Fri, 24 Jan 2003 03:00:00 GMT  
 PyObject *data - access to raw data?

...
| Am I to assume then Python programmers would call a network write routine with an
| object representing itself as text, so I could safely use PyArg_ParseTuple looking
| for strings?  In the C API, a programmer can call the function with a (void *) and
| a len, and the send() function dutifully sends it across.  I'm unsure from reading
| the response how exactly this would work in Python.

Forget I said that about relying on the object's ability to represent
itself, that probably would not be of much use in your situation.

At the lowest level, we do call socket.write() with a string, but really
a string.  In theory, perhaps ParseTuple() could check its parameters
for a __str__ method, but in Python that is actually pretty rare.  Instead,
the programmer is usually obliged to take care of this at a much higher
level, like send(str(obj)), because the library routines (even the Python
ones) rarely think to support non-string (or whatever basic type) objects.

| If a Python module wants to send a C-like structure to another program written in a
| a different language (binary representation aside - assume they've already
| considered it) how then would I access the data of, say, the struct?  Hmm.  I'll
| take a look at the struct module to see if that answers my question.

Good, by the time you read this you will already have some of the answers.




Sat, 25 Jan 2003 03:00:00 GMT  
 PyObject *data - access to raw data?

Quote:


> <snip>
> | If a Python module wants to send a C-like structure to another program written in a
> | a different language (binary representation aside - assume they've already
> | considered it) how then would I access the data of, say, the struct?  Hmm.  I'll
> | take a look at the struct module to see if that answers my question.

> Good, by the time you read this you will already have some of the answers.

It would seem the pack and unpack methods are great for creating a structure all at
once, but not so great at accessing members of a structure for purposes of
manipulation.  I've decided to create a class that uses a dictionary to maintain the
values for the members.  It uses a method called asBytes() to return the string produced
by struct.pack().  I should probably call it asString() since I'll document the
arguments to send and recv and string arguments.

Of course, right not its all buried inside my IsdHeader class, but I'm thinking I should
create a more general class called CStructure that defines all the methods for such
things, then subclassing it to create IsdHeader.

[ IsdHeader.py < 1K ]
import isdio
import struct

class IsdHeader:
        def __init__(self):
                self.data = {
                        'len':0,
                        'sequence':0,
                        'reply':0,
                        'error':0,
                        'command':0,
                        'version':0,
                        'workerid':0,
                        'more':0 }

        def setVal(self, key, value):
                self.data[key] = value
                return self.getVal(key)

        def getVal(self, key):
                return self.data[key]

        def len(self):
                return self.getVal('len')

        def len(self, anInteger):
                return self.setVal('len', anInteger)

        def sequence(self):
                return self.getVal('sequence')

        def sequence(self, anInteger):
                return self.setVal('sequence', anInteger)

        def more(self):
                return self.getVal('more')

        def more(self, anInteger):
                return self.setVal('more', anInteger)

        def asBytes(self):
                return struct.pack('llhhhhlh21h',
                        self.data['len'],
                        self.data['sequence'],
                        self.data['reply'],
                        self.data['error'],
                        self.data['command'],
                        self.data['version'],
                        self.data['workerid'],
                        self.data['more'],
                        0, 0, 0, 0, 0,
                        0, 0, 0, 0, 0,
                        0, 0, 0, 0, 0,
                        0, 0, 0, 0, 0,
                        0)



Sat, 25 Jan 2003 03:00:00 GMT  
 PyObject *data - access to raw data?

...
| It would seem the pack and unpack methods are great for creating a structure a
| once, but not so great at accessing members of a structure for purposes of
| manipulation.  I've decided to create a class that uses a dictionary to mainta
| values for the members.  It uses a method called asBytes() to return the strin
| by struct.pack().  I should probably call it asString() since I'll document th
| arguments to send and recv and string arguments.

That looks good.  You could also call it __str__, though there is some
difference of opinion on this.  __str__ supports the str() function.
For some people, this should be a ``friendly'' text representation,
i.e. I guess readable.  For me, it's the object qua string, and if that's
readable, fine.  (This is particularly in contrast to __repr__, which
I think everyone agrees should be text.)  Anyway, I guess the most
likely advantage of calling it __str__ is that it would be invoked
automatically in %s formatting.

| Of course, right not its all buried inside my IsdHeader class, but I'm thinkin
| create a more general class called CStructure that defines all the methods for
| things, then subclassing it to create IsdHeader.

Well, it sounds like you're having fun.  From a practical point of view
it sounds like we're approaching overkill, but that's a judgement call.

Speaking of overkill, I think you are probably finding that your read/write
accessor overloading isn't working.  Unlike C++, Python doesn't support
different functions with the same name but different signature.  You can
simulate it with a kind of varargs feature (try a parameter declared with
a leading "*"), or with default arguments, but it's not commonly done and
for me it isn't idiomatic Python.  Might as well say getstatus()/setstatus().

Now, for a really intrusive accessor feature that is occasionally used
in Python code (I think usually to the sorrow of everyone concerned, but
others might disagree), consider this:

    class T:
        def __init__(self):
            self.__dict__ = {'x': 0, 'y': 0}
        def __getattr__(self, name):
            print 'getattr', repr(name)
            if name == 'xy':
                return self.x * self.y
            else:
                raise AttributeError, name
        def __setattr__(self, name, value):
            print 'setattr', repr(name), repr(value)
            if self.__dict__.has_key(name):
                self.__dict__[name] = value
            else:
                raise AttributeError, name

    t = T()
    print t.x
    print t.xy
    t.x = 5
    t.y = 2
    print t.x
    print t.xy
    print dir(t)

I think the most useful thing about this is that since you know it's
there in case you need it, you can relax and just code direct access
to class instance attributes and avoid the probably pointless hassle
of a pair of accessor functions dedicated to each attribute.  Then if
changes to the class mean you need to calculate a value on access, you
can use these functions to do it -- in this extremely unlikely case,
and 99 out of 100 classes you write will never need any such thing.
Note that __getattr__ is called for only those attributes that don't
already appear in the instance's __dict__.




Sat, 25 Jan 2003 03:00:00 GMT  
 PyObject *data - access to raw data?

Quote:


> ...
> | It would seem the pack and unpack methods are great for creating a structure a
> | once, but not so great at accessing members of a structure for purposes of
> | manipulation.  I've decided to create a class that uses a dictionary to mainta
> | values for the members.  It uses a method called asBytes() to return the strin
> | by struct.pack().  I should probably call it asString() since I'll document th
> | arguments to send and recv and string arguments.

> That looks good.  You could also call it __str__, though there is some
> difference of opinion on this.  __str__ supports the str() function.
> For some people, this should be a ``friendly'' text representation,
> i.e. I guess readable.  For me, it's the object qua string, and if that's
> readable, fine.  (This is particularly in contrast to __repr__, which
> I think everyone agrees should be text.)  Anyway, I guess the most
> likely advantage of calling it __str__ is that it would be invoked
> automatically in %s formatting.

The only problem I see here is if programmers went through the trouble of creating
a C-like structure that contained binary data, they probably intended there to be
binary data that would screw-up __str__, no?  I have problems calling the functions
returning the characters "asString" or any such derivative name, because the
intention is to return an array of bytes, living next to each other in memory.
When I think of a string I usually think of printable characters in sequential
memory addresses terminated by a \0.

Quote:
> | Of course, right not its all buried inside my IsdHeader class, but I'm thinkin
> | create a more general class called CStructure that defines all the methods for
> | things, then subclassing it to create IsdHeader.

> Well, it sounds like you're having fun.  From a practical point of view
> it sounds like we're approaching overkill, but that's a judgement call.

Overkill, maybe.  The lofty goal here is to create a class that any python
programmer could subclass from with something ala

class LittleStructure(CStructure):
    def __init__(self):
        CStructure.__init__(self, {
            'firstValue':'l',
            'nextValue':'l',
            'someShortValue':h
        }

and would then be able to modify members of the structure with:
    myLittleStructure.setMember('firstValue', someIntegerValue)
    myLittleStructure.setMember('nextValue', anotherIntegerValue)
ultimately calling something like
    myLittleStructure.asString()
to return the string of bytes representing (in this case) two longs and a short.

Quote:
> Speaking of overkill, I think you are probably finding that your read/write
> accessor overloading isn't working.  Unlike C++, Python doesn't support
> different functions with the same name but different signature.  You can
> simulate it with a kind of varargs feature (try a parameter declared with
> a leading "*"), or with default arguments, but it's not commonly done and
> for me it isn't idiomatic Python.  Might as well say getstatus()/setstatus().

You're correct.  Actually I hadn't discovered it in the structure stuff (yet) but I
did in the middleware routines that define multiple version of the 'send(..)'
function.  I was wondering what was wrong.  Hmm.  Interesting polymorphic
implementation details...

- Show quoted text -

Quote:
> Now, for a really intrusive accessor feature that is occasionally used
> in Python code (I think usually to the sorrow of everyone concerned, but
> others might disagree), consider this:

>     class T:
>         def __init__(self):
>             self.__dict__ = {'x': 0, 'y': 0}
>         def __getattr__(self, name):
>             print 'getattr', repr(name)
>             if name == 'xy':
>                 return self.x * self.y
>             else:
>                 raise AttributeError, name
>         def __setattr__(self, name, value):
>             print 'setattr', repr(name), repr(value)
>             if self.__dict__.has_key(name):
>                 self.__dict__[name] = value
>             else:
>                 raise AttributeError, name

Isn't this kinda of what I want to use in my CStructure class???

- Show quoted text -

Quote:
> I think the most useful thing about this is that since you know it's
> there in case you need it, you can relax and just code direct access
> to class instance attributes and avoid the probably pointless hassle
> of a pair of accessor functions dedicated to each attribute.  Then if
> changes to the class mean you need to calculate a value on access, you
> can use these functions to do it -- in this extremely unlikely case,
> and 99 out of 100 classes you write will never need any such thing.
> Note that __getattr__ is called for only those attributes that don't
> already appear in the instance's __dict__.



Sat, 25 Jan 2003 03:00:00 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. Extract data from radius's raw data

2. How to recup data with Access or Excel from Cobol Data with *.ISM and *.IDX file

3. Raw data from the Aus Stocks

4. Outputting 'Raw' printer data

5. Eiffel and "raw" data storage

6. USPS/UPS/etc shipping rate raw data...

7. I just want the raw data...

8. newbie:displaying raw binary data

9. Variable length raw-byte data

10. Attempting to read variable length records as raw data

11. Using DEC Fortran to read raw binary data?

12. raw post data

 

 
Powered by phpBB® Forum Software