Problem with a tuple - newbie ignorance 
Author Message
 Problem with a tuple - newbie ignorance

Greetings all:
So I am writing this program to parse text files. I am using a tuple to
store the field names and values (i.e. [['id =', 't =',
...]['','',...]]. The program reads through all the files in the
directory. For each file it writes all the all the matches in teh
appropriate field.  I am having several problems, but the biggest right
now is:
When the program goes to parse the next file it doesn't clear out the
contents from the last file. Here is how I thought it should work...
<Header stuff>
ifields = [[fields names list][list of just '','','']
<FOR loop with all the file names from the directory>
open file
results = ifields
<fill the results tuple>
<print the results tuple>
<go back to the top of the loop>

For some reason I thought that each time results gets sets to ifields it
should reset the contents of results. I have tried all sorts of other
workarounds, such as using "del results" just before I go back and
repeat the loop.
What am I missing here?
Thanks for your help...
Steve



Sat, 05 Jul 2003 01:36:49 GMT  
 Problem with a tuple - newbie ignorance
How about "Your question is completely incomprehensible".

TIP: Show actual code, not Pseudo-Code. python snippets are executable, your
thoughts aren't.

Warren Postma



Sat, 05 Jul 2003 03:14:57 GMT  
 Problem with a tuple - newbie ignorance
Hi Steven,

I'm afraid that is pretty unclear to me - you'd be better posting a small
snippet of code. I can't tell from you explanation how you think your tuple
should be cleared.

A couple of things to be aware of:

Your tuple is in fact a list - anything in [1,2,3,4] is a list a tuple is
(1,2,3,4).

From your description it sounds to me like you'd be better with a
dictionary. If your matching field names with values this would be ideal. A
dictionary is a key,value type.

Richard

Quote:
> Greetings all:
> So I am writing this program to parse text files. I am using a tuple to
> store the field names and values (i.e. [['id =', 't =',
> ...]['','',...]]. The program reads through all the files in the
> directory. For each file it writes all the all the matches in teh
> appropriate field.  I am having several problems, but the biggest right
> now is:
> When the program goes to parse the next file it doesn't clear out the
> contents from the last file. Here is how I thought it should work...
> <Header stuff>
> ifields = [[fields names list][list of just '','','']
> <FOR loop with all the file names from the directory>
> open file
> results = ifields
> <fill the results tuple>
> <print the results tuple>
> <go back to the top of the loop>

> For some reason I thought that each time results gets sets to ifields it
> should reset the contents of results. I have tried all sorts of other
> workarounds, such as using "del results" just before I go back and
> repeat the loop.
> What am I missing here?
> Thanks for your help...
> Steve



Sat, 05 Jul 2003 03:26:49 GMT  
 Problem with a tuple - newbie ignorance
Sorry all about my incomprehensible message. Honestly, I tried reading
the woodrat and the alligator but I can't get this too work.
I didn't want to send code because I thought everyone would get POed at
me sending code.
Here is my code - remember think newbie and don't slam me too hard, its
one of those days. Thanks again for any help
Steve

import os
import sys
import string
import re

#numfields is the number of fields to potentially parse
numfields = 41

"""  THIS LIST IS INCOMPETE - get complete list from the spreadsheet
   ifields[0] is the name of the field
    ifields[1] is the content of that field
   so if we add a new item to the fields we have to add '' to the other
2 lists
   """
ifields = [['ID =', 'T = ', 'AU =', 'DIST =', 'DNUM =', 'ABS =',
'ARCH.FILTER = </B>I', 'ARCH.FILTER = </B>C', 'ARCH.FILTER = </B>N',
'ARCH.FILTER = </B>S', 'ARCH.FILTER = </B>E', 'CLASSIF',
'ICPSR.CLASSIF1', 'NACJD.CLASS', 'NACDA.CLASS' ,'SAMHDA.CLASS',
'IAED.CLASS', 'EXTENT.COLLECT', 'CLASSNO', 'SERIES.NAME', 'SERIES.INFO',
'RESTRICTIONS =', 'DATA.TYPE', 'TIME.PERIOD', 'DATE.OF.COLLECT',
'FUNDING.AGENCY', 'GRANT.NUMBER', 'DATA.SOURCE', 'EXTENT.PROCESS',
'DATA.FORMAT', 'COLLECT.NOTE', 'SAMPLING =', 'UNIVERSE =',
'RELATED.PUBS', 'CITATION =', 'KEYWORDS =', 'DIR =', 'CHAPTER =',
'SECTION =', 'SUBSECTION =','SUBSUB'],
['','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','']]

#read the files and set up file for writing
dir = os.listdir('D:\\statlab\\ssda\\data') #read in the list of files
in this directory

"""open the file"""
for f in dir:
    try:
        fileproc = open('D:\\statlab\\ssda\\data\\'+f, 'r')
    except IOError:
        print 'Can\'t open file for reading.'
        sys.exit(0)

#create a new iflieds to store the data. More important for looping
    #through the directory

    result = ifields

    #read the file into a list
    text = fileproc.readlines()

    #loop through and find an occurnce of a tag
    #if you find a tag write it to the field
    #if you don't find a tag write it to the previous found field
    for i in text:
        jindex = 0
        found = 0
        for j in result[0]:
            if (i.rfind(j)==0) or (i.rfind(j)==1) or (i.rfind(j)==3):
               if i.rfind("=")+2 == ' ':
                   where = i.rfind("=")+3
               else:
                   where = i.rfind("=")+2
               result[1][jindex] = i[where:-1]
               oldindex = jindex
               found = 1
    # need to write a test for ; at the end
               if result[1][jindex][-1] == ";":
                   result[1][jindex] = i[where:-2]
               break
            elif (i.rfind(j)==6) or (i.rfind(j)==7):
               if i.rfind("=")+6 == ' ':
                   where = i.rfind("=")+7
               else:
                   where = i.rfind("=")+6
               result[1][jindex] = i[where:-1]
               found = 1
               oldindex = jindex
               if result[1][jindex][-1] == ";":
                   result[1][jindex] = i[where:-2]
               break
            jindex += 1
            if i.rfind("BV =") != -1:
               break
        if (found != 1) and (i.rfind("BV =") != -1):
            result[1][oldindex] += i[0:-1]
            found = 0

    {*filter*} = 0
    while {*filter*}< numfields :
        if result[1][crap]:
            print "field # ", result[0][crap], " ", result[1][crap]
        {*filter*}+= 1

___________________________________________________________________________________________________

###Sample data file

ID = 1588;
T = Census Of Population And Housing, 1980 [United States]: Summary Tape
File 1A;
AU = United States Department of Commerce. Bureau of the Census.;
DIST = ICPSR;
DNUM = 07941;
ABS = Summary Tape File 1 consists of four sets of computer-readable
data files containing detailed tabulations of the nation's population
and housing characteristics produced from the 1980 Census. This series
is comprised of Summary Tape File 1A (STF1A), Summary Tape File 1B
(STF1B), Summary Tape File 1C (STF1C), and Summary Tape File 1D (STF1D).
STF1A, STF1B, and STF1D have 52 separate files, one for each state,
Puerto Rico, and the District of Columbia. STF1C consists of one
nation-wide datafile containing information about all
states. All files in the STF1 series are identical, containing 321
substantive data variables organized in the form of 59 ''tables,'' as
well as standard geographic identification variables. All of the data
items contained in all the STF 1 files were tabulated from the
''complete count'' or ''100%'' questions included on the 1980 Census
questionnaire. All four groups of files within the STF1 series have
identical record formats and technical characteristics and differ only
in the types of geographical areas for which the summarized data items
are presented. STF1A provides summaries for state or state equivalent,
county or county equivalent, minor civil division/census county division
(MCD/CCD), place or place segment within MCD/CCD or remainder of
MCD/CCD, census tract or block numbering area (BNA) or untracted segment
within place, place segment or remainder or MCD/CCD, block group (BG) or
BG segment or enumeration district (ED). An additional STF 1A file for
Outlying Areas is also available from ICPSR. This file contains data
specifically for the United States possessions: American Samoa, Guam,
Northern Mariana Islands, Trust Territory of the Pacific Islands, and
the {*filter*} Islands. The information contained in this file is similar to
but not identical with the data for the 50 states and is
documented in a separate codebook. All STF 1 files are being released on
a state-by-state ''flow'' basis, with the less populous states generally
being prepared and released before the most populous states. Each
''record'' in these files comprises 3,276 characters with two record
segments (physical records) of 1,638 characters each, the number of data
records in each file varies by state.
<P><B>CITATION = </B>U.S. Dept. of Commerce, Bureau of the Census.
CENSUS OF POPULATION AND HOUSING, 1980 [UNITED STATES]: SUMMARY TAPE
FILE 1A [Computer file]. Washington, DC: U.S. Dept. of Commerce, Bureau
of the Census [producer], 1982. Ann Arbor, MI: Inter-university
Consortium for Political and Social Research [distributor], 1983.;
DIR;
CHAPTER = Census Enumerations;
SECTION = Contemporary;
SUBSECTION = United States;
BV;
BV.TAPE;

FILE.NUMBER = 0;
NOVELL.LOC = h:\ssda\7941\da7941ct.dat;
NRECS = 8772;
LRECL = 1638;
DS.COMMENTS = ASCII data file: Connecticut;

;



Sat, 05 Jul 2003 04:23:14 GMT  
 Problem with a tuple - newbie ignorance
Quote:

> Sorry all about my incomprehensible message. Honestly, I tried reading
> the woodrat and the alligator but I can't get this too work.
> I didn't want to send code because I thought everyone would get POed at
> me sending code.
> Here is my code - remember think newbie and don't slam me too hard, its
> one of those days. Thanks again for any help
> Steve

> import os
> import sys
> import string
> import re

> #numfields is the number of fields to potentially parse
> numfields = 41

> """  THIS LIST IS INCOMPETE - get complete list from the spreadsheet
>    ifields[0] is the name of the field
>     ifields[1] is the content of that field
>    so if we add a new item to the fields we have to add '' to the other
> 2 lists
>    """
> ifields = [['ID =', 'T = ', 'AU =', 'DIST =', 'DNUM =', 'ABS =',
> 'ARCH.FILTER = </B>I', 'ARCH.FILTER = </B>C', 'ARCH.FILTER = </B>N',
> 'ARCH.FILTER = </B>S', 'ARCH.FILTER = </B>E', 'CLASSIF',
> 'ICPSR.CLASSIF1', 'NACJD.CLASS', 'NACDA.CLASS' ,'SAMHDA.CLASS',
> 'IAED.CLASS', 'EXTENT.COLLECT', 'CLASSNO', 'SERIES.NAME', 'SERIES.INFO',
> 'RESTRICTIONS =', 'DATA.TYPE', 'TIME.PERIOD', 'DATE.OF.COLLECT',
> 'FUNDING.AGENCY', 'GRANT.NUMBER', 'DATA.SOURCE', 'EXTENT.PROCESS',
> 'DATA.FORMAT', 'COLLECT.NOTE', 'SAMPLING =', 'UNIVERSE =',
> 'RELATED.PUBS', 'CITATION =', 'KEYWORDS =', 'DIR =', 'CHAPTER =',
> 'SECTION =', 'SUBSECTION =','SUBSUB'],
> ['','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','']]

> #read the files and set up file for writing
> dir = os.listdir('D:\\statlab\\ssda\\data') #read in the list of files
> in this directory

> """open the file"""
> for f in dir:
>     try:
>         fileproc = open('D:\\statlab\\ssda\\data\\'+f, 'r')
>     except IOError:
>         print 'Can\'t open file for reading.'
>         sys.exit(0)

> #create a new iflieds to store the data. More important for looping
>     #through the directory

>     result = ifields[:]

                      ^^^
I think you want a copy of ifields. A list is mutable.

- Show quoted text -

Quote:

>     #read the file into a list
>     text = fileproc.readlines()

>     #loop through and find an occurnce of a tag
>     #if you find a tag write it to the field
>     #if you don't find a tag write it to the previous found field
>     for i in text:
>         jindex = 0
>         found = 0
>         for j in result[0]:
>             if (i.rfind(j)==0) or (i.rfind(j)==1) or (i.rfind(j)==3):
>                if i.rfind("=")+2 == ' ':
>                    where = i.rfind("=")+3
>                else:
>                    where = i.rfind("=")+2
>                result[1][jindex] = i[where:-1]
>                oldindex = jindex
>                found = 1
>     # need to write a test for ; at the end
>                if result[1][jindex][-1] == ";":
>                    result[1][jindex] = i[where:-2]
>                break
>             elif (i.rfind(j)==6) or (i.rfind(j)==7):
>                if i.rfind("=")+6 == ' ':
>                    where = i.rfind("=")+7
>                else:
>                    where = i.rfind("=")+6
>                result[1][jindex] = i[where:-1]
>                found = 1
>                oldindex = jindex
>                if result[1][jindex][-1] == ";":
>                    result[1][jindex] = i[where:-2]
>                break
>             jindex += 1
>             if i.rfind("BV =") != -1:
>                break
>         if (found != 1) and (i.rfind("BV =") != -1):
>             result[1][oldindex] += i[0:-1]
>             found = 0

>     {*filter*} = 0
>     while {*filter*}< numfields :
>         if result[1][crap]:
>             print "field # ", result[0][crap], " ", result[1][crap]
>         {*filter*}+= 1

> ___________________________________________________________________________________________________

Roland Schlenker


Sat, 05 Jul 2003 05:41:53 GMT  
 Problem with a tuple - newbie ignorance


You are getting bitten by the usual object/reference confusion.

When you assign "... = ifields" you're just creating a new name for the
same objects referenced by ifields. So, when you modify list elements
under the new name, you're also modifying what ifields references.

--
Chris Ryland * Em Software, Inc. * www.emsoftware.com

Sent via Deja.com
http://www.deja.com/



Sat, 05 Jul 2003 05:44:21 GMT  
 Problem with a tuple - newbie ignorance
On Mon, 15 Jan 2001 15:23:14 -0500, Steven Citron-Pousty

Quote:

>Here is my code - remember think newbie and don't slam me too hard, its
>one of those days. Thanks again for any help.

Other posts have already pointed out the difference between list
copying and list referencing, so I won't belabor that, but here's a
couple of idioms you can use to make your code more Pythonic if you
wish:

1) Multiple imports....

Quote:
>import os
>import sys
>import string
>import re

...can be written in one line, such as...

    import os, sys, string, re

This is a matter of style and strictly up to you. Either way works
fine.

2) Like others have said, the "ifields" data structure looks as if it
would be better off as a dictionary, thus giving you nice key/value
lookup capability. Plus you wouldn't have to hardcode that "numfields"
value! Let me know if you want help on this and I'll help you offline.

3) If you find yourself doing the double-backslash a lot (thanks,
Microsoft!), rather than typing...

Quote:
> dir = os.listdir('D:\\statlab\\ssda\\data')

...you can write this as...

  dir = os.listdir(r'D:\statlab\ssda\data')

The 'r' before the string means it's a "raw" string, and all escape
sequences are ignored. Handy for long Windows file paths.

4) If you ever find yourself writing a line like...

Quote:
> if (i.rfind(j)==0) or (i.rfind(j)==1) or (i.rfind(j)==3):

...where you test to see if a value is one of many possible discrete
values, remember that tuples are your friends, and try:

  if i.rfind(j) in (0, 1, 3):

...instead. The "in" operator lets you test for memebership of an item
in a sequence (i.e. is a value in a certain list, or tuple), or even
looking for a character in a string. Plus, your code will run faster
since "rfind" will be called *one* time instead of three.

5) If you ever need to do the typical "for i = 1 to 10" - style
enumerating through a number, such as you have here...

Quote:
>    {*filter*} = 0
>    while {*filter*}< numfields :
>        if result[1][crap]:
>            print "field # ", result[0][crap], " ", result[1][crap]
>        {*filter*}+= 1

...you should replace it with a "for" statement using a call to
"range" (this generates a list of numbers for you). Example:

     for {*filter*}in range(numfields):
         if result[1][crap]:
             print "field # ", result[0][crap], " ", result[1][crap]

6) Python has built in "sprintf"-style string interpolation using the
percent-sign character on a string, turning...

Quote:
> print "field # ", result[0][crap], " ", result[1][crap]

...into...

  print "field # %s %s" % (result[0][crap], result[1][crap])

...but once again, this is a matter of style. I generally use the
percent style interpolation whenever I can, because otherwise the
string handling looks too much like Java, and I get enough of *that*
language at work. :-)

This doesn't cover it all (I'm sure there's more), but I hope this
helps.
==============================
Alan Daniels
daniels at alandaniels dot com



Sat, 05 Jul 2003 13:00:48 GMT  
 Problem with a tuple - newbie ignorance

Quote:

> 3) If you find yourself doing the double-backslash a lot (thanks,
> Microsoft!), rather than typing...
> > dir = os.listdir('D:\\statlab\\ssda\\data')
> ...you can write this as...

>   dir = os.listdir(r'D:\statlab\ssda\data')

How about:

    dir = os.listdir('D:/statlab/ssda/data')

I'm pretty sure that works too but I'm running Linux now and
can't test it.

  Neil



Sat, 05 Jul 2003 06:43:29 GMT  
 Problem with a tuple - newbie ignorance

Quote:

> > 3) If you find yourself doing the double-backslash a lot (thanks,
> > Microsoft!), rather than typing...
> > > dir = os.listdir('D:\\statlab\\ssda\\data')
> > ...you can write this as...

> >   dir = os.listdir(r'D:\statlab\ssda\data')

> How about:

>     dir = os.listdir('D:/statlab/ssda/data')

> I'm pretty sure that works too but I'm running Linux now and
> can't test it.

>   Neil

It does.

regards
 Steve



Sat, 05 Jul 2003 13:56:34 GMT  
 Problem with a tuple - newbie ignorance

Quote:
> Greetings all:
> So I am writing this program to parse text files. I am using a tuple to
> store the field names and values (i.e. [['id =', 't =',
> ...]['','',...]]. The program reads through all the files in the
> directory. For each file it writes all the all the matches in teh
> appropriate field.  I am having several problems, but the biggest right
> now is:
> When the program goes to parse the next file it doesn't clear out the
> contents from the last file. Here is how I thought it should work...
> <Header stuff>
> ifields = [[fields names list][list of just '','','']
> <FOR loop with all the file names from the directory>
> open file
> results = ifields
> <fill the results tuple>
> <print the results tuple>
> <go back to the top of the loop>

> For some reason I thought that each time results gets sets to ifields it
> should reset the contents of results. I have tried all sorts of other
> workarounds, such as using "del results" just before I go back and
> repeat the loop.
> What am I missing here?

These are not tuples, but lists. (you can't change tuples).

When you do 'results = ifields', both variables refer to the same list.

So when you add elements to results, ifields is also changed, since it is
the same list.

Similar to:

Quote:
>>> a = [1]
>>> b = a
>>> b.append(2)
>>> a

[1, 2]

What you need is a *copy* of the original list. So do
results = ifields[:]
instead.

Also, if you have a structure
[[name1, name2, name3, name4], [value, value, value, value]],
that just screams for a dictionary
{ name1: value, name2: value, name3: value, name4: value }

--
Remco Gerlich



Sat, 05 Jul 2003 15:32:57 GMT  
 Problem with a tuple - newbie ignorance
Alan Daniels <danielsatalandanielsdotcom> wrote in comp.lang.python:

Quote:
> 3) If you find yourself doing the double-backslash a lot (thanks,
> Microsoft!), rather than typing...
> > dir = os.listdir('D:\\statlab\\ssda\\data')
> ...you can write this as...

>   dir = os.listdir(r'D:\statlab\ssda\data')

Also, a / works just as well:

dir = os.listdir('D:/statlab/ssda/data')

It works, but it's kind of secret, it seems :)

--
Remco Gerlich



Sat, 05 Jul 2003 20:33:59 GMT  
 Problem with a tuple - newbie ignorance

Quote:

>Alan Daniels <danielsatalandanielsdotcom> wrote in comp.lang.python:

>> 3) If you find yourself doing the double-backslash a lot (thanks,
>> Microsoft!), rather than typing...
>> > dir = os.listdir('D:\\statlab\\ssda\\data')
>> ...you can write this as...

>>   dir = os.listdir(r'D:\statlab\ssda\data')

>Also, a / works just as well:

>dir = os.listdir('D:/statlab/ssda/data')

>It works, but it's kind of secret, it seems :)

Apparently.  Forward slashes have been accepted by MS
"operating systems" since the very beginning, but I don't know
how many times I've seen people (mostly in C programs) shooting
themselves in the foot by trying to use backslashes instead.

--
Grant Edwards                   grante             Yow!  I want to so HAPPY,
                                  at               the VEINS in my neck STAND
                               visi.com            OUT!!



Sun, 06 Jul 2003 02:23:15 GMT  
 Problem with a tuple - newbie ignorance
Normal (forward) slashes won't however work in a command prompt window
in Win2000.
Does the python interpreter make the change "magically" (like it does
the end of line marker)?
Quote:
>K

> Apparently.  Forward slashes have been accepted by MS
> "operating systems" since the very beginning, but I don't know
> how many times I've seen people (mostly in C programs) shooting
> themselves in the foot by trying to use backslashes instead.

> --
> Grant Edwards                   grante             Yow!  I want to so HAPPY,
>                                   at               the VEINS in my neck STAND
>                                visi.com            OUT!!



Sun, 06 Jul 2003 07:30:29 GMT  
 Problem with a tuple - newbie ignorance

Quote:

> Normal (forward) slashes won't however work in a command prompt
> window in Win2000.  Does the python interpreter make the change
> "magically" (like it does the end of line marker)?

No.  I believe the Win32 API accepts either / or \ as a path
separator .  Its the command shell that's braindead.  They can't
use / because its used to signal a flag.  I guess Win2000 still
shows its CP/M heritage.

  Neil



Sun, 06 Jul 2003 01:15:19 GMT  
 Problem with a tuple - newbie ignorance


Quote:
> Normal (forward) slashes won't however work in a command prompt window
> in Win2000.
> Does the python interpreter make the change "magically" (like it does
> the end of line marker)?

Not really -- it's actually the (Microsoft) runtime libraries
that _underlie_ the interpreter (and every other C or C++ program
that is compiled/linked with Microsoft Visual C++) that accept
forward slashes or backward ones indifferently in file-paths
(just as it's the same Microsoft runtime libraries that perform
any line end translation that may be needed).

But I've never seen any C or C++ compiler for Windows (or, earlier,
DOS) that would fail to provide this little convenience (I _think_,
but I'm not sure I recall correctly, that DOS would accept forward
slashes even at the system-call [interrupt] level, while the Win32
APIs are not so forgiving).

The command-line processors have long been the worst-crippled
parts of MS operating systems (which IS saying something, given
the crippledness level of other parts thereof:-).

Alex



Sun, 06 Jul 2003 19:31:57 GMT  
 
 [ 17 post ]  Go to page: [1] [2]

 Relevant Pages 

1. Newbie questions - FAQ and tuples.

2. Newbie Question: Giving names to Elements of List/Tuple/Dict

3. PROP: tuple.index(), tuple.count()

4. PROP: tuple.index(), tuple.count() (fwd)

5. tuple to list to tuple conversion

6. exporting boost::tuples::tuple to python

7. Confusing tuple/list problem in Haskell ...

8. Problem with tuple - hopefully clear this time

9. Problem subclassing tuple

10. tuple problem

11. Problems subclassing tuple instead of list

12. Dolphin 4.0 possible bugs or user ignorance

 

 
Powered by phpBB® Forum Software