Author |
Message |
Kenny McCorma #1 / 36
|
 awk array elements
Quote:
>> > Is there an easy way to print the total number of elements in an awk >> > array? >> Other than counting the entries, in awk/nawk/gawk there is no other >> way of sizing the array. Are you having some problem counting? >Well, gawk's asort function returns the number of elements, so it >is possible the OP might be able to get a count for free as a side >effect of something else he is doing. >One question that probably belongs in comp.lang.awk is why >does the awk interpreter not keep a count of the number of >elements in each (associative) array that could be made available >via a function call?
Like TAWK does? Is that what you are getting at?
|
Tue, 26 Apr 2005 08:01:34 GMT |
|
 |
Harlan Grov #2 / 36
|
 awk array elements
Quote:
... >>One question that probably belongs in comp.lang.awk is why >>does the awk interpreter not keep a count of the number of >>elements in each (associative) array that could be made available >>via a function call? >Like TAWK does? Is that what you are getting at?
It's purely for convenience/laziness. There are two ways to assign entries in an array: using the split function and assigning specific entries. The former returns the number of entries created. The latter doesn't, but script code could be added to count the array entries after each assignment, e.g., using the index "" for the array count, a[""] += !(x in a) a[x] = y
|
Tue, 26 Apr 2005 08:27:15 GMT |
|
 |
Kenny McCorma #3 / 36
|
 awk array elements
Quote:
>... >>>One question that probably belongs in comp.lang.awk is why >>>does the awk interpreter not keep a count of the number of >>>elements in each (associative) array that could be made available >>>via a function call? >>Like TAWK does? Is that what you are getting at? >It's purely for convenience/laziness. There are two ways to assign entries >in an array: using the split function and assigning specific entries. The >former returns the number of entries created. The latter doesn't, but script >code could be added to count the array entries after each assignment, e.g., >using the index "" for the array count, >a[""] += !(x in a) >a[x] = y
Isn't everything in AWK "purely for convenience/laziness"? I.e., it is all just a shorthand for the equivalent C program, which is, in turn just shorthand for the equivalent machine code? And, so your point is?
|
Tue, 26 Apr 2005 09:12:46 GMT |
|
 |
Doug McClur #4 / 36
|
 awk array elements
Which, in turn, is just a convenience over programming a Turing Machine. Quote: >Isn't everything in AWK "purely for convenience/laziness"? >I.e., it is all just a shorthand for the equivalent C program, which is, in >turn just shorthand for the equivalent machine code? >And, so your point is?
|
Tue, 26 Apr 2005 09:30:36 GMT |
|
 |
Dan Haygoo #5 / 36
|
 awk array elements
Quote:
> ... > >>One question that probably belongs in comp.lang.awk is why > >>does the awk interpreter not keep a count of the number of > >>elements in each (associative) array that could be made available > >>via a function call? > >Like TAWK does? Is that what you are getting at? > It's purely for convenience/laziness. There are two ways to assign entries > in an array: using the split function and assigning specific entries. The > former returns the number of entries created. The latter doesn't, but script > code could be added to count the array entries after each assignment, e.g., > using the index "" for the array count, > a[""] += !(x in a) > a[x] = y
I was thinking, "Hmm, AWK really could do that much more efficiently internally, it's only an INC in some sort of traversal." But then I thought, "Why would you want to know how many elements there are in an array, anyway?" If you want to traverse elements based on count, then that count must correspond to something. For instance, consider delete a; a[1] = x; a[3] = y for (i = 1; i <= acount(a); i++) doSomething(i) You can't use i to reference into a, so what good is it? This only works if a is uniformly, incrementally, filled. split() and asort() do this. And in user space, like Harlan's example, something like this would typically be used: count=0; delete a : a[++count] = data (or 'a[++a[0]] = data' if you want to package the count in with a.) But then, you already have 'count'. Suppose, though, you have incremental data coming in: 1 e; 2 t; 3 a; 4 o; 5 n; 6 i; 7 s; 8 r; 9 h; 10 l BEGIN {RS=";"} {a[$1]=$2} END {for (i=1; i<= acount(a); i++) print a[i] } This could be of use...but why not just use: END {for (i=1; i<= $1; i++) print a[i] } There is the case of wanting to pre-count data for random output (or at least, output in indeterminate order). Here I want a page header showing total number of pages. END { pages = int(acount(a) / 50) + (acount(a) % 50 > 0) c = 0; p = 0 for (i in a) { if (!(c++ % 50)) print "Page " ++p " of " pages print i } } But I think this is rare enough that I wouldn't mess with the language. (But even in this case, I think I would be counting things as I put them in the array, like Harlan suggests, so I could use the count for other worthwhile things during processing.) Did I miss any other reasons one would need a count, but not yet have it already at hand? - Dan
|
Tue, 26 Apr 2005 12:23:59 GMT |
|
 |
Harlan Gro #6 / 36
|
 awk array elements
... Quote: >And, so your point is?
Anything easily done in the awk script itself with awk code (such as keeping track of the number of entries in an array) doesn't need to be built into the language. If you want a language with overstuffed syntax, use Perl. At least it's available on any platform anyone would want to run it on, unlike TAWK.
|
Tue, 26 Apr 2005 15:00:56 GMT |
|
 |
Kenny McCorma #7 / 36
|
 awk array elements
Quote:
>... >>And, so your point is? >Anything easily done in the awk script itself with awk code (such as keeping >track of the number of entries in an array) doesn't need to be built into the >language.
You obviously missed my point. If what you said was true, then nothing should be built-in (since anything could be done in user-space, if needed). Quote: >If you want a language with overstuffed syntax, use Perl. At least it's >available on any platform anyone would want to run it on, unlike TAWK.
Your envy is showing.
|
Tue, 26 Apr 2005 19:43:59 GMT |
|
 |
Kenny McCorma #8 / 36
|
 awk array elements
... Quote: >I was thinking, "Hmm, AWK really could do that much more efficiently >internally, it's only an INC in some sort of traversal."
A basic principle of real software is that it is better for the implementors to implement commonly needed features than for users to do so. You can google for the reasons (look for "Kenny McCormack" in "comp.lang.awk") if it isn't obvious to you why this is so. Or, to put it another way, I actually do understand why the GAWK implementor(s) want to keep the feature set small, but can't understand why us users should be expected to feel the same way. Obviously, a good example of "group-think". Quote: >But then I thought, "Why would you want to know how many elements there are >in an array, anyway?"
Why would I ever need to know the arc tangent of y/x? In all my years of writing programs in dozens of languages, I've never needed this, but yet that function is there in most of those languages. A common use of the "length(A)" construct in TAWK is like this: /somecondition/ { A[$1] = $0 } END { if (length(A)) # Did I get anything at all? ... } Obviously, syntactic sugar, but, as I've said repeatedly, all of AWK is syntactic sugar - since real men use Turing machines (to quote another poster). Another version of the above is: END { print "There are",length(A),"element" (length(A)==1 ? "" : "s"), "in the array" } Quote: >If you want to traverse elements based on count, then that count must >correspond to something. For instance, consider > delete a; a[1] = x; a[3] = y > for (i = 1; i <= acount(a); i++) doSomething(i) >You can't use i to reference into a, so what good is it? This only works if >a is uniformly, incrementally, filled. split() and asort() do this.
Nobody is doubting that you can get around the limitations of a primitive implementation. It is just more fun and satisfying to work in a full-featured one. Quote: >Did I miss any other reasons one would need a count, but not yet have it >already at hand?
See above. By the way, have you ever used atan2()?
|
Tue, 26 Apr 2005 19:58:36 GMT |
|
 |
Dan Merc #9 / 36
|
 awk array elements
Quote:
> ... >>I was thinking, "Hmm, AWK really could do that much more efficiently >>internally, it's only an INC in some sort of traversal." > A basic principle of real software is that it is better for the > implementors to implement commonly needed features than for users to do so. > You can google for the reasons (look for "Kenny McCormack" in > "comp.lang.awk") if it isn't obvious to you why this is so. > Or, to put it another way, I actually do understand why the GAWK > implementor(s) want to keep the feature set small, but can't understand why > us users should be expected to feel the same way. Obviously, a good > example of "group-think". >>But then I thought, "Why would you want to know how many elements there are >>in an array, anyway?" > Why would I ever need to know the arc tangent of y/x? In all my years of > writing programs in dozens of languages, I've never needed this, but yet > that function is there in most of those languages. > A common use of the "length(A)" construct in TAWK is like this:
/somecondition/ { A[$1] = $0 } Is there some reason you can't /somecondition/ { A[$1] = $0;ct++ } END { if (ct) # Did I get anything at all? ... } Or you could: END { for (any in A) { any=1;break } if (any) ... You still haven't made a case for the feature. -- Dan Mercer
If responding by email, include the phrase 'from usenet' in the subject line to avoid spam filtering. Quote: > END { > if (length(A)) # Did I get anything at all? > ... > } > Obviously, syntactic sugar, but, as I've said repeatedly, all of AWK is > syntactic sugar - since real men use Turing machines (to quote another > poster). > Another version of the above is: > END { > print "There are",length(A),"element" (length(A)==1 ? "" : "s"), > "in the array" > } >>If you want to traverse elements based on count, then that count must >>correspond to something. For instance, consider >> delete a; a[1] = x; a[3] = y >> for (i = 1; i <= acount(a); i++) doSomething(i) >>You can't use i to reference into a, so what good is it? This only works if >>a is uniformly, incrementally, filled. split() and asort() do this. > Nobody is doubting that you can get around the limitations of a primitive > implementation. It is just more fun and satisfying to work in a > full-featured one. >>Did I miss any other reasons one would need a count, but not yet have it >>already at hand? > See above. By the way, have you ever used atan2()?
Opinions expressed herein are my own and may not represent those of my employer.
|
Tue, 26 Apr 2005 21:57:19 GMT |
|
 |
Kenny McCorma #10 / 36
|
 awk array elements
Quote:
>> ... >>>I was thinking, "Hmm, AWK really could do that much more efficiently >>>internally, it's only an INC in some sort of traversal." >> A basic principle of real software is that it is better for the >> implementors to implement commonly needed features than for users to do so. >> You can google for the reasons (look for "Kenny McCormack" in >> "comp.lang.awk") if it isn't obvious to you why this is so. >> Or, to put it another way, I actually do understand why the GAWK >> implementor(s) want to keep the feature set small, but can't understand why >> us users should be expected to feel the same way. Obviously, a good >> example of "group-think". >>>But then I thought, "Why would you want to know how many elements there are >>>in an array, anyway?" >> Why would I ever need to know the arc tangent of y/x? In all my years of >> writing programs in dozens of languages, I've never needed this, but yet >> that function is there in most of those languages. >> A common use of the "length(A)" construct in TAWK is like this: >/somecondition/ { A[$1] = $0 } >Is there some reason you can't > /somecondition/ { A[$1] = $0;ct++ }
This is ugly, but the most common way to do it. Quote: > END { > if (ct) # Did I get anything at all? > ... > } >Or you could: > END { for (any in A) { any=1;break }
This is putrid. Quote: > if (any) ... >You still haven't made a case for the feature.
Are you as dense as you seem? Of course there are workarounds. My point is that it is better if it is a builtin. You don't have to agree with that, but it would be nice if you could understand it.
|
Tue, 26 Apr 2005 22:41:54 GMT |
|
 |
Dan Merc #11 / 36
|
 awk array elements
Quote:
>>> ... >>>>I was thinking, "Hmm, AWK really could do that much more efficiently >>>>internally, it's only an INC in some sort of traversal." >>> A basic principle of real software is that it is better for the >>> implementors to implement commonly needed features than for users to do so. >>> You can google for the reasons (look for "Kenny McCormack" in >>> "comp.lang.awk") if it isn't obvious to you why this is so. >>> Or, to put it another way, I actually do understand why the GAWK >>> implementor(s) want to keep the feature set small, but can't understand why >>> us users should be expected to feel the same way. Obviously, a good >>> example of "group-think". >>>>But then I thought, "Why would you want to know how many elements there are >>>>in an array, anyway?" >>> Why would I ever need to know the arc tangent of y/x? In all my years of >>> writing programs in dozens of languages, I've never needed this, but yet >>> that function is there in most of those languages. >>> A common use of the "length(A)" construct in TAWK is like this: >>/somecondition/ { A[$1] = $0 } >>Is there some reason you can't >> /somecondition/ { A[$1] = $0;ct++ } > This is ugly, but the most common way to do it. >> END { >> if (ct) # Did I get anything at all? >> ... >> } >>Or you could: >> END { for (any in A) { any=1;break } > This is putrid.
One would have to ask why, since it is the least expensive method of testing whether an array holds any data. Quote: >> if (any) ... >>You still haven't made a case for the feature. > Are you as dense as you seem? > Of course there are workarounds. My point is that it is better if it is a > builtin. You don't have to agree with that, but it would be nice if you > could understand it.
Well, maybe if you would stop thinking of them as arrays and thinking of them as hashes you would understand why a count hasn't been implemented. Even when you act as though they are arrays, i.e. X[1] = y You are really hashing two strings - "1" and the string value of whatever is in y. And you have to be careful about accidentally creating entries: $ awk 'BEGIN { > if (X[22]) print 1 > else print 2 > for (i in X) print i > exit > }' 2 22 In short, if you think about awk arrays as hashes you will probably be safe. If you think too much of them as arrays you will probably come to grief. Seriously, if this bothers you then you probably shouldn't program in awk - pick a more procedural language like perl. I've often thought AWK programming required a twisted mind to really do right. -- Dan Mercer
If responding by email, include the phrase 'from usenet' in the subject line to avoid spam filtering. Opinions expressed herein are my own and may not represent those of my employer.
|
Tue, 26 Apr 2005 23:09:06 GMT |
|
 |
Kenny McCorma #12 / 36
|
 awk array elements
... Quote: >Well, maybe if you would stop thinking of them as arrays and thinking >of them as hashes you would understand why a count hasn't been implemented.
It has been implemented. I use it all the time. I really don't see why you keep pretending it doesn't exist.
|
Tue, 26 Apr 2005 23:34:04 GMT |
|
 |
Dan Haygoo #13 / 36
|
 awk array elements
Quote:
> ... > >I was thinking, "Hmm, AWK really could do that much more efficiently > >internally, it's only an INC in some sort of traversal." > A basic principle of real software is that it is better for the > implementors to implement commonly needed features than for users to do so. > You can google for the reasons (look for "Kenny McCormack" in > "comp.lang.awk") if it isn't obvious to you why this is so. > Or, to put it another way, I actually do understand why the GAWK > implementor(s) want to keep the feature set small, but can't understand why > us users should be expected to feel the same way. Obviously, a good > example of "group-think". > >But then I thought, "Why would you want to know how many elements there are > >in an array, anyway?" > Why would I ever need to know the arc tangent of y/x? In all my years of > writing programs in dozens of languages, I've never needed this, but yet > that function is there in most of those languages. > A common use of the "length(A)" construct in TAWK is like this: > /somecondition/ { A[$1] = $0 } > END { > if (length(A)) # Did I get anything at all? > ... > } > Obviously, syntactic sugar, but, as I've said repeatedly, all of AWK is > syntactic sugar - since real men use Turing machines (to quote another > poster). > Another version of the above is: > END { > print "There are",length(A),"element" (length(A)==1 ? "" : "s"), > "in the array" > } > >If you want to traverse elements based on count, then that count must > >correspond to something. For instance, consider > > delete a; a[1] = x; a[3] = y > > for (i = 1; i <= acount(a); i++) doSomething(i) > >You can't use i to reference into a, so what good is it? This only works if > >a is uniformly, incrementally, filled. split() and asort() do this. > Nobody is doubting that you can get around the limitations of a primitive > implementation. It is just more fun and satisfying to work in a > full-featured one. > >Did I miss any other reasons one would need a count, but not yet have it > >already at hand? > See above. By the way, have you ever used atan2()?
Hi Kenny- Quote: > A basic principle of real software is that it is better for the > implementors to implement commonly needed features than > for users to do so.
"Commonly" is the operative word. I would maintain that this is not a common need. Quote: > Obviously, syntactic sugar...
No, it wouldn't be. "Syntactic sugar," as I understand the term, is modification to the grammer, not the built-in function set, of a language. For instance, the printf() function is only that. But modifying the grammer of the language to allow a printf keyword that works similarly to the print statement is syntactic sugar. Syntactic sugar is required to go places where the ordinary language definition can't...like the variable number of parameters to Pascal's write() and writeln(). Implementing length() or count() or acount() hardly requires modification of the grammer of the language. Quote: > By the way, have you ever used atan2()?
Actually, yes. But that is beside the point you are trying to make. Many languages don't implement atan2(); they leave it in user space. This function, and those other rare ones, typically come to languages in libraries; since AWK is (was) interpretted, and real math is (was) slow in user-space, I'm guessing that, since they knew many people would be using, say, sin() and cos(), that they just brought along the whole library, somewhat mechanically. Quote: > >Did I miss any other reasons one would need a count, but not yet have it > >already at hand? > if (length(A)) # Did I get anything at all?
This is a good example of something I missed. Strictly speaking, though, you aren't interested in the number of elements at all...you just want to know if there are any elements in the array. So what you really want is the isempty() function, not length(). But--wait--oh no--fearturitis is creeping in--and the language is now two functions bigger to learn! - Dan
|
Wed, 27 Apr 2005 00:29:06 GMT |
|
 |
Dan Haygoo #14 / 36
|
 awk array elements
Quote:
> ... > >Well, maybe if you would stop thinking of them as arrays and thinking > >of them as hashes you would understand why a count hasn't been implemented. > It has been implemented. I use it all the time. > I really don't see why you keep pretending it doesn't exist.
Kenny - What do you mean? You have, indeed, pointed out that your vendor has, indeed, implemented it in their AWK-like language, but do gawk, awk, mawk, POSIX-compliant awk, OTU specs, whatever, have this feature already? Quote: > I really don't see why you keep pretending it doesn't exist.
For those of us tht didn't pay extra for our AWK, I don't think it does. (I guess that's what paying extra gets you.) It sounds like the only reason to use the intrinsic length is to test whether or not an array is empty. Do you use it for other uses, as well? - Dan
|
Wed, 27 Apr 2005 00:37:49 GMT |
|
 |
Jim Mellande #15 / 36
|
 awk array elements
Quote:
<snip> > > >Did I miss any other reasons one would need a count, but not yet have it > > >already at hand? > > if (length(A)) # Did I get anything at all? > This is a good example of something I missed. Strictly speaking, though, > you aren't interested in the number of elements at all...you just want to > know if there are any elements in the array. So what you really want is the > isempty() function, not length(). But--wait--oh no--fearturitis is creeping > in--and the language is now two functions bigger to learn! > - Dan
function isempty(a, i) { for (i in a) return 0; return 1; Quote: }
Works in gawk, even if the array was never defined. YMMV on other awks (OK, the semicolons are syntactic sugar, but they make C programmers like me more comfortable) -- Jim Mellander Incident Response Manager Computer Protection Program Lawrence Berkeley National Laboratory (510) 486-7204 Your fortune for today is: Questions are never indiscreet, answers sometimes are. -- Oscar Wilde
|
Wed, 27 Apr 2005 02:41:34 GMT |
|
|
Page 1 of 3
|
[ 36 post ] |
|
Go to page:
[1]
[2] [3] |
|