Author Message

Quote:

> The code you use uses an O(n^2) string copying, so it's not surprising
> that it's slow.
> Even without doing anything else, you could switch it to
> an O(n) algorithm (where n is the number of lines) by doing:

Thanks. I applied your algorithm for a new function which is now very
quick also for big files. But I have now another O(n^2) algorithm:

(with-open-file (file name :direction :input)
(rows nil (nconc rows (list row))))
((eql row 'file-end) rows))))

It becomes sufficiently quick in this way:

(with-open-file (file name :direction :input)
(v (make-array 11820)) (k 0 (+ 1 k)))
((eql row 'file-end) v) (setf (aref v k) row))))

But how can I know that my file has 11819 rows? I could divide
file-length by 80, say. Is there a way to access the result of
(system "wc ...") from Lisp? Also the setf seems ugly.

Quote:
> There's also zero justification for this being a macro.

And here? For symmetry also this should become a function:

(defmacro write-file (name &rest a)
`(with-open-file (file ,name :direction :output)

J. Eschgfaeller

Fri, 16 Nov 2001 03:00:00 GMT

< Thanks. I applied your algorithm for a new function which is now very
< quick also for big files. But I have now another O(n^2) algorithm:
<
<   (with-open-file (file name :direction :input)
<   (rows nil (nconc rows (list row))))
<   ((eql row 'file-end) rows))))

It might be faster and easier to read/write using loop; probably
depends on what you think about using loop.

(with-open-file (input file :direction :input)
(loop for line = (read-line input nil nil)
while line
collect line)))

< It becomes sufficiently quick in this way:
<
<   (with-open-file (file name :direction :input)
<   (v (make-array 11820)) (k 0 (+ 1 k)))
<   ((eql row 'file-end) v) (setf (aref v k) row))))
<
< But how can I know that my file has 11819 rows? I could divide
< file-length by 80, say. Is there a way to access the result of
< (system "wc ...") from Lisp? Also the setf seems ugly.

There is no reliable or recommended way to statically determine such a

(defun rfvr (file)
(let ((vector (make-array 1024
:initial-element nil
:fill-pointer 0
(with-open-file (input file :direction :input)
((null line)
vector)
(vector-push-extend line vector)))))

Using loop,

(defun rfvr (file)
(let ((vector (make-array 1024
:initial-element nil
:fill-pointer 0
(with-open-file (input file :direction :input)
(loop for line = (read-line input nil nil)
while line
do (vector-push-extend line vector)
finally (return vector)))))
.

< > There's also zero justification for this being a macro.
<
< And here? For symmetry also this should become a function:
<
<   (defmacro write-file (name &rest a)
<   `(with-open-file (file ,name :direction :output)

I would use `write-sequence' instead of format.

I would make a macro call `with-file-lines' that would bind some
variable (similiar to with-open-file) to the current line.

(make-array size
:initial-element #\Null
:element-type 'character
:fill-pointer 0

(defmacro with-file-lines ((stream variable-name &optional return-val)
&body forms)
(let ((ch (gensym "CHARACTER"))
(fn-name (gensym "FN")))
`(let ((,variable-name (make-fill-string 1024 t)))
(flet ((,fn-name (,variable-name)

((null ,ch)
(unless (zerop (fill-pointer ,variable-name))
;; Should refuse to
;; operate upon reaching this condition
;; for unix compliance? Tough choice.
; (vector-push-extend #\Newline ,variable-name)
; (error "File `~s' does not end!" (file-namestring ,stream))
)
(,fn-name ,variable-name)
,return-val)
(vector-push-extend ,ch ,variable-name)
;; This `Newline' may not work for your OS,
;; and since I try not to use keywords and optional args
;; I leave you this comment instead.
(cond ((char= ,ch #\Newline)
(,fn-name ,variable-name)
(setf (fill-pointer ,variable-name) 0))))))))

(let ((line-count 0))
(with-open-file (in ".emacs" :direction :input)
(with-file-lines (in line line-count)
(write-string line)
(incf line-count))))

Fri, 16 Nov 2001 03:00:00 GMT

Quote:

> (defun rfvr (file)
>  (let ((vector (make-array 1024
>                  :initial-element nil
>                  :fill-pointer 0
>   (with-open-file (input file :direction :input)
>     (loop for line = (read-line input nil nil)
>        while line
>          do (vector-push-extend line vector)
>       finally (return vector)))))

Some small comments here, mostly for those looking on, and probably
not really for Steve, who may know this but just not thought to have
mentioned it:

One is that it may be useful to open the vector with an initial size of

(or (ignore-errors (truncate (file-length input) n)) 1024)

instead of 1024. where n is the expected average line length.
The ignore-errors is in case it can't compute a file length.
[You have to factor the LET into the WITH-OPEN-FILE in order to
use the open stream INPUT as an arg to FILE-LENGTH.]

The second is that
(vector-push-extend line vector)
will grow the vector by an implemenation-dependent amount if you overrun
the vector allocated size.  [This will involve a copy of the backing-store
of the array in most cases.]  If the amount is a fixed small amount, as
it is in some systems, you'll end up with what may feel like
an O(n^2) problem again.  It's possibly better to control this value yourself
and make sure the new vector size is fairly chunky so the copy won't be
done over and over again.  Suggesting a good value may depend somewhat
on knowledge of the data.

Fri, 16 Nov 2001 03:00:00 GMT

Quote:

>> (defun rfvr (file)
>>  (let ((vector (make-array 1024
>>                  :initial-element nil
>>                  :fill-pointer 0
>>   (with-open-file (input file :direction :input)
>>     (loop for line = (read-line input nil nil)
>>        while line
>>          do (vector-push-extend line vector)
>>       finally (return vector)))))

>Some small comments here, mostly for those looking on, and probably
>not really for Steve, who may know this but just not thought to have
>mentioned it:

>One is that it may be useful to open the vector with an initial size of

>  (or (ignore-errors (truncate (file-length input) n)) 1024)

>instead of 1024. where n is the expected average line length.
>The ignore-errors is in case it can't compute a file length.
>[You have to factor the LET into the WITH-OPEN-FILE in order to
>use the open stream INPUT as an arg to FILE-LENGTH.]

>The second is that
> (vector-push-extend line vector)
>will grow the vector by an implemenation-dependent amount if you overrun
>the vector allocated size.  [This will involve a copy of the backing-store
>of the array in most cases.]  If the amount is a fixed small amount, as
>it is in some systems, you'll end up with what may feel like
>an O(n^2) problem again.  It's possibly better to control this value yourself
>and make sure the new vector size is fairly chunky so the copy won't be
>done over and over again.  Suggesting a good value may depend somewhat
>on knowledge of the data.

You might as well circumvent all these unexpected efficiency problems
of vector-push-extend and adjusting arrays by simply collecting lines
into a LIST and converting to a vector afterwards (if you really have to)
by sticking to the original collecting version:

(defun rfvr (file)
(with-open-file (input file :direction :input)
(coerce (loop for line = (read-line input nil nil)
while line
collect line)
'simple-vector)))

Bernhard

PS: reminds me of the trade-off between hash-tables and trees,
you incur some bounded overhead but get better worst-case protection.
--
--------------------------------------------------------------------------
Bernhard Pfahringer
Austrian Research Institute for  http://www.ai.univie.ac.at/~bernhard/

Fri, 16 Nov 2001 03:00:00 GMT

Quote:

> Thanks. I applied your algorithm for a new function which is now very
> quick also for big files. But I have now another O(n^2) algorithm:

>   (with-open-file (file name :direction :input)
>   (rows nil (nconc rows (list row))))
>   ((eql row 'file-end) rows))))

A simple way to make this O(n) is:

(with-open-file (file name :direction :input)
(rows nil (cons row rows)))
((eql row 'file-end) (nreverse rows)))))

Use "rows" as a stack and accumulate the rows in reverse order.  This is
a O(n) operation.  Finally, make one pass to reverse the stack into
proper order.  Another O(n) operation.

--

Sat, 17 Nov 2001 03:00:00 GMT

Quote:

> Thanks. I applied your algorithm for a new function which is now very
> quick also for big files. But I have now another O(n^2) algorithm:

>   (with-open-file (file name :direction :input)
>   (rows nil (nconc rows (list row))))
>   ((eql row 'file-end) rows))))

A simple way to make this O(n) is:

(with-open-file (file name :direction :input)
(rows nil (cons row rows)))
((eql row 'file-end) (nreverse rows)))))

Use "rows" as a stack and accumulate the rows in reverse order.  This is
a O(n) operation.  Finally, make one pass to reverse the stack into
proper order.  Another O(n) operation.

Also, it would appear slightly safer to make the first call to READ-LINE
the same as subsequent calls.  That way you wouldn't throw an error on
an empty file.

--

Sat, 17 Nov 2001 03:00:00 GMT

Quote:
> A simple way to make this O(n) is:

>   (with-open-file (file name :direction :input)
>          (rows nil (cons row rows)))
>         ((eql row 'file-end) (nreverse rows)))))

> Use "rows" as a stack and accumulate the rows in reverse order.  This is
> a O(n) operation.  Finally, make one pass to reverse the stack into
> proper order.  Another O(n) operation.

Incidentally, the first call to read-line should be the same as
the repeat call.  It's possible for a file to have no lines.

This bug was in the original, and I forgot to mention it.  I bet you
noticed it too but forgot as well.  But better late than never...

The relevant line should be: