Search for an End of Line? 
Author Message
 Search for an End of Line?

I have text in a file that looks like this: (Where the  ? character is
Alt + 0182, which normally displays the end of a line and shows as ?)

THE TAX CONSULTANTS, INC. ?
300 LACKAWANNA AVE
WEST PATERSON, NJ 07424?

Note that there is no ? character after the second line, but
there"should" be.  Is it possible to search through a file of
hundreds of addresses formatted in this fashion, and locate and
correct any lines that wrap without the ? character?

--
Charlie Hoffpauir
http://www.*-*-*.com/ ~charlieh/
If you really want to reply via email, my valid
address is available on my web site.



Thu, 19 May 2005 07:43:42 GMT  
 Search for an End of Line?
On Sat, 30 Nov 2002 17:43:42 -0600, Charlie Hoffpauir

Additional note: The text is coming from a file that is OCR'd from a
PDF file, and I don't understand "why" I'm getting the lines without a
? character, but they are there.

--
Charlie Hoffpauir
http://freepages.genealogy.rootsweb.com/~charlieh/
If you really want to reply via email, my valid
address is available on my web site.



Thu, 19 May 2005 07:55:59 GMT  
 Search for an End of Line?
Hi Charlie,

While there isn't a pilcrow (i.e., ?) character, I'm guessing that there is
something that causes the line to break, such as a soft return (ALT + 011).
The only other way (that I can think of) to control the wrapping is to
adjust the line width, which I don't think is causing the break (in your
example, then, the first line would also wrap). If you find that it is
indeed a soft return, then you can use the find and replace command to
replace all occurrences (FindWhat:="^l", ReplaceWith:="^p").

FYI and FWIW, the pilcrow character in your text is actually ALT + 013 (it
causes the line to break whereas ALT + 0182 inserts the symbol like a letter
and does not cause the line to break).

HTH


Quote:
> I have text in a file that looks like this: (Where the  ? character is
> Alt + 0182, which normally displays the end of a line and shows as ?)

> THE TAX CONSULTANTS, INC. ?
> 300 LACKAWANNA AVE
> WEST PATERSON, NJ 07424?

> Note that there is no ? character after the second line, but
> there"should" be.  Is it possible to search through a file of
> hundreds of addresses formatted in this fashion, and locate and
> correct any lines that wrap without the ? character?

> --
> Charlie Hoffpauir
> http://freepages.genealogy.rootsweb.com/~charlieh/
> If you really want to reply via email, my valid
> address is available on my web site.



Fri, 20 May 2005 23:12:56 GMT  
 Search for an End of Line?
Dave,

Thanks for the suggestions, but that apparent;ly isn't it. I did a
search on ^| and find none.

I'm guessing that in the OCR of the PDF file, somehow the margin for
these particular lines is being reset to what is now the last
character on the line. BTW, all these lines seem to end in a space.
What happens is that if I reset the formatting, the line wraps to one
line, so that the address becomes:

300 LACKAWANNA AVE  WEST PATERSON, NJ 07424?

which isn't too bad, but looks bad.

I can look at the individual lines in Word and the margin setting are
certainly different for the lines that contain the pilcrow (generally
right margin =0) and those that show the problem (generally right
margin about 6").

If you're interested in what the file looks like after OCR, take a
look at the image I posted here, with tabs and pilcrows displayed.

http://web.wt.net/~charlieh/pictures/sample.gif

This is the first of 222 similar pages in the file.

So what I seem to need to do, is to look for lines that have the large
right margin, and check them for a pilcrow at the end of the line, and
if they don't contain one, add one. Does that seem right?
I'll look at more of the file to see if all the problem lines have the
same right margin.... hoping they do, how do I search for lines that
have that margin setting?

On Mon, 2 Dec 2002 10:12:56 -0500, "Dave Lett"

Quote:

>Hi Charlie,

>While there isn't a pilcrow (i.e., ?) character, I'm guessing that there is
>something that causes the line to break, such as a soft return (ALT + 011).
>The only other way (that I can think of) to control the wrapping is to
>adjust the line width, which I don't think is causing the break (in your
>example, then, the first line would also wrap). If you find that it is
>indeed a soft return, then you can use the find and replace command to
>replace all occurrences (FindWhat:="^l", ReplaceWith:="^p").

>FYI and FWIW, the pilcrow character in your text is actually ALT + 013 (it
>causes the line to break whereas ALT + 0182 inserts the symbol like a letter
>and does not cause the line to break).

>HTH



>> I have text in a file that looks like this: (Where the  ? character is
>> Alt + 0182, which normally displays the end of a line and shows as ?)

>> THE TAX CONSULTANTS, INC. ?
>> 300 LACKAWANNA AVE
>> WEST PATERSON, NJ 07424?

>> Note that there is no ? character after the second line, but
>> there"should" be.  Is it possible to search through a file of
>> hundreds of addresses formatted in this fashion, and locate and
>> correct any lines that wrap without the ? character?

>> --
>> Charlie Hoffpauir
>> http://freepages.genealogy.rootsweb.com/~charlieh/
>> If you really want to reply via email, my valid
>> address is available on my web site.

--
Charlie Hoffpauir
http://freepages.genealogy.rootsweb.com/~charlieh/
If you really want to reply via email, my valid
address is available on my web site.


Sat, 21 May 2005 00:56:53 GMT  
 Search for an End of Line?
Hi Charlie,

Here's a macro that inserts a paragraph (i.e., pilcrow) after each line in a
document that doesn't have one. If you want all of your lines to have
paragraphs, then this might be the easiest way to go about it (run this
without resetting the right indent).

Dim iPara As Integer
Dim iLine As Integer

With ActiveDocument
    For iPara = .Paragraphs.Count To 1 Step -1
        With .Paragraphs(iPara).Range
            .Select
            For iLine = 1 To .ComputeStatistics(wdStatisticLines) - 1
                With Selection
                    .Collapse Direction:=wdCollapseStart
                    .Bookmarks("\Line").Range.InsertAfter vbCrLf
                    .MoveDown unit:=wdLine, Count:=1, Extend:=wdMove
                End With
            Next iLine
        End With
    Next iPara
End With

HTH


Quote:
> Dave,

> Thanks for the suggestions, but that apparent;ly isn't it. I did a
> search on ^| and find none.

> I'm guessing that in the OCR of the PDF file, somehow the margin for
> these particular lines is being reset to what is now the last
> character on the line. BTW, all these lines seem to end in a space.
> What happens is that if I reset the formatting, the line wraps to one
> line, so that the address becomes:

> 300 LACKAWANNA AVE  WEST PATERSON, NJ 07424?

> which isn't too bad, but looks bad.

> I can look at the individual lines in Word and the margin setting are
> certainly different for the lines that contain the pilcrow (generally
> right margin =0) and those that show the problem (generally right
> margin about 6").

> If you're interested in what the file looks like after OCR, take a
> look at the image I posted here, with tabs and pilcrows displayed.

> http://web.wt.net/~charlieh/pictures/sample.gif

> This is the first of 222 similar pages in the file.

> So what I seem to need to do, is to look for lines that have the large
> right margin, and check them for a pilcrow at the end of the line, and
> if they don't contain one, add one. Does that seem right?
> I'll look at more of the file to see if all the problem lines have the
> same right margin.... hoping they do, how do I search for lines that
> have that margin setting?

> On Mon, 2 Dec 2002 10:12:56 -0500, "Dave Lett"

> >Hi Charlie,

> >While there isn't a pilcrow (i.e., ?) character, I'm guessing that there
is
> >something that causes the line to break, such as a soft return (ALT +
011).
> >The only other way (that I can think of) to control the wrapping is to
> >adjust the line width, which I don't think is causing the break (in your
> >example, then, the first line would also wrap). If you find that it is
> >indeed a soft return, then you can use the find and replace command to
> >replace all occurrences (FindWhat:="^l", ReplaceWith:="^p").

> >FYI and FWIW, the pilcrow character in your text is actually ALT + 013
(it
> >causes the line to break whereas ALT + 0182 inserts the symbol like a
letter
> >and does not cause the line to break).

> >HTH



> >> I have text in a file that looks like this: (Where the  ? character is
> >> Alt + 0182, which normally displays the end of a line and shows as ?)

> >> THE TAX CONSULTANTS, INC. ?
> >> 300 LACKAWANNA AVE
> >> WEST PATERSON, NJ 07424?

> >> Note that there is no ? character after the second line, but
> >> there"should" be.  Is it possible to search through a file of
> >> hundreds of addresses formatted in this fashion, and locate and
> >> correct any lines that wrap without the ? character?

> >> --
> >> Charlie Hoffpauir
> >> http://freepages.genealogy.rootsweb.com/~charlieh/
> >> If you really want to reply via email, my valid
> >> address is available on my web site.

> --
> Charlie Hoffpauir
> http://freepages.genealogy.rootsweb.com/~charlieh/
> If you really want to reply via email, my valid
> address is available on my web site.



Sat, 21 May 2005 01:51:41 GMT  
 Search for an End of Line?
On Mon, 2 Dec 2002 12:51:41 -0500, "Dave Lett"

Quote:

>Hi Charlie,

>Here's a macro that inserts a paragraph (i.e., pilcrow) after each line in a
>document that doesn't have one. If you want all of your lines to have
>paragraphs, then this might be the easiest way to go about it (run this
>without resetting the right indent).

>Dim iPara As Integer
>Dim iLine As Integer

>With ActiveDocument
>    For iPara = .Paragraphs.Count To 1 Step -1
>        With .Paragraphs(iPara).Range
>            .Select
>            For iLine = 1 To .ComputeStatistics(wdStatisticLines) - 1
>                With Selection
>                    .Collapse Direction:=wdCollapseStart
>                    .Bookmarks("\Line").Range.InsertAfter vbCrLf
>                    .MoveDown unit:=wdLine, Count:=1, Extend:=wdMove
>                End With
>            Next iLine
>        End With
>    Next iPara
>End With

>HTH

Dave,

I'm watching it run now, and it does seem to do exactly what I need,
if rather slowly. Hopefully I can save some time by actually running
your macro after I've done some processing to reduce the total number
of lines that have to be read. (For example, I'm stripping off all the
headers, so I can insert your macro within my code after the headers
have been removed).

Many thanks for the help.
--
Charlie Hoffpauir
http://freepages.genealogy.rootsweb.com/~charlieh/
If you really want to reply via email, my valid
address is available on my web site.



Sat, 21 May 2005 04:19:15 GMT  
 
 [ 6 post ] 

 Relevant Pages 

1. End-of-Line vs. End-of-Paragraph

2. vba line input not recognizing end of line

3. Search thru Word Doc line by line

4. Loop search, do something, stop at end

5. Searching untill the end of the word file

6. End search?

7. Search an Access DB backend using a VB6 front end

8. searching between start and end dates

9. Writing a front-end to use web search enginesI

10. Writing a front-end to use web search enginesI

11. Incremental search in DBGrid & End printing

12. Internet Search Front End

 

 
Powered by phpBB® Forum Software