Search for an End of Line?
Author |
Message |
Charlie Hoffpaui #1 / 6
|
 Search for an End of Line?
I have text in a file that looks like this: (Where the ? character is Alt + 0182, which normally displays the end of a line and shows as ?) THE TAX CONSULTANTS, INC. ? 300 LACKAWANNA AVE WEST PATERSON, NJ 07424? Note that there is no ? character after the second line, but there"should" be. Is it possible to search through a file of hundreds of addresses formatted in this fashion, and locate and correct any lines that wrap without the ? character? -- Charlie Hoffpauir http://www.*-*-*.com/ ~charlieh/ If you really want to reply via email, my valid address is available on my web site.
|
Thu, 19 May 2005 07:43:42 GMT |
|
 |
Charlie Hoffpaui #2 / 6
|
 Search for an End of Line?
On Sat, 30 Nov 2002 17:43:42 -0600, Charlie Hoffpauir
Additional note: The text is coming from a file that is OCR'd from a PDF file, and I don't understand "why" I'm getting the lines without a ? character, but they are there. -- Charlie Hoffpauir http://freepages.genealogy.rootsweb.com/~charlieh/ If you really want to reply via email, my valid address is available on my web site.
|
Thu, 19 May 2005 07:55:59 GMT |
|
 |
Dave Let #3 / 6
|
 Search for an End of Line?
Hi Charlie, While there isn't a pilcrow (i.e., ?) character, I'm guessing that there is something that causes the line to break, such as a soft return (ALT + 011). The only other way (that I can think of) to control the wrapping is to adjust the line width, which I don't think is causing the break (in your example, then, the first line would also wrap). If you find that it is indeed a soft return, then you can use the find and replace command to replace all occurrences (FindWhat:="^l", ReplaceWith:="^p"). FYI and FWIW, the pilcrow character in your text is actually ALT + 013 (it causes the line to break whereas ALT + 0182 inserts the symbol like a letter and does not cause the line to break). HTH
Quote: > I have text in a file that looks like this: (Where the ? character is > Alt + 0182, which normally displays the end of a line and shows as ?) > THE TAX CONSULTANTS, INC. ? > 300 LACKAWANNA AVE > WEST PATERSON, NJ 07424? > Note that there is no ? character after the second line, but > there"should" be. Is it possible to search through a file of > hundreds of addresses formatted in this fashion, and locate and > correct any lines that wrap without the ? character? > -- > Charlie Hoffpauir > http://freepages.genealogy.rootsweb.com/~charlieh/ > If you really want to reply via email, my valid > address is available on my web site.
|
Fri, 20 May 2005 23:12:56 GMT |
|
 |
Charlie Hoffpaui #4 / 6
|
 Search for an End of Line?
Dave, Thanks for the suggestions, but that apparent;ly isn't it. I did a search on ^| and find none. I'm guessing that in the OCR of the PDF file, somehow the margin for these particular lines is being reset to what is now the last character on the line. BTW, all these lines seem to end in a space. What happens is that if I reset the formatting, the line wraps to one line, so that the address becomes: 300 LACKAWANNA AVE WEST PATERSON, NJ 07424? which isn't too bad, but looks bad. I can look at the individual lines in Word and the margin setting are certainly different for the lines that contain the pilcrow (generally right margin =0) and those that show the problem (generally right margin about 6"). If you're interested in what the file looks like after OCR, take a look at the image I posted here, with tabs and pilcrows displayed. http://web.wt.net/~charlieh/pictures/sample.gif This is the first of 222 similar pages in the file. So what I seem to need to do, is to look for lines that have the large right margin, and check them for a pilcrow at the end of the line, and if they don't contain one, add one. Does that seem right? I'll look at more of the file to see if all the problem lines have the same right margin.... hoping they do, how do I search for lines that have that margin setting? On Mon, 2 Dec 2002 10:12:56 -0500, "Dave Lett" Quote:
>Hi Charlie, >While there isn't a pilcrow (i.e., ?) character, I'm guessing that there is >something that causes the line to break, such as a soft return (ALT + 011). >The only other way (that I can think of) to control the wrapping is to >adjust the line width, which I don't think is causing the break (in your >example, then, the first line would also wrap). If you find that it is >indeed a soft return, then you can use the find and replace command to >replace all occurrences (FindWhat:="^l", ReplaceWith:="^p"). >FYI and FWIW, the pilcrow character in your text is actually ALT + 013 (it >causes the line to break whereas ALT + 0182 inserts the symbol like a letter >and does not cause the line to break). >HTH
>> I have text in a file that looks like this: (Where the ? character is >> Alt + 0182, which normally displays the end of a line and shows as ?) >> THE TAX CONSULTANTS, INC. ? >> 300 LACKAWANNA AVE >> WEST PATERSON, NJ 07424? >> Note that there is no ? character after the second line, but >> there"should" be. Is it possible to search through a file of >> hundreds of addresses formatted in this fashion, and locate and >> correct any lines that wrap without the ? character? >> -- >> Charlie Hoffpauir >> http://freepages.genealogy.rootsweb.com/~charlieh/ >> If you really want to reply via email, my valid >> address is available on my web site.
-- Charlie Hoffpauir http://freepages.genealogy.rootsweb.com/~charlieh/ If you really want to reply via email, my valid address is available on my web site.
|
Sat, 21 May 2005 00:56:53 GMT |
|
 |
Dave Let #5 / 6
|
 Search for an End of Line?
Hi Charlie, Here's a macro that inserts a paragraph (i.e., pilcrow) after each line in a document that doesn't have one. If you want all of your lines to have paragraphs, then this might be the easiest way to go about it (run this without resetting the right indent). Dim iPara As Integer Dim iLine As Integer With ActiveDocument For iPara = .Paragraphs.Count To 1 Step -1 With .Paragraphs(iPara).Range .Select For iLine = 1 To .ComputeStatistics(wdStatisticLines) - 1 With Selection .Collapse Direction:=wdCollapseStart .Bookmarks("\Line").Range.InsertAfter vbCrLf .MoveDown unit:=wdLine, Count:=1, Extend:=wdMove End With Next iLine End With Next iPara End With HTH
Quote: > Dave, > Thanks for the suggestions, but that apparent;ly isn't it. I did a > search on ^| and find none. > I'm guessing that in the OCR of the PDF file, somehow the margin for > these particular lines is being reset to what is now the last > character on the line. BTW, all these lines seem to end in a space. > What happens is that if I reset the formatting, the line wraps to one > line, so that the address becomes: > 300 LACKAWANNA AVE WEST PATERSON, NJ 07424? > which isn't too bad, but looks bad. > I can look at the individual lines in Word and the margin setting are > certainly different for the lines that contain the pilcrow (generally > right margin =0) and those that show the problem (generally right > margin about 6"). > If you're interested in what the file looks like after OCR, take a > look at the image I posted here, with tabs and pilcrows displayed. > http://web.wt.net/~charlieh/pictures/sample.gif > This is the first of 222 similar pages in the file. > So what I seem to need to do, is to look for lines that have the large > right margin, and check them for a pilcrow at the end of the line, and > if they don't contain one, add one. Does that seem right? > I'll look at more of the file to see if all the problem lines have the > same right margin.... hoping they do, how do I search for lines that > have that margin setting? > On Mon, 2 Dec 2002 10:12:56 -0500, "Dave Lett"
> >Hi Charlie, > >While there isn't a pilcrow (i.e., ?) character, I'm guessing that there is > >something that causes the line to break, such as a soft return (ALT + 011). > >The only other way (that I can think of) to control the wrapping is to > >adjust the line width, which I don't think is causing the break (in your > >example, then, the first line would also wrap). If you find that it is > >indeed a soft return, then you can use the find and replace command to > >replace all occurrences (FindWhat:="^l", ReplaceWith:="^p"). > >FYI and FWIW, the pilcrow character in your text is actually ALT + 013 (it > >causes the line to break whereas ALT + 0182 inserts the symbol like a letter > >and does not cause the line to break). > >HTH
> >> I have text in a file that looks like this: (Where the ? character is > >> Alt + 0182, which normally displays the end of a line and shows as ?) > >> THE TAX CONSULTANTS, INC. ? > >> 300 LACKAWANNA AVE > >> WEST PATERSON, NJ 07424? > >> Note that there is no ? character after the second line, but > >> there"should" be. Is it possible to search through a file of > >> hundreds of addresses formatted in this fashion, and locate and > >> correct any lines that wrap without the ? character? > >> -- > >> Charlie Hoffpauir > >> http://freepages.genealogy.rootsweb.com/~charlieh/ > >> If you really want to reply via email, my valid > >> address is available on my web site. > -- > Charlie Hoffpauir > http://freepages.genealogy.rootsweb.com/~charlieh/ > If you really want to reply via email, my valid > address is available on my web site.
|
Sat, 21 May 2005 01:51:41 GMT |
|
 |
Charlie Hoffpaui #6 / 6
|
 Search for an End of Line?
On Mon, 2 Dec 2002 12:51:41 -0500, "Dave Lett" Quote:
>Hi Charlie, >Here's a macro that inserts a paragraph (i.e., pilcrow) after each line in a >document that doesn't have one. If you want all of your lines to have >paragraphs, then this might be the easiest way to go about it (run this >without resetting the right indent). >Dim iPara As Integer >Dim iLine As Integer >With ActiveDocument > For iPara = .Paragraphs.Count To 1 Step -1 > With .Paragraphs(iPara).Range > .Select > For iLine = 1 To .ComputeStatistics(wdStatisticLines) - 1 > With Selection > .Collapse Direction:=wdCollapseStart > .Bookmarks("\Line").Range.InsertAfter vbCrLf > .MoveDown unit:=wdLine, Count:=1, Extend:=wdMove > End With > Next iLine > End With > Next iPara >End With >HTH
Dave, I'm watching it run now, and it does seem to do exactly what I need, if rather slowly. Hopefully I can save some time by actually running your macro after I've done some processing to reduce the total number of lines that have to be read. (For example, I'm stripping off all the headers, so I can insert your macro within my code after the headers have been removed). Many thanks for the help. -- Charlie Hoffpauir http://freepages.genealogy.rootsweb.com/~charlieh/ If you really want to reply via email, my valid address is available on my web site.
|
Sat, 21 May 2005 04:19:15 GMT |
|
|
|