Displaying binary data as ascii "1"'s and "0"'s 
Author Message
 Displaying binary data as ascii "1"'s and "0"'s

I'm writing a simple program to display binary data as ascii text to visually
recognize certain patterns in the data. What I need is an efficient algorithm to
convert my binary data to an ascii string.

Here's the algorithm I've written to do this: (pseudo code)

void BinToChar()
{
   char DataBuffer [really big]; //assume it's already filled with data
   unsigned long uSize; // actual bytes in DataBuffer

   std::string strDataBuffer; // string version of DataBuffer

   std::string strTemp;
   unsigned long uTemp;

   strDataBuffer.clear (strDataBuffer.begin (), strDataBuffer.end ());

   for (unsigned long i = 0; i < nSize / 4; i++)
   {
      uTemp = ((unsigned long *)DataBuffer) [i];
      strTemp.clear (strTemp.begin (), strTemp.end ());

      for (unsigned long j = 0x80000000; j > 0; j = j >> 1)
      {
         if (uTemp & j) // this "if" probably could be better.
         {
            strTemp += "1";
         }
         else
         {
            strTemp += "0";
         } // end if
      } // end for

      strDataBuffer += strTemp;
   } // end for

Quote:
}

The original code worked with one byte at a time. It was way too slow. it took
about 20 minutes to half an hour to convert 67k of bytes to text. I rewrote it
to what I have here casting to unsigned long and using 4 bytes at a time and now
it takes about 5 minutes. This is a little better but still too slow.

I haven't spent a lot of time thinking about this yet but does anyone have a
more efficient way of doing this?? I would like to be able to do at least 150k
in less than 2 minutes if possible.

I'm running:
Windows NT 4.0
32 MB Ram (I know that's bad but tech support won't upgrade me :)
200 MHz pentium

I'm using MSVC 5.0 but all I'm looking for is efficient C/C++ code.

I'll be out of the office till Monday but I will be working on this at home so
any replies asap would be greatly appreciated. Also, e-mail is welcome (and
probably prefered since I'm not sure if I'll be able to check back here till
Monday) email at my home address at:


Thanks in advance,
Dave.

--
David J. Rager


http://www.*-*-*.com/ ~djrst14

Every turkey dies...
   Not every turkey truly lives.



Sun, 13 May 2001 03:00:00 GMT  
 Displaying binary data as ascii "1"'s and "0"'s
On Wed, 25 Nov 1998 17:03:53 -0500, David J. Rager

Quote:
>I'm writing a simple program to display binary data as ascii text to visually
>recognize certain patterns in the data. What I need is an efficient algorithm to
>convert my binary data to an ascii string.

No, it seems clearer to step through the array byte by byte. So write
a function that takes an unsigned char and returns the 8 digit string
corresponding to it.  Eg,

std::string byte(unsigned char x)
{
     ostringstream binary; // <sstream>
     binary << std::bitset<8>(x); // <bitset>
     return binary.str();

Quote:
}

This is slow.  Since there are only 256 possible bit patterns in a
byte, assuming 8 bits per byte, we could make a table.  Each element
in the table would be the 8+1=9 char string corresponding to some
unsigned number 'x'.  This approach also avoids the overhead of
memory  allocation.

00 or 0x01 is "00000000"
...
10 or 0x02 is "00001010"
...

Here is a program that just does.  Look at main(...) to figure out
what's going on.

#include <cstddef> // size_t
#include <algorithm> // fill, copy

namespace myspace
{
template <class T, size_t N>
class TinyVec
{
     private:
          T d_vec[N];

     public:
          typedef T           value_type    ;
          typedef T&          reference     ;
          typedef size_t      size_type     ;
          typedef T       *         iterator;
          typedef T const *   const_iterator;

          TinyVec() { std::fill(begin(),end(),T()); }

          reference  operator[](size_type i)       { return d_vec[i]; }
          value_type operator[](size_type i) const { return d_vec[i]; }

                iterator begin()       { return d_vec  ; }
                iterator end  ()       { return d_vec+N; }

          const_iterator begin() const { return d_vec  ; }
          const_iterator end  () const { return d_vec+N; }

Quote:
};
} // namespace myspace

/////
/////
/////

#include <functional> // unary_function

class Byte : public std::unary_function<unsigned char, const char (&)[9]>
{
     public:
          static const Byte& instance() { return s_instance; }

          result_type operator()(argument_type) const;
          //const char (&operator()(unsigned char) const)[9];

     private:
          Byte();
          ~Byte() { }

          static myspace::TinyVec<char,8> calculate(unsigned char x);
          static Byte s_instance;

          char table[256][9];

          Byte(const Byte&); // not implemented
          Byte& operator=(const Byte&); // not implemented

Quote:
};

Byte Byte::s_instance;

myspace::TinyVec<char,8> Byte::calculate(unsigned char x)
{
     myspace::TinyVec<char,8> out;
     for (unsigned i=8; i>0; i--) { out[i-1]=(x&1)+'0'; x>>=1; }
     return out;

Quote:
}

Byte::Byte()
{
     for (unsigned i=0; i<256; i++)
     {
          myspace::TinyVec<char,8> binary=calculate(i);
          std::copy(binary.begin(),binary.end(),table[i]);
          table[i][8]=0;
     }

Quote:
}

inline Byte::result_type Byte::operator()(argument_type x) const
{
     return table[x];

Quote:
}

/////

#include <iostream>

int main()
{
     const Byte& byte=Byte::instance();
     cout << byte(0  ) << '\n';
     cout << byte(10 ) << '\n';
     cout << byte(16 ) << '\n';
     cout << byte(255) << '\n';

Quote:
}

--------------------

Now just read the input file byte by byte using fgetc or streambuf::sgetc.
Then apply Byte::instance()::operator() to this byte.
The return will be a string of 8 chars plus a null 9th char.
Just output this string using fputn or streambuf::sputn

You can even use ostream::write

Note that
   cout << byte(0);
is inefficient as byte(0) returns a char[9] "00000000" which is
known to have 8 printable chars in it.  Then the op<< calls
   operator<<(ostream&, const char *);
which counts the number of chars in the array all over again.  That's
why we rather use fputn or sputn or even write.  Eg,
   std::cout.write(byte(0),8);

A better to solution would be to change the table in class Byte from
   char table[256][9];
to
   myspace::TinyVec<char,8> table[256];

Then change the ctor of class Byte to reflect this change:

Byte::Byte()
{
     for (unsigned i=0; i<256; i++) table[i]=calculate(i);

Quote:
}

And now overload the following operator

template <size_t N>
std::ostream& operator<<(std::ostream&, const myspace::TinyVec<char,N>&);
   // uses the ostream's streambuf's sputn function
   // or the ostream's write function

Quote:
>I'll be out of the office till Monday but I will be working on this at home so
>any replies asap would be greatly appreciated. Also, e-mail is welcome (and
>probably prefered since I'm not sure if I'll be able to check back here till
>Monday) email at my home address at:

Check at www.dejanews.com where all posts are archived.
So if your news server keeps only messages 3 days old, you can go here!

--
----------------------------------

----------------------------------



Sun, 13 May 2001 03:00:00 GMT  
 Displaying binary data as ascii "1"'s and "0"'s

Quote:
>I'm writing a simple program to display binary data as ascii text to visually
>recognize certain patterns in the data. What I need is an efficient algorithm to
>convert my binary data to an ascii string.

Use a look up table.

ie:

char *lookuptable[256] = {
"00000000",
"00000001",

etc

"11111111"

Quote:
};

and then do the followind in your loop:

printf("%s", lookuptable[data]);

or, if you don't want to type in 256 strings, do it four bits at a time and make your look up table only go to 16.

ie:

hinibble = data >> 4;

lonibble = data & 0x0F;

printf("%s%s", lockuptable[hinibble], lookuptable[lonibble]);

Hope that helps.

Roland

--
+----------------------------------------------+

| System Administrator - Comp Sci Course Union |
| http://gulf.uvic.ca/~rrabien                 |



Mon, 14 May 2001 03:00:00 GMT  
 Displaying binary data as ascii "1"'s and "0"'s
I wrote this code.  It does a 1MB file in 10 seconds:

#include <stdio.h>
        // fopen()
        // fclose()
        // fgetc()
        // FILE

#include <stdlib.h>
        // itoa()

#include <string.h>
        // strlen()    

void Usage()
{
        printf("Gary Davies (c) 1998\n");
        printf("This program will produce a text ASCII version of any (binary)
file.\n");
        printf("The file bin2text.txt will be produced.\n");
        printf("Usage:\n");
        printf("       bin2text [FILENAME]\n");

Quote:
};

int main(int argc, char *argv[])
{
        FILE *in;
        FILE *out;
        unsigned char Ch;
        char Str[ 200 ];
        char Str2[ 200 ];
        int Index;
        int Line_No;

        if( argc != 2 )
        {
                Usage();
                return 1;
        };

        if ((in = fopen( argv[1], "rb")) == NULL)
        {
                fprintf(stderr, "Cannot open input file.\n");
                return 1;
        };

        if ((out = fopen( "bin2text.txt", "wt")) == NULL)
        {
                fprintf(stderr, "Cannot open output file.\n");
                return 1;
        };

        Index = 0;
        Line_No = 0;
        while( !feof( in ) )
        {
                if( ( Index % 16 ) == 0 )
                {
                        itoa( Line_No, Str, 10 );
                        while( strlen( Str ) < 6 )
                        {
                                strcpy( Str2, "0" );
                                strcat( Str2, Str );
                                strcpy( Str, Str2 );
                        };
                        if( Index == 0 )
                                strcpy( Str2, "[");
                        else
                                strcpy( Str2, "\n[");
                        strcat( Str2, Str );
                        strcpy( Str, Str2 );
                        strcat( Str, "] " );
                        fputs( Str, out );
                        Line_No = Line_No + 16;
                };
                Index++;

                Ch = fgetc( in );
                if( !feof( in ) )
                {
                        itoa( Ch, Str, 10 );
                        while( strlen( Str ) < 3 )
                        {
                                strcpy( Str2, "0" );
                                strcat( Str2, Str );
                                strcpy( Str, Str2 );
                        };
                        strcat( Str, " ");
                        fputs( Str, out );
                };
        };
        fclose( in );
        fclose( out );
        return 0;

Quote:
};



Mon, 14 May 2001 03:00:00 GMT  
 Displaying binary data as ascii "1"'s and "0"'s

Quote:
>I'm writing a simple program to display binary data as ascii text to visually
>recognize certain patterns in the data. What I need is an efficient algorithm to
>convert my binary data to an ascii string.

>Here's the algorithm I've written to do this: (pseudo code)

>void BinToChar()
>{
>   char DataBuffer [really big]; //assume it's already filled with data
>   unsigned long uSize; // actual bytes in DataBuffer

>   std::string strDataBuffer; // string version of DataBuffer

>   std::string strTemp;
>   unsigned long uTemp;

>   strDataBuffer.clear (strDataBuffer.begin (), strDataBuffer.end ());

>   for (unsigned long i = 0; i < nSize / 4; i++)
>   {
>      uTemp = ((unsigned long *)DataBuffer) [i];
>      strTemp.clear (strTemp.begin (), strTemp.end ());

>      for (unsigned long j = 0x80000000; j > 0; j = j >> 1)
>      {
>         if (uTemp & j) // this "if" probably could be better.
>         {
>            strTemp += "1";
>         }
>         else
>         {
>            strTemp += "0";
>         } // end if
>      } // end for

>      strDataBuffer += strTemp;
>   } // end for
>}

>The original code worked with one byte at a time. It was way too slow. it took
>about 20 minutes to half an hour to convert 67k of bytes to text. I rewrote it
>to what I have here casting to unsigned long and using 4 bytes at a time and now
>it takes about 5 minutes. This is a little better but still too slow.

Have you profiled your code? That's the only sure way to find out
what's slowing it down.

However, it looks to me like the major overhead in your code is string
concatenation. Try working with a large buffer of characters (say, 1K
in size), and convert the input buffer in blocks. This will cut down
the number of concatenation operations by a factor of a thousand.
Pseudo code:

  while ( input buffer not empty ) {
    convert up to 1K of characters, placing output in output buffer
    "add" (concatenate) output buffer to output string
  }

Even faster would be to precalculate the necessary size of the output
string (sizeof DataBuffer + 1) and do the whole thing in a oner.

Quote:
>I haven't spent a lot of time thinking about this yet but does anyone have a
>more efficient way of doing this?? I would like to be able to do at least 150k
>in less than 2 minutes if possible.

The following code may give you some hints. Rather than working on a
fixed input buffer, it reads input from stdin and writes to stdout,
using a 256-entry lookup table to convert the input a byte at a time
It is able to convert its own executable (approx 15K) in under a
second on my meagre P120.

#include <stdio.h>
#include <stdlib.h>

#define BUFFSIZE 1024

int
main( void ) {
  int i, j;
  char table[256][9];

  unsigned char inbuff[BUFFSIZE];
  char outbuff[BUFFSIZE*8];

  /* generate lookup table */
  for ( i = 0; i < 256; ++i ) {
    for ( j = 0; j < 8; ++j ) {
      table[i][7-j]
        = ((unsigned char)i & ((unsigned char)1)<<j) ? '1' : '0';
    }
    table[i][j]=0;
  }

  /* read from stdin in BUFFSIZE-sized chunks, writing
     converted data to stdout */
  do {
    size_t s = fread( inbuff, 1, BUFFSIZE, stdin );
    for ( i = 0; i < s; ++i ) {
      memcpy( &outbuff[i*8], table[inbuff[i]], 8 );
    }
    fwrite( outbuff, 8, s, stdout );
  } while ( !feof( stdin ) );

  /* terminate the string */
  printf( "\0" );

  return 0;

Quote:
}

-- Mat.


Mon, 14 May 2001 03:00:00 GMT  
 Displaying binary data as ascii "1"'s and "0"'s

[snip]

Quote:
>The following code may give you some hints.

[snip]

Sigh. Here's a fixed version :)

#include <stdio.h>
#include <stdlib.h>

#define BUFFSIZE 1024

int
main( void ) {
  int i, j;
  char table[256][8];

  unsigned char inbuff[BUFFSIZE];
  char outbuff[BUFFSIZE*8];

  /* generate lookup table */
  for ( i = 0; i < 256; ++i ) {
    for ( j = 0; j < 8; ++j ) {
      table[i][7-j]
        = ((unsigned char)i & ((unsigned char)1)<<j) ? '1' : '0';
    }
  }

  /* read from stdin in BUFFSIZE-sized chunks, writing
     converted data to stdout */
  do {
    size_t s = fread( inbuff, 1, BUFFSIZE, stdin );
    for ( i = 0; i < s; ++i ) {
      memcpy( &outbuff[i*8], table[inbuff[i]], 8 );
    }
    fwrite( outbuff, 8, s, stdout );
  } while ( !feof( stdin ) );

  /* terminate the string */
  fputc( '\0', stdout );

  return 0;

Quote:
}

-- Mat.


Mon, 14 May 2001 03:00:00 GMT  
 Displaying binary data as ascii "1"'s and "0"'s

Quote:
>Use a look up table.

Great solution.

[snip]

Quote:
>or, if you don't want to type in 256 strings, do it four bits at a
>time and make your look up table only go to 16.

No.  We could have the computer generate the 256 strings at the
start of the program.  Use any algorithm -- even a slow one --
to do this, as it will done only once.  And if you want to be
even faster then you can write a program that will generate the
table, and then use this generated table in your source code.
Ie, write a program that writes part of a program.  But this
is a pain in the ____.

Quote:
>printf("%s%s", lockuptable[hinibble], lookuptable[lonibble]);

In case fputn should be marginally faster as it won't count chars
in the string using strlen() and won't parse the "%s" character
formatting code.  Actually, looking at my man page, the actual
function is:

size_t fread (void *ptr, size_t size, size_t nmemb, FILE *stream);
size_t fwrite(void *ptr, size_t size, size_t nmemb, FILE *stream);

       The function fread reads nmemb elements of data, each size
       bytes long, from the stream pointed to by stream,  storing
       them at the location given by ptr.

       The  function  fwrite  writes nmemb elements of data, each
       size bytes long, to  the  stream  pointed  to  by  stream,
       obtaining them from the location given by ptr.

--
----------------------------------

----------------------------------



Mon, 14 May 2001 03:00:00 GMT  
 Displaying binary data as ascii "1"'s and "0"'s
Five minutes ... whew!  The code below takes a split second (on my p5-90) to
fill the 150KB DataBuffer with random chars, and write the appropriate 8*150KB
0's and 1's to OutputBuffer.  (Of course it took 6 minutes to printf
OutputBuffer, but saving it to a file would be quick.)  If you try this you may
need to increase the stack size from the default. ... Vince

#include <stdio.h>
#include <stdlib.h>
#define DATA_BUF_SIZE 150*1024
int main (void) {
        unsigned char DataBuffer[DATA_BUF_SIZE];
        // fill DataBuffer with random chars
        for (unsigned int i=0; i<DATA_BUF_SIZE; i++) DataBuffer[i] = rand()%256;
        char OutputBuffer[8*DATA_BUF_SIZE+1];
        OutputBuffer[8*DATA_BUF_SIZE] = 0;
        typedef struct BF {
                unsigned short b0 : 1;
                unsigned short b1 : 1;
                unsigned short b2 : 1;
                unsigned short b3 : 1;
                unsigned short b4 : 1;
                unsigned short b5 : 1;
                unsigned short b6 : 1;
                unsigned short b7 : 1;
        } bitfield;
        BF *ch;
        for (i=0; i<DATA_BUF_SIZE; i++) {
                ch = (BF*) &DataBuffer[i];           // treat char as bitfield
                OutputBuffer[8*i]     = '0' + ch->b7;   // 0 or 1
                OutputBuffer[8*i+1] = '0' + ch->b6;
                OutputBuffer[8*i+2] = '0' + ch->b5;
                OutputBuffer[8*i+3] = '0' + ch->b4;
                OutputBuffer[8*i+4] = '0' + ch->b3;
                OutputBuffer[8*i+5] = '0' + ch->b2;
                OutputBuffer[8*i+6] = '0' + ch->b1;
                OutputBuffer[8*i+7] = '0' + ch->b0;
        }
                // printf("%s", OutputBuffer); // better write it to a file or something
        return 0;

Quote:
}

___
   Vincent Fatica
   Syracuse University Mathematics

   http://barnyard.syr.edu/~vefatica/


Mon, 14 May 2001 03:00:00 GMT  
 Displaying binary data as ascii "1"'s and "0"'s
Here's another one which doesn't use all that memory for the OutputBuffer.
Instead it writes the roughly 1.2MB of output to a file, as it goes, 8 chars at
a time. It takes less than 2 seconds (P5-90). ... Vince

#include <stdio.h>
#include <stdlib.h>
#define DATA_BUF_SIZE 150*1024
int main (void) {
        unsigned char DataBuffer[DATA_BUF_SIZE];
        for (unsigned int i=0; i<DATA_BUF_SIZE; i++) DataBuffer[i] = rand()%256;
        char OutputBuffer[8];
        typedef struct BF {
                unsigned short b0 : 1;
                unsigned short b1 : 1;
                unsigned short b2 : 1;
                unsigned short b3 : 1;
                unsigned short b4 : 1;
                unsigned short b5 : 1;
                unsigned short b6 : 1;
                unsigned short b7 : 1;
        } bitfield;
        BF *ch;
        FILE *ofile = fopen("v:\\bits.txt", "wb");
        for (i=0; i<DATA_BUF_SIZE; i++) {
                ch = (BF*) &DataBuffer[i];                  //treat char as bitfield
                OutputBuffer[0]   = '0' + ch->b7; // 0 or 1
                OutputBuffer[1] = '0' + ch->b6;
                OutputBuffer[2] = '0' + ch->b5;
                OutputBuffer[3] = '0' + ch->b4;
                OutputBuffer[4] = '0' + ch->b3;
                OutputBuffer[5] = '0' + ch->b2;
                OutputBuffer[6] = '0' + ch->b1;
                OutputBuffer[7] = '0' + ch->b0;
                fwrite(OutputBuffer, 1, 8, ofile);
        }
        fclose(ofile);
        return 0;

Quote:
}

___
   Vincent Fatica
   Syracuse University Mathematics

   http://barnyard.syr.edu/~vefatica/


Mon, 14 May 2001 03:00:00 GMT  
 Displaying binary data as ascii "1"'s and "0"'s
first ditch the c++. it doesn't help for something this simple. you could use
inline. I haven't don't any asm in a while but here is about what you will need.
it will be instant compared to the c++. even if you don't use it, do a c
routine. initialize the array to all '0' and on a per byte basis add the lsb..
(note that the asm will not compile as is...)

void BTC(char * buf, void * array, int size){
  mov edi,[buf] //destination
  mov esi,[array]       //source
  mov ebx,[size]
  cmp ebx,0     //check for null array
  jz done
loop1:
  lodsl         //can use w or b instead - load from source into eax
  mov ecx,32    //do 32 bytes for speed
loop2:
  shr eax,1     //eax=eax/2, carry=lsb
  mov byte [edi],'0'    //preload dest
  adc byte [edi],0      //add carry if any
  inc edi               //next byte
  loop loop2    //until ecx==0
  dec ebx       //and ebx==0
  jnz loop1
done:

Quote:
}

On Wed, 25 Nov 1998 17:03:53 -0500, "David J. Rager"
Quote:

>I'm writing a simple program to display binary data as ascii text to visually
>recognize certain patterns in the data. What I need is an efficient algorithm to
>convert my binary data to an ascii string.

>Here's the algorithm I've written to do this: (pseudo code)

>void BinToChar()
>{
>   char DataBuffer [really big]; //assume it's already filled with data
>   unsigned long uSize; // actual bytes in DataBuffer

>   std::string strDataBuffer; // string version of DataBuffer

>   std::string strTemp;
>   unsigned long uTemp;

>   strDataBuffer.clear (strDataBuffer.begin (), strDataBuffer.end ());

>   for (unsigned long i = 0; i < nSize / 4; i++)
>   {
>      uTemp = ((unsigned long *)DataBuffer) [i];
>      strTemp.clear (strTemp.begin (), strTemp.end ());

>      for (unsigned long j = 0x80000000; j > 0; j = j >> 1)
>      {
>         if (uTemp & j) // this "if" probably could be better.
>         {
>            strTemp += "1";
>         }
>         else
>         {
>            strTemp += "0";
>         } // end if
>      } // end for

>      strDataBuffer += strTemp;
>   } // end for
>}

>The original code worked with one byte at a time. It was way too slow. it took
>about 20 minutes to half an hour to convert 67k of bytes to text. I rewrote it
>to what I have here casting to unsigned long and using 4 bytes at a time and now
>it takes about 5 minutes. This is a little better but still too slow.

>I haven't spent a lot of time thinking about this yet but does anyone have a
>more efficient way of doing this?? I would like to be able to do at least 150k
>in less than 2 minutes if possible.

>I'm running:
>Windows NT 4.0
>32 MB Ram (I know that's bad but tech support won't upgrade me :)
>200 MHz pentium

>I'm using MSVC 5.0 but all I'm looking for is efficient C/C++ code.

>I'll be out of the office till Monday but I will be working on this at home so
>any replies asap would be greatly appreciated. Also, e-mail is welcome (and
>probably prefered since I'm not sure if I'll be able to check back here till
>Monday) email at my home address at:


>Thanks in advance,
>Dave.



Mon, 14 May 2001 03:00:00 GMT  
 Displaying binary data as ascii "1"'s and "0"'s
Sure, assembly will be fast, but it's not for everyone. Here's another example,
almost in C.  It is essentially what he did. With 150KB of data, it executes in
0.34 seconds (P5-90, measured at the CMD prompt, and including initialization
with random chars) ... 0.48 seconds if the file writes are included. If his code
really took 5 minutes with 64KB of data, I must agree: ditch the ++.  ... Vince

#include <stdio.h>
#include <stdlib.h>
#define DATA_BUF_SIZE 150*1024
int main (void) {
        unsigned char DataBuffer[DATA_BUF_SIZE];
        for (unsigned int i=0; i<DATA_BUF_SIZE; i++)
                DataBuffer[i] = rand()%256;
        char OutputBuffer[32];
        /* FILE *ofile = fopen("v:\\bits.txt", "wb"); */
        for (i=0; i<DATA_BUF_SIZE/4; i++) {
                for (int j=0; j<32; j++)
                        OutputBuffer[31-j] =
                                '0'+((((unsigned long *)DataBuffer)[i] & (1<<j)) ? 1 : 0);
                /* fwrite(OutputBuffer, 32, 1, ofile); */
        }
        /* fclose(ofile); */
return 0;

Quote:
}

>first ditch the c++. it doesn't help for something this simple. you could use
>inline. I haven't don't any asm in a while but here is about what you will need.
>it will be instant compared to the c++. even if you don't use it, do a c
>routine. initialize the array to all '0' and on a per byte basis add the lsb..
>(note that the asm will not compile as is...)

[code snipped]

___
   Vincent Fatica
   Syracuse University Mathematics

   http://barnyard.syr.edu/~vefatica/



Mon, 14 May 2001 03:00:00 GMT  
 Displaying binary data as ascii "1"'s and "0"'s
Wed, 25 Nov 1998 17:03:53 -0500: "David J. Rager"

Quote:

>I'm writing a simple program to display binary data as ascii text to visually
>recognize certain patterns in the data. What I need is an efficient algorithm to
>convert my binary data to an ascii string.

...

Quote:
>I haven't spent a lot of time thinking about this yet but does anyone have a
>more efficient way of doing this?? I would like to be able to do at least 150k
>in less than 2 minutes if possible.

>I'm running:
>Windows NT 4.0
>32 MB Ram (I know that's bad but tech support won't upgrade me :)
>200 MHz pentium

>I'm using MSVC 5.0 but all I'm looking for is efficient C/C++ code.

Using a memory mapped file or a memory buffer with pointers, as
apposed to doing a call for every byte, will be faster.

Also, it takes a long time to go through the formatter, so build up a
formatted string in memory as you go, then display the string with
puts().

With the compiler set to optimize, a 200MHz machine should be able to
go through 150KB in less than a second.

Now to the real issue -- looking for a pattern in data.  Why not just
display the data as a bitmap.  You could do some filtering on the data
also.  To make a bitmap, you just need to put a bitmap header in front
of the data, and add some padding zeros if needed.  The format of a
bitmap file is well documented in the MSVC online help.

Good luck,
G. Levand

//------------------------------------------
#include <windows.h>
#include <assert.h>     // for assert().
#include <stdio.h>      // for puts().
#pragma hdrstop

const wordSize = 8;
const tableLen = 256;

void MakeTable(char abTable[][wordSize], UINT uBytes)
{
   assert(uBytes == wordSize * tableLen);

   for(int byte = 0; byte < tableLen; byte++)
   {
      for(int bit = 0; bit < wordSize; bit++)
         abTable[byte][bit]
            = (byte & (1 << (wordSize - 1 - bit))) ? '1' : '0';
   }

Quote:
}//-----------------------------------------

void _cdecl main()
{
   char abTable[tableLen][wordSize];

   MakeTable(abTable, sizeof(abTable));

   // setup map here...
   const char* pcMap = ::MapViewOfFile(hFile, ...);
   const DWORD dwFileSize = ::GetFileSize(hFile, 0);

   char*const pcWorkingBuf = new char[dwFileSize * wordSize + 1];
   char* pcPut = pcWorkingBuf;

   for(DWORD dw = 0; dw < dwFileSize; dw++, pcPut += wordSize)
   {
      memcpy(pcPut, abTable[*pcMap], wordSize);
      // could put some formatting logic here like LF
      // every 10 bytes.
   }

   *pcPut = 0;
   puts(pcWorkingBuf);
   delete[] pcWorkingBuf;

Quote:
}//-----------------------------------------



Wed, 16 May 2001 03:00:00 GMT  
 Displaying binary data as ascii "1"'s and "0"'s
For one thing, that string concatenate operation is the killer. Using
the + operation on a CString in the fashion you are doing is a
guarantee that you are saying "I want to go SLOWWW..." Essentially,
your efficiency decreases as something like the square of the amount
of data; adding 1 character to a 4096-byte string requires (a)
allocating a 4097-byte string (b) copying the existing 4096-byte
string to the new string (c) freeing up the 4096-byte string. Note
that this tends to fragment storage badly, so your memory footprint
also grows. Overall, the algorithm you have is just about the worst
implementation that you could create (the problem with C++ is that it
hides these inefficiencies in ways that are not obvious, which is why
the notion of "clean" abstract interfaces in C++, or any other OOP, is
a complete myth; performance matters, and none of the specs about how
these operations work have a performance specification.

I suspect that the byte-at-a-time vs. long-at-a-time may have to do
more with I/O efficiency than conversion efficiency. Here's an even
faster method: build a table of 256 strings, e.g., "00000000",
"00000001", ... "11111111". Now for each byte, just print the string
indexed by its value. Can't do much faster than this.

I am always suspect of assembly code solutions to problems whose
performance is completely dominated by I/O time (once you change the
implementation so you avoid the memory allocator time problem you
have)

Another fast solution, if you want an in-memory copy of the string
instead of writing it out, is to ask for the file size in bytes, then
allocate a char * of 8 times that size, e.g.,

        CFile * file = ... whatever;
        DWORD n =  file->FileSize();
        n *= 8;
        TCHAR p = new TCHAR[n];

now you can just add bytes to the end (NOT by using strcat! This gives
you n-squared performance!) of the string, or use the table lookup
hack I suggested to add 8 characters at a time. Just keep a
current-end-of-string position and strcpy to that pointer. You get a
big string (and remember this will hurt you in terms of paging
performance!) but you get it as quickly as it can be generated.
                                joe

Quote:

>first ditch the c++. it doesn't help for something this simple. you could use
>inline. I haven't don't any asm in a while but here is about what you will need.
>it will be instant compared to the c++. even if you don't use it, do a c
>routine. initialize the array to all '0' and on a per byte basis add the lsb..
>(note that the asm will not compile as is...)

>void BTC(char * buf, void * array, int size){
>  mov edi,[buf]     //destination
>  mov esi,[array]   //source
>  mov ebx,[size]
>  cmp ebx,0 //check for null array
>  jz done
>loop1:
>  lodsl             //can use w or b instead - load from source into eax
>  mov ecx,32        //do 32 bytes for speed
>loop2:
>  shr eax,1 //eax=eax/2, carry=lsb
>  mov byte [edi],'0'        //preload dest
>  adc byte [edi],0  //add carry if any
>  inc edi           //next byte
>  loop loop2        //until ecx==0
>  dec ebx   //and ebx==0
>  jnz loop1
>done:
>}

>On Wed, 25 Nov 1998 17:03:53 -0500, "David J. Rager"

>>I'm writing a simple program to display binary data as ascii text to visually
>>recognize certain patterns in the data. What I need is an efficient algorithm to
>>convert my binary data to an ascii string.

>>Here's the algorithm I've written to do this: (pseudo code)

>>void BinToChar()
>>{
>>   char DataBuffer [really big]; //assume it's already filled with data
>>   unsigned long uSize; // actual bytes in DataBuffer

>>   std::string strDataBuffer; // string version of DataBuffer

>>   std::string strTemp;
>>   unsigned long uTemp;

>>   strDataBuffer.clear (strDataBuffer.begin (), strDataBuffer.end ());

>>   for (unsigned long i = 0; i < nSize / 4; i++)
>>   {
>>      uTemp = ((unsigned long *)DataBuffer) [i];
>>      strTemp.clear (strTemp.begin (), strTemp.end ());

>>      for (unsigned long j = 0x80000000; j > 0; j = j >> 1)
>>      {
>>         if (uTemp & j) // this "if" probably could be better.
>>         {
>>            strTemp += "1";
>>         }
>>         else
>>         {
>>            strTemp += "0";
>>         } // end if
>>      } // end for

>>      strDataBuffer += strTemp;
>>   } // end for
>>}

>>The original code worked with one byte at a time. It was way too slow. it took
>>about 20 minutes to half an hour to convert 67k of bytes to text. I rewrote it
>>to what I have here casting to unsigned long and using 4 bytes at a time and now
>>it takes about 5 minutes. This is a little better but still too slow.

>>I haven't spent a lot of time thinking about this yet but does anyone have a
>>more efficient way of doing this?? I would like to be able to do at least 150k
>>in less than 2 minutes if possible.

>>I'm running:
>>Windows NT 4.0
>>32 MB Ram (I know that's bad but tech support won't upgrade me :)
>>200 MHz pentium

>>I'm using MSVC 5.0 but all I'm looking for is efficient C/C++ code.

>>I'll be out of the office till Monday but I will be working on this at home so
>>any replies asap would be greatly appreciated. Also, e-mail is welcome (and
>>probably prefered since I'm not sure if I'll be able to check back here till
>>Monday) email at my home address at:


>>Thanks in advance,
>>Dave.

Joseph M. Newcomer

http://www3.pgh.net/~newcomer


Fri, 01 Jun 2001 03:00:00 GMT  
 Displaying binary data as ascii "1"'s and "0"'s

Quote:

> For one thing, that string concatenate operation is the killer. Using
> the + operation on a CString in the fashion you are doing is a
> guarantee that you are saying "I want to go SLOWWW..." Essentially,
> your efficiency decreases as something like the square of the amount
> of data; adding 1 character to a 4096-byte string requires (a)
> allocating a 4097-byte string (b) copying the existing 4096-byte
> string to the new string (c) freeing up the 4096-byte string. Note
> that this tends to fragment storage badly, so your memory footprint
> also grows.

imho what needs to be done here is to pre-allocate the buffer to the
full size needed (8 * len + 1 at minimum) via CString::GetBuffer() or
its STL equivalent.  The concatenation itself is not the problem, it's
the mechanism used to grow the buffer.  fwiw.  -steve


Fri, 01 Jun 2001 03:00:00 GMT  
 
 [ 14 post ] 

 Relevant Pages 

1. Displaying binary data as ascii "1"'s and "0"'s

2. '.', '::", "->"

3. 250 times 'fopen(...,"a")':errno 24

4. Werid question about "binary data execution"

5. "INSERT" binary data into a field

6. remove() vrs fopen("""w")

7. Looking for "Shroud"/"Obfus"

8. ""help with TSR""

9. Parse trees and "("")"

10. Error "free"-ing "malloc"-ed memory

11. "I don't like const"

12. what's an "Illegal Instruction"??

 

 
Powered by phpBB® Forum Software