943,923 Members | Top Members by Rank

Ad:
  • C++ Discussion Thread
  • Marked Solved
  • Views: 743
  • C++ RSS
Sep 20th, 2009
-1

may pay money for vectorized MD5

Expand Post »
I currently am using the optimized scaler 32-bit md4 and md5 implementations from here: http://freerainbowtables.com/phpBB3/...p?p=8454#p8454. I am get ~8.1million hashes/second for md4, ~6.6million for md5.

I am looking for a sse2-accelerated implementation for core 2, primarily md5 - which is easy to implement and at the very least twice as fast as my current speed. This will probably compute multiple hashes in parallel - here is the format I am thinking:

each plaintext would be fed in in an array as follows:

C++ Syntax (Toggle Plain Text)
  1. char candidate[9];
  2. candidate="plaintext";

the array would then be padded and the length appended, or whatever is necessary to prepare it for the md5 compress function. The candidate plaintexts should all be 32 bytes after padding (correct me if I am wrong). Each 32-byte plaintext would then be fed into a 2D array of the following structure:

C++ Syntax (Toggle Plain Text)
  1. unsigned char vect2enc[4][32];

this buffer would hold 4 32-byte padded plaintexts, for example. vect2enc would then be fed into a sse2-accelerated md5 compress function, and the resultant 32-byte md5 hashes would be stored back in their respective elements in vect2enc where I could refer to them one-by-one by array element.

The goal is to generate plaintexts one-by-one, pad them one-by-one, store them one-by-one in 2D vect2enc array, encrypt the entire array simultaneously in SIMD parallel, and then test each element one at a time with memcmp(). I need the padding code (which can be optimized for length as I will know the length of every plaintext before I encrypt it) and the compress function (which needn't encrypt only 4 hashes simultaneously - I have seen people get higher benchmarks encrypting 8 hashes simultaneously).

Anybody think they are up to it? Price estimate? Benchmark estimate? I am not committing to anything yet, just want to check around to see how much this would cost. And if I am misunderstanding this completely and for some reason it is impossible to do this in the method I have described, please educate me.
Similar Threads
Reputation Points: 10
Solved Threads: 0
Light Poster
dzhugashvili is offline Offline
35 posts
since Jun 2009
Sep 21st, 2009
0

Re: may pay money for vectorized MD5

anybody?
Reputation Points: 10
Solved Threads: 0
Light Poster
dzhugashvili is offline Offline
35 posts
since Jun 2009
Sep 21st, 2009
0

Re: may pay money for vectorized MD5

Post padding isn't the optimal solution. Pre-padding is!

Do a 32 bit write each write no loop!
C++ Syntax (Toggle Plain Text)
  1. unsigned char vect2enc[4][32];
  2. uint32 *pAry = (uint32 *) vect2enc;
  3.  
  4. *(pAry+0) = 0;
  5. *(pAry+1) = 0;
  6. :
  7. *(pAry+31) = 0;

Of course this would be much faster in assembly code.
Asm Syntax (Toggle Plain Text)
  1. mov eax,0
  2. mov [ebx+0],eax
  3. mov [ebx+4],eax
  4. mov [ebx+8],eax
  5. :
  6. mov [ebx+124],eax

As to your SIMD. I've used SIMD in data encoding before but I thiink you're under a misconception.
SIMD is 128-bit so you can grab 16 characters at a time, but from one character string as they're sequential.
You would have to do four grabs, with a 16 byte offset, spend time swizzling the data.

Do a google search on AoS (Array of Structures) vs SoA (Structure of Arrays).

You can try to do this yourself. Take the original function and write it in simple C code but vectorize it. That is orient the data for your 16 character handling! Then orient the code to process day in that fashion! You'll need a working C code to bench mark the assembly code against so keep the original and vectorized C code around.
Last edited by wildgoose; Sep 21st, 2009 at 10:57 pm.
Reputation Points: 546
Solved Threads: 99
Practically a Posting Shark
wildgoose is offline Offline
891 posts
since Jun 2009
Oct 4th, 2009
0

Re: may pay money for vectorized MD5

I think I found a solution that works for me. Thank you!
Reputation Points: 10
Solved Threads: 0
Light Poster
dzhugashvili is offline Offline
35 posts
since Jun 2009

This thread is solved

Either the thread starter or a moderator has marked this thread as solved. You can most likely trust the responses and answers given. There is most likely no reason for any further responses to be posted here. If you have a related question, please start a new thread in this forum instead.

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in C++ Forum Timeline: C++ destructor problem
Next Thread in C++ Forum Timeline: Multimap





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC