| | |
may pay money for vectorized MD5
Please support our C++ advertiser: Intel Parallel Studio Home
Thread Solved |
•
•
Join Date: Jun 2009
Posts: 33
Reputation:
Solved Threads: 0
I currently am using the optimized scaler 32-bit md4 and md5 implementations from here: http://freerainbowtables.com/phpBB3/...p?p=8454#p8454. I am get ~8.1million hashes/second for md4, ~6.6million for md5.
I am looking for a sse2-accelerated implementation for core 2, primarily md5 - which is easy to implement and at the very least twice as fast as my current speed. This will probably compute multiple hashes in parallel - here is the format I am thinking:
each plaintext would be fed in in an array as follows:
the array would then be padded and the length appended, or whatever is necessary to prepare it for the md5 compress function. The candidate plaintexts should all be 32 bytes after padding (correct me if I am wrong). Each 32-byte plaintext would then be fed into a 2D array of the following structure:
this buffer would hold 4 32-byte padded plaintexts, for example. vect2enc would then be fed into a sse2-accelerated md5 compress function, and the resultant 32-byte md5 hashes would be stored back in their respective elements in vect2enc where I could refer to them one-by-one by array element.
The goal is to generate plaintexts one-by-one, pad them one-by-one, store them one-by-one in 2D vect2enc array, encrypt the entire array simultaneously in SIMD parallel, and then test each element one at a time with memcmp(). I need the padding code (which can be optimized for length as I will know the length of every plaintext before I encrypt it) and the compress function (which needn't encrypt only 4 hashes simultaneously - I have seen people get higher benchmarks encrypting 8 hashes simultaneously).
Anybody think they are up to it? Price estimate? Benchmark estimate? I am not committing to anything yet, just want to check around to see how much this would cost. And if I am misunderstanding this completely and for some reason it is impossible to do this in the method I have described, please educate me.
I am looking for a sse2-accelerated implementation for core 2, primarily md5 - which is easy to implement and at the very least twice as fast as my current speed. This will probably compute multiple hashes in parallel - here is the format I am thinking:
each plaintext would be fed in in an array as follows:
C++ Syntax (Toggle Plain Text)
char candidate[9]; candidate="plaintext";
the array would then be padded and the length appended, or whatever is necessary to prepare it for the md5 compress function. The candidate plaintexts should all be 32 bytes after padding (correct me if I am wrong). Each 32-byte plaintext would then be fed into a 2D array of the following structure:
C++ Syntax (Toggle Plain Text)
unsigned char vect2enc[4][32];
this buffer would hold 4 32-byte padded plaintexts, for example. vect2enc would then be fed into a sse2-accelerated md5 compress function, and the resultant 32-byte md5 hashes would be stored back in their respective elements in vect2enc where I could refer to them one-by-one by array element.
The goal is to generate plaintexts one-by-one, pad them one-by-one, store them one-by-one in 2D vect2enc array, encrypt the entire array simultaneously in SIMD parallel, and then test each element one at a time with memcmp(). I need the padding code (which can be optimized for length as I will know the length of every plaintext before I encrypt it) and the compress function (which needn't encrypt only 4 hashes simultaneously - I have seen people get higher benchmarks encrypting 8 hashes simultaneously).
Anybody think they are up to it? Price estimate? Benchmark estimate? I am not committing to anything yet, just want to check around to see how much this would cost. And if I am misunderstanding this completely and for some reason it is impossible to do this in the method I have described, please educate me.
Post padding isn't the optimal solution. Pre-padding is!
Do a 32 bit write each write no loop!
Of course this would be much faster in assembly code.
As to your SIMD. I've used SIMD in data encoding before but I thiink you're under a misconception.
SIMD is 128-bit so you can grab 16 characters at a time, but from one character string as they're sequential.
You would have to do four grabs, with a 16 byte offset, spend time swizzling the data.
Do a google search on AoS (Array of Structures) vs SoA (Structure of Arrays).
You can try to do this yourself. Take the original function and write it in simple C code but vectorize it. That is orient the data for your 16 character handling! Then orient the code to process day in that fashion! You'll need a working C code to bench mark the assembly code against so keep the original and vectorized C code around.
Do a 32 bit write each write no loop!
C++ Syntax (Toggle Plain Text)
unsigned char vect2enc[4][32]; uint32 *pAry = (uint32 *) vect2enc; *(pAry+0) = 0; *(pAry+1) = 0; : *(pAry+31) = 0;
Of course this would be much faster in assembly code.
Asm Syntax (Toggle Plain Text)
mov eax,0 mov [ebx+0],eax mov [ebx+4],eax mov [ebx+8],eax : mov [ebx+124],eax
As to your SIMD. I've used SIMD in data encoding before but I thiink you're under a misconception.
SIMD is 128-bit so you can grab 16 characters at a time, but from one character string as they're sequential.
You would have to do four grabs, with a 16 byte offset, spend time swizzling the data.
Do a google search on AoS (Array of Structures) vs SoA (Structure of Arrays).
You can try to do this yourself. Take the original function and write it in simple C code but vectorize it. That is orient the data for your 16 character handling! Then orient the code to process day in that fashion! You'll need a working C code to bench mark the assembly code against so keep the original and vectorized C code around.
Last edited by wildgoose; Sep 21st, 2009 at 10:57 pm.
![]() |
Similar Threads
- Online survey? Pay? (eCommerce)
- Google owes me money (Pay-Per-Click Advertising)
- How do you make a site GNU and make money? (Geeks' Lounge)
- Need help with C programming assignment (will pay money) (C)
- How much would you pay for a halo tournament (Geeks' Lounge)
Other Threads in the C++ Forum
- Previous Thread: C++ destructor problem
- Next Thread: Multimap
| Thread Tools | Search this Thread |
api array arrays based beginner binary bitmap c++ c/c++ calculator char char* class code coding compile compiler console conversion count data database delete deploy developer dll download dynamiccharacterarray email encryption error file forms fstream function functions game getline givemetehcodez graph gui homeworkhelp homeworkhelper iamthwee ifstream input int integer java lib linker list loop looping loops map math matrix memory multiple news node number numbertoword output parameter pointer problem program programming project proxy python random read recursion recursive reference rpg sorting string strings struct temperature template text text-file tree url variable vector video visual visualstudio win32 windows winsock word wordfrequency wxwidgets





