hello. I need help implementing an SSE2-accelerated MD4 implementation. I found one that I think may work at http://www.freerainbowtables.com/phpBB3/viewtopic.php?f=6&t=904&start=30
it is the post by Corni:

“So, I took a crashcourse in how to implement MD4 and in what the hell is SSE2 and how do you use it, and implemented the reference implementation in SSE…”

but need help implementing it. It is a little over my head. All I want to do is conduct 4 MD4 encryptions simultaneously using SSE2.

I have this, for example:

char candidate0[]="password0";
char candidate1[]="password1";
char candidate2[]="password2";
char candidate3[]="password3";

I believe I need to pad them and append the length, or whatever the MD4 algorithm requires, before feeding them into the compressSse() function which will do the rest of the work to turn them into 4 MD4 hashes.

Maybe somebody knows of a better SSE2-accelerated MD4 routine? Easier/ faster? I also plan to use this for NTLM and MSCACHE, so I would like to retain the flexibility to use or not to use UNICODE conversion. I think the implementation above might include UNICODE conversion. I would like to keep that separate, so I can use it for NTLM and MD4.

here is pseudocode for my planned future implementation in a password brute forcer:

int cached_hashes=0;//counts how many plaintexts are ready, when 4, compute
char candidate[];//the candidate password
char plaintexts[4][];//stores all 4 plaintexts
unsigned char ready_candidate[16];//plaintexts prepared for the compressSse() function
unsigned char encrypted[16];//the encrypted password we are trying to recover

	//possibly UNICODE conversion here, depending on hash type
	prepare candidate- padding, whatever- store in ready_candidate
	if(cached_hashes==4)//if 4 candidates ready and stored...
		//MD4 encrypt them all simultaneously
		//I'm assuming that compressSse() writes the MD4 hashes
		//back into input[] elements, respectively
		for(int i=0;i<4;i++)
				cout<<"plaintext is "<<plaintext[i]<<"!\n";

I generate one candidate at a time, and then wait until I have collected 4 candidates and then encrypt them all simultaneously. Then I check the results one-by-one, again.

I think once I can get an SSE2-accelerated MD4 implementation up and running, I should be able to modify it to turn it into an SSE2-accelerated MD5 implementation

Hey. I thought I would bring this thread up again since there seems to be more traffic during the week. Anybody know how I would go about this?