The background to this question is that I have discovered a virus on one of my computers that adds a block of encrypted activex code to html and php pages. Unfortunately all the virus cleanup tools delete the infected files... being a webdeveloper this means a lot of my 'working files' would be deleted. Luckily none of them are current projects, but I'd really hate to lose my records of past developments.

Since I'm learning java as part of my degree, I was wondering, could java be used to either search my hard drive for php/html files and then remove the block of activex code from them, or could it take a text file input of all the file names (one of the antivirus programs gave me a list of all the infected files as a csv), and remove the block from those files?

Sure it could.
If you can find something that uniquely identifies the start and end of the block you can use that as search criteria for the block and filter it all out.

Thanks for your response.

Well I think my best idea there is:
I always use valid html with only one html tag. The virus is tagged on the end of the page between two html tags, so if I can count the first html tag with a temp variable I can start and end deleting on the second ones.

Can you give me any example files where I could find the 'search through the hard drive' or 'go down a list of files opening them', and also the 'deleting a block of code'? I'm fairly new to java, so I've done opening one file and reading the contents with the command line, but never anything with multiple files... or deleting from a file.

valid html with only one tag? Valid html requires ALL tags to be closed so if you don't use closing tags your html is invalid by definition ;)

Does the virus code always start with a fixed and unique sequence? If so, you can just load the files line by line and write everything to a memory buffer up to the first line with that sequence.
Then keep reading (but don't write) until the first line that's not part of the virus code.

When you're done, close the file and reopen it for writing, clearing the file content (File has methods for this). Move the entire buffer into the file and close it.

Of course if the virus doesn't (always) put itself on new lines of the files but inserts itself into existing lines you need to do a bit more.

yes but it's one <html> tag and the other is a </html> tag, so if I was searching for either string it would only appear once wouldn't it?

The virus always inserts a new block of code within <html> tags below the page content. I have a few example files in a passworded zip if you want to examine them in notepad? I can also change the file extensions to prevent them being executed if that helps?

edit: the body tag that is inserted also has a constant onload attribute, so yes I guess there is a sequence of things there.
Alternatively it could just delete from after the first </html> tag downwards, and that would always remove all the junk.

ps: the reason I have them in a zip is because I asked merijn and he said he'd see if he could help, and he wanted two samples of each file type in a zip.

yes, delete anything below the first </html> tag, that should remove all illegal content (no sane person would allow for more than one html block per page anyway as it's not allowed under the html standards, and any validating parser will barf over it.

Do you know where I could find some examples that look through a file and delete parts of it? I don't really know where to start...

experiment and you'll learn. I basically told you what to do in an earlier reply, all you have to do is translate that into code.

Happily Merijn has sent me a program that solved my problem. However I would still ike to try and make this program. Do you have any suggestions as to what commands I need? At present my knowledge of java commands is limited to drawing circles and seeing if they collide... well, maybe a bit more than that.

Just a thought, ANTIVIRUS????? any antivirus should have a "remove virus from file" option. all that i have seen anyway... have you tried Symantec AntiVirus version 7/8/9?

antivirus deletes the entire file... as a web developer this nearly wiped out all my files... So no, antivirus is not the solution ;)

again Symantec can remove it from the file - ive witnessed the effect myself! maybe it depends on which anti virus you use? I know its not your degree but a c/c++ program would be VERY easy to make using the standard libraries

OIC. i got symantec free from uni anyway so i cant complain :) i still recommend c++ as there is a good file manipulation tutorial on daniweb c/c++ tutorials forum - and the fact that i have done next to no java programming...

Anything you can do with C++ you can do with Java (at least where file handling is concerned ;) ) .

Norton is probably the WORST antivirus product on the market, far from being the best.
It has just about the worst detection ratio I've ever seen, during one test I performed a 2 week old McAfee found 5 virusses a Norton that had been updated missed completely in just a few minutes...

well its done me fine so far. im a bit hesitant to upgrade as i have no idea how much it will cost, especially as the stores in my area are notorious for rip-off prices for software! and i got mine free! im interested in the comparison of these two antiviruses as surely it comes down to the definitions and functionality in the end?

Just about all AV products cost pretty much the same which is between 60 and 80 Euro a year.
I've used (over the years) quite a few. McAfee is good but (with Norton) a bit too much worldwide standard and therefore target for clever evasion tactics on the part of virus authors.

I'm now using Kaspersky which is good. Some false positives but I'd rather have those than false negatives like Norton is prone to giving.

This article has been dead for over six months. Start a new discussion instead.