Hi

I have been having this issue for a while and cannot figure how should I start to do this with python. Actually Im not a programmer but would like to learn it. Not just because of this (what topic title) says but, for sure, for other advantages to.
I need the script that moves entire (100% of the text) text from one .doc file to another. But its not so easy as it sounds. The target .doc file is not the only one but can be many of them. All the target .doc files are always in the same folder (same path) but all of them don't have the same name. The .doc file FROM where I want to move entire text is only one, always in the same folder (same path) and always with the same file name.
Names of the target are only similar but as I have said before, not the same. Here is the point of whole script:
Target .doc files have the names:
HD1.doc
HD2.doc
HD3.doc
HD4.doc
and so on

What I would like to have is moved the entire (but really all of the text, must be 100% all) text into the .doc file with the highest ( ! ) number. The target .doc files will always start with ''HD'' and always be similar to above examples.
It is possible that the doc file (target file) is only one, so only HD1.doc. Therefore ''1'' is the maximum number and the text is moved into this file.
Sometimes the target file is empty but usually won't be. If it won't be then the text should be moved to the end of the text, into first new line (no empty lines inbetween).
So for example in the target file which has the maximum number in its name is the following text:

a
b
c

In the file from which I want to move the text is:

d

This means I need in the target file this:

a
b
c
d

Even if Im not python programmer, also not in any other languages, I will do my best to try to learn so Im able to do this.

If someone could suggest me anything, I would really appreciate it.

Thank you, best wishes.

Recommended Answers

All 17 Replies

Okay, so here is how i hear it in my head.
Files 1-5:
hd1
hd2
hd3
hd4
hd5
And you want to be able to save everything from file example.doc
And the amount of files can vary. Here's a simple script that askes how many files for it:

# Open the original file to be written
f = open("example.doc", "r")
data = f.read
f.close()
# Ask the user how many files to be written
to_write = int(raw_input("How many files to write? "))
# Perform a loop to write the data to each file
for x in range(0, to_write):
    file = "hd" + str(x) + ".doc"
    # Try to open a file and read it. If it can't open it, it creates a file
    try:
         f = open(file, "r")
         to_file = f.read()
         f.close()
         f = open(file, "w")
         f.write(to_file)
   except:
          f = open(file, "w")
   # Now write to the file all data
   f.write(data)
# That was the end

Hopefully this code works, I have not tried it out. Take a look and see if you can tell what exactly it's doing. Good luck

Thank you for your reply hondros. However I don't know why should be the user (me) asked how many files? The script shouldn't ask anything. It just finds the HD?.doc file where ? is the maximum number. If HD file is only one (so HD1.doc) then the maximum number is 1 and it will move the text inside HD1.doc. All of HD files are and will be in the same folder.
Also why did you choose exsactly 5 HD files? Just for example? Because I, as the user, won't know how many HD files are there. Script needs to find this.

If you just want to find the file with the highest value, that should be simple if the files all have the same format. Something like this:

high = 0
for i in files:
    if i[2] > high:
        high = i[2]
        change_file = i

Then you can just append the content to the change_file:

try:
         f = open(file, "r")
         to_file = f.read()
         f.close()
         f = open(change_file, "a") #Use append instead of 'w' to append the content
         f.write(to_file)
etc

hmmm, just thinking, because python would not quite understand the .doc files, so opening them in read mode might miss something. If i was doing this i would use read-binary mode and write-binary mode.

Thats the easiest way to read and write files that contain unknown things. Because i know .doc files are most definitely not just simple "Hello this is my doc file" so python could get confused

#How you open as read-binary
open('file.doc.','rb')

#How you open as write-binary
open('file.doc','wb')

Hope that helps :)

Oh you can't do that easily with normal python!
Gotta use interop
Learn .NET and IronPython (or any .NET language) and use Microsoft.Office.Interop.Word
Since Microsoft is .NET, Word can be accessed from .NET the easiest.
That's what I would do.

Here's something that could help (its a full code snippet):
http://www.ironpython.info/index.php/Replace_Text_within_a_Word_document

Namibnat I tried this code (as you suggested):

high = 0
for i in files:
if i[2] > high:
high = i[2]
change_file = i
try:
f = open(file, "r")
to_file = f.read()
f.close()
f = open(change_file, "a") #Use append instead of 'w' to append the content
f.write(to_file)

But doesn't work. Its nowhere mentoined that the script should use HD files. Also I presume that I need to somewhere in the code use the path where are all the HD files located. Same is for: where and with which name, is the file FROM which i want to move the text.

Paul Thompson I need to end up (all text together in the HD file with the highest number) with .doc file so this probably won't work. Thank you anyway for reply.

jcao219 using the code on this link together with Namibnat's idea, the code at the end should look something like this one, below, but even here are no paths given, how will the script know which files should be checked and which two files (one of them is always used because I will always move the text FROM it) should be used.

import sys
import clr
import System
from System import DateTime
    
clr.AddReference("Microsoft.Office.Interop.Word")
import Microsoft.Office.Interop.Word as Word

def doc_replace_text(source_filename, tokens, values, destination_filename):

    missing = System.Type.Missing
    replaceAll = Word.WdReplace.wdReplaceAll

    word_application = Word.ApplicationClass()
    word_application.visible = False

    document = word_application.Documents.Open(source_filename)

    for i in files:
      if i[2] > high:
      high = i[2]
      change_file = i

    for i in range(len(tokens)):
        for r in document.StoryRanges:
            #print "i = %d, tokens[i] = %s, values[i] = %s" % (i, tokens[i], values[i])
            r.Find.Text = tokens[i]
            r.Find.Replacement.Text = values[i]
            r.Find.Wrap = Word.WdFindWrap.wdFindContinue
            r.Find.Execute(missing, missing, missing, missing, missing, missing, missing, missing, missing, missing, replaceAll, missing, missing, missing, missing)

    document.SaveAs(destination_filename)
    document.Close()
    document = None

    word_application.Quit()
    word_application = None

You have to do this: doc_replace_text(source_filename, "replace this", "with this", destination_filename)

Yes I know but I cannot definite ''with this'' (target file into which I want to move whole text) because I won't have the name what exsactly is this file. I have just three informations about it:
- those target files always start with ''HD'' and are always in .doc
- they always have HDX where X is a number and the text has to be moved to the HD with the maximum number, if is only one HD file then it is HD1.doc and therefore ''1'' is a maximum number.
- all of the HD files are always in the same folder

But thats not enough, the script needs to find out this file, so I cannot put the name of the file. Are you sure the doc_replace_text will move the text? Because ''replace text'' sounds to me that one part of text is replaced with another one. This is not ''moving of text'' because being replaced mean that the text, exsisting there as the first one, will be removed and instead of it, it will be replaced with new text. I never want to remove anything. Just move the text to the file.

Yes I know but I cannot definite ''with this'' (target file into which I want to move whole text) because I won't have the name what exsactly is this file. I have just three informations about it:
- those target files always start with ''HD'' and are always in .doc
- they always have HDX where X is a number and the text has to be moved to the HD with the maximum number, if is only one HD file then it is HD1.doc and therefore ''1'' is a maximum number.
- all of the HD files are always in the same folder

But thats not enough, the script needs to find out this file, so I cannot put the name of the file. Are you sure the doc_replace_text will move the text? Because ''replace text'' sounds to me that one part of text is replaced with another one. This is not ''moving of text'' because being replaced mean that the text, exsisting there as the first one, will be removed and instead of it, it will be replaced with new text. I never want to remove anything. Just move the text to the file.

I'm sorry. Can't help you much since I've never done Word interop before.
You have to go here for info:
http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word(office.11).aspx

Ok thank you anyway for your time. Hopefully someone else could know.

I'm sure you can figure it out if you study it closely.
Read the msdn reference for word interop.
If you don't know the file names, you can make a file path walker.
http://lookherefirst.wordpress.com/2008/01/19/a-simple-non-recursive-directory-walker-in-python/

(or use os.walk)

Then you can program the word interop.
Just have to look into it, none of us know exactly how to do it, but you can still do it. You have the necessary tools.
Think creatively.

Thank you jcao. Im searching for additional information everywhere thats why I reply so fast. Im not programmer but still trying to do this. I heard through python (as the only programming language that im interested about) its not possible to do this with .doc files but hopefully this information is wrong. I could contact some word gurus but I don't know the target name, the only thing i know is that it has to contain the highest number and therefore, the only way to do this is to be programmed. If I knew the name (of the target file), I might have a chance to find the function inside ms word.
Regarding your link that you gave; Namibnat already gave me a little code that could find the correct target file but didn't work. Also in this code, even if I don't understand it, it doesn't seem to find the correct file, nowhere is anything like ''max'' function mentoined, I see something that must be higher than 0 (while function) but thats not the maximum. The variable ''fullpath'' probably just saves some path and print it, it doesn't move the text.

I think with the regular python (CPython) it might be impossible.
But IronPython, since it allows access to Microsoft libraries, can interact with Word.
You have to learn IronPython* before you can make a successful program that does Word interop.

*Or any other .NET language.

I have Python version 2.6.1. If I haven't mentoined yet - my OS is windows xp pro. Regarding ironpython; is it possible to get some kind of (ANY) plugin / addon for this?

Why nobody else wants to take a look at my post? It has passed a while now and kinda need the script, if anyone could offer me some suggestion? I have tried the code(s) from previouns page of the same topic but doesn't work. Also I found this:

http://snippets.dzone.com/posts/show/2037

But its not moving the text. Also it doesn't search for correct file name (my rule which is correct one is in my first post).

Is someone could suggest me anything? Thank you.

Why nobody else wants to take a look at my post? It has passed a while now and kinda need the script, if anyone could offer me some suggestion? I have tried the code(s) from previouns page of the same topic but doesn't work. Also I found this:

http://snippets.dzone.com/posts/show/2037

But its not moving the text. Also it doesn't search for correct file name (my rule which is correct one is in my first post).

Is someone could suggest me anything? Thank you.

Hi there,

This can be solved, but you will need extra modules/packages.
You will need to go to Mark Hammond's site for the win32COM modules: http://python.net/crew/skippy/win32/Downloads.html
or you can go to SourceForge: http://sourceforge.net/project/showfiles.php?group_id=78018

On pure Python you cannot chop and change .doc files.
What you will have to do is access the Windows Object models via the COM interface.
The process is very similar to Visual Basic scripting.

You will need to refer to the Word Application Object model:http://msdn.microsoft.com/en-us/library/kw65a0we%28VS.80%29.aspx

I will start you off on your code but you have to do the rest b/c I don't have time. Or else someelse can contribute.

#!/usr/bin/env python

from time import sleep
import win32com.client
from win32com.client import Dispatch

wordApp = win32com.client.Dispatch('Word.Application')
wordApp.Visible=True
wordApp.Documents.Open('C:\\test.doc')
sleep(5)

This code works, I have tried it. It will open up a windows word document file. For me I just named it 'test.doc'. Yours might be 'HD1' or something.
You need to import the win32com packages as listed above.

From then on it's just a matter of trying things out with the Word Object model reference I listed above.

My feeling is it will be something like (no guarantees for this code as I just took a brief glance at the Object model);

HD1 = wordApp.Documents.Open('C:\\test.doc')   #HD1 word document as object.
HD1.Content.Select.Copy()     #Selects entire document and copies it.

Then you would have to open up the master document "HD" or something, and paste it into HD.

Very similar to VB.

With the win32COM packages, it's important that you use the Dispatch method, that's what enables Python and COM to interface together.

Hope this helps.

Thank you for reply.
However the script shouldn't open up anything unless the visible is set to false so the user doesn't see it otherwise its not automated. What did you mean with ''test.doc''? I assume you meant the target file. The problem is I will never know what is the target file name because the script should figure this out with HDX file names. The correct target one is file where X is maximum number - same as I described above. So this means I cannot use the directory path to target name - only to source file (this one has always the same name and always the same path).
If I do this:
wordApp.Visible=False
I won't see it, correct? So the script will open it but again, I cannot use the correct target file's path directory because i don't know the name. Also why target file must be opened but source shouldn't be?
Are you sure you haven't done variable mistake here:

HD1 = wordApp.Documents.Open('C:\\test.doc') #HD1 word document as object.
HD1.Content.Select.Copy() #Selects entire document and copies it.

I think it should be the source file because FROM it im copying the text but I doubt this would work if I just use the name of source file....

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.