Updating a file: setting the offset

Question

Mushy-pea 36 What, you can change this tag?

17 Years Ago

Hello again. I am writing a bulletin board script at the moment (Perl BB sound good :p )? While writing some code to update a text file I encountered this situation. I need to open the file, append data to the end, then set the write offset to a particular point in the file and write new data over old data. The required offset will already have been found somwhere else in the program during a read-only open of the same file.

The appropriate way to open the file here therefore seems to be:

open(file1, ">>post_log.db");

I thought I could then use "seek" to change the offset and "print" to write the new data in. But, will the offset jump back to the EOF when I use "print" because of the ">>" mode? If so will I need to use "syswrite" to solve the problem? Any answers appriciated.

Steven.

perl

3 Contributors
7 Replies
236 Views
1 Day Discussion Span
Latest Post 17 Years Ago Latest Post by MattEvans

KevinADC 192 Practically a Posting Shark

17 Years Ago

It sounds to me like you are trying to do something the hard way. Why would you need to write some new data over old data instead of using a delimited flat file or a real database?

KevinADC 192 Practically a Posting Shark

17 Years Ago

.... so if this script is implemented well enough it will out perform all existing forum software solutions.
Steven.

I admire your ambition. Maybe if you post some sample data and what you want to do with that data it will help to clarify your original question. If not, you may want to look into syswrite() or sysseek() and etc.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Mushy-pea 36 What, you can change this tag? · Answer 1 · 2006-10-20T06:24:11+00:00

It sounds to me like you are trying to do something the hard way.

That is pretty much true, but I have my reasons. Part of the challenge in this project is to write a single script that provides the features of existing "script + database app" solutions. Inter process communication is slow, so if this script is implemented well enough it will out perform all existing forum software solutions.

Steven.

MattEvans 473 Veteran Poster Team Colleague Featured Poster · Answer 2 · 2006-10-20T09:42:38+00:00

I need to open the file, append data to the end, then set the write offset to a particular point in the file and write new data over old data. The required offset will already have been found somwhere else in the program during a read-only open of the same file.

To elaborate on what Kevin said, you're gonna find working this ^ way to get more and more unworkable as your project gets bigger. I'm also working on some BB script in Perl at the moment, I've found that the best way to deal with data is using small interlinked XML files that can gradually be transformed closer and closer to an HTML page as and when data changes. This ensures important data can't get damaged/overwritten accidently, ensures data is only ever updated when it needs to be, and lets me do the final rendering process using different template files to get a different visual output depending on who is logged in. Plus it doesn't use databases, can be something other than a BB (blog, profile site, even a complex dynamic site), all the data/template files are easily (humanly) interperable, and is relatively fast, compared to XSL transforms and the like anyway.

Well. that wasn't a blatent plug or anything :)

Some advice though;

- Fuse your read-only and writing operation. read data in and write data out uptill the point where you want to write new data, write the new data, then write the old data till the end of the file.

- Alternatevely, use a different file wherever a value or value group is likely to change. Instead of reading "the value" from an arbitrary place in one long stream of data from file X, always read and write it from/to file Y. You can always reference file Y by name IN file X so you wont have to make as many decisions in Perl. You'll always maintain a considerably larger amount of data integrity than rewriting over part of a stream, your process may be faster/simpler (no need to calculate offsets and lengths); your data files will be shorter (no need to use fixed length strings), elements of data can be resused by other data files, and your data files can take on more of a role in the processing (meaning your userbase will have more control of the process)... There are too many virtues to list really!

EDIT: Let me know if you want to see the datafiles I've been using. I've pretty-much written an XML-instruction-language-in-XML for this project.

Mushy-pea 36 What, you can change this tag? · Answer 3 · 2006-10-20T18:34:46+00:00

Maybe if you post some sample data and what you want to do with that data it will help to clarify your original question.

Here is the file I'm currently using for testing purposes:

<header>
<type>Perl BB version 1.00 post database file</type>
<id>6F43 89AF 15DD 18AB</id>
<title>Post log database file for rochfest.co.uk public forum</title>
</header>
 
<database records=5>
<record>
<record length>0383</record length>
<links></links>
<post_id>000001_00</post_id>
<subject>Welcome to the general off-topic area!</subject>
<author>Steven Tinsley</author>
<date+time>Sunday 8th October, 12:30 AM</date+time>
<content>Hello and welcome to the general off-topic area.  Feel free to post about any subject (within the rules) in here.</content>
</record>
<record>
<record length>0339</record length>
<links></links>
<post_id>000002_00</post_id>
<subject>This is the second post.</subject>
<author>Steven Tinsley (again)</author>
~<data+time>Sunday 8th October, 5:23 PM</data+time>
<content>This is the second post in the database, not to be confused with the first!</content>
</record>
<record>
<record length>0289</record length>
<links></links>
<post_id>000003_00</post_id>
<subject>Now the post count is up to three.</subject>
<author>Mushy-pea</author>
<date+time>Recently.</date+time>#
<content>This record might help me to debug the script.</content>
</record>
<record>
<record length>0270</record length>
<links></links>
<post_id>000004_00</post_id>
<subject>Wow! Four posts!</subject>
<author>Mushy-peeeeeeeea!</author>
<date+time>VERY SOON!</date+time>
<content>Some girls are bigger than others....</content>
</record>
<record>
<record length>0297</record length>
<links></links>
<post_id>000005_00</post_id>
<subject>Five posts and still going strong....</subject>
<author>S.T.</author>
<date+time>Wednesday 11th October, 9:40 PM</date+time>
<content>Can the system handle five records?</content>
</record>
<record><record length>##158/record length><post_id></post_id><subject>null</subject><author></author><date+time>date</date+time><content>l</content></record></database>

As you can see it roughly adheres to the principle of XML (as I understand it). The program is able to find and retrieve fields from the first five records using <post_id> as the primary key (then prepare a HTML page based on these fields). The last record shows a botched attempt to add data. The thing is, being a BB system there will need to be a way to link records together if they are in the same reply thread. The idea of the links field is that (in the first post in a thread) it will contain a list of the offsets of all other records in the reply thread, meaning it will never be necessary to perform more than one search operation per "view thread" request. It is this field I will need to update as I mentioned. However, I think I will take Matt's advice to use a number of small linked XML files. Then I can truncate a file when necessary and write in new data, then write back the old data after it and save a lot of potential hassle. Thanks for the advice guys.

Steven.

KevinADC 192 Practically a Posting Shark · Answer 4 · 2006-10-21T02:42:49+00:00

if you want real scalability and flexibility the XML files sounds like a good (but a bit complicated) way to go, otherwise, what you are trying to do is easily acomplished using a delimited flat file system.

MattEvans 473 Veteran Poster Team Colleague Featured Poster · Answer 5 · 2006-10-21T14:24:41+00:00

the best way to parse XML is to ensure it is always strict :) have you looked at methods like XSLT for rendering HTML pages? I reckon you COULD even build an entire BB solution in XSL, although it'll get hella hacky in certain places.

With strict XML in Perl though it's very easy to write data in certain places; assuming you are outputting a new file though rather than overwriting part of an old one. You can conditionally skip/preserve whole blocks of data, perform operations like counting topics/replies automatically, and import/embed unknown numbers of related files just by reading a whole glob of them. It may be worthwhile to act on/consider each tag rather than just looking for specific ones though, it'll make your core system more useful in future.

Consider the fact that your final rendering process may have to read and buffer a substantial amount of your record file before it can complete; I paginate/split multiple records to 5-per-file even if my final intention is to output 20 records at a time.

I use permanent files like this:

<board>
<btitle>General Games</btitle>
<bdesc>General Discussion Regarding Computer Games</bdesc>
</board>

But that becomes this temporary file:

<board>
<breadcrumbs>
<breadcrumb href="../../index.xrm" name="forum" linked="yes">FusionGroupUK</breadcrumb>
<breadcrumb href="../index.xrm" name="group" linked="yes">FusionGames</breadcrumb>
<breadcrumb name="board" linked="no">General Games</breadcrumb>
</breadcrumbs>
<btitle>General Games</btitle>
<bdesc>General Discussion Regarding Computer Games</bdesc>
<bhottopic>
<topic href="a_test/index.xrm">
<ttitle>Testing Scopes</ttitle>
<tauthor>Guest</tauthor>
<tlastposter>Matt</tlastposter>
<tfirstdate>1161306255</tfirstdate>
<tlastdate>1161306000</tlastdate>
<tsummary>hello hello</tsummary>
<tpicture>a picture</tpicture>
</topic>
</bhottopic>
<btopics count="3" perpage="2" pages="1">
<page number="1" href="topics_page_0.xml"></page>
<page number="2" href="topics_page_1.xml"></page>
</btopics>
<btopiccount>3</btopiccount>
</board>

And then it's easy to make a HTML file from there.

I used to include embed instructions inside my permanent XML files and write the data handlers in Perl... Now I process them together with templates/handlers that look like this:

NOTE Theres 3 files following.. The important one is the first. It can be used to build different board temporary files based on a query string, and it outputs topics into paged files rather than all on the same page. My temporary-file-to-HTML processors are very similar to these.

"Topic files" are found automatically from every folder that is a child of the working folder and that contains a temporary file called "topic.xml"... The same principle works for boards in groups and groups in my forum. The handler files (last 2) are written with a deliberately small command set.. They act like a conditional data trap and extract values from multiple files in order to output it into a single file.

NOTE2: There's no need for your processing system to be as complex as this! I had the same effect working fine with the same data handlers written in Perl; I've been exapanding into total-external-customization though, and that includes data process customization. It does suffer a small loss in speed (constructing the handler/processing tree takes a finite time before any process can begin) but after that, there's not a noticable time difference however many source files are read.

<fuse:private variable="QUERY_boardpath">games/general</fuse:private>
<fuse virtualroot="{$QUERY_boardpath}" outputfile="board.xml" virtual="yes">
<fuse:private variable="forum_name"><fuse:leech href="../../forum.xml" nodepattern="forum/ftitle" virtual="yes"/></fuse:private>
<fuse:private variable="group_name"><fuse:leech href="../group.xml" nodepattern="group/gtitle" virtual="yes"/></fuse:private>
<fuse:embeds href="board.xsr" source="embed_board" virtual="yes">
<fuse:insert href="embed_board.xin" virtual="no">
<fuse:downstream variable="output_template">
<fuse:fast>
<board>
<breadcrumbs>
<breadcrumb href="../../index.xrm" name="forum" linked="yes">{$forum_name}</breadcrumb>
<breadcrumb href="../index.xrm" name="group" linked="yes">{$group_name}</breadcrumb>
<breadcrumb name="board" linked="no">{$board_title}</breadcrumb>
</breadcrumbs>
<btitle>{$board_title}</btitle>
<bdesc>{$board_desc}</bdesc>
{$all_topics}
</board>
</fuse:fast>
</fuse:downstream>
</fuse:insert>
</fuse:embeds>
<fuse:private variable="all_topics">
<fuse:embeds href="*/topic.xml" source="embed_topics" virtual="yes">
<fuse:insert href="embed_topics.xin" virtual="no">
<fuse:downstream variable="page_prefix">topics_page_</fuse:downstream>
<fuse:downstream variable="items_per_page">2</fuse:downstream>
<fuse:downstream variable="summary_length">15</fuse:downstream>
<fuse:downstream variable="hot_topic_template">
<bhottopic>
<topic href="{$topic_href}">
{$content}
</topic>
</bhot_topic>
</fuse:downstream>
<fuse:downstream variable="topic_template">
<topic href="{$topic_href}">
{$content}
</topic>
</fuse:downstream>
<fuse:downstream variable="page_template">
<page number="{@eval($current_page + 1)}" href="{$page_href}"/>
</fuse:downstream>
<fuse:downstream variable="topics_template">
<btopics count="{$topic_count}" perpage="{$items_per_page}" pages="{$page_count}">
{$pages}
</btopics>
<btopiccount>{$topic_count}</btopiccount>
</fuse:downstream>
</fuse:insert>
</fuse:embeds>
</fuse:private>
</fuse>

<fuse:mask>
<fuse:psuedo:handler name="embed_board" tag="tag" data="data" params="params">
 <fuse:psuedo:case mode="postTag">
  <fuse:psuedo:select from="tag">
   <fuse:psuedo:local mode="btitle" variable="board_title" from="data"/>
   <fuse:psuedo:local mode="bdesc" variable="board_desc" from="data"/>
   <fuse:psuedo:data mode="board">{$output_template}</fuse:psuedo:data>
  </fuse:psuedo:select>
 </fuse:psuedo:case>
</fuse:psuedo:handler>
</fuse:mask>

<fuse:mask>
 <fuse:psuedo:handler name="embed_topics">
 
  <fuse:psuedo:output mode="preEmbeds" mask="yes" callmodes="create_arrays">
   <fuse:psuedo:output mode="create_arrays" callmodes="scopes,dates">
    <fuse:psuedo:scopeset mode="scopes" name="scopes" method="build"/>
    <fuse:psuedo:array mode="last_dates" name="dates" method="build"/>
   </fuse:psuedo:output>
  </fuse:psuedo:output>
  <fuse:psuedo:output mode="preEmbed" params="params" callmodes="build_scope,init_scope">
   <fuse:psuedo:scopeset mode="build_scope" source="scopes" method="push"/>
   <fuse:psuedo:scopeset mode="init_scope" source="scopes" method="father" index="#" callmodes="path,href">
    <fuse:psuedo:local mode="path" variable="topic_path">
     <fuse:psuedo:param from="params" key="path"/>
    </fuse:psuedo:local>
    <fuse:psuedo:local mode="href" variable="topic_href">
     <fuse:psuedo:data>{$topic_path}index.xrm</fuse:psuedo:data>
    </fuse:psuedo:local>
   </fuse:psuedo:scopeset>
  </fuse:psuedo:output>
  <fuse:psuedo:output mode="postTag" tag="tag" data="data">
   <fuse:psuedo:scopeset source="scopes" method="father" index="#">
    <fuse:psuedo:select from="tag">
     <fuse:psuedo:case mode="tlastdate" callmodes="extract,preserve">
      <fuse:psuedo:array mode="extract" source="dates" method="push" from="data"/>
      <fuse:psuedo:output mode="preserve"/>
     </fuse:psuedo:case>
     <fuse:psuedo:data mode="tmessage"><tsummary>{@string.summarize($data,$summary_length)}</tsummary></fuse:psuedo:data>
     <fuse:psuedo:local mode="topic" variable="content" from="data"/>
    </fuse:psuedo:select>
   </fuse:psuedo:scopeset>
  </fuse:psuedo:output>
  <fuse:psuedo:output mode="postEmbeds" callmodes="order,localize_count,sort,pop_hot,paginate,output" data="data">
 
   <fuse:psuedo:local mode="order" variable="sort_order">
    <fuse:psuedo:array source="dates" method="order" direction="1"/>
   </fuse:psuedo:local>
   <fuse:psuedo:local mode="localize_count" variable="topic_count">
    <fuse:psuedo:scopeset source="scopes" method="len"/>
   </fuse:psuedo:local>
 
   <fuse:psuedo:scopeset mode="sort" source="scopes" method="sort">
    <fuse:psuedo:data>{$sort_order}</fuse:psuedo:data>
   </fuse:psuedo:scopeset>
   <fuse:psuedo:scopeset mode="pop_hot" source="scopes" method="pop">
    <fuse:psuedo:data>{$hot_topic_template}</fuse:psuedo:data>
   </fuse:psuedo:scopeset>
   <fuse:psuedo:output mode="paginate" callmodes="init,group,finish_up,write_pages" mask="yes">
    <fuse:psuedo:case mode="init" callmodes="stack,pages,page_links,stack_counter" mask="yes">
     <fuse:psuedo:array mode="stack" name="embed_stack" method="build"/>
     <fuse:psuedo:array mode="pages" name="embed_pages" method="build"/>
     <fuse:psuedo:array mode="pages_links" name="page_links" method="build"/>
     <fuse:psuedo:array mode="stack_counter" name="len_embed_stack" source="embed_stack" method="len"/>
    </fuse:psuedo:case>
    <fuse:psuedo:scopeset mode="group" source="scopes" method="for-each" variable="scope_index" callmodes="stack,calculate">
     <fuse:psuedo:array mode="stack" source="embed_stack" method="push">
      <fuse:psuedo:data>{$topic_template}</fuse:psuedo:data>
     </fuse:psuedo:array>
     <fuse:psuedo:select mode="calculate" source="len_embed_stack">
      <fuse:psuedo:case mode="{$items_per_page}" callmodes="group,purge">
       <fuse:psuedo:array mode="group" source="embed_pages" method="push">
        <fuse:psuedo:array source="embed_stack" method="dump"/>
       </fuse:psuedo:array>
       <fuse:psuedo:array mode="purge" method="purge" source="embed_stack"/>
      </fuse:psuedo:case>
     </fuse:psuedo:select>
    </fuse:psuedo:scopeset>
    <fuse:psuedo:array mode="finish_up" source="embed_pages" method="push">
     <fuse:psuedo:array source="embed_stack" method="dump"/>
    </fuse:psuedo:array>
    <fuse:psuedo:array mode="write_pages" source="embed_pages" method="for-each" variable="current_page" callmodes="init,output,link">
     <fuse:psuedo:local mode="init" variable="page_href"><fuse:psuedo:data>{$page_prefix}{$current_page}.xml</fuse:psuedo:data></fuse:psuedo:local>
     <fuse:psuedo:output mode="output" mask="yes" virtual="yes" href="../{$page_href}">
      <fuse:psuedo:array source="embed_pages" method="shift"/>
     </fuse:psuedo:output>
     <fuse:psuedo:array mode="link" source="page_links" method="push">
      <fuse:psuedo:data>{$page_template}</fuse:psuedo:data>
     </fuse:psuedo:array>
    </fuse:psuedo:array>
   </fuse:psuedo:output>
   <fuse:psuedo:output mode="output" callmodes="localize,output">
    <fuse:psuedo:case mode="localize" callmodes="page_count,pages">
     <fuse:psuedo:local mode="page_count" variable="page_count">
      <fuse:psuedo:array source="page_links" method="len"/>
     </fuse:psuedo:local>
     <fuse:psuedo:local mode="pages" variable="pages">
      <fuse:psuedo:array source="page_links" method="dump"/>
     </fuse:psuedo:local>
    </fuse:psuedo:case>
    <fuse:psuedo:data mode="output">{$topics_template}</fuse:psuedo:data>
   </fuse:psuedo:output>
  </fuse:psuedo:output>
 </fuse:psuedo:handler>
</fuse:mask>