Help find variable length number in text file

Question

esdftw 0 Newbie Poster

14 Years Ago

Hi all,

I did a search on this site to find threads where people were talking about finding a variable length number in a text file and came up with no results. So, I'm asking :)

I have a text file with hundreds of lines of data in it. One piece of data I need is a variable length number in the middle of a string delimited by "|" without the quotes, and the other is found in "["count"] = xx" again without the outermost quotes and where "xx" is up to a four digit number that I need.

After I have found a way to identify these values (item ID and the number of those items) I need to output the results to a separate file.

c++

5 Contributors
13 Replies
140 Views
4 Days Discussion Span
Latest Post 14 Years Ago Latest Post by esdftw

All 13 Replies

jonsca 1,059 Quantitative Phrenologist

14 Years Ago

Welcome. So what have to tried so far?

It might be easier if you posted 1-2 lines of your text file since it's sometimes tough to assimilate the information in the abstract.

Frederick2 189 Posting Whiz

14 Years Ago

That's confusing looking data. Does every line have the same structure (even though the data is different) with the same number of delimiters? I'm seeing pipe characters which you mentioned were being used as delimiters, but I'm also seeing commas.

dusktreader 137 Posting Whiz in Training

14 Years Ago

For this task, you should really use Regular Expressions. There aren't any regular expression parsers built into standard c++, so you will need to install a library. Boost has a good one that I know of, but there are others.

Now, you might be saying, "screw that, regular expressions seem too hard!" However, believe me that when you are facing complicated data, you want to use a flexible, pattern-matching technique. Regular expressions are the best and easiest way to look for specific text patterns in a heap of crazy data.

If you were open to other languages, you might check out Python for this task. Python manages text files better and simpler (in my opinion), and it has built in regular expression functionality.

Another alternative is to use a Finite State Machine to match the expression yourself. You would build the FSM in code, and use it to parse input. This will not be easy, but it will also work well.

In any case, I would suggest that you don't re-invent the wheel. Find yourself a library that can parse the patterns and use it.

Frederick2 189 Posting Whiz

14 Years Ago

This is the "Data.dat" file i made from your data and saved as a text file in the same dir as the executable...

["link"] = "|cff1eff00|Hitem:7931:0:0:0:0:0:0:0:80|h[Mithril Coif]|h|r",},[45] = {["count"] = 1000,
["link"] = "|cff0070dd|Hitem:41165:0:0:0:0:0:0:0:80|h[Saronite Razorheads]|h|r",

Here is "Main.cpp" which is the main source file (not counting the String class in "Strings.h" and "Strings.cpp" to follow). Note I used CodeBlocks to create this. It is free software and very good. I actually prefer it to my Visual Studio 2008 Pro, which just puts you through a lot more hassle to start a project. Since I start a lot of small projects for testing purposes, that is a big issue for me. Anyway, if you are using Visual Studio you'll likely get piles of _CRT_SECURE_NO_WARNINGS. These can be turned off by pasting the above in the pre-processor definitions.

#include <tchar.h>
#include <stdio.h>
#include "Strings.h"

int main()
{
 char szBuffer[200];
 unsigned int iLn,iLn1;
 FILE* fp=NULL;
 String strLn;
 String* pLn1;
 String* pLn;

 fp=fopen("Data.dat","r");
 if(fp)
 {
    while(!feof(fp))
    {
       fgets(szBuffer,160,fp);
       strLn=szBuffer;
       iLn=strLn.ParseCount('|');
       printf("%d\t%s\n",iLn,szBuffer);
       pLn=new String[iLn];
       strLn.Parse(pLn,'|');
       for(unsigned int i=0; i<iLn; i++)
       {
           if(i==2)
           {
              iLn1=pLn[i].ParseCount(':');
              pLn1=new String[iLn1];
              pLn[i].Parse(pLn1,':');
              printf("\n\tThe Number You Are Looking For Is %s\n\n",pLn1[1].lpStr());
              delete [] pLn1;
           }
           printf("%u\t%s\n",i,pLn[i].lpStr());
       }
       delete [] pLn;
       printf("\n\n");
    }
    fclose(fp);
 }

 return 0;
}

/*
Output
==========================================================================================================

6       ["link"] = "|cff1eff00|Hitem:7931:0:0:0:0:0:0:0:80|h[Mithril Coif]|h|r",},[45] = {["count"] = 1000,

0       ["link"] = "
1       cff1eff00

        The Number You Are Looking For Is 7931

2       Hitem:7931:0:0:0:0:0:0:0:80
3       h[Mithril Coif]
4       h
5       r",},[45] = {["count"] = 1000,



6       ["link"] = "|cff0070dd|Hitem:41165:0:0:0:0:0:0:0:80|h[Saronite Razorheads]|h|r",
0       ["link"] = "
1       cff0070dd

        The Number You Are Looking For Is 41165

2       Hitem:41165:0:0:0:0:0:0:0:80
3       h[Saronite Razorheads]
4       h
5       r",
*/

Here are the referenced "Strings.h" and "Strings.cpp" files. When you start a new project in whatever development environment you are using you need to include these two files in the project or you will get compiler/linker errors...

//Strings.h
#if !defined(STRINGS_H)
#define STRINGS_H
#define EXPANSION_FACTOR      2
#define MINIMUM_ALLOCATION   16

class String
{
 public:
 ~String();                               //String Destructor
 String();                                //Uninitialized Constructor
 String(const TCHAR);                     //Constructor Initializes String With TCHAR
 String(const TCHAR*);                    //Constructor Initializes String With TCHAR*
 String(const String&);                   //Constructor Initializes String With Another String (Copy Constructor)
 String(const int);                       //Constructor Initializes Buffer To Specific Size
 String& operator=(const TCHAR);          //Assigns TCHAR To String
 String& operator=(const TCHAR*);         //Assigns TCHAR* To String
 String& operator=(const String&);        //Assigns one String to another (this one)
 bool operator==(const String);           //For comparing Strings
 String& operator+(const TCHAR);          //For adding TCHAR to String
 String& operator+(const TCHAR*);         //For adding null terminated TCHAR array to String
 String& operator+(const String&);        //For adding one String to Another
 String Left(unsigned int);               //Returns String of iNum Left Most chars of this
 String Right(unsigned int);              //Returns String of iNum Right Most chars of this
 String Mid(unsigned int, unsigned int);  //Returns String consisting of number of chars from some offset
 String Remove(const TCHAR*, bool);       //Returns A String With A Specified TCHAR* Removed
 int InStr(const TCHAR);                  //Returns one based offset of a specific TCHAR in a String
 int InStr(const TCHAR*, bool);           //Returns one based offset of a particular TCHAR pStr in a String
 int InStr(const String&, bool);          //Returns one based offset of where a particular String is in another String
 String LTrim();                          //Returns String with leading spaces/tabs removed
 String RTrim();                          //Returns String with spaces/tabs removed from end
 String Trim();                           //Returns String with both leading and trailing whitespace removed
 unsigned int ParseCount(const TCHAR);    //Returns count of Strings delimited by a TCHAR passed as a parameter
 void Parse(String*, TCHAR);              //Returns array of Strings in first parameter as delimited by 2nd TCHAR delimiter
 String CStr(const int);                  //Converts String to integer
 String CStr(const unsigned int);         //Converts String to unsigned int
 String CStr(const short int);            //Converts String to 16 bit int
 String CStr(const double);               //Converts String to double
 int iVal();                              //Returns int value of a String
 int LenStr(void);                        //Returns length of string
 TCHAR* lpStr();                          //Returns address of pStrBuffer member variable
 void Print(bool);                        //Outputs String to Console with or without CrLf

 private:
 TCHAR* pStrBuffer;
 int    iAllowableCharacterCount;
};
#endif

Strings.cpp

//Strings.cpp
//#define   _UNICODE
#include  <tchar.h>
#include  <stdlib.h>
#include  <stdio.h>
#include  <string.h>
#include  "Strings.h"


String::~String()   //String Destructor
{
 delete [] pStrBuffer;
}


String::String()    //Uninitialized Constructor
{
 pStrBuffer=new TCHAR[MINIMUM_ALLOCATION];
 pStrBuffer[0]=_T('\0');
 this->iAllowableCharacterCount=MINIMUM_ALLOCATION-1;
}


String::String(const TCHAR ch)  //Constructor: Initializes with TCHAR
{
 pStrBuffer=new TCHAR[MINIMUM_ALLOCATION];
 pStrBuffer[0]=ch;
 pStrBuffer[1]=_T('\0');
 iAllowableCharacterCount=MINIMUM_ALLOCATION-1;
}


String::String(const TCHAR* pStr)  //Constructor: Initializes with TCHAR*
{
 int iLen,iNewSize;

 iLen=_tcslen(pStr);
 iNewSize=(iLen/MINIMUM_ALLOCATION+1)*MINIMUM_ALLOCATION;
 pStrBuffer=new TCHAR[iNewSize];
 this->iAllowableCharacterCount=iNewSize-1;
 _tcscpy(pStrBuffer,pStr);
}


String::String(const String& s)  //Constructor Initializes With Another String, i.e., Copy Constructor
{
 int iLen,iNewSize;

 iLen=_tcslen(s.pStrBuffer);
 iNewSize=(iLen/MINIMUM_ALLOCATION+1)*MINIMUM_ALLOCATION;
 this->pStrBuffer=new TCHAR[iNewSize];
 this->iAllowableCharacterCount=iNewSize-1;
 _tcscpy(this->pStrBuffer,s.pStrBuffer);
}


String::String(const int iSize)  //Constructor Creates String With Custom Sized
{                                //Buffer (rounded up to paragraph boundary)
 int iNewSize;

 iNewSize=(iSize/MINIMUM_ALLOCATION+1)*MINIMUM_ALLOCATION;
 pStrBuffer=new TCHAR[iNewSize];
 this->iAllowableCharacterCount=iNewSize-1;
}


String& String::operator=(const TCHAR ch)  //Overloaded operator = for assigning a TCHARacter to a String
{
 this->pStrBuffer[0]=ch;
 this->pStrBuffer[1]=_T('\0');

 return *this;
}


String& String::operator=(const TCHAR* pStr)   //Constructor For If Pointer To Asciiz String Parameter
{
 int iLen,iNewSize;

 iLen=_tcslen(pStr);
 if(iLen<this->iAllowableCharacterCount)
    _tcscpy(pStrBuffer,pStr);
 else
 {
    delete [] pStrBuffer;
    iNewSize=(iLen/MINIMUM_ALLOCATION+1)*MINIMUM_ALLOCATION;
    pStrBuffer=new TCHAR[iNewSize];
    this->iAllowableCharacterCount=iNewSize-1;
    _tcscpy(pStrBuffer,pStr);
 }

 return *this;
}


String& String::operator=(const String& strRight)    //Overloaded operator = for assigning
{                                                    //another String to a String
 int iRightLen,iThisLen,iNewSize;
 TCHAR* pNew;

 if(this==&strRight)
    return *this;
 iRightLen=_tcslen(strRight.pStrBuffer);
 iThisLen=_tcslen(this->pStrBuffer);
 if(iRightLen < this->iAllowableCharacterCount)
    _tcscpy(pStrBuffer,strRight.pStrBuffer);
 else
 {
    if(iThisLen) //There Is Something Stored In This!
    {
       iNewSize=iThisLen+iRightLen+1;
       iNewSize=(iNewSize/MINIMUM_ALLOCATION+1)*MINIMUM_ALLOCATION;
       pNew=new TCHAR[iNewSize];
       this->iAllowableCharacterCount=iNewSize-1;
       _tcscpy(pNew,pStrBuffer);
       _tcscat(pNew,strRight.pStrBuffer);
       delete [] this->pStrBuffer;
       pStrBuffer=pNew;
    }
    else
    {
       iNewSize=(iRightLen/MINIMUM_ALLOCATION+1)*MINIMUM_ALLOCATION;
       delete [] this->pStrBuffer;
       this->pStrBuffer=new TCHAR[iNewSize];
       this->iAllowableCharacterCount=iNewSize-1;
       _tcscpy(pStrBuffer,strRight.pStrBuffer);
    }
 }

 return *this;
}


bool String::operator==(const String strCompare)
{
 if(_tcscmp(this->pStrBuffer,strCompare.pStrBuffer)==0)  //strcmp
    return true;
 else
    return false;
}


String& String::operator+(const TCHAR ch)      //Overloaded operator + (Puts TCHAR in String)
{
 int iLen,iNewSize;
 TCHAR* pNew;

 iLen=_tcslen(this->pStrBuffer);
 if(iLen<this->iAllowableCharacterCount)
 {
    this->pStrBuffer[iLen]=ch;
    this->pStrBuffer[iLen+1]=_T('\0');
 }
 else
 {
    iNewSize=((this->iAllowableCharacterCount*EXPANSION_FACTOR)/MINIMUM_ALLOCATION+1)*MINIMUM_ALLOCATION;
    pNew=new TCHAR[iNewSize];
    this->iAllowableCharacterCount=iNewSize-1;
    _tcscpy(pNew,this->pStrBuffer);
    delete [] this->pStrBuffer;
    this->pStrBuffer=pNew;
    this->pStrBuffer[iLen]=ch;
    this->pStrBuffer[iLen+1]=_T('\0');
 }

 return *this;
}


String& String::operator+(const TCHAR* pChar) //Overloaded operator + (Adds TCHAR literals
{                                             //or pointers to Asciiz Strings)
 int iLen,iNewSize;
 TCHAR* pNew;

 iLen=_tcslen(this->pStrBuffer)+_tcslen(pChar);
 if(iLen<this->iAllowableCharacterCount)
    _tcscat(this->pStrBuffer,pChar);
 else
 {
    iNewSize=(iLen*EXPANSION_FACTOR/MINIMUM_ALLOCATION+1)*MINIMUM_ALLOCATION;
    pNew=new TCHAR[iNewSize];
    this->iAllowableCharacterCount = iNewSize-1;
    _tcscpy(pNew,this->pStrBuffer);
    delete [] pStrBuffer;
    _tcscat(this->pStrBuffer,pChar);
    this->pStrBuffer=pNew;
 }

 return *this;
}


String& String::operator+(const String& strRight)  //Overloaded operator + Adds Another String
{                                                  //to the left operand
 int iLen,iNewSize;
 TCHAR* pNew;

 iLen=_tcslen(this->pStrBuffer) + _tcslen(strRight.pStrBuffer);
 if(iLen < this->iAllowableCharacterCount)
 {
    if(this->pStrBuffer)
       _tcscat(this->pStrBuffer,strRight.pStrBuffer);
    else
       _tcscpy(this->pStrBuffer,strRight.pStrBuffer);
 }
 else
 {
    if(this->pStrBuffer)
    {
       iNewSize=(iLen*EXPANSION_FACTOR/MINIMUM_ALLOCATION+1)*MINIMUM_ALLOCATION;
       pNew=new TCHAR[iNewSize];
       this->iAllowableCharacterCount=iNewSize-1;
       _tcscpy(pNew,this->pStrBuffer);
       delete [] pStrBuffer;
       _tcscat(pNew,strRight.pStrBuffer);
       this->pStrBuffer=pNew;
    }
    else
    {
       iNewSize=(iLen*EXPANSION_FACTOR/MINIMUM_ALLOCATION+1)*MINIMUM_ALLOCATION;
       pNew=new TCHAR[iNewSize];
       this->iAllowableCharacterCount=iNewSize-1;
       _tcscpy(pNew,strRight.pStrBuffer);
       this->pStrBuffer=pNew;
    }
 }

 return *this;
}


String String::Left(unsigned int iNum)
{
 unsigned int iLen,i,iNewSize;
 String sr;

 iLen=_tcslen(this->pStrBuffer);
 if(iNum<iLen)
 {
    iNewSize=(iNum*EXPANSION_FACTOR/MINIMUM_ALLOCATION+1)*MINIMUM_ALLOCATION;
    sr.iAllowableCharacterCount=iNewSize-1;
    sr.pStrBuffer=new TCHAR[iNewSize];
    for(i=0;i<iNum;i++)
        sr.pStrBuffer[i]=this->pStrBuffer[i];
    sr.pStrBuffer[iNum]=_T('\0');
    return sr;
 }
 else
 {
    sr=*this;
    return sr;
 }
}


String String::Remove(const TCHAR* pToRemove, bool blnCaseSensitive)
{
 int i,j,iParamLen,iReturn=0;
 bool blnFound=false;

 if(*pToRemove==0)
    return true;
 iParamLen=_tcslen(pToRemove);
 i=0, j=0;
 do
 {
  if(pStrBuffer[i]==0)
     break;
  if(blnCaseSensitive)
     iReturn=_tcsncmp(pStrBuffer+i,pToRemove,iParamLen);  //strncmp
  else
     iReturn=_tcsnicmp(pStrBuffer+i,pToRemove,iParamLen); //_strnicmp
  if(iReturn!=0)
  {
     if(blnFound)
        pStrBuffer[j]=pStrBuffer[i];
     j++, i++;
  }
  else   //made a match
  {
     blnFound=true;
     i=i+iParamLen;
     pStrBuffer[j]=pStrBuffer[i];
     j++, i++;
  }
 }while(1);
 if(blnFound)
    pStrBuffer[i-iParamLen]=_T('\0');
 String sr=pStrBuffer;

 return sr;
}


String String::Right(unsigned int iNum)  //Returns Right$(strMain,iNum)
{
 unsigned int iLen,i,j,iNewSize;
 String sr;

 iLen=_tcslen(this->pStrBuffer);
 if(iNum<iLen)
 {
    iNewSize=(iNum*EXPANSION_FACTOR/MINIMUM_ALLOCATION+1)*MINIMUM_ALLOCATION;
    sr.iAllowableCharacterCount=iNewSize-1;
    sr.pStrBuffer=new TCHAR[iNewSize];
    j=0;
    for(i=iLen-iNum;i<=iLen;i++)
    {
        _tprintf(_T("%u\t%u\t%c\n"),i,j,pStrBuffer[i]);
        sr.pStrBuffer[j]=this->pStrBuffer[i];
        j++;
    }
    sr.pStrBuffer[iNum]=_T('\0');
    return sr;
 }
 else
 {
    sr=*this;
    return sr;
 }
}


String String::Mid(unsigned int iStart, unsigned int iCount)
{
 unsigned int iLen,i,j,iNewSize;
 String sr;

 iLen=_tcslen(this->pStrBuffer);
 if(iStart && iStart<=iLen)
 {
    if(iCount && iStart+iCount-1<=iLen)
    {
       iNewSize=(iCount*EXPANSION_FACTOR/MINIMUM_ALLOCATION+1)*MINIMUM_ALLOCATION;
       sr. iAllowableCharacterCount=iNewSize-1;
       sr.pStrBuffer=new TCHAR[iNewSize];
       j=0;
       sr.pStrBuffer=new TCHAR[iNewSize];
       for(i=iStart-1;i<iStart+iCount-1;i++)
       {
           sr.pStrBuffer[j]=this->pStrBuffer[i];
           j++;
       }
       sr.pStrBuffer[iCount]=_T('\0');
       return sr;
    }
    else
    {
       sr=*this;
       return sr;
    }
 }
 else
 {
    sr=*this;
    return sr;
 }
}


int String::InStr(const TCHAR ch)
{
 int iLen,i;

 iLen=_tcslen(this->pStrBuffer);
 for(i=0;i<iLen;i++)
 {
     if(this->pStrBuffer[i]==ch)
        return (i+1);
 }

 return 0;
}


int String::InStr(const TCHAR* pStr, bool blnCaseSensitive)
{
 int i,iParamLen,iRange;

 if(*pStr==0)
    return 0;
 iParamLen=_tcslen(pStr);
 iRange=_tcslen(pStrBuffer)-iParamLen;
 if(iRange>=0)
 {
    for(i=0;i<=iRange;i++)
    {
        if(blnCaseSensitive)
        {
           if(_tcsncmp(pStrBuffer+i,pStr,iParamLen)==0)   //strncmp
              return i+1;
        }
        else
        {
           if(_tcsnicmp(pStrBuffer+i,pStr,iParamLen)==0)  //_strnicmp
              return i+1;
        }
    }
 }

 return 0;
}


int String::InStr(const String& s, bool blnCaseSensitive)
{
 int i,iParamLen,iRange,iLen;

 iLen=_tcslen(s.pStrBuffer);
 if(iLen==0)
    return 0;
 iParamLen=iLen;
 iRange=_tcslen(pStrBuffer)-iParamLen;
 if(iRange>=0)
 {
    for(i=0;i<=iRange;i++)
    {
        if(blnCaseSensitive)
        {
           if(_tcsncmp(pStrBuffer+i,s.pStrBuffer,iParamLen)==0)  //strncmp
              return i+1;
        }
        else
        {
           if(_tcsnicmp(pStrBuffer+i,s.pStrBuffer,iParamLen)==0) //_strnicmp
              return i+1;
        }
    }
 }

 return 0;
}


String String::LTrim()
{
 unsigned int i,iCt=0,iLenStr;

 iLenStr=this->LenStr();
 for(i=0;i<iLenStr;i++)
 {
     if(pStrBuffer[i]==32||pStrBuffer[i]==9)
        iCt++;
     else
        break;
 }
 if(iCt)
 {
    for(i=iCt;i<=iLenStr;i++)
        pStrBuffer[i-iCt]=pStrBuffer[i];
 }

 return *this;
}


String String::RTrim()
{
 unsigned int iCt=0, iLenStr;

 iLenStr=this->LenStr()-1;
 for(unsigned int i=iLenStr; i>0; i--)
 {
     if(this->pStrBuffer[i]==32 || this->pStrBuffer[i]==9)
        iCt++;
     else
        break;
 }
 this->pStrBuffer[this->LenStr()-iCt]=0;

 return *this;
}


String String::Trim()
{
 this->LTrim();
 this->RTrim();

 return *this;
}


unsigned int String::ParseCount(const TCHAR c)  //returns one more than # of
{                                              //delimiters so it accurately
 unsigned int iCtr=0;                          //reflects # of strings delimited
 TCHAR* p;                                      //by delimiter.

 p=this->pStrBuffer;
 while(*p)
 {
  if(*p==c)
     iCtr++;
  p++;
 }

 return ++iCtr;
}


void String::Parse(String* pStr, TCHAR delimiter)
{
 unsigned int i=0;
 TCHAR* pBuffer=0;
 TCHAR* c;
 TCHAR* p;

 pBuffer=new TCHAR[this->LenStr()+1];
 if(pBuffer)
 {
    p=pBuffer;
    c=this->pStrBuffer;
    while(*c)
    {
     if(*c==delimiter)
     {
        pStr[i]=pBuffer;
        p=pBuffer;
        i++;
     }
     else
     {
        *p=*c;
        p++;
        *p=0;
     }
     c++;
    }
    pStr[i]=pBuffer;
    delete [] pBuffer;
 }
}


int String::iVal()
{
 return _ttoi(this->pStrBuffer);  //atoi
}


String String::CStr(const int iNum)
{
 String sr;
 _stprintf(sr.pStrBuffer,_T("%d"),iNum);
 return sr;
}


String String::CStr(const unsigned int iNum)
{
 String sr;
 _stprintf(sr.pStrBuffer,_T("%u"),iNum);
 return sr;
}


String String::CStr(const short int iNum)
{
 String sr;
 _stprintf(sr.pStrBuffer,_T("%d"),iNum);
 return sr;
}


String String::CStr(const double dblNum)
{
 String sr(32);
 _stprintf(sr.pStrBuffer,_T("%f"),dblNum);
 return sr;
}


int String::LenStr(void)
{
 return _tcslen(this->pStrBuffer);
}


TCHAR* String::lpStr()
{
 return pStrBuffer;
}

void String::Print(bool blnCrLf)
{
 _tprintf(_T("%s"),pStrBuffer);
 if(blnCrLf)
    _tprintf(_T("\n"));
}

Let me know if you have any difficulties. With code optimazation turned on for striping the executable 'Os', O1 and minimum size I get about 26K with CodeBlocks. The PowerBASIC Console Compiler exe came in around 21. I have an older version of my string class not optimized for minimizing memory allocations, but when I used that it only knocked about a K off the exe size.

jonsca commented: Great Effort! +2

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

esdftw 0 Newbie Poster · Answer 1 · 2010-02-03T23:40:33+00:00

an extract from the text file:
--------------------

["link"] = "|cff1eff00|Hitem:7931:0:0:0:0:0:0:0:80|h[Mithril Coif]|h|r",
},
[45] = {
["count"] = 1000,
["link"] = "|cff0070dd|Hitem:41165:0:0:0:0:0:0:0:80|h[Saronite Razorheads]|h|r",
},
----------------------------------
The value after "Hitem" between those colons is what I need first. Also, some of the items have a "count" value associated with them, and some do not. The "count" value also needs to be located and kept with it's item where appropriate.

esdftw 0 Newbie Poster · Answer 2 · 2010-02-06T04:04:00+00:00

That's confusing looking data. Does every line have the same structure (even though the data is different) with the same number of delimiters? I'm seeing pipe characters which you mentioned were being used as delimiters, but I'm also seeing commas.

mmhm, it's a whole jumble of crap in there which is making this much harder for me.

Frederick2 189 Posting Whiz · Answer 3 · 2010-02-06T05:19:57+00:00

I've never done it here but you can apparently attach files. If you can whittle your file down to just a few K, and identify the data you would like parsed out, I'll see what I can do.

Lerner 582 Nearly a Posting Maven · Answer 4 · 2010-02-06T09:49:33+00:00

Based on representative file lines the following strategy using sequential calls to find() and substr() and using a stringstream to convert an STL string into a numerical data type might work. Below is narrative description of a possible protocol and a rough pseudocode of what it might look like:

use a loop to read each line in file into an STL string one line at a time
sequentially evaluate the input string for either the flag substring Hitem: or ["count"] = using find(). Break the original string into two substrings with breakpoint for second substring atarting at the index returned by the call to find() plus the length of the substring you found. Do another call to find() on the substring returned looking for a colon or a comma depending on which substring you found. Extract the number of char between the start of the substring and the position of the delimiter into a string that will represent the numerical value as a string. Send the numerical string to a stringstream and read into an int.

while(getline(inputString))
   //look for Hitem:
   index = inputString.find("Hitem:", 0);
   if(index != npos)
       temp = inputString.substr(index + 6)
       index = temp.find(":")
       numericalSubstring = temp.substr(0, index)
       stringstream ss(numercialSubstring)
       ss >> desiredNumber
   //look for ["count"] = using similar protocol for Hitem:

The pseudocode is pretty rough as it stands, but it looks plausible based on code sample posted and description of desired data to extract.

esdftw 0 Newbie Poster · Answer 5 · 2010-02-06T10:37:59+00:00

ya regex was on my mind, BOOST or Perl or Python are fine and none of them seem overwhelming....i've just never used regex or BOOST or Perl or Python...and I'm a c++ nub anyways.

Frederick2 189 Posting Whiz · Answer 6 · 2010-02-07T02:02:33+00:00

Well, I'm not sure where the line breaks are in the above blurb you posted, and that would likely be critical information. That's why I asked you to attach a real file with just a few lines.

But since everybody is talking external libraries or other languages (in terms of other languages, if one were to attempt to come up with a worse language than C++ for this sort of work, one would be really up against it to find one. The only thing that comes to my mind would be assembler), I used PowerBASIC ( www.powerbasic.com ), which is my main programming language for desktop Windows. So, assumming this data in a text file named Data.dat, here is the PowerBASIC program that parses it based on the pipe character as a delimiter, i.e., "|"

["link"] = "|cff1eff00|Hitem:7931:0:0:0:0:0:0:0:80|h[Mithril Coif]|h|r",},[45] = {["count"] = 1000,
["link"] = "|cff0070dd|Hitem:41165:0:0:0:0:0:0:0:80|h[Saronite Razorheads]|h|r",

1st code followed by output...

#Compile Exe
#Dim All

Function PBMain() As Long
  Local fp As Integer
  Local strLn As String
  Local iLn As Long

  fp=Freefile
  Open "Data.dat" For Input As #fp
  Do While Not Eof(fp)
     Line Input #fp, strLn
     iLn=ParseCount(strLn,"|")
     Print iLn,strLn
  Loop
  Close #fp
  Waitkey$

  PBMain=0
End Function

'Output
'=================================================================================================================
' 6            ["link"] = "|cff1eff00|Hitem:7931:0:0:0:0:0:0:0:80|h[Mithril Coif]|h|r",},[45] = {["count"] = 1000,
' 6            ["link"] = "|cff0070dd|Hitem:41165:0:0:0:0:0:0:0:80|h[Saronite Razorheads]|h|r",

What the above program indicates is that there are six (6) pipe "|" delimited fields in each line. Next step we'll actually parse the lines based on that delimiter...

#Compile Exe
#Dim All

Function PBMain() As Long
  Local strArr() As String
  Local strLn As String
  Local fp As Integer
  Register i As Long
  Local iLn As Long

  fp=Freefile
  Open "Data.dat" For Input As #fp
  Do While Not Eof(fp)
     Line Input #fp, strLn
     iLn=ParseCount(strLn,"|")
     Print iLn,strLn  : Print
     Redim strArr(iLn-1)
     Parse strLn, strArr(), "|"
     For i=0 To UBound(strArr,1)
       Print i, strArr(i)
     Next i
     Print : Print
     Erase strArr()
  Loop
  Close #fp
  Waitkey$

  PBMain=0
End Function

' 6            ["link"] = "|cff1eff00|Hitem:7931:0:0:0:0:0:0:0:80|h[Mithril Coif]|h|r",},[45] = {["count"] = 1000,
'
' 0            ["link"] = "
' 1            cff1eff00
' 2            Hitem:7931:0:0:0:0:0:0:0:80
' 3            h[Mithril Coif]
' 4            h
' 5            r",},[45] = {["count"] = 1000,
'
'
' 6            ["link"] = "|cff0070dd|Hitem:41165:0:0:0:0:0:0:0:80|h[Saronite Razorheads]|h|r",
'
' 0            ["link"] = "
' 1            cff0070dd
' 2            Hitem:41165:0:0:0:0:0:0:0:80
' 3            h[Saronite Razorheads]
' 4            h
' 5            r",

Examining the above data one sees that the third line (2nd zero based) has the Hitem with the sought after colon and number. That line appears to have multiple colons ( : ) as delimiters so one could just invoke Parse again on that line to extract the number...

#Compile Exe
#Dim All

Function PBMain() As Long
  Local strArr(),strArr1() As String
  Local iLn,iLn1 As Long
  Local strLn As String
  Local fp As Integer
  Register i As Long

  fp=Freefile
  Open "Data.dat" For Input As #fp
  Do While Not Eof(fp)
     Line Input #fp, strLn
     iLn=ParseCount(strLn,"|")
     Print iLn,strLn  : Print
     Redim strArr(iLn-1)
     Parse strLn, strArr(), "|"
     For i=0 To UBound(strArr,1)
       If i=2 Then
          iLn1=ParseCount(strArr(i),":")
          Redim strArr1(iLn1-1)
          Parse strArr(i), strArr1(), ":"
          Print
          Print , "The Number You Are Looking For Is " strArr1(1)
          Print
          Erase strArr1()
       End If
       Print i, strArr(i)
     Next i
     Print : Print
     Erase strArr()
  Loop
  Close #fp
  Waitkey$

  PBMain=0
End Function

' 6            ["link"] = "|cff1eff00|Hitem:7931:0:0:0:0:0:0:0:80|h[Mithril Coif]|h|r",},[45] = {["count"] = 1000,
'
' 0            ["link"] = "
' 1            cff1eff00
'
'              The Number You Are Looking For Is 7931
'
' 2            Hitem:7931:0:0:0:0:0:0:0:80
' 3            h[Mithril Coif]
' 4            h
' 5            r",},[45] = {["count"] = 1000,
'
'
' 6            ["link"] = "|cff0070dd|Hitem:41165:0:0:0:0:0:0:0:80|h[Saronite Razorheads]|h|r",
'
' 0            ["link"] = "
' 1            cff0070dd
'
'              The Number You Are Looking For Is 41165
'
' 2            Hitem:41165:0:0:0:0:0:0:0:80
' 3            h[Saronite Razorheads]
' 4            h
' 5            r",

The count wouldn't be hard either. Anyway, as I've said, I'm not sure that is really the structure of your real data.

Years ago when I decided C++ was going to be an important language for me to learn & use, I wrote my own C++ implementations for most of the important things PowerBASIC does natively; particularly regarding Strings. So in my String class I have Parse() and ParseCount() member functions that operate exactly as PowerBASIC's shown above. I'll post C++ implementations of the above including my String class if you want.

dusktreader 137 Posting Whiz in Training · Answer 7 · 2010-02-07T02:42:00+00:00

But since everybody is talking external libraries or other languages (in terms of other languages, if one were to attempt to come up with a worse language than C++ for this sort of work, one would be really up against it to find one. The only thing that comes to my mind would be assembler), I used PowerBASIC ( www.powerbasic.com ), which is my main programming language for desktop Windows.

[rant]
Wow....just, wow. You're really bringing a BASIC derivative into the discussion? While people are talking about power houses like python and perl you suggest a BASIC compiler from 1989? "Here's a nickel kid, go buy yourself a real language."

By the way, there are worse languages for parsing strings than C++. Like, C for starters. But then there are the functional languages: Erlang, Haskell, ML, etc... It may be easier for you, an experienced PowerBASIC programmer (stifling chuckles), to parse strings in this language, but, please, spare others the pain of having to look at code that utilizes constructs like Redim strArr(iLn-1) [/rant]

esdftw 0 Newbie Poster · Answer 8 · 2010-02-07T12:32:52+00:00

I'll post C++ implementations of the above including my String class if you want.

PLEASE ! :)

esdftw 0 Newbie Poster · Answer 9 · 2010-02-08T05:00:08+00:00

Thank you all for the info and quick responses. I'll print this thread out and see what I can come up with. Will post back probably by mid-week with an update :)

Help find variable length number in text file

Recommended Answers Collapse Answers

All 13 Replies

Recommended Answers