I think you'll just have to search through it one bufferSize at a time... you'll certainly have to read it to see where the data is...
Infarction
Posting Virtuoso
1,580 posts since May 2006
Reputation Points: 683
Solved Threads: 53
Is there a way using file handling functions to search within a text file to find the position of certain text WITHOUT having to read each character/line and evaluating it?
Is it possible to find a file in a file cabinet without opening the cabinet?
No, the only way to find something is to look at the contents.
WaltP
Posting Sage w/ dash of thyme
10,492 posts since May 2006
Reputation Points: 3,348
Solved Threads: 943
Is there a way using file handling functions to search within a text file to find the position of certain text WITHOUT having to read each character/line and evaluating it?
the short answer is no. but you may not have to do this; let the standard library do the work.For example, let's say I have a file of 'messages', where a message has a distinct start and end characters. I would like to go through the file and locate the positions of each start of a message. I would then index this (let's say with an stl map of each message # to it's starting position in the file. Later on, I could then ask for a particular message # by jumping to that position in the file (let's say with fseek) and read just that portion of the file.
here is an example of how to do this:
//23456<890<2345678<0123<567890
//45<789<1234567<90123456<890
#include <iostream>
#include <fstream>
#include <vector>
#include <iterator>
#include <algorithm>
using namespace std;
int main()
{
vector<streampos> index ; // filepos where we find a '<'
// create index
{
ifstream file(__FILE__) ;
enum { BUFFER_SIZE = 1024*1024*256 } ; // a larger buffer can improve
vector<char> large_buffer(BUFFER_SIZE) ; // performance for very large files
file.rdbuf()->pubsetbuf( &large_buffer.front(), large_buffer.size() );
file >> noskipws ;
istream_iterator<char> begin(file), end ;
begin = find( begin, end, '<' ) ;
while( begin != end )
{
index.push_back( file.tellg() + streamoff(-1) ) ;
begin = find( ++begin, end, '<' ) ;
}
}
// verify that index contains the right offsets
copy( index.begin(), index.end(), ostream_iterator<streampos>(cout," ") ) ;
cout << '\n' ;
ifstream file(__FILE__) ;
for( vector<streampos>::size_type i = 0U ; i<index.size() ; ++i )
{ file.seekg( index[i] ) ; char c ; file.get(c) ; cout << c << ' ' ; }
cout << '\n' ;
}
here is the output:
g++ -std=c++98 -Wall ./create_index.cpp ; ./a.out
7 11 19 24 36 40 48 57 71 91 110 128 148 204 252 411 590 647 777 903 931 932 985 1018 1098 1099 1103 1104 1122 1123
< < < < < < < < < < < < < < < < < < < < < < < < < < < < < <
note: this is on a freebsd system; on windows there would be two (not one) characters at end of line.
vijayan121
Posting Virtuoso
1,606 posts since Dec 2006
Reputation Points: 1,159
Solved Threads: 287
vijayan121 , I don't think ifstream works for files over 2GB, which mine is. (7GB)
If this is on an MS-Windows machine, them you may have to use win32 api file i/o directly on that large a file. See CreateFile() to open the file, and ReadFile() to read its contents. All those functions can work with huge files.
Ancient Dragon
Retired & Loving It
30,040 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,341
vijayan121 , I don't think ifstream works for files over 2GB, which mine is. (7GB)
true, unless you are using a standard library implementation like one from dinkumware, and that too on a
64-bit architecture.
here is something you could try.
a. map chunks of the file (say 256 MB each) into memory.
how you would do this depends on the platform:
unix: use mmap (compile with -D_FILE_OFFSET_BITS=64 to make sure that off_t is a 64-bit value.
linux: same as unix, but i think kernels prior to something like 2.6.10 are buggy with large
files which are memory mapped.
windows: the CreateFile/CreateFileMapping/MapViewOfFile triplet
b. wrap anstlsoft::basic_string_view around the chunk that is mapped.
eg. stlsoft::basic_string_view str( static_cast(address), nchars ) ;
download stlsoft from http://www.synesis.com.au/software/stlsoft .
for basic_string_view<> documentation, see:
http://www.synesis.com.au/software/stlsoft/doc-1.9/classstlsoft_1_1basic__string__view.html
stlsoft library is header-only; you need only #include the requisite files to access the functionality.
c. stlsoft::basic_string_view<> does not have the find family member functions as in std::string;
but do have provide polymorphic iterators. so, functions like find in the header
could be used.
eg. std::find( str.begin(), str.end(), '*' ) ;
vijayan121
Posting Virtuoso
1,606 posts since Dec 2006
Reputation Points: 1,159
Solved Threads: 287