Hello friends,

I am working an a section of application which needs to Parse CSV Logs generated by PostgreSql server.

  • The Logs are stored C:\Program Files\PostgreSQL\9.0\data\pg_log

  • The Server version in 9.0.4

  • The application is developed in C Sharp

    • The basic utility after Parse the Log is to show contents in a DataGridView.
    • There are other filter options like to view log contents for a particular range of Time for a Day.

However the main problem that is, the Log format is not readable

It was first tested with A Fast CSV Reader

A Sample Log data line

 2012-03-21 11:59:20.640 IST,"postgres","stock_apals",3276,"localhost:1639",4f697540.ccc,10,"idle",2012-03-21 11:59:20 IST,2/163,0,LOG,00000,"statement: SELECT id,pdate,itemname,qty from stock_apals order by pdate,id()",,,,,,,,"exec_simple_query, .\src\backend\tcop\postgres.c:900",""

As you can see the columns in the Log are comma separated , But however individual values
are not Quote Enclosed.

For instance the 1st,4rth,6th .. columns

Is there a utility or a Regex that can find malformed columns are place quotes

This is especially with respect to performace, becuase these Logs are very long and
new ones are made almost every hour

I just want to update the columns and use the FastCSVReader to parse it.

Thanks for any advice and help

Recommended Answers

All 8 Replies

Why do you care if they are quote enclosed? You only do that for strings, not for other data types.

what happens wrong is when it reaches the column where sql statement is place. it also has commas set for table columns. The log line is a mix bunch of quote-enclosed and non-quote-enclosed column. is there is a regex or utility to convert the non-quoted column to quoted column

on CSVReader page it highlights "This reader supports fields spanning multiple lines. The only restriction is that they must be quoted, otherwise it would not be possible to distinguish between malformed data and multi-line values."

You could load that with Excel (COM classes) as Excel will parse that the way you expect.
Of course, it's fairly slow (depending on how you do it) and has a learning curve.
...and the COM object requires Excel to be installed.

Depending on how often you have to do this, you could manually load it into Excel and save it as a tab-delimited file.
If this is long term, you might see what other archve output options are available for the log. You might be able to pick a different delimiter (like tab or a custom one like a caret ^ ).

I cannot use office automation. secondly this is a automated procedure

But this suggestion does hold good, becuase when i opened the file it correctly identifies the column

What are your options for changing the delimeter, then?

Still not seeing your problem. The SQL statements are quote enclosed and the reader should handle them properly. Maybe if you explained exactly what it isn't doing that you think it should, it would help.

The problem is the parser knocks the below column into more columns although they are within quotes.

"statement: SELECT id,pdate,itemname,qty from stock_apals order by pdate,id()"

secondly the parser requires that all columns must be Quote enclosed, although i have read that CSV columns can be without quotes if it is not a string/text type.

The solution is either we create a parser that runs a loop , and picks columns within quotes as it is ignoring all commas, in C sharp (do you know of any such code)

OR we place quotes in all columns
(which is double the task, becuase we have to read log twice and these are long logs and get updated rather frequently)

ok i have found a solution from https://bitbucket.org/pabdulin/gorgon/src

 namespace So9817628 
{     
using System.Data;     
using System.Text;     
using Gorgon.Parsing.Csv;      
class Program     
{         
static void Main(string[] args)         
{             
// prepare             
CsvParserSettings s = new CsvParserSettings();             
s.CodePage = Encoding.Default;             
s.ContainsHeader = false;             
s.SplitString = ",";             
s.EscapeString = "\"\"";             
s.ContainsQuotes = true;             
s.ContainsMultilineValues = true;             
// uncomment below if you don't want escape quotes ("") to be replaced with single quote             //s.ReplaceEscapeString = false;              
CsvParser parser = new CsvParser(s);              
DataTable dt = parser.ParseToDataTableSequential("multiline_quotes.txt");              dt.WriteXml("parsed.xml");         
}     
} 
}
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.