#Tutorial - Content extraction using Apache Tika From the official website: > The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more. In this tutorial we will try and implement the four most important features of Apache Tika (as of version 1.14). ##Table of contents 1. Is this tutorial for me? 1. Requirements 1. How can I detect a file's type? 1. …

Member Avatar
+0 forum 0

Hi,all I need a java parser for parsing any java file in any package,not only existing file ...Can i do this ? how?

Member Avatar
Member Avatar
+0 forum 10

I have strings representing an angle, of the format ddd°mm'ss''. I want to get to the three constituents: degrees, minutes and seconds. I first chopped off the seconds like this: `angleString = “ddd°mm'ss''”;` `string str = angleString.Substring(0, angleString.Length - 2);` Then I tried to do this, to get to the degrees,minutes and seconds: `string[] items = str.Split(new char[] { '°', '''});` But it gave an error on the ' character, saying: "empty char literal". I solved by using this: `string[] items = str.Split(new char[] { '°', char.Parse("'") });` I find this rather clumsy. It would be great if anyone over …

Member Avatar
Member Avatar
+0 forum 3

Hi Friends, I want to do a project for Parsing Resume in C#. i.e when we upload resumes(More than 100), it should extract Name, email id, phone no, skills. **Please don't tell that software’s are available. I tried those soft wares, but they are not working properly.** So, I wanted to do by myself. Please Help me.

Member Avatar
Member Avatar
+0 forum 1

I am using PHP Simple HTML DOM Parser to fetch urls, but i got an error while fetching links. Have a look at this script: $result = str_get_html($result); foreach($result->find('a') as $element) $result = str_get_html($result); $result = str_replace('http://', '', $result); foreach($result->find('a') as $elementa) echo $element->href; echo $elementa->href; Here I want to fetch all links for twice, first time urls in `$element->href` will fetch links starting with `http://` and in `$elementa->href` will fetch links without `http://` But this shows only a blank page. Any idea?

Member Avatar
Member Avatar
+0 forum 9

Hello, I recently took a course on assembler/compiler construction. In it we covered parsing algorithms such as LL(n), LR(n), LALR(n), and SLR(n). I understand how these parsing algorithms can be used to determine **if** an input string follows a context free grammar (CFG). At some point I also understood how to tokenize the string using the grammar, however my assignment code is not commented (in hindsight that was a mistake) and I do not understand it anymore. Now I am in need of a parser to convert lines of code into useful tokens and cannot for the life of me …

Member Avatar
Member Avatar
+0 forum 3

All, What is the most efficient way of writing several lines of data to a *.txt file? Currently, I store the required text in memory then write it all to a user-specified text file at the end, similar to the following; Dim text As String = "" text = text + "1st line of text" & vbNewLine text = text + "2nd line of text" & vbNewLine text = text + "3rd line of text" & vbNewLine Dim path As String = "C:\Temp\fileName.txt" My.Computer.FileSystem.WriteAllText(path, text, True, System.Text.Encoding.Default) This works well enough for up to about 1000 lines of text. However, …

Member Avatar
Member Avatar
+0 forum 5

i am trying to use xerces in ubuntu 13.10, it is instaled, i can see the files in usr folder but i have no luck including it in eclipse cdt, i've found this thread "[Click Here](http://www.daniweb.com/hardware-and-software/linux-and-unix/threads/409769/ubuntu-11.10-xerces-c)" but it is dead and the answer is not clear for me, could anyone help me?

Member Avatar
+0 forum 0

What's a better way to do this? I need to sift through this file and store the following values Cpl File name Content kind Package type Encryption status Container File size Duration Timed text/png Number of audio channels 2d/3d Fps 1) I'm not sure if I should just store these as lists? 2) Should I use dictionaries ? Here's one sample file. [Click Here](http://tny.cz/aeca9b6c) And here's how I was attempting to sort through the file, with not as much luck as I would like. import glob import os import sys print(os.getcwd()) dir=raw_input(["Please enter directory location of dcp_inspect output"]) print (dir) …

Member Avatar
+0 forum 0

I am trying to parse a text file with 1200 songs. I want to increase the barcount every time I come across a '|' or a ']' character. I want to have a new line every time the barCount is equal to 4. However, if I come across '\n' I want to reset the bar count to 0. Unfortunately, what I have so far is only counting '|' and inserting a new line. Can anyone help me fix my code? This is some of what I have tried to do: while((start = chartDataString.find(" |", start)) != string::npos){ barCount++; start+=2; charCount++; …

Member Avatar
Member Avatar
+0 forum 1

when I click on the link for the posts http://www.telechargercours.com/post-sitemap.xml I receive an error XML Parsing Error: not well-formed Location: http://www.telechargercours.com/post-sitemap.xml Line Number 14427, Column 34: <image:caption>L'Ordre SELECT Élémentaire</image:caption> any idea about this error?

Member Avatar
Member Avatar
+0 forum 2

Hi, I tried parsing a multi-record genbank file (from this site: http://biopython.org/DIST/docs/tutorial/examples/ls_orchid.gbk) using the code below. The code returned an error: readline() on unopened filehandle at parser.pl line 62. The code: #!/usr/local/bin/perl -w use strict; my $record; print "Please type in the name of a file\n"; my $file = <STDIN>; chomp $file; while( $record = get_next_record($file) ) { my ($annotation, $seq) = get_dna ($record); open my $fh, '>', 'oufile.txt', or die "cant't open outfile:$!"; print "Sequence:\n\n", $seq, "\n"; close $fh or die "cant't open outfile:$!"; } sub get_dna { my ($file) = @_; my @annotation = (); my $seq = …

Member Avatar
+0 forum 0

Hello everyone, I'm an amateur programmer, and I am working on a program to help me practice my mental math. The essence of this program is that the user inputs an expression which will be the form of each question. Their goal is to solve those expressions as fast as they can. For example, let's say I input a quadratic formula of (ax^2 + bx + c). My program will then read each variable in the equation, and allow the user to specify minimum and maximum values for each variable, as well as the increments. From this, it will randomly …

Member Avatar
Member Avatar
+0 forum 2

I have a large data set (12,000 rows X 14 columns); the first 4 rows as below: x1 y1 0.02 NAN NAN NAN NAN NAN NAN 0.004 NAN NAN NAN NAN x2 y2 NAN 0.003 NAN 10 NAN 0.03 NAN 0.004 NAN NAN NAN NAN x3 y3 NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN x4 y4 NAN 0.004 NAN NAN NAN NAN 10 NAN NAN 30 NAN 0.004 I need to remove any row with "NAN" in columns 3-14 and then output the rest of the dataset. I wrote the following code: #!usr/bin/perl use warnings; …

Member Avatar
Member Avatar
+0 forum 4

Can someone help me with solving boolean expressions with the help of forward chaining. A good tutorial will also help me. Example: A.(A + B) = A A.(A + B) => A.A + A.B [Applying distributive law] A.A + A.B => A + A.B [Applying idempotency law] A + A.B => A.(1 + B) A.(1 + B) => A.(1) => A I have made huge efforts but still am unable to do this. The procedure would require parsing the boolean expression and then recursive rule checking. I was thinking about creating a binary tree of the expression and then doing …

Member Avatar
+0 forum 0

I've been looking for a way to parse a simple XML-like language for use as a type of data storage. I've been through stuff like JSON, XML, etc but I don't want to use them because they are quite slow. I just need a simple way to parse this: [stuff] [key1]data[/key1] [key2]data[/key2] [/stuff] And make/map it into a dictionary, like this: {"stuff":{"key1":"data", "key2":"data"}} I've made myself a generator which will process a dictionary according to the syntax rules: def generate_di(self, item) assert type(item) is dict for key in item: if type(item[key]) is dict: self.puts(self.strtag % key) # means we are …

Member Avatar
Member Avatar
+0 forum 7

I have a bit of a baffling problem! I'm writing a tag matcher in XML and whenever I run the below procedure, I get a garbage value after the name of every tag **except for the first one**. When I add the array text to the parameters list, the garbage value goes away (I had it there during initial debugging) even though it isn't actually referenced at any point in the procedure. Another thing, is that while reading a tag, it doesn't seem to want to recognize spaces. For example, <img src="img.png"/> reads as imgsrc="imgpng" instead of img, but only …

Member Avatar
Member Avatar
+0 forum 1

Hi all, I'm having one problem with disapling the json data into ul and li based content with using of foreach() json data from database is [{"id":1},{"id":2,"children":[{"id":3},{"id":4},{"id":5,"children":[{"id":6},{"id":7},{"id":8}]},{"id":9},{"id":10}]},{"id":11},{"id":12}] $loop = new RecursiveIteratorIterator( new RecursiveArrayIterator(json_decode($get_menu, TRUE)), RecursiveIteratorIterator::SELF_FIRST); foreach($loop as $mydata) { echo $mydata foreach($mydata->values as $values) { echo $values->value . "\n"; } }

Member Avatar
Member Avatar
+0 forum 1

(edit) this is solved, was a unicode issue. Hi I'm hoping someone has used the library [pugixml](http://pugixml.org/) I'm just trying to use a simple example provided but I'm not getting the result I expect. int _tmain(int argc, _TCHAR* argv[]) { pugi::xml_document doc; pugi::xml_parse_result result = doc.load_file("tree.xml"); //pugi::char_t * c = "Fail\0"; // used with as_string() method std::cout << "Load result: " << result.description() // output = "Load result: No Error " << ", mesh name: " << doc.child("mesh").attribute("name").value() // output = ", mesh name: " << std::endl; return 0; } I was expecting "Load result: No Error, mesh name: mesh_root" …

Member Avatar
+0 forum 0

I would really like for someone to take a little time and look over my code. I'm parsing some news content and I can insert the initial parse into my database which contains the news URL and the title. I'd like to expand it farther, to pass along each article link and parse the content of the article and include it in my database. The initial parsing works perfectly like this: <?php include_once ('connect_to_mysql.php'); include_once ('simple_html_dom.php'); $html = file_get_html('http://basket-planet.com/ru/'); $main = $html->find('div[class=mainBlock]', 0); $items = array(); foreach ($main->find('a') as $m){ $items[] = '("'.mysql_real_escape_string($m->plaintext).'", "'.mysql_real_escape_string($m->href).'")'; } $reverse = array_reverse($items); mysql_query ("INSERT …

Member Avatar
Member Avatar
+0 forum 1

Hi, I've just started out with Python and I've been stuck on this problem for a few hours now trying to parse a file into a certain format.. I am trying to create a list in a list out of a list. I currently have this list; ['MPNRRRCKLSTAISTVATLAIASPCAYFLVYEPTASAKPAAKHYEFKQAASIADLPGEVLDAISQGLSQFGINL', 'MQLVDRVRGAVTGMSRRLVVGAVGAALVSGLVGAVGGTATAGAFSRPGLPVEYLQVPSPSMGRSELPGWLQA', 'etc'] What I am trying to make is a list of each item, with the first 40 individual characters in the strings as items, while translating them into an int for example; M = 1.43, P=1.53, I want to be able to have each item as an individual list before putting …

Member Avatar
Member Avatar
+0 forum 2

Hi im an infant in perl scripting, Can any body help me in writing the code snippet for my need to parse xml file. i have an xml file which will have tag x,y,z repeating with different values through out the xml start and end tag, i need these different values of x,y,z to collect in excel sheet and write them in a row wise fashion. then read these data put in the excel in a row wise fashion and decide upon printing them based on the values read from the excel. kindly help me.

Member Avatar
Member Avatar
+0 forum 1

Please can someone help me and send me in the right direction. I am making a program that reads a text file and then out puts the data into a datagrid, the text file looks like this: [HRData] 84 73 0 -124 0 50 84 73 0 -124 0 50 84 84 0 -124 0 50 87 109 0 -124 0 50 82 120 0 -124 0 50 82 132 0 -124 0 50 83 143 0 -124 0 50 83 154 0 -124 0 50 85 171 77 -124 0 50 86 179 84 -125 121 3893 87 190 …

Member Avatar
Member Avatar
+0 forum 1

I am going to create graphics interface to intermediate code generated by the gcc. so the output from gcc is like ;; Function main (main, funcdef_no=1, decl_uid=2162, cgraph_uid=1) main () { int i; int c[10]; int b; int a; int D.2177; <bb 2>: a = 1; b = 20; if (a < b) goto <bb 3>; else goto <bb 7>; <bb 3>: i = 0; goto <bb 5>; <bb 4>: c[i] = 1; i = i + 1; ... <L9>: return D.2177; } and i want to parse it using **python** and to have graphical interface as graphviz's **.dot language …

Member Avatar
+0 forum 0

I am learning re module for practice I have taken an export of my phone addressbook which is a comma seperated text file, containing "First Name","Mobile Phone","Home Phone","Company","E-mail Address","Company Main Phone","Business Fax","Birthday" as of now I am more interested in First name , mobile phone number and email address. in my phone book I have first names with characters a-z in small caps as well as capitals with - and some also containig number or special characters so almost everything is included. so I build a search pattern like this `namePattern = re.compile('[a-zA-Z _@-]+')` this works fine but it also …

Member Avatar
Member Avatar
+0 forum 6

Hi, I have quite a large CSV file (Around 20,000 rows with about 20 columns) that I am trying to manipulate. Initially I am looking for a way to get out the first 10 or so records after they have been sorted in ascending order on one of the numeric fields. I've thought of loading the file into memory and then sorting it from there using one of PHPs sort functions but as expected, the memory is simply exhausted. My current thoughts on this would be to read line by line attempting to place each record into an array (Which …

Member Avatar
Member Avatar
+0 forum 12

Thanks in advance Shanti

Member Avatar
Member Avatar
+0 forum 6

hi all, I am working on C++ coding of fault simulation algorithm of a digital circuit . The first step involves parsing of netlist files. The sample netlist looks like - # Only 2,3 input gates considered ... INPUT(1) INPUT(2) INPUT(3) INPUT(6) INPUT(7) OUTPUT(22) OUTPUT(23) # comment here 10 = NAND(1, 3) 11 = NAND(3, 6) 16 = NAND(2, 11) 19 = NAND(11, 7) 22 = NAND(10, 16) 23 = NAND(16, 19) INPUT are the primary inputs, OUTPUT are the primary outputs, gates are the intermediate nodes. I hope some of you can imagine how this circuit will look like …

Member Avatar
Member Avatar
+0 forum 2

Hi all, Requirement: - I have a perl script that I need to run for several hours, for a performance test. - I would like to create a second script in ANSI C to call the perl script. The C script should be able to parse any of the text output from the perl script at run time, one line at a time. - This C code will be used by the performance tool, Loadrunner. The processed output will be used as real time performance statistics of a database. - I do not want to have to rewrite/port the entire …

Member Avatar
Member Avatar
+0 forum 2

This is error its throwing : javax.servlet.ServletException: /dtl.xhtml @13,75 value="#{registerBean.reteriveData()}" Error Parsing: #{registerBean.reteriveData()} javax.faces.webapp.FacesServlet.service(FacesServlet.java:606) org.netbeans.modules.web.monitor.server.MonitorFilter.doFilter(MonitorFilter.java:393) dtl.xhtml <h:dataTable value="#{registerBean.reteriveData()}" var="rb"> <h:column> <f:facet name="header"> <h:outputText value="Model" /> </f:facet> <h:outputText value="#{rb.fname}" /> </h:column> </h:dataTable> registerBean.java public List reteriveData() { Connection con = null; List list = null; try { DBConnection db = new DBConnection(); con = db.getConnection(); PreparedStatement ps = con.prepareStatement("select * from regtbl"); ResultSet rs=ps.executeQuery(); while(rs.next()) { registerBean rb=new registerBean(); rb.setFname(rs.getString(3)); list.add(rb); } } catch (SQLException ex) { Logger.getLogger(registerBean.class.getName()).log(Level.SEVERE, null, ex); } return list; } Please help to solve it ..... i am using tomcat/mysql

Member Avatar
Member Avatar
+0 forum 1

The End.