I have a quesion for my problem with using mysql and perl programming.
I think that is a good question but to avoid confusion please start a new thread for it, instead of posting it as a reply to this thread's question.
I have a quesion for my problem with using mysql and perl programming.
I think that is a good question but to avoid confusion please start a new thread for it, instead of posting it as a reply to this thread's question.
So basically what you are saying onaclov2000 is that the only way to delete a line is to write the whole thing in a different file omitting the line you do not want that begins with the specified string?
Pardon my jumping in. I'm new to Perl but I have read there is such a thing as doing an in place edit on a file, which I think is what you are asking about. If you Google perl "in place edit" $^I you will see a lot has been written about it.
Now that you have given us a second example of a string to be parsed we need to infer a rule that will handle both examples. Say we assign your first example to a variable called $str1 and the second to $str2. Your program needs to parse both of the following strings into keys and values to put in a hash:
#RULE 1:
# Every substring consisting of capital letters followed by
# a hyphen (-) represents a key and whatever follows before the
# next substring consisting of capital letters followed by
# a hyphen is the value associated with that key.
$str1="OWN - NLM STAT- Publisher DA - 20091005 AU - Gannon AM AU - Turner EC AU - Reid HM AU - Kinsella BT AU- XYZ AD - UCD School of Biomolecular and Biomedical Sciences";
#
#RULE 2:
# A substring consisting of capital letters followed by a hyphen -
# represents a key only if it occurs at the beginning of a line.
# It may (or must?) be preceded by a space which we will
# remove when we put the key in the hash.
# Whatever follows, including substrings of capital letters
# followed by a hyphen, up until the end of the line
# (indicated by a newline character?) is the value associated
# with that key.
$str2=" AB - Thromboxane plays an essential role in hemostasis, regulating platelet aggregation and vessel tone. TP - beta- isoforms that are transcriptionally regulated by …
If you need to store the results in a hash you could do it like this:
#!/usr/bin/perl -w
#CountLetters.pl
use strict;
print "Enter string of letters: ";
chomp (my $input = <STDIN>);
my ($char, $count, $verb, $s, %hash);
print "\n";
while ($input =~ m/(.)/) {
#When all characters have been counted and replaced with Null,
# $input will not match /(.)/ and the loop will end
$char = $1;
#Count and remove all instances of this character -- lower and uppercase
#Replace $char with Null so we won't count same character again
$count = $input =~ s/$char//gi;
#Did we find more than one? Singular or plural?
$verb = $count == 1 ? "is" : "are";
$s = $count > 1 ? "s" : "";
print "There $verb $count '$char'$s.\n";
$hash{$char} = $count; #Store $char as hash key and $count as hash value
}
print "\nThe letter counts have been stored in a hash.\n";
while (($char, $count) = each %hash ){
print "Hash key '$char' has hash value $hash{$char}\n";
}
Unless you have to use an array you could also do the following.
#!/usr/bin/perl -w
#CountLetters.pl
use strict;
print "Enter string of letters: ";
chomp (my $input = <STDIN>);
my ($char, $count, $verb, $s);
print "\n";
while ($input =~ m/(.)/g) {
$char = $1;
#Count and remove all instances of this character -- lower and uppercase
$count = $input =~ s/$char//gi;
#Did we find more than one? Singular or plural?
$verb = $count == 1 ? "is" : "are";
$s = $count > 1 ? "s" : "";
print "There $verb $count '$char'$s.\n";
}
You're welcome. I took a second look at my output today and see that it still isn't quite right.:ooh: Both ways have their pros and cons but yours actually worked so that's what counts.:)
Sometimes it's simpler to read the entire document into one string variable and then apply a series of global substitute commands. The nice thing about this way is you can add a substitute command to your program, run it, visually inspect the output, then add another substitute command, test again until the output looks right.
I find this more intuitive sometimes because it's similar to what I would have to do if I didn't have time to write a program. I would have to load the document into a good text editor that allows regular expressions for search and replace, and keep running search and replace commands until the document has been tidied up. I tested the following with your ID.txt input file:
#!/usr/bin/perl -w
#ParseFile.pl
use strict;
my ($f1, $f2) = @ARGV;
open (INFILE, $f1) || die "Can't open $f1: $!";
open (OUTFILE, ">$f2") || die "Can't open $f2: $!";
undef $/; #When $/ doesn't contain a record-end character Perl reads entire file
my $string = <INFILE>; #Read entire file into a string variable
$/ = "\n";
my $stringout = $string;
$stringout =~ s/^\\\\//gm; #Remove double backslashes at start of any line
$stringout =~ s/^(JASSS:|ID:|Date:|Title:|Address:|Author:)(.*)\n/$1$2/gm; #Remove extra newlines
$stringout =~ s/\n^Author:/Author:/gm; #Remove extra newline before Author
$stringout =~ s/^ID:/\\\n$&/gm; #Put a single backslash on the line before ID:
#Remove the extra blank lines and single backslash at the start of the document (not global)
$stringout =~ s/^\s*\\//m;
print OUTFILE $stringout;
close INFILE;
close OUTFILE;
This may work for you. From looking at the files as they appear in your post (without code tags) it's hard to know if there are supposed to be spaces, carriage returns, or line-feed characters separating the records, or whether they are fixed or variable length.
I attached the input files used to test the following script, plus the resulting output file.
#!/usr/bin/perl -w
#RegExSlurp.pl for jacquelinek
use strict;
my ($v1, $v2, $v3) = @ARGV;
#$v1 is the namelist file
#$v2 is the database filename
#$v3 is the desired output filename
open (FILEHANDLE, $v1) || die;
open (DATABASE, $v2) || die;
open (RESULTS, ">$v3");
my @namelist = <FILEHANDLE>; #Read entire namelist file into an array;
my $save = $/; #To restore after undef
undef $/; #Enter "file-slurp mode"
my $db = <DATABASE>; #Read entire DATABASE into $db string
$/ = $save; #Restore default record separator
print "\n";
foreach my $i (@namelist) {
chomp($i);
if ($db =~ /^($i) detail information\s*([a-z\r\n\s]+)/m){
#print "Name is: $1\n details are: $2\n";
print RESULTS "$1_XXX\n$2";
}
else {
print RESULTS "$i not found in database\n";
}
}
close (FILEHANDLE);
close (DATABASE);
close (RESULTS);
exit;
A12345 detail information
nvonafwenfovosdncsjdnfoewhuwerhwieufhiudhfisdfnsd
sdofnowerugfeuhgfurhgiuwerhfjdshfiasdhifheruwufhi
irgfiweurgf
A246 detail information
isdofnowerugfeuhgfurhgiuwerhfjdshfiadhifheruwufhi
wgerjgneiguihuhdnvkjdnvkjbdegiauberiubgieubgridfb
ooogrngoawerngiauengugbuivrug
B153875 detail information
wgerjgneiguihuvkwwjddnvkegtiaugberijubgieubgridfb
eragnowergnoweungfiousdhiuhsdnjkfnsk
C34893 detail information
fnweuraiwerbgivjbdbvurgfuwherugtheurhguhweriguhdg
sdgnasoughiueghaiwuh
A12345
B153875
C34893
A12345_XXX
nvonafwenfovosdncsjdnfoewhuwerhwieufhiudhfisdfnsd
sdofnowerugfeuhgfurhgiuwerhfjdshfiasdhifheruwufhi
irgfiweurgf
B153875_XXX
wgerjgneiguihuvkwwjddnvkegtiaugberijubgieubgridfb
eragnowergnoweungfiousdhiuhsdnjkfnsk
C34893_XXX
fnweuraiwerbgivjbdbvurgfuwherugtheurhguhweriguhdg
sdgnasoughiueghaiwuh
One more suggestion: if you still want to use the RegEx pattern that specifies each punctuation character you want to remove, try preceding the " and the ' characters with a backslash in the RegEx pattern, like this: var myRegExPattern = /[\-,?~!@#$%&*+\-\'=\"]/g
On my platform it doesn't make any difference, it matches and replaces the quotes either way, but might as well try it. (Notice that I had to escape the hyphen - character because otherwise it has a special meaning in RegEx character classes.)
Good luck. I'm learning about javascript today by looking at the tutorial at http://www.w3schools.com/js/default.asp. Their "Try It Yourself" button is handy for testing something quickly.
That's strange. I tested the following and the same RegEx did remove my single and double quotes. I modified the script so you can type in any text you want in the prompt box.
<SCRIPT LANGUAGE="JavaScript" TYPE="text/javascript">
<!-- Hide from browsers that do not support JavaScript
var myRegExPattern = /[\-,?~!@#$%&*+\-'="]/g
var query = "Let's test to see if 'single quotes' and \"double quotes\" are removed.";
query = prompt("Please enter your query",query);
var cleanquery = query.replace(myRegExPattern, "");
alert("Cleaned query is: " + cleanquery);
document.writeln("Original query was:<br>")
document.writeln(query)
document.writeln("<p></p>")
document.writeln("Cleaned query is:<br>")
document.writeln(cleanquery)
// --> Finish hiding
</SCRIPT>
I'm testing on a Windows Vista system with regional settings for North America. If you are using a different platform with different regional settings maybe that would give you different results.
Would it be simpler to make a RegEx pattern for the characters that you do want in your query? If you want only alphanumeric characters and spaces in your query you could make a Regex to find any character not in the specified character set. For example:
<HTML>
<HEAD>
<TITLE>Testing RegEx in Javascript</TITLE>
<SCRIPT LANGUAGE="JavaScript" TYPE="text/javascript">
<!-- Hide from browsers that do not support JavaScript
var myRegExPattern = /[^\w\s]/g
var query = "Let's test to see if 'single quotes' and \"double quotes\" are removed.";
query = prompt("Please enter your query",query);
var cleanquery = query.replace(myRegExPattern, "");
alert("Cleaned query is: " + cleanquery);
document.writeln("Original query was:<br>")
document.writeln(query)
document.writeln("<p></p>")
document.writeln("Cleaned query is:<br>")
document.writeln(cleanquery)
// --> Finish hiding
</SCRIPT>
</HEAD>
<BODY>
<p> HTLM body …
You can create a regular expression like the one in the variable called myRegExPattern to match any character listed between the square brackets. The square brackets define a "character class" aka "character set". Then replace each character that is matched.
<SCRIPT LANGUAGE="JavaScript" TYPE="text/javascript">
<!-- Hide from browsers that do not support JavaScript
var myRegExPattern = /[\-,?~!@#$%&*+\-'="]/g
var query = "Then it's one, two, three strikes you're out at the old, ball-game!..."
query = query.replace(myRegExPattern, "");
document.writeln(query)
// --> Finish hiding
</SCRIPT>
This should do it.
#!/usr/bin/perl -w
#ParseStringCreateHash.pl
use strict;
$_ = "OWN - NLM STAT- Publisher DA - 20091005 AU - Gannon AM AU - Turner EC AU - Reid HM AU - Kinsella BT AU- XYZ AD - UCD School of Biomolecular and Biomedical Sciences";
print "\n";
#Parse the above string into key-value pairs.
s/([A-Z]+\s*)-/###$1###/g;
#Put the resulting substrings into a array
my @array = [];
@array = split /\s*###/;
#Remove leading and trailing spaces from each element
foreach my $i (@array) {
$i =~ s/^\s+//;
$i =~ s/\s+$//;
}
#The first element is empty. Remove it with shift.
shift @array;
#Here's what the array contains now
print "Here's what the array contains now:\n";
for (0..$#array) {
print "Element $_: $array[$_]\n";
}
#Create a hash from the above array
my %hash = ();
for (0..$#array) {
if ($_%2 eq 0) { #$_ is even
#Array element represents a hash key
unless (exists $hash{$array[$_]}) {
$hash{$array[$_]} .= "";
} }
else { #$_ is odd
#Array element[$_] represents part of value associated with previous element-key
$hash{$array[$_ - 1]} .= "$array[$_] "
};
}
print "\nHash keys and values separated by '-' :\n";
for (keys %hash) {
print "$_ - $hash{$_}\n";
}
The following program parses the string and stores the substrings in an array or list. So far I haven't figured out how to put the contents of this array into a hash that would be useful.
#!/usr/bin/perl -w
use strict;
$_ = "OWN - NLM "
. "STAT- Publisher "
. "DA - 20091005 "
. "AU - Gannon AM "
. "AU - Turner EC "
. "AU - Reid HM "
. "AU - Kinsella BT "
. "AU- XYZ "
. "AD - UCD School of Biomolecular and Biomedical Sciences";
print;
print "\n";
#Parse the above string into key-value pairs.
s/([A-Z]+\s*)-/###$1###/g;
#Put the resulting substrings into a list
my @list = [];
@list = split /\s*###/;
#Remove leading and trailing spaces from each element
foreach my $i (@list) {
$i =~ s/^\s+//;
$i =~ s/\s+$//;
}
#The first element is empty. Remove it with shift.
shift @list;
#Here's what the list contains now
for (0..$#list) {
print "Element $_: $list[$_]\n";
}
Please try the following as well. It works on my computer now.
#!/usr/local/bin/python
# -*- coding: utf-8 -*-
import os, sys
#DictManipulation.py written 26/09/09
#See flow_dia_for_new_dict.ods, this directory
#
# A text file that is an English to Greek dictionary and has 2 columns.
# As there are many entries it takes up 25 or so pages. The object is to
# produce a file that has 2 x 2 columns and so producing half the number
# of pages. The 2nd columb generated should be the next 72 entries following
# the first column so the alfa order is maintained.
#
# A sample of the original file is shown below
#
#August Αύγουστος (ο)
#aunt θεία η
#autumn φθινόπωρο (το)
#bad, ugly (adj.) άσχημος, -η, -ο
#bad, wicked, evil (adj.) κακός, -ή, -ό
#bank τράπεζα (η)
#basket καλάθι (το)
def rpad (orig_string, length):
"""Adds spaces to the end of a string until it has the desired length"""
#Convert to unicode utf-8 because if your default encoding is Ascii the encode step fails
ustring = unicode(orig_string,"utf-8","strict")
#In a plain string Greek symbols have length = 1 instead of 2
plain_string = ustring.encode("cp737", "replace")
spaces_needed = length - len(plain_string)
padded_string = orig_string + " " * spaces_needed
return padded_string
lines_per_page = 73
current_page = 0
line_pointer_a = 1
line_pointer_b = line_pointer_a + lines_per_page
write_file = open ("newDict.txt", "w")
read_file = open ("origDict.txt", "r")
lines_in_file = read_file.readlines ()
read_file.close()
lengh_file = len (lines_in_file)
new_line = '\n'
while line_pointer_a != (lines_per_page …
Now that we can recreate the problem I'm confident that we're close to finding a way around it. I just need to learn a little more about how to format a unicode string. Maybe something to do with encoding or decoding it to set the length we want when formatting.
I'll try to look at it some more this afternoon, unless someone else finds the answer before then.
No solution yet. Now it's driving me crazy.:)
What I think is happening is that Python knows how to read your input file and write to your output file even though there are non-Ascii characters represented in them because the # -*- coding: utf-8 -*-
statement informs it. However when I examine the contents of lineA using Idle, it shows me that it has content like the following:
>>> lineA
'2 \xce\xb4\xcf\x8d\xce\xbf (also \xce\xb4\xcf\x85\xce\xbf)'
>>>
>>> len(lineA)
50
>>> lineB
'but \xce\xbc\xce\xb1'
>>> len(lineB)
34
>>> #...So when we adjust the widths of lineA and lineB, it is the above unicode sequences we are manipulating
>>> combined_AB = "%-56s%-60s" % (lineA,lineB)
>>> combined_AB
'2 \xce\xb4\xcf\x8d\xce\xbf (also \xce\xb4\xcf\x85\xce\xbf) but \xce\xbc\xce\xb1 '
>>> len(combined_AB)
116
>>> #You see the problem? Python does the length adjusting BEFORE it converts the unicode sequence into the record that it writes to the output file.
Thanks Chico2009. I'm interested because here in Canada we have two official languages so it's good to learn how this encoding thing works. I just tried testing something in Python's Idle and it complained right away about "unsupported characters".
I have to go to lunch now but I'll look at your attachment later this afternoon if possible.
If I understand correctly, the above shows 14 examples of output records -- that is, 14 examples of the contents of newString after the newString = '%-30s%-30s%s' % ( lineA , lineB, new_line)
runs.
What I'd like to do is run that command myself to reproduce your output and try to modify the command to avoid the irregular-looking output.
Could you show us one or two examples of the initial contents of the lineA, lineB, and new_line variables? I'm a Python beginner as well so I don't know all the answers, but I think if we knew some sample values for the input variables we would have something to test.
I can't see where the columns are in your example because this website reduces multiple spaces to single spaces. If you wrap your example in code tags (click the "Help with Code Tags" link for instructions) we can see how your example data really is appearing to you.
For example, when I type:
'This is lineA This is lineB' without the code tags you can't see the 20 spaces that I typed between 'lineA' and 'This'.
With the code tags, the same example looks like this:
'This is lineA This is lineB'
Blank lines won't make any difference to how the program runs. Since I'm new to Perl I probably don't have a good sense of style yet. But I was trying to separate my code into three sections: introductory lines that appear in all programs, followed by setting up and printing the test string, followed by modifying and printing the resulting string.
Sorry, I read your example wrong. This should do it. Too bad the substitute command uses slashes to delimit the two strings. It confuses me, but it should work this time.
#!/usr/bin/perl -w
#ReplaceSlashWithBackslash.pl
use strict;
my $myPath = 'c:/perl/test'; # Example string containing backslashes
print "\nOriginal string is: $myPath\n\n";
# The following command replaces forward slashes with backslashes
# You need to escape the backslash and slash with backslashes
$myPath =~ s/\//\\/g; #There must be a better way, but I don't know it.
print "Modified string is: $myPath\n";
#!/usr/bin/perl -w
#ReplaceBackslashWithSlash.pl
use strict;
my $myPath = 'c:\perl\test'; # Example string containing backslashes
print "\nOriginal string is: $myPath\n\n";
# The following command substitutes backslashes with forward slashes
# You need to escape the backslash and slash with backslash
$myPath =~ s/\\/\//g;
print "Modified string is: $myPath\n";
Instead of trying to make up new incremented variable names on the fly, you can create an empty list and append each intFCITC value to it. The elements of a list are sequenced in the order in which they were appended and can be referred to by index. The first element appended to the list has index of zero.
noline=0
intFCITC_list = [] #Empty list where we will put all values of intFCITC
for line in open("fcitc.txt"):
if noline==1:
fcitc=line[:9].strip()
intFCITC = int(float(fcitc)) #OP wants this to be integer.
intFCITC_list.append(intFCITC)
limConst=line[29:79].strip()
noline+=1
elif noline==2:
contDesc=line[90:137].strip() #Added subscripts. Don't want entire line.
print fcitc
print limConst
print contDesc
noline=0
if "FCITC" in line:
noline+=1
print "\nList of intFCITC values:" #All the intFCITC values should be in intFCITC_list
print intFCITC_list
#Refer to any element by its index
print "intFCITC_list[0] = ", intFCITC_list[0] #the first one is FMPP to FPL
print "intFCITC_list[1] = ", intFCITC_list[1] #the second one is FMPP ro PEF
print "intFCITC_list[2] = ", intFCITC_list[2] #FMPP to TECO?
if coursenum in courselistA: #courselistA is the set of course numbers
#Convert courselistA into a list and use list comprehension
print [x for x in list(courselistA) if x > coursenum]
jice's solution looks good to me, unless you want only a slice of the second data line and want to convert the FCITC string into an integer. Here it is with a couple of minor changes to do that.
noline=0
for line in open("fcitc.txt"):
if noline==1:
fcitc=line[:9].strip()
intFCITC = int(float(fcitc)) #OP wants this to be integer.
limConst=line[29:79].strip()
noline+=1
elif noline==2:
contDesc=line[90:137].strip() #Slice from line 2. Don't want entire line.
print intFCITC
print limConst
print contDesc
noline=0
if "FCITC" in line:
noline+=1
The above needs another element at the end of the list so it will handle a perfect score of 100%. I modified the program to test all possible values. It's still not elegant.
def CalcGrade(grade):
"""Converts a numeric grade between 0 and 100 into a letter between 'A' and 'F' """
index = int(grade)/10
#Added another "A" to grades list to handle perfect grade of 100
grades = ["F", "F", "F", "F", "F", "F", "D", "C", "B", "A", "A"]
final = grades[int(index)]
return final
#grade = raw_input("Enter the student's grade: ")
# Commented out the raw input and used loop to test all values
for x in range(101): #Test for values between 0 and 100
print x, " Converts to ", CalcGrade(x)
I think you can use pretty much the same list that you used for the simpler problem. All you need to come up with is a formula to calculate the index for each grade.
def main():
grade = raw_input("Enter the student's grade: ")
index = int(grade)/10
grades = ["F", "F", "F", "F", "F", "F", "D", "C", "B", "A"]
final = grades[int(index)]
print final
main()