How to read a file and plot scatterplot in python?

Question

tcl76 0 Light Poster

14 Years Ago

hi,

i'm new to python. i want to read from a text file (as attached) and i want to plot a scatterplot. i want to plot lane as X-axis, EyVt and EyHt as Y-axis.
i have a sample code but i need help on how to get python start reading column Lane, EyVt and EyHt. Pls help. tq

import numpy as np
import pylab as pl

data=np.loadtxt('sampledata.txt')

pl.plot(data[:,0],data[:,1],'ro')
pl.xlabel('x')
pl.ylabel('y')
pl.xlim(0.0,10.)

pl.show()

eg of text file content:
Platform: PC
Tempt : 25
TAP0 :0
TAP1 :1

+++++++++++++++++++++++++++++++++++++++++++++
Port Chnl Lane EyVt EyHt
+++++++++++++++++++++++++++++++++++++++++++++
0 1 1 75 55
0 1 2 10 35
0 1 3 25 35
0 1 4 35 25
0 1 5 10 20
+++++++++++++++++++++++++++++++++++++++++++++
Time: 20s

python

sampledata.txt (0.3 KB)

Platform: PC
Tempt	: 25
TAP0	:0
TAP1	:1

+++++++++++++++++++++++++++++++++++++++++++++
Port	Chnl	Lane	EyVt	EyHt
+++++++++++++++++++++++++++++++++++++++++++++
0	 1	 1	 75 	55
0	 1	 2	 10	35
0	 1	 3	 25	35	
0	 1	 4	 35	25
0	 1	 5	 10	20
+++++++++++++++++++++++++++++++++++++++++++++
Time: 20s

2 Contributors
13 Replies
5K Views
15 Hours Discussion Span
Latest Post 14 Years Ago Latest Post by tcl76

Enders_Game 13 Newbie Poster

14 Years Ago

Unless there is going to be the same amount of lines every time in the data files I would use regular expressions
Heres a way that just works (note that its probably not the best way to write it :P, im not very used to re's yet).
What it does is reads through the whole file and if it matches the expressions. Then x.group(1) will be your lane, x.group(2) EyVt, and x.group(3)EyHt. Itll do this as many times as it has too. You can have it with 5 lines of data or 500.

import re
file = open("C:/Users/Enders/Desktop/sampledata.txt", "r")

for line in file:
    x = re.search("\d+\s+\d+\s+(\d+)\s+(\d+)\s+(\d+)", line)
    if x != None:
        print(x.group(1))
        print(x.group(2))
        print(x.group(3))

http://docs.python.org/library/re.html
For more on regular expresions.

Edited 14 Years Ago by Enders_Game because: n/a

Enders_Game 13 Newbie Poster

14 Years Ago

\d matches any digit (1 2 3 4) (so no letters) the + operator makes it repeat so \d+ would be true for (1 345 23 563456).
\s matches any whitespace character (tab space) + has same effect again.

the () means that its in a group. (\d+) So the first one gets assigned to .group(1) next to 2 and so on.

so \d+\s+ means that itll match "0 " The second will match "1 " and so on. Makes sense?

edit: like I said this isn't the best way to write the expressions (its very messy) Im pretty new at it, but it works! ^_^

Edited 14 Years Ago by Enders_Game because: n/a

Enders_Game 13 Newbie Poster

14 Years Ago

Because those lines dont match the expression, it returns None. (You can confirm this by trying to remove if x != None: and youll get an error). It will only have data in its group if it found something in the line that matches the expression.

Enders_Game 13 Newbie Poster

14 Years Ago

I was looking for different ways of writing the regular expression (I thought it looked messy) found another way.
Anyways if you wanna keep original code then.

x1 = []
y1 = []
y2 = []   
for line in file:
    x = re.search("\d+\s+\d+\s+(\d+)\s+(\d+)\s+(\d+)", line)
    if x != None:
        x1.append(x.group(1))
        y1.append(x.group(2))
        y2.append(x.group(3))
print(x1,y1,y2)

Second implementation

x1 = []
y1 = []
y2 = []
for line in file:
    numbers = re.findall("\d+", line)
    if len(numbers) == 5:
        x1.append(numbers[2])
        y1.append(numbers[3])
        y2.append(numbers[4])
print(x1, y1, y2)

Then loop through arrays later when actually plotting.

Edited 14 Years Ago by Enders_Game because: n/a

vegaseat commented: I like your second approach +13

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

tcl76 0 Light Poster · Answer 1 · 2011-01-23T04:24:25+00:00

thanks for your help. could you pls explain what does:\d+\s+\d+\s+(\d+)\s+(\d+)\s+(\d+
mean? i don't understand how it is able to find the Lane, EyVt, EyHt columns by using this.

tq

tcl76 0 Light Poster · Answer 2 · 2011-01-23T04:36:51+00:00

thanks for u r patience. re can be confusing. :)
how does it know that it needs to skip 7 lines and then skip Port, Chnl columns before extracting data from Lane, EyVt, EyHt?
how does know to stop extracting before reaching line +++++

tcl76 0 Light Poster · Answer 3 · 2011-01-23T04:50:46+00:00

i now combined the re matching code to get a plot but i only get 1 dot in the graph. appreciate any suggestions?

import re
import numpy as np
import pylab as pl

file = open("C:/Python25/myscript/plot/sampledata.txt", "r")

for line in file:
    x = re.search("\d+\s+\d+\s+(\d+)\s+(\d+)\s+(\d+)", line)
    if x != None:
##        print(x.group(1))       
##        print(x.group(2))
##        print(x.group(3))
        x1=x.group(1)
        y1=x.group(2)
        y2=x.group(3)
        

plot1=pl.plot(x1,y1,'r')
plot2=pl.plot(x1,y2,'go')


pl.title('Plot of y vs x')

pl.xlabel('x axis')
pl.ylabel('y axis')

pl.xlim(0.0,9.0)
pl.ylim(0.0,90.0)

pl.legend([plot1,plot2],('red line', 'green circles'),'best',numpoints=1)

pl.show()

Enders_Game 13 Newbie Poster · Answer 4 · 2011-01-23T04:58:27+00:00

Store it into an array. You're overwriting your variables everytime it loops again lol.

Also you should've changed the variables to something that sort of represents what they are...

If you have a lot of trouble ill post some code but you should be able to figure this out yourself.
hint: initialize an array and array.append all the points to it.

tcl76 0 Light Poster · Answer 5 · 2011-01-23T05:13:20+00:00

i put x1, y1, y2 to capture the extracted element into arrays. when i print it i get correct:
('1', '75')
('1', '55')
('2', '10')
('2', '35')
('3', '25')
('3', '35')
('4', '35')
('4', '25')
('5', '10')
('5', '20')

could you pls show how to loop the arrays? tq

import re
import numpy as np
import pylab as pl

file = open("C:/Python25/myscript/plot/sampledata.txt", "r")

for line in file:
    x = re.search("\d+\s+\d+\s+(\d+)\s+(\d+)\s+(\d+)", line)
    if x != None:
##        print(x.group(1))       
##        print(x.group(2))
##        print(x.group(3))
        x1=x.group(1)
        y1=x.group(2)
        y2=x.group(3)
        print (x1,y1)
        print (x1,y2)
##        plot1=pl.plot(x1,y1,'ro')
##        pl.show()
        

##plot1=pl.plot(x1,y1,'ro')
##plot2=pl.plot(x1,y2,'go')
##
##plot1=pl.plot(x1,y1,'ro')
##plot2=pl.plot(x1,y2,'go')
##
##
##pl.title('Plot of y vs x')
##
##pl.xlabel('x axis')
##pl.ylabel('y axis')
##
##pl.xlim(0.0,9.0)
##pl.ylim(0.0,90.0)
##
##pl.legend([plot1,plot2],('red circles', 'green circles'),'best',numpoints=1)
##
##pl.show()

tcl76 0 Light Poster · Answer 6 · 2011-01-23T05:40:15+00:00

ok. now i think i'm looping, i'm using for item in x1, y1,y2 after declaring it as arrays. but when i put the plot statement it'll plot only one value, it doesn't seem to be iterating.

for line in file:
    x = re.search("\d+\s+\d+\s+(\d+)\s+(\d+)\s+(\d+)", line)
    if x != None:
##        print(x.group(1))       
##        print(x.group(2))
##        print(x.group(3))

        x1=x.group(1)
        y1=x.group(2)
        y2=x.group(3)
        #print (x1,y1)
        #print (x1,y2)


        for item in x1,y1,y2:
            print (x1,y1,y2)

            plot1=pl.plot(x1,y1,'ro')
            plot2=pl.plot(x1,y2,'go')
            pl.show()

Enders_Game 13 Newbie Poster · Answer 7 · 2011-01-23T05:51:13+00:00

matplotlib doesnt work for python3 so I cant really help you anymore.
Also just wanna point out that you arent really looping.

From what I read in the docs this code should plot all your lines. Not too sure though.

x1 = []
y1 = []
y2 = []
for line in file:
    numbers = re.findall("\d+", line)
    if len(numbers) == 5:
        x1.append(numbers[2])
        y1.append(numbers[3])
        y2.append(numbers[4])
pl.plot(x1,y1,'ro')
pl.plot(x1,y2,'go')

I gotta go now, good luck.

tcl76 0 Light Poster · Answer 8 · 2011-01-23T05:53:38+00:00

thanks for your help. i'll dig further and will post out once i find the solution. btw, i'm using Python 2.5 and Win XP.

tcl76 0 Light Poster · Answer 9 · 2011-01-23T06:07:23+00:00

thanks for all the advice.
Solution:

import re
import numpy as np
import pylab as pl

file = open("C:/Python25/myscript/plot/sampledata.txt", "r")

x1 = []
y1 = []
y2 = []
for line in file:
    numbers = re.findall("\d+", line)
    if len(numbers) == 5:
        x1.append(numbers[2])
        y1.append(numbers[3])
        y2.append(numbers[4])
plot1=pl.plot(x1,y1,'ro')
plot2=pl.plot(x1,y2,'go')

pl.title('Plot of y vs x')
pl.xlabel('x axis')
pl.ylabel('y axis')

pl.xlim(0.0,9.0)
pl.ylim(0.0,90.0)

pl.legend([plot1,plot2],('red circles', 'green circles'),'best',numpoints=1)

pl.show()
pl.show()