I am new to programming (but not to the concepts) and I need to process some .csv files. Here's what I have:
The files containing Cartesian coordinates, XYZ of various "polygons." (They are cell perimeters of a mouse embryo) The polygons are in quotes because they are not closed, and I need Python to close them for me. So what I want to do is use Python to: (in general terms)
Go to cell 1
Find start point coordinate(s) It will always be "A."
Copy those values
Find endpoint coordinate(s) This will vary, and I need to set bounds here, as the drawing s/w sets the points down as A-Z then restarts from A again! But that's another script that I believe will be trivial to create. (I could easily write a Excel macro to handle that problem)
Copy those values
Sum the values and average (divide by two)
Paste those values as the new beginning and end points. Closes the polygon.
Go to cell 2 - repeat until end.
I figure I must process each cell three times. I first sum and average the value for X, then do it again for Y, then finally for Z. I'm also guessing I will need a sandbox to paste the new value (twice - once at start, once at end), and insert between the those points the other values, the ones for B through the unknown penultimate value.
So my questions are: Am I thinking this through right? And - are there any examples from which I could copy the methodology? It's much easier to follow directions or the path of someone else then it is to build from scratch.
I intend to pre-process the files and so can delete/ insert Carriage Returns or other things between the coordinates and cells to make my script easier to code. I'm working in Bitplane's Imaris software if you're familiar with the image source files. Thanks!

Ah, I hate general terms! :P

Could you provide sample data and what your expected outcome is?

I am assuming you want to take each cell's parameter and document any change (growth or decay).

Simplistic example:
X = 5
Y = 5
Z = 5

You want to take the value of X (startpoint) and then find its endpoint (which you will provide or script another way, correct?), and then obtain the sum and average, yes? So lets say the endpoints are as follows:

X = 7
Y = 8
Z = 6

The data you would like to have returned would be something along these lines:

X = 12 (sum), 6 (avg)
Y = 13 (sum), 6.5 (avg)
Z = 11 (sum), 5.5 (avg)

Is this all correct? If so, the application for your problem wouldn't be too difficult, although I would definitely need to have test data to work with. :)

As requested here is a snippet of the "raw" (not pre-processed) data:

60.612,um,Position,A,cell001,1,0
147.217,um,Position,A,cell001,1,0
43.596,um,Position,A,cell001,1,0
59.384,um,Position,B,cell001,1,1
147.217,um,Position,B,cell001,1,1
48.798,um,Position,B,cell001,1,1
... Points C through P...
60.64,um,Position,Q,cell001,1,16
147.217,um,Position,Q,cell001,1,16
43.83,um,Position,Q,cell001,1,16
>112.975,um,Position,A,cell002,1,0
170.135,um,Position,A,cell002,1,0
45.5,um,Position,A,cell002,1,0

I will pre-process the .csv, and remove the units of measurement, position, and last field. I'm thinking that I should move the name (cell001, cell002,...) to the first column. I inserted the angle bracket to differentiate the break between position Q (the last point for this cell) and position A the first point for cell 1.

What I am doing is drawing outlines around cell membranes. (The image is that of a mouse embryo obtained from a confocal microscope.) The software we use to make these measurements doesn't automatically connect points or prevent overlapping lines. The goal is to process a few hundred of these images - each with about 200 to 400 cells, then create a compact database of wild type embryos. In that d/b I will catalog size, shape, location, neighbors and so forth, of the cells. From that I will derive statistical info that the investigators can use to compare their results, as well as have a way to navigate the images.

And thanks for the reply and the link, I will look at it shortly. I'm going through the Pasteur's pdf of Python for Biologists as a crash course. (I'm an art historian and erstwhile cartographer; I got lost and ended up in a dev bio lab!)

Alright, now we are on the right track! So cell001-A has 3 rows

60.612,um,Position,A,cell001,1,0
147.217,um,Position,A,cell001,1,0
43.596,um,Position,A,cell001,1,0

Which parts do you need the sum of, and which parts do you need averaged? Or.. wait, you are wanting to average that column for all of cell001? So..

60.612
147.217
43.596

For position A and sum up and average all of these through the last position? ( example above, C through P ).

You really don't have to rearrange the data to make it neater. The whole point of programming (in my opinion) is to do these tasks.

Which of the above columns are needed and which can be ignored? How should the data be computed, the entire first column for cell001 added up as well as averaged, or are these the X, Y, Z positions you referred to? So for cell001-A you have: X = 60.612, Y = 147.217, and Z = 43.596 ? If that is the case then you are wanting to sum up and average all of cell001's X, Y, Z coordinates for all positions?

Alright, now we are on the right track! So cell001-A has 3 rows

60.612,um,Position,A,cell001,1,0
147.217,um,Position,A,cell001,1,0
43.596,um,Position,A,cell001,1,0

Which parts do you need the sum of, and which parts do you need averaged? Or.. wait, you are wanting to average that column for all of cell001? So..

I draw outlines of the membranes of cells. This is analogous to drawing with a felt marker on a balloon, the outlines will have X, Y,and Z coordinates. However, where a felt marker is very "gross" (easy to connect the ends) this software is extremely precise (nanometer precise) and so connecting start and end is nearly impossible.
But the software allows me to export the statistics. It exports useful information - the coordinates, the name, and the position, as well as not so useful statistics such as the units of measurement, qualifier, and the last two columns that I cannot readily identify.
So for Celloo1 I have these coordinates for position "A" the start point.
60.612 This is the X coordinate
147.217 This is the Y coordinate
43.596 This is the Z coordinate

To "close" the polygon I need to go to the end point. On this particular cell it is point Q. Which are:

60.64 This is the X coordinate
147.217 This is the Y coordinate
43.83 This is the Z coordinate

I don't really need to average them. It would serve me just as well to copy the values of A into a "new" position. ("R") It's just that when I draw the outlines, I make a point that is very close to the start. My goal/need is to make the start point == the end point, and so have a true polygon. I also must preserve the points in between.
Much thanks!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.