Hi. I am a python newbie seeking some help in working with a file containing rows and columns. I have a file, that ideally contains an equal number of rows and columns. The first row is a tab separated header row and the first column is a label row. The row headers and the column labels are identical. Occasionally this balance is upset as I get a file with more rows than columns or more columns than rows. I am seeking some python code to first count and report the number of rows and columns in the file. If both are the same then the code should do nothing. However, if the number is not the same, the code should write a new file where the rows and columns are not matched and write a second file with the exceptions that caused the unbalance in the count. I would be grateful for any help. Thank you for your time and consideration.

A sample file of 27 rows and 24 columns follows. I would like a file that is 24 rows and 24 columns where the row label and column headers match and a file with the 3 rows, in this case, that make the file unbalanced. The output files should contain the numerical values for each cell.

ID BTB06022SC99x BTB12160SC86x BTB12229SC99x BTB13684SC82x DSHS00083sc87 DSHS00083sc87x HTS00931SC77x HTS01508SC88x HTS02534SC67x HTS0332897x HTS03849SC90x HTS07614SC98x HTS09466SC75x HTS10266SC95x HTS12242SC72x KM07785SC96x ML00376S83x RefStd RF03548SC99x RH01120SC97x S11135SC71x SCR00444SC82x13 SPB00790SC58x
BTB06022SC99x 1 0.223529 0.12381 0.160643 0.130081 0.130081 0.17757 0.151292 0.270677 0.188498 0.165992 0.212544 0.174603 0.210526 0.183333 0.176471 0.20438 0.237354 0.299342 0.184211 0.154472 0.196172 0.173203
BTB12160SC86x 0.223529 1 0.142857 0.180995 0.136364 0.136364 0.224044 0.188285 0.183206 0.178082 0.209302 0.163636 0.151515 0.178439 0.233962 0.151292 0.227642 0.164659 0.256849 0.177305 0.185185 0.174603 0.173759
BTB12229SC99x 0.12381 0.142857 1 0.134969 0.2 0.2 0.193548 0.124324 0.2 0.140426 0.194805 0.150943 0.0991736 0.147619 0.135747 0.156863 0.147208 0.169399 0.0898876 0.117904 0.118012 0.177419 0.148649
BTB13684SC82x 0.160643 0.180995 0.134969 1 0.247253 0.247253 0.210843 0.225352 0.205882 0.220532 0.215385 0.202429 0.199248 0.195122 0.164751 0.17004 0.262443 0.145923 0.120521 0.197674 0.532895 0.143678 0.193798
DSHS00083sc87 0.130081 0.136364 0.2 0.247253 1 1 0.338028 0.152074 0.169492 0.169811 0.329412 0.153226 0.310638 0.136546 0.236287 0.125506 0.185841 0.142857 0.0918033 0.128788 0.180851 0.153374 0.142308
DSHS00083sc87x 0.130081 0.136364 0.2 0.247253 1 1 0.338028 0.152074 0.169492 0.169811 0.329412 0.153226 0.310638 0.136546 0.236287 0.125506 0.185841 0.142857 0.0918033 0.128788 0.180851 0.153374 0.142308
HTS00931SC77x 0.17757 0.224044 0.193548 0.210843 0.338028 0.338028 1 0.12 0.157407 0.183333 0.449275 0.135371 0.184874 0.122271 0.24186 0.12 0.21608 0.197917 0.116364 0.152542 0.12 0.227273 0.129167
HTS01508SC88x 0.151292 0.188285 0.124324 0.225352 0.152074 0.152074 0.12 1 0.265306 0.283582 0.155556 0.344538 0.1875 0.293878 0.135417 0.238095 0.258333 0.25 0.118902 0.24812 0.179724 0.132653 0.292969
HTS02534SC67x 0.270677 0.183206 0.2 0.205882 0.169492 0.169492 0.157407 0.265306 1 0.298246 0.167347 0.346304 0.194805 0.383065 0.192568 0.265918 0.333333 0.258964 0.12931 0.265018 0.15102 0.137615 0.252632
HTS0332897x 0.188498 0.178082 0.140426 0.220532 0.169811 0.169811 0.183333 0.283582 0.298246 1 0.207547 0.507937 0.221884 0.375912 0.187117 0.215686 0.489712 0.377953 0.147849 0.501916 0.17037 0.128 0.498084
HTS03849SC90x 0.165992 0.209302 0.194805 0.215385 0.329412 0.329412 0.449275 0.155556 0.167347 0.207547 1 0.179283 0.227799 0.153543 0.207171 0.2 0.219298 0.248826 0.128289 0.193798 0.142857 0.207317 0.199219
HTS07614SC98x 0.212544 0.163636 0.150943 0.202429 0.153226 0.153226 0.135371 0.344538 0.346304 0.507937 0.179283 1 0.2 0.434959 0.167203 0.26087 0.450644 0.429825 0.132022 0.466135 0.172691 0.131579 0.479839
HTS09466SC75x 0.174603 0.151515 0.0991736 0.199248 0.310638 0.310638 0.184874 0.1875 0.194805 0.221884 0.227799 0.2 1 0.194268 0.199377 0.245791 0.237113 0.175676 0.167582 0.21875 0.184906 0.102362 0.204334
HTS10266SC95x 0.210526 0.178439 0.147619 0.195122 0.136546 0.136546 0.122271 0.293878 0.383065 0.375912 0.153543 0.434959 0.194268 1 0.118012 0.177474 0.52968 0.362869 0.162791 0.387833 0.184426 0.113537 0.368421
HTS12242SC72x 0.183333 0.233962 0.135747 0.164751 0.236287 0.236287 0.24186 0.135417 0.192568 0.187117 0.207171 0.167203 0.199377 0.118012 1 0.163934 0.169492 0.132653 0.164773 0.157407 0.107407 0.183036 0.172414
KM07785SC96x 0.176471 0.151292 0.156863 0.17004 0.125506 0.125506 0.12 0.238095 0.265918 0.215686 0.2 0.26087 0.245791 0.177474 0.163934 1 0.20438 0.2 0.158358 0.220339 0.168724 0.126126 0.250871
ML00376S83x 0.20438 0.227642 0.147208 0.262443 0.185841 0.185841 0.21608 0.258333 0.333333 0.489712 0.219298 0.450644 0.237113 0.52968 0.169492 0.20438 1 0.381166 0.17737 0.452282 0.223214 0.126761 0.384921
RefStd 0.237354 0.164659 0.169399 0.145923 0.142857 0.142857 0.197917 0.25 0.258964 0.377953 0.248826 0.429825 0.175676 0.362869 0.132653 0.2 0.381166 1 0.137195 0.373984 0.174888 0.14 0.326772
RF03548SC99x 0.299342 0.256849 0.0898876 0.120521 0.0918033 0.0918033 0.116364 0.118902 0.12931 0.147849 0.128289 0.132022 0.167582 0.162791 0.164773 0.158358 0.17737 0.137195 1 0.175637 0.164948 0.205534 0.2
RH01120SC97x 0.184211 0.177305 0.117904 0.197674 0.128788 0.128788 0.152542 0.24812 0.265018 0.501916 0.193798 0.466135 0.21875 0.387833 0.157407 0.220339 0.452282 0.373984 0.175637 1 0.19685 0.129707 0.446565
RJC03605SC50x13 0.158416 0.190751 0.313131 0.196078 0.293233 0.293233 0.327273 0.157303 0.165829 0.151515 0.28169 0.163462 0.109244 0.127358 0.226601 0.17 0.154639 0.145946 0.0988593 0.133929 0.163399 0.297297 0.155251
S02950SC97x 0.161417 0.165939 0.143713 0.626667 0.301676 0.301676 0.217647 0.197309 0.186235 0.287402 0.185366 0.169884 0.231061 0.171875 0.20155 0.14786 0.25 0.1375 0.129032 0.184211 0.512658 0.138889 0.203065
S11135SC71x 0.154472 0.185185 0.118012 0.532895 0.180851 0.180851 0.12 0.179724 0.15102 0.17037 0.142857 0.172691 0.184906 0.184426 0.107407 0.168724 0.223214 0.174888 0.164948 0.19685 1 0.147929 0.202381
SCR00444SC82x13 0.196172 0.174603 0.177419 0.143678 0.153374 0.153374 0.227273 0.132653 0.137615 0.128 0.207317 0.131579 0.102362 0.113537 0.183036 0.126126 0.126761 0.14 0.205534 0.129707 0.147929 1 0.195556
SCR00743SC53x13 0.15859 0.175 0.186567 0.184358 0.264151 0.264151 0.367188 0.129808 0.154867 0.255319 0.294479 0.14346 0.172 0.12605 0.311321 0.128755 0.187793 0.14218 0.123675 0.169421 0.101064 0.384 0.15102
SCR00747SC66x13 0.166667 0.17341 0.23301 0.2 0.251852 0.251852 0.375 0.127778 0.162437 0.168889 0.316176 0.149038 0.139738 0.129187 0.261538 0.121359 0.188172 0.13587 0.108527 0.167442 0.100629 0.516129 0.136364
SPB00790SC58x 0.173203 0.173759 0.148649 0.193798 0.142308 0.142308 0.129167 0.292969 0.252632 0.498084 0.199219 0.479839 0.204334 0.368421 0.172414 0.250871 0.384921 0.326772 0.2 0.446565 0.202381 0.195556 1

The following snippet shows you how to report for your first question. It is deliberately simple to understand, not compact.

filename = 'ccdata.csv'
with open(filename,'r') as f:
    first_line = f.readline()
    labels = first_line.strip().split()[1:] # skip 'ID' (which may not be correct)
    counts = set() # this may be overkill?
    line_count = 0
    for line in f:
        line = line.strip() # lose the trailing newline
        counts.add(len(line.split())) # part of the overkill?
        line_count += 1
print("Report for file %s"%filename)
if len(counts) > 1: # part of the overkill?
  print("There were rows with unequal number of columns: %s"%(str(counts)))
else:
  print("Number of columns was %d"%counts.pop())
print("Number of IDs was %d"%len(labels))
print("Number of rows was %d"%line_count)

On your test data, I got this report:

Report for file ccdata.csv
Number of columns was 24
Number of IDs was 23
Number of rows was 27

column count is one too large because it counted the label. Fix that at line 8 by subtracting 1 from the length of the list; or at the report line by subtracting 1 before you print.

To deal with your next problem, you will want to add a line after line 4: label_set = set(labels) and then after line 8 (of this code) you will want to check that the line's label is in label_set If so, save it in the 'good' list, else the 'bad' list. Eventually you will write out the good and bad lists into your files... only if the 'bad' list is not empty.

Edited 5 Years Ago by griswolf: n/a

Thank you for supplying this code. I have found it to work as you did. I appreciate the long form of the code as it is easier for me to follow and thus understand what is happening. I will work on the second part of the problem. Thanks for your time.

This question has already been answered. Start a new discussion instead.