I am new to this forum (and python), but since I found excellent post here, I figured it would be the best place to post this question which is two part:

I have many (+100) .txt files which are space delimited (not csv or tab, possibly referred to as ASCI?) containing 2 columns of data by ~90,000 rows. The first column is consecutive values from 800-10,000. I have attached a file as an example which is only 800-3,627 (38,430 rows).

Part (1): I would like to remove rows that have a first column value between 1569 and 1575. I assume this is accomplished by reading the lines in, then writing them out to another file if they are not between 1569 and 1575.

Part (2): Can I set this to run on all text files in a path/directory versus running it for each file individually?

Thanks again for the help and assistance, and I promise one day I am going to get better at programming.

Attachments
800.8939625419730000 157
800.9396210631257400 183
800.9852808943741100 244
801.0309420357183400 290
801.0766044871591000 422
801.1222682486972000 475
801.1679333203337600 498
801.2135997020695900 584
801.2592673939045700 683
801.3049363958402900 547
801.3506067078769800 521
801.3962783300156600 546
801.4419512622571300 465
801.4876255046017400 433
801.5333010570508300 468
801.5789779196047700 510
801.6246560922644400 422
801.6703355750300900 292
801.7160163679031900 315
801.7616984708840800 306
801.8073818839739000 326
801.8530666071727600 295
801.8987526404818000 251
801.9444399839017000 254
801.9901286374336000 283
802.0358186010772700 354
802.0815098748343000 415
802.1272024587052600 411
802.1728963526907100 392
802.2185915567912400 375
802.2642880710082000 447
802.3099858953414700 406
802.3556850297929900 327
802.4013854743622000 307
802.4470872290511400 261
802.4927902938593400 314
802.5384946687881900 316
802.5842003538384700 257
802.6299073490106400 292
802.6756156543056000 320
802.7213252697241600 343
802.7670361952667700 306
802.8127484309347900 413
802.8584619767278800 421
802.9041768326480900 363
802.9498929986954200 375
802.9956104748706600 289
803.0413292611745000 260
803.0870493576079500 282
803.1327707641716000 228
803.1784934808663400 255
803.2242175076927400 208
803.2699428446511500 324
803.3156694917433900 471
803.3613974489694600 482
803.4071267163298000 489
803.4528572938262400 580
803.4985891814582100 544
803.5443223792272000 364
803.5900568871341000 326
803.6357927051792600 291
803.6815298333638100 256
803.7272682716879900 276
803.7730080201528200 272
803.8187490787592000 293
803.8644914475077000 304
803.9102351263989000 306
803.9559801154338200 266
804.0017264146133600 190
804.0474740239375300 192
804.0932229434076800 195
804.1389731730244000 211
804.1847247127886900 207
804.2304775627011400 283
804.2762317227621900 266
804.3219871929727600 203
804.3677439733338600 213
804.4135020638457300 245
804.4592614645096100 193
804.5050221753261900 240
804.5507841962958100 385
804.5965475274194900 418
804.6423121686981400 345
804.6880781201322200 325
804.7338453817228600 221
804.7796139534697200 157
804.8253838353751900 183
804.8711550274387000 164
804.9169275296616200 237
804.9627013420444000 302
805.0084764645882800 215
805.0542528972932800 188
805.1000306401609800 261
805.1458096931910400 293
805.1915900563852800 291
805.2373717297434700 344
805.2831547132674400 288
805.3289390069569400 229
805.3747246108135900 268
805.4205115248372500 224
805.4662997490294200 244
805.5120892833906500 269
805.5578801279211800 264
805.6036722826222600 272
805.6494657474942200 234
805.6952605225385500 227
805.7410566077552400 246
805.7868540031456600 291
805.8326527087095900 318
805.8784527244491800 340
805.9242540503637400 298
805.9700566864548800 206
806.0158606327235000 179
806.0616658891697200 223
806.1074724557947800 274
806.1532803325985700 270
806.1990895195831400 252
806.2449000167483700 277
806.2907118240948400 219
806.3365249416242500 188
806.3823393693361400 195
806.4281551072321000 239
806.4739721553130500 264
806.5197905135787600 293
806.5656101820307000 330
806.6114311606694400 349
806.6572534494957800 284
806.7030770485101800 297
806.7489019577137700 259
806.7947281771071200 176
806.8405557066910200 162
806.8863845464660500 153
806.9322146964336800 250
806.9780461565933400 251
807.0238789269468500 174
807.0697130074945600 177
807.1155483982373700 196
807.1613850991758500 177
807.2072231103106800 214
807.2530624316426600 251
807.2989030631727000 241
807.3447450049017100 248
807.3905882568298000 271
807.4364328189582200 292
807.4822786912878900 218
807.5281258738190200 290
807.5739743665527600 306
807.6198241694892200 343
807.6656752826296500 363
807.7115277059750700 319
807.7573814395261700 314
807.8032364832829400 264
807.8490928372467600 354
807.8949505014181800 307
807.9408094757980100 214
807.9866697603870300 194
808.0325313551859400 154
808.0783942601954100 166
808.1242584754163500 157
808.1701240008494600 192
808.2159908364952800 191
808.2618589823547400 180
808.3077284384285100 253
808.3535992047175100 266
808.3994712812221900 324
808.4453446679436900 475
808.4912193648823400 467
808.5370953720390600 386
808.5829726894145300 436
808.6288513170097800 294
808.6747312548250200 216
808.7206125028618500 277
808.7664950611197100 305
808.8123789296009800 351
808.8582641083048700 367
808.9041505972331800 330
808.9500383963858200 314
808.9959275057647100 278
809.0418179253691700 220
809.0877096552008000 188
809.1336026952602700 183
809.1794970455481500 162
809.2253927060655800 269
809.2712896768125600 392
809.3171879577907900 316
809.3630875489998300 319
809.4089884504416000 358
809.4548906621161000 425
809.5007941840245800 437
809.5466990161673900 546
809.5926051585452100 646
809.6385126111590600 633
809.6844213740096200 571
809.7303314470979100 470
809.7762428304237100 511
809.8221555239890700 537
809.8680695277938600 429
809.9139848418386700 432
809.9599014661253000 428
810.0058194006535400 346
810.0517386454245100 287
810.0976592004387800 215
810.1435810656970500 348
810.1895042412004400 323
810.2354287269492900 354
810.2813545229445300 509
810.3272816291870400 508
810.3732100456774100 464
810.4191397724160900 417
810.4650708094043200 343
810.5110031566424600 343
810.5569368141317500 453
810.6028717818724100 443
810.6488080598654700 637
810.6947456481119600 720
810.7406845466114200 600
810.7866247553661200 611
810.8325662743757200 497
810.8785091036420500 371
810.9244532431642900 368
810.9703986929444000 371
811.0163454529828200 357
811.0622935232803500 379
811.1082429038377800 310
811.1541935946554500 271
811.2001455957345100 327
811.2460989070752900 334
811.2920535286795000 304
811.3380094605464600 364
811.3839667026782100 439
811.4299252550747500 522
811.4758851177368800 538
811.5218462906655100 499
811.5678087738614300 480
811.6137725673252100 450
811.6597376710577700 524
811.7057040850596600 661
811.7516718093321500 691
811.7976408438753400 585
811.8436111886900300 546
811.8895828437771300 446
811.9355558091377800 418
811.9815300847719800 414
812.0275056706810800 344
812.0734825668654400 300
812.1194607733261800 224
812.1654402900634300 140
812.2114211170787700 146
812.2574032543725500 165
812.3033867019447600 177
812.3493714597977900 231
812.3953575279306300 250
812.4413449063449700 359
812.4873335950418300 533
812.5333235940212300 398
812.5793149032842800 368
812.6253075228316900 415
812.6713014526641200 264
812.7172966927826100 213
812.7632932431873800 242
812.8092911038796700 275
812.8552902748599500 303
812.9012907561291300 312
812.9472925476876500 254
812.9932956495368900 296
813.0393000616767300 322
813.0853057841086400 300
813.1313128168330900 316
813.1773211598505200 362
813.2233308131621900 338
813.2693417767686700 294
813.3153540506706300 272
813.3613676348690000 225
813.4073825293641000 286
813.4533987341567400 269
813.4994162492483800 194
813.5454350746389300 230
813.5914552103298500 260
813.6374766563209300 272
813.6834994126136300 260
813.7295234792088600 246
813.7755488561072100 183
813.8215755433085400 260
813.8676035408151400 351
813.9136328486260900 279
813.9596634667435600 214
814.0056953951676600 256
814.0517286338990700 296
814.0977631829387100 360
814.1437990422871300 332
814.1898362119453600 286
814.2358746919138600 265
814.2819144821935400 353
814.3279555827851900 304
814.3739979936892700 306
814.4200417149072600 338
814.4660867464389200 312
814.5121330882854000 323
814.5581807404478200 325
814.6042297029264300 351
814.6502799757222400 374
814.6963315588356000 356
814.7423844522678600 331
814.7884386560197100 256
814.8344941700912600 283
814.8805509944835400 342
814.9266091291978000 363
814.9726685742342600 312
815.0187293295933800 276
815.0647913952765300 268
815.1108547712843800 296
815.1569194576172800 260
815.2029854542764700 332
815.2490527612626500 386
815.2951213785762500 355
815.3411913062179800 314
815.3872625441886200 238
815.4333350924888500 188
815.4794089511202600 127
815.5254841200823000 218
815.5715605993764300 250
815.6176383890035600 215
815.6637174889643800 214
815.7097978992588900 229
815.7558796198887900 229
815.8019626508544200 261
815.8480469921561300 241
815.8941326437951600 198
815.9402196057727700 154
815.9863078780882700 125
816.0323974607437100 178
816.0784883537393200 239
816.1245805570756600 230
816.1706740707537600 179
816.2167688947741900 175
816.2628650291376300 170
816.3089624738454400 128
816.3550612288976300 255
816.4011612942954300 307
816.4472626700389800 280
816.4933653561300800 304
816.5394693525680600 320
816.5855746593550700 297
816.6316812764908900 300
816.6777892039767700 299
816.7238984418132800 300
816.7700089900007500 262
816.8161208485403200 277
816.8622340174331400 337
816.9083484966796500 333
816.9544642862802000 311
817.0005813862360400 230
817.0466997965472700 154
817.0928195172155000 81
817.1389405482411800 143
817.1850628896245300 202
817.2311865413669200 172
817.2773115034689200 232
817.3234377759312100 272
817.3695653587542400 142
817.4156942519390400 150
817.4618244554869800 226
817.5079559693973600 288
817.5540887936726900 354
817.6002229283119500 325
817.6463583733173000 288
817.6924951286886200 316
817.7386331944272800 313
817.7847725705333900 320
817.8309132570078600 299
817.8770552538514900 292
817.9231985610655300 228
817.9693431786504300 165
818.0154891066064200 195
818.0616363449346400 186
818.1077848936357700 271
818.1539347527107100 253
818.2000859221602700 252
818.2462384019847800 289
818.2923921921852800 288
818.3385472927626600 228
818.3847037037173800 298
818.4308614250500100 432
818.4770204567615800 511
818.5231807988531000 454
818.5693424513247000 380
818.6155054141779600 336
818.6616696874126500 247
818.7078352710302600 267
818.7540021650311200 278
818.8001703694162600 326
818.8463398841861400 272
818.8925107093417600 247
818.9386828448833700 22

I am new to this forum (and python), but since I found excellent post here, I figured it would be the best place to post this question which is two part:

I have many (+100) .txt files which are space delimited (not csv or tab, possibly referred to as ASCI?) containing 2 columns of data by ~90,000 rows. The first column is consecutive values from 800-10,000. I have attached a file as an example which is only 800-3,627 (38,430 rows).

Part (1): I would like to remove rows that have a first column value between 1569 and 1575. I assume this is accomplished by reading the lines in, then writing them out to another file if they are not between 1569 and 1575.

Part (2): Can I set this to run on all text files in a path/directory versus running it for each file individually?

Thanks again for the help and assistance, and I promise one day I am going to get better at programming.

Regarding your First part

Readline function gives you all the data of the file in a list. So all you have to do is that traversing the list you call the split function with space delimiter.
Read more on http://www.java2s.com/Code/Python/String/String-Split.htm

REgarding your 2nd part there is an option that you store the name of all the files in a seperate filename.txt and in your program open this file and use it to open all the other files.

Maybe something like this:

import os
for textfile in (filename for filename in os.listdir(os.curdir) if filename.endswith('.txt')):
    oklines = [line for line in open(textfile) if not (1569 < float(line.split()[0]) < 1575)]
    with open(textfile,'w') as outfile:
        outfile.write(''.join(oklines))

Maybe something like this:

import os
for textfile in (filename for filename in os.listdir(os.curdir) if filename.endswith('.txt')):
    oklines = [line for line in open(textfile) if not (1569 < float(line.split()[0]) < 1575)]
    with open(textfile,'w') as outfile:
        outfile.write(''.join(oklines))

Hey tony, this is a beginner's question, not the obfuscated python contest !

Maybe something like this:

import os
for textfile in (filename for filename in os.listdir(os.curdir) if filename.endswith('.txt')):
    oklines = [line for line in open(textfile) if not (1569 < float(line.split()[0]) < 1575)]
    with open(textfile,'w') as outfile:
        outfile.write(''.join(oklines))

tonyjv,
That is some intense code. I was going to post what I had started, but then saw yours. Any way for you to maybe put some explanations in the code? My biggest question is that although I see how it should read all text files in a specified directory, is it writing it out as a different file name in that directory, or is at it seems that it just re-writes the existing file? And does it matter that each file is 90,000 rows and 2-2.5 MB in size?

(Here is what I had so far, which didn't address the second needs of my post for going through a batch of files)

# read the data file in as a list
f = open( '*.txt', "r" )

data_list = f.readlines()
f.close()

# remove list items, x>1569 and x<1575
	for line in data_list
		if not '1569' or '1571' or '1572' or '1573' or '1574' in line:
			print line
		
# write the changed data (list) to new file
f = open("*.txt", "w")
f.writelines(data_list)
f.close()

Edited 6 Years Ago by neely615: n/a

You must use either filtering by endswith or the module glob You can not put '*' in filename.

On request of clarification (though I assumed that the variable names were quite self documenting):

# we need module os to access the os.listdir for the list of files that are in same directory as this program
import os
# lets only take those filenames from the directory which end in '.txt'
for textfile in (filename for filename in os.listdir(os.curdir) if filename.endswith('.txt')):
    # we can not use generator of lines, as we are overwriting the original files, this is dangerous practice though
    # we are filtering out the lines whose first (0th) columnt as float is between given limits per OP's request
    oklines = [line for line in open(textfile) if not (1569 < float(line.split()[0]) < 1575)]
    # here we overwrite the original file assuming that it is copy of original or we never need the full original data
    # we could make separate directory for the processed file with os.mkdir for more safe code in practical application
    # we will let with to close the file for us safely
    with open(textfile,'w') as outfile:
        # it is enough just to join the lines as they have their original '\n' at end
        outfile.write(''.join(oklines))

Edited 6 Years Ago by pyTony: n/a

You must use either filtering by endswith or the module glob You can not put '*' in filename.

On request of clarification (though I assumed that the variable names were quite self documenting):

# we need module os to access the os.listdir for the list of files that are in same directory as this program
import os
# lets only take those filenames from the directory which end in '.txt'
for textfile in (filename for filename in os.listdir(os.curdir) if filename.endswith('.txt')):
    # we can not use generator of lines, as we are overwriting the original files, this is dangerous practice though
    # we are filtering out the lines whose first (0th) columnt as float is between given limits per OP's request
    oklines = [line for line in open(textfile) if not (1569 < float(line.split()[0]) < 1575)]
    # here we overwrite the original file assuming that it is copy of original or we never need the full original data
    # we could make separate directory for the processed file with os.mkdir for more safe code in practical application
    # we will let with to close the file for us safely
    with open(textfile,'w') as outfile:
        # it is enough just to join the lines as they have their original '\n' at end
        outfile.write(''.join(oklines))

Here version, which creates subdirectory for results:

# we need module os to access many usefull file functions
import os
# we create output directory if it does not exist
if not os.path.isdir('output'):
    os.mkdir('output')

# lets only take those filenames from the directory which end in '.txt'
for textfile in (filename for filename in os.listdir(os.curdir) if filename.endswith('.txt')):
    with open(os.path.join('output',textfile),'w') as outfile:
        # it is enough just to join the lines as they have their original '\n' at end
        outfile.write(''.join(line
                              for line in open(textfile)
                              if not (1569 < float(line.split()[0]) < 1575)
                              )
                      )

The size of few megabytes is negligible, even it is lot of data when you print it on paper. These days the computers have Gigabytes of memory, and only one file's oklines even are in memory at one time. In this corrected post version, we do not even need to keep those in memory but let the generator expression generate them 'on the fly'

Edited 6 Years Ago by pyTony: n/a

Tony,

That script worked perfectly. I am sorry to have asked you to explain it since I could have looked up everything myself. Thank you so much; wish I could buy you a beer or something.

This question has already been answered. Start a new discussion instead.