Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
256 views
in Technique[技术] by (71.8m points)

fixing the code in python to change a text file

I have a big text file like the small example:

small example:

chr1    37091   37122   D00645:305:CCVLRANXX:1:1104:21074:48301 0   -
chr1    37091   37122   D00645:305:CCVLRANXX:1:1104:4580:50451  0   -
chr1    37091   37122   D00645:305:CCVLRANXX:1:1106:13064:5974  0   -
chr1    37091   37122   D00645:305:CCVLRANXX:1:1106:16735:48726 0   -
chr1    37091   37122   D00645:305:CCVLRANXX:1:2210:5043:83540  0   -
chr1    37091   37122   D00645:305:CCVLRANXX:1:2204:15744:24410 0   -
chr1    37091   37122   D00645:305:CCVLRANXX:1:2204:19627:73060 0   -
chr1    37091   37122   D00645:305:CCVLRANXX:1:2206:8497:68295  0   -
chr1    37091   37122   D00645:305:CCVLRANXX:1:1312:11371:24672 0   -
chr1    37091   37122   D00645:305:CCVLRANXX:1:1312:17050:42431 0   -
chr1    37091   37122   D00645:305:CCVLRANXX:1:1312:12969:62696 0   -
chr1    37091   37122   D00645:305:CCVLRANXX:1:1312:6478:73521  0   -
chr1    37091   37122   D00645:305:CCVLRANXX:1:1312:8402:80222  0   -
chr1    37091   37122   D00645:305:CCVLRANXX:1:1309:19837:15007 0   -
chr1    37091   37122   D00645:305:CCVLRANXX:1:1309:20126:89687 0   -
chr1    37091   37122   D00645:305:CCVLRANXX:1:1310:2838:27860  0   -
chr1    37091   37122   D00645:305:CCVLRANXX:1:1310:7280:85906  0   -
chr1    54832   54863   D00645:305:CCVLRANXX:1:2102:19886:3949  0   -
chr1    74307   74338   D00645:305:CCVLRANXX:1:2203:13233:29983 0   -
chr1    74325   74356   D00645:305:CCVLRANXX:1:1310:7266:92995  0   -
chr1    93529   93560   D00645:305:CCVLRANXX:1:1103:1743:29602  0   +
chr1    93529   93560   D00645:305:CCVLRANXX:1:1101:16098:97354 0   +

I am trying to count the lines with similar 1st, 2nd and 3rd columns and make a new file with 4 columns in which the first 3 columns are similar to the original file but the 4th column is number of times that every row is repeated. for example there 17 rows with chr1 37091 37122 here is the expected output for the above small example:

expected output:

chr1    37091   37122   17
chr1    54832   54863   1
chr1    74307   74338   1
chr1    74325   74356   1
chr1    93529   93560   2

I wrote this code in python but it does not return what I want. do you how to fix it?

infile = open('infile.txt', 'rb')
content = []
for i in infile:
    content.append(i.split())

final = []
for j in range(len(content)):
    if content[j] == content[j-1]:
        final.append(content[j])

with open('outfile.txt','w') as f:
    for sublist in final:
        for item in sublist:
            f.write(item + '	')
        f.write('
')
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use Counter like this:

from collections import Counter

infile = open('infile.txt', 'rb')
content = []
for i in infile:
    # append only first 3 columns as one line string
    content.append('  '.join(i.split()[:3]))

# this is now dictionary
c = Counter(content)


elements = c.most_common(len(c.elements()))

with open('outfile.txt','w') as f:
    for item, freq in elements:
        f.write('{}	{}
'.format(item, freq))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...