Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
278 views
in Technique[技术] by (71.8m points)

Is there a way to subset txt data in unix?

I have a txt file that looks like this Before

I would like to select rows with the "score" column greater than 100. Then removing everything else except for the "Sequence" and "Description" columns. My goal is to obtain a file that looks like this After.

The problem is that the file is not in a tabular format, I can't really select "column", so I am not sure how to proceed.

I tried to do this by deleting the first 15 rows and then finish the rest using excel's "txt to column" conversion tool. But I am looking for an automated way using unix, in case I have more files coming up.

I should have mentioned that there is a line, below which I'd also like to get rid of,like this,

So I tried the following code to remove all lines below the line containing "inclusion threshold" first.

sed -n '/inclusion threshold/q;p' file

Then use the code that Mr.@Raman Sailopal mentioned

awk 'NR>15 && $2>99 { printf $9""$10"
" } ' file

Is there anyway to combine the sed and awk command together, or achieve the same goal with just one function?

Thank you!

question from:https://stackoverflow.com/questions/65913713/is-there-a-way-to-subset-txt-data-in-unix

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
awk 'NR>15 && $2>100 { printf $9""$10"
" } ' file

Using awk, when the line number (NR) is greater than 15, check that the second space delimited field is less than 100 and if it is, print the the 9th and 10th space delimited fields separated by a tab.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...