I have a script with output for example a c d txt iso e z I need to sort it alphabetically. These are file extensions so I cant compile it together in one word and then split up.
Can anyone help me?
If your the name of your script is foo and it writes to stdout a string such as a c d txt iso e z, you can get the sorted list by, for instance:
sorted_output=$(foo|xargs -n 1|sort)
Of course, depending on what you are going to do with the result, it might make more sense to store it into an array.
Related
I have a school project that gives me several lines of string in a text like this:
team1-team2:2-1
team3-team1:2-2
etc
it wants me to determine what team won (or drew) and then make a league table with them, awarding points for wins/draws.
this is my first time using bash. what i did was save team1/team2 names in a variable and then do the same for goals. how should i make the table? i managed to make my script create a new file that saves in there all team names (And checking for no duplicates) but i dont know how to continue. should i make an array for each team saving in there their results? and then how do i implement the rankings, for example
team1 3p
team2 1p
etc.
im not asking for actual code, just a guide as to how i should implement it. is making a new file the right move? should i try making a new array with the teams instead? or something else?
The problem can be divided into 3 parts:
Read the input data into memory in a format that can be manipulated easily.
Manipulate the data in memory
Output the results in the desired format.
When reading the data into memory, you might decide to read all the data in one go before manipulating it. Or you might decide to read the input data one line at a time and manipulate each line as it is read. When using shell scripting languages, like bash, the second option usually results in simpler code.
The most important decision to make here is how you want to structure the data in memory. You normally want to avoid duplication of data, and you usually want a data structure that is easy to transform into your desired output. In this case, the most logical data structure is an associative array, using the team name as the key.
Assuming that you have to use bash, here is a framework for you to build upon:
#!/bin/bash
declare -A results
while IFS=':-' read team1 team2 score1 score2; do
if [ ${score1} -gt ${score2} ]; then
((results[${team1}]+=2))
elif [ ...next test... ]; then
...
else
...
fi
done < scores.txt
# Now you have an associative array containing the points for each team.
# You can either output it as it stands, or sort it by piping through the
# 'sort' command.
for key in $[!results[#]}; do
echo ...
done
I would use awk for this
AWK is an interpreted programming language(AWK stands for Aho, Weinberger, Kernighan) designed for text processing and typically used as a data extraction and reporting tool. AWK is used largely with Unix systems.
Using pure bash scripting is often messy for that kind of jobs.
Let me show you how easy it can be using awk
Input file : scores.txt
team1-team2:2-1
team3-team1:2-2
Code :
awk -F'[:-]' ' # set delimiters to ':' or '-'
{
if($3>$4){teams[$1] += 3} # first team gets 3 points
else if ($3<$4){teams[$2] += 3} # second team gets 3 points
else {teams[$1]+=1; teams[$2]+=1} # both teams get 1 point
}
END{ # after scanning input file
for(team in teams){
print(team OFS teams[team]) # print total points per team
}
}' scores.txt | sort -rnk 2 > ranking.txt # sort by nb of points
Output (ranking.txt):
team1 4
team3 1
I study genetic data from 288 fish samples (Fish_one, Fish_two ...)
I have four files per fish, each with a different suffix.
eg. for sample_name Fish_one:
file 1 = "Fish_one.1.fq.gz"
file 2 = "Fish_one.2.fq.gz"
file 3 = "Fish_one.rem.1.fq.gz"
file 4 = "Fish_one.rem.2.fq.gz"
I would like to apply the following concatenate instructions to all my samples, using maybe a text file containing a list of all the sample_name, that would be provided to a loop?
cp sample_name.1.fq.gz sample_name.fq.gz
cat sample_name.2.fq.gz >> sample_name.fq.gz
cat sample_name.rem.1.fq.gz >> sample_name.fq.gz
cat sample_name.rem.2.fq.gz >> sample_name.fq.gz
In the end, I would have only one file per sample, ideally in a different folder.
I would be very grateful to receive a bit of help on this one, even though I'm sure the answer is quite simple for a non-novice!
Many thanks,
NoƩ
I would like to apply the following concatenate instructions to all my
samples, using maybe a text file containing a list of all the
sample_name, that would be provided to a loop?
In the first place, the name of the cat command is mnemonic for "concatentate". It accepts multiple command-line arguments naming sources to concatenate together to the standard output, which is exactly what you want to do. It is poor form to use a cp and three cats where a single cat would do.
In the second place, although you certainly could use a file of name stems to drive the operation you describe, it's likely that you don't need to go to the trouble to create or maintain such a file. Globbing will probably do the job satisfactorily. As long as there aren't any name stems that need to be excluded, then, I'd probably go with something like this:
for f in *.rem.1.fq.gz; do
stem=${f%.rem.1.fq.gz}
cat "$stem".{1,2,rem.1,rem.2}.fq.gz > "${other_dir}/${stem}.fq.gz"
done
That recognizes the groups present in the current working directory by the members whose names end with .rem.1.fq.gz. It extracts the common name stem from that member's name, then concatenates the four members to the correspondingly-named output file in the directory identified by ${other_dir}. It relies on brace expansion to form the arguments to cat, so as to minimize code and (IMO) improve clarity.
I need a program to sort the lines of a file and print them in an order can anyone explain
File has Data :
A 12345
B 32122
C 23232
what is the option to run only one time pig script and store first record(A 12345) in one file , second record(B 32122) in second file and third(c 23232) in third file. Right now if we run the pig script it will run the job for each store. Please let me know the option.
Use the SPLIT operator to partition the contents of a relation into two or more relations based on some expression. Depending on the conditions stated in the expression:
A tuple may be assigned to more than one relation.
A tuple may not be assigned to any relation.
Example
In this example relation A is split into three relations, X, Y, and Z.
A = LOAD 'data' AS (f1:int,f2:int,f3:int);
DUMP A;
(1,2,3)
(4,5,6)
(7,8,9)
SPLIT A INTO X IF f1<7, Y IF f2==5, Z IF (f3<6 OR f3>6);
DUMP X;
(1,2,3)
(4,5,6)
DUMP Y;
(4,5,6)
DUMP Z;
(1,2,3)
(7,8,9)
then STORE X, Y ,Z according to your filename
My aim is to read a file and write the record in to different files based on criteria it will fit to your problem.
Actually pig is not made for this. But still if you wanna do that then will have to write a custom store function. Will have to write some class which extends StoreFunc class. Further inside it will have to use Multiple outputs since you wanna store in 3 different files.
Refer https://pig.apache.org/docs/r0.7.0/udf.html#Store+Functions for custom store function.
Otherwise in pig, one store command will store only one alias, only in one file.
For such kind of requirement better you write JAVA MR.
You can try with MultiStorage() option, It will be available in piggybank jar. you need to download pig-0.11.1.jar and set it in your classpath.
Example:
input.txt
A 12345
B 32122
C 23232
PigScript:
A = LOAD 'input.txt' USING PigStorage(' ') AS (f1,f2);
STORE A INTO 'output' USING org.apache.pig.piggybank.storage.MultiStorage('output', '0');
Now output folder contains 3 dirs A,B,C and filenames(A-0,000 ,B-0,000 and C-0,000 ) contain the actual value
output$ ls
A B C _SUCCESS
output$ cat A/A-0,000
A 12345
output$ cat B/B-0,000
B 32122
output$ cat C/C-0,000
C 23232
I am trying to copy data at position (50,10) of my input file to an output file,
but I am having problems.
My input file size is 100; the needed data is from the 50th position for next 10 bytes.
I have used the following options but each of them cause an abend.
I have taken output file as length 10 only, as I only need 10 bytes.
But abend says. OUTREC RECORD LENGTH = 10
SORTIN : RECFM=VB ; LRECL= 100; BLKSIZE= 1000
SORTIN : DSNAME=MNV.TESTS.DF.CPR810S1.EZ2OP
OUTREC RECORD LENGTH = 10
SORTOUT RECFM INCOMPATIBLE
SORTOUT : RECFM=FB ; LRECL= ; BLKSIZE=
I have used the below options:
OUTREC FIELDS(50,10)
SORT FIELDS(1,4,CH,A)
--------didn't work------------
SORT FIELDS=COPY
OUTREC FIELDS=(115,9,125,10)
--------didn't work------------
SORT FIELDS=COPY
BUILD=(50,10)
--------didn't work------------
INREC FIELDS=(50,10)
SORT FIELDS=(1,3,CH,A)
--------didn't work------------
I know it's pointless to mention that you rarely Accept or provide feedback, and are not that much of a voter either.
For some reason you cut them off, but all those messages you posted come with a WER prefix and a message number. If you consult your SyncSORT manual, you'll find all the messages documented.
Forget that for a moment. You have posted SORTOUT RECFM INCOMPATIBLE. Why go on about the record-length? The RECFM. The RECFM. You have included the text of the message which shows the RECFM of the SORTIN, and also the one which shows the RECFM of SORTOUT. They are VB and FB respectively. If you look at the message in the manual, you'll discover that you haven't done anything explicit to make them different.
You have two choices. VTOF or CONVERT. You can use them on OUTREC (I believe) and OUTFIL (for sure).
OPTION COPY
OUTFIL VTOF,
BUILD=(50,10)
Why you'd want to try SORTing the file, I don't know, and you should be aware by not that just making up syntax does not work.
For SORT, by default, the output file is the same RECFM as the input. A variable-length record must always contain an RDW, 1,4 and the data itself starts at position 5.
If you need an output file of a different RECFM, then you must be explicit about it (with CONVERT, FTOV or VTOF).
When creating an F record, no RDW, so your BUILD=(50,10) is the correct format (if you are four bytes out, remember that for a V record, data starts at position five, so you need to add four to all start-positions which don't take account of the RDW (like a COBOL record-layout).
When creating a V from an F, no RDW, the FTOV/CONVERT will create it.
With V input and V output, always specify (1,4 at the start of your BUILD statement.