Split string with for loop and sed in bash shell - bash

I have following string in a variable:
-rw-r--r-- 0 1068 1001 4870 Dec 6 11:58 1.zip -rw-r--r-- 0 1068 1001 20246 Dec 6 11:59 10.zip
I'm trying to loop over this string with a for loop and get the following result:
Dec 6 11:58 1.zip
Dec 6 11:59 10.zip
Does anyone have the proper sed command to do this?
So let me make my question a little more clear. I do an sftp command with -b file and in there I do an ls -l *.zip. The result of this goes into a file. At first, I used a sed command to clear the first 2 lines since these are irrelevant information for me. I now only have the ls results, but they are on one line. In my example, there were just 2 zip files but there can be a lot more.
ListOfFiles=$(sed '1,2d' $LstFile) #delete first 2 lines
for line in $ListOfFiles
do
$line=$(echo "${line}" | sed (here i want the command to ony print zip file and date)
done

Notes on the revised scenario
The question has been modified to include a shell fragment:
ListOfFiles=$(sed '1,2d' $LstFile) #delete first 2 lines
for line in $ListOfFiles
do
$line=$(echo "${line}" | sed # I want to print only file name and date
done
Saving the results into a variable, as in the first line, is simply the wrong way to deal with it. You can use a simple adaptation of the code in my original answer (below) to achieve your requirement simply — very simply using awk, but it is possible using sed with a simple adaptation of the original code, if you're hung up on using sed.
awk variant
awk 'NR <= 2 { next } { print $6, $7, $8, $9 }' $LstFile
The NR <= 2 { next } portion skips the first two lines; the rest is unchanged, except that the data source is the list file you downloaded.
sed variant
sed -nE -e '1,2d' -e 's/^([^ ]+[ ]+){5}([^ ]+([ ]+[^ ]+){3})$/\2/p' $LstFile
In practice, the 1,2d command is unlikely to be necessary, but it is safer to use it, just in case one of the first two lines has 9 fields. (Yes, I could avoid using the -e option twice — no, I prefer to have separate commands in separate options; it makes it easier to read IMO.)
An answer for the original question
If you treat this as an exercise in string manipulation (disregarding legitimate caveats about trying to parse the output from ls reliably), then you don't need sed. In fact, sed is almost definitely the wrong tool for the job — awk would be a better choice — but the shell alone could be used. For example, assuming the data is in the string $variable, you could use:
set -- $variable
echo $6 $7 $8 $9
echo $15 $16 $17 $18
This gives you 18 positional parameters and prints the 8 you're interested in. Using awk, you might use:
echo $variable | awk '{ print $6, $7, $8, $9; print $15, $16, $17, $18 }'
Both these automatically split a string at spaces and allow you to reference the split elements with numbers. Using sed, you don't get that automatic splitting, which makes the job extremely cumbersome.
Suppose the variable actually holds two lines, so:
echo "$variable"
reports:
-rw-r--r-- 0 1068 1001 4870 Dec 6 11:58 1.zip
-rw-r--r-- 0 1068 1001 20246 Dec 6 11:59 10.zip
The code above assumed that the contents of $variable was a single line (though it would work unchanged if the variable contained two lines), but the code below assumes that it contains two lines. In fact, the code below would work if $variable contained many lines, whereas the set and awk versions are tied to '18 fields in the input'.
Assuming that the -E option to sed enables extended regular expressions, then you could use:
variable="-rw-r--r-- 0 1068 1001 4870 Dec 6 11:58 1.zip
-rw-r--r-- 0 1068 1001 20246 Dec 6 11:59 10.zip"
echo "$variable" |
sed -nE 's/^([^[:space:]]+[[:space:]]+){5}([^[:space:]]+([[:space:]]+[^[:space:]]+){3})$/\2/p'
That looks for a sequence of not white space characters followed by a sequence of white space characters, repeated 5 times, followed by a sequence of not white space characters and 3 sets of a sequence of white space followed by a sequence of not white space. The grouping parentheses — thus picking out fields 1-5 into \1 (which is ignored), and fields 6-9 into \2 (which is preserved), and then prints the result. If you decide you can assume no tabs etc, you can simplify the sed command to:
echo "$variable" | sed -nE 's/^([^ ]+[ ]+){5}([^ ]+([ ]+[^ ]+){3})$/\2/p'
Both of those produce the output:
Dec 6 11:58 1.zip
Dec 6 11:59 10.zip
Dealing with the single line variant of the input is excruciating — sufficiently so that I'm not going to show it.
Note that with the two-line value in $variable, the awk version could become:
echo "$variable" | awk '{ print $6, $7, $8, $9 }'
This will also handle an arbitrary number of lines.
Note how it is crucial to understand the difference between echo $variable and echo "$variable". The first treats all white space sequences as equivalent to a single blank but the other preserves the internal spacing. And capturing output such as with:
variable=$(ls -l 1*.zip)
preserves the spacing (especially the newline) in the assignment (see Capturing multiple line output into a Bash variable). Thus there's a moderate chance that the sed shown would work for you, but it isn't certain because you didn't answer clarifications sought before this answer was posted.

as others said you shouldn't really be parsing ls output.. Otherwise a dummy way to do it using awk to print out the columns you're interested in :
awk '{print $6, $7, $8, $9 "\n" $15, $16, $17, $18}' <<< $your_variable

Related

how to use awk to read a part of line including number of space?

I want to extract a value using "awk subtring" which should also count the number of spaces without any separator.
For example, below is the input, and I want to extract the "29611", including space,
201903011232101029 2961104E3021 223 0 12113 5 15 8288 298233 0 45 0 39 4
I used this method, but it used space as a separator:
more abbas.dat | awk '{print substr($1,1,16),substr($1,17,25)}'
Expected output should be :
201903011232101029 2961
But it prints only
201903011232101029
My question is how can we print using "substr" which count spaces?
I know, I can use this command to get the desired output but it is not helpful for my objective
more abbas.dat | awk '{print substr($1,1,16),substr($2,1,5)}'
1st solution: With your shown samples, please try following awk code. Written and tested in GNU awk. Using match function of awk here to get required output.
To print 1st field followed by varying spaces followed by 5 digits from 2nd field then use following:
awk 'match($0,/^[0-9]+[[:space:]]+[0-9]{5}/){print substr($0,RSTART,RLENGTH)}' Input_file
OR To print 16 letters in 1st field and 5 from second field including varying length of spaces between 1st and 2nd fields:
awk 'match($0,/^([0-9]{16})[^[:space:]]+([[:space:]]+)([0-9]{5})/,arr){print arr[1] arr[2] arr[3]}' Input_file
2nd solution: Using GNU grep please try following, considering that your 2nd column first 4 needed values can be anything(eg: digits, alphabets etc).
grep -oP '^\S+\s+.{5}' Input_file
OR to only match 4 digits in 2nd field have a minor change in above grep.
grep -oP '^\S+\s+\d{5}' Input_file
If there is always one space you can use the following command which will print the first group, plus the first 5 character of the second group.
N.B. It's not clear in the question whether you want 4 or 5 characters but that can be adjusted easily.
more abbas.dat | awk '{print $1" "substr($2,1,5) }'
I think the simplest way is to include "Fs" in your command.
more abbas.dat | awk -Fs '{print substr($1,1,16),substr($1,17,25)}'
$ awk '{print substr($0,1,24)}' file
201903011232101029 29611
If that's not all you need then edit your question to clarify your requirements.

How to loop a variable range in cut command

I have a file with 2 columns, and i want to use the values from the second column to set the range in the cut command to select a range of characters from another file. The range i desire is the character in the position of the value in the second column plus the next 10 characters. I will give an example in a while.
My files are something like that:
File with 2 columns and no blank lines between lines (file1.txt):
NAME1 10
NAME2 25
NAME3 48
NAME4 66
File that i want to extract the variable range of characters(just one very long line with no spaces and no bold font) (file2.txt):
GATCGAGCGGGATTCTTTTTTTTTAGGCGAGTCAGCTAGCATCAGCTACGAGAGGCGAGGGCGGGCTATCACGACTACGACTACGACTACAGCATCAGCATCAGCGCACTAGAGCGAGGCTAGCTAGCTACGACTACGATCAGCATCGCACATCGACTACGATCAGCATCAGCTACGCATCGAAGAGAGAGC
...or, more literally (for copy/paste to test):
GATCGAGCGGGATTCTTTTTTTTTAGGCGAGTCAGCTAGCATCAGCTACGAGAGGCGAGGGCGGGCTATCACGACTACGACTACGACTACAGCATCAGCATCAGCGCACTAGAGCGAGGCTAGCTAGCTACGACTACGATCAGCATCGCACATCGACTACGATCAGCATCAGCTACGCATCGAAGAGAGAGC
Desired resulting file, one sequence per line (result.txt):
GATTCTTTTT
GGCGAGTCAG
CGAGAGGCGA
TATCACGACT
The resulting file would have the characters from 10-20, 25-35, 48-58 and 66-76, each range in a new line. So, it would always keep the range of 10, but in different start points and those start points are set by the values in the second column from the first file.
I tried the command:
for i in $(awk '{print $2}' file1.txt);
do
p1=$i;
p2=`expr "$1" + 10`
cut -c$p1-$2 file2.txt > result.txt;
done
I don't get any output or error message.
I also tried:
while read line; do
set $line
p2=`expr "$2" + 10`
cut -c$2-$p2 file2.txt > result.txt;
done <file1.txt
This last command gives me an error message:
cut: invalid range with no endpoint: -
Try 'cut --help' for more information.
expr: non-integer argument
There's no need for cut here; dd can do the job of indexing into a file, and reading only the number of bytes you want. (Note that status=none is a GNUism; you may need to leave it out on other platforms and redirect stderr otherwise if you want to suppress informational logging).
while read -r name index _; do
dd if=file2.txt bs=1 skip="$index" count=10 status=none
printf '\n'
done <file1.txt >result.txt
This approach avoids excessive memory requirements (as present when reading the whole of file2 -- assuming it's large), and has bounded performance requirements (overhead is equal to starting one copy of dd per sequence to extract).
Using awk
$ awk 'FNR==NR{a=$0; next} {print substr(a,$2+1,10)}' file2 file1
GATTCTTTTT
GGCGAGTCAG
CGAGAGGCGA
TATCACGACT
If file2.txt is not too large, then you can read it in memory,
and use Bash sub-strings to extract the desired ranges:
data=$(<file2.txt)
while read -r name index _; do
echo "${data:$index:10}"
done <file1.txt >result.txt
This will be much more efficient than running cut or another process for every single range definition.
(Thanks to #CharlesDuffy for the tip to read data without a useless cat, and the while loop.)
One way to solve it:
#!/bin/bash
while read line; do
pos=$(echo "$line" | cut -f2 -d' ')
x=$(head -c $(( $pos + 10 )) file2.txt | tail -c 10)
echo "$x"
done < file1.txt > result.txt
It's not the solution an experienced bash hacker would use, but it is very good for someone who is new to bash. It uses tools that are very versatile, although somewhat bad if you need high performance. Shell scripting is commonly used by people who rarely shell scripts, but knows a few commands and just wants to get the job done. That's why I'm including this solution, even if the other answers are superior for more experienced people.
The first line is pretty easy. It just extracts the numbers from file1.txt. The second line uses the very nice tools head and tail. Usually, they are used with lines instead of characters. Nevertheless, I print the first pos + 10 characters with head. The result is piped into tail which prints the last 10 characters.
Thanks to #CharlesDuffy for improvements.

grep command giving unexpected output when searching exact word in file in csh

I used following script to search every line of one file in another file and if it is found printing 2nd column of that line :
#!/bin/csh
set goldFile=$1
set regFile=$2
set noglob
foreach line ("`cat $goldFile`")
set searchString=`echo $line | awk '{print $1}'`
set id=`grep -w -F "$searchString" $regFile | awk '{print $2}'`
echo "$searchString" "and" "$id"
end
unset noglob
Gold file is as follows :
\$#%$%escaped.Integer%^^&[10]
\$#%$%escaped.Integer%^^&[10][0][0][31]
\$#%$%escaped.Integer%^^&[10][0][0][30]
\$#%$%escaped.Integer%^^&[10][0][0][29]
\$#%$%escaped.Integer%^^&[10][0][0][28]
\$#%$%escaped.Integer%^^&[10][0][0][27]
\$#%$%escaped.Integer%^^&[10][0][0][26]
and RegFile is as follows :
\$#%$%escaped.Integer%^^&[10] 1
\$#%$%escaped.Integer%^^&[10][0][0][31] 10
\$#%$%escaped.Integer%^^&[10][0][0][30] 11
\$#%$%escaped.Integer%^^&[10][0][0][29] 12
\$#%$%escaped.Integer%^^&[10][0][0][28] 13
\$#%$%escaped.Integer%^^&[10][0][0][27] 14
\$#%$%escaped.Integer%^^&[10][0][0][26] 15
Output is coming :
\$#%$%escaped.Integer%^^&[10] and 1 10 11 12 13 14 15
\$#%$%escaped.Integer%^^&[10][0][0][31] and 10
\$#%$%escaped.Integer%^^&[10][0][0][30] and 11
\$#%$%escaped.Integer%^^&[10][0][0][29] and 12
\$#%$%escaped.Integer%^^&[10][0][0][28] and 13
\$#%$%escaped.Integer%^^&[10][0][0][27] and 14
\$#%$%escaped.Integer%^^&[10][0][0][26] and 15
But expected Output is :
\$#%$%escaped.Integer%^^&[10] and 1
\$#%$%escaped.Integer%^^&[10][0][0][31] and 10
\$#%$%escaped.Integer%^^&[10][0][0][30] and 11
\$#%$%escaped.Integer%^^&[10][0][0][29] and 12
\$#%$%escaped.Integer%^^&[10][0][0][28] and 13
\$#%$%escaped.Integer%^^&[10][0][0][27] and 14
\$#%$%escaped.Integer%^^&[10][0][0][26] and 15
Please help me to figure out how to search exact word having some special character using grep.
csh and bash are completely different variants of shell. They're not even supposed to be compatible. Your problem is more associated with usage of grep
Because of the -F flag in grep which lets your string to be fixed pattern, prone to contain all sorts of regex special characters like ,,[],(),.,*,^,$,-,\
The error result is because the -F flag, the line \$#%$%escaped.Integer%^^&[10] in Gold file matches all the input lines on the RegFile.
So normally the exact words of search can be filtered by the word boundary constructs ^ and $ as part of the pattern, but it won't work in your case because of the -F, --fixed-strings flag they will be treated as being part of the search string.
So assuming from the input file, there could be only one match for each line in the Gold file to RegFile you could stop the grep search after the first hit
Using the -m1 flag, which according to the man grep page says,
-m NUM, --max-count=NUM
Stop reading a file after NUM matching lines. If the input is standard input
from a regular file, and NUM matching lines are output, grep ensures that the
standard input is positioned to just after the last matching line before
exiting, regardless of the presence of trailing context lines.
So adding it like,
grep -w -F -m1 "$searchString" $regFile
should solve your problem.

Find filenames using variables in a for loop in UNIX

I have some files of this form: 2014.144.09.27.56.0195.IU.SDV.00.BHZ.M.SAC and I want to extract the SDV, 00 ,BHZ variables, in order to find a file, of this form ./POLEZEROFILES/SAC_PZs_IU_SDV_BHZ_00_2014.112.19.50.00.0000_2599.365.23.59.59.99999, using these variables. I am using this forloop, that for every ".SAC" file, it finds the corresponding "SAC_PZs_" file, with the same variables in the filename.
#!/bin/sh
ALIST=(*SAC)
for ((i=0;i<${#ALIST[#]};i++));do
A="${ALIST[i]}"
staname=`ls "$A" | awk -F"[_.]" '{print $8}'`
staXX=`ls "$A" | awk -F"[_.]" '{print $9}'`
stacomp=`ls "$A" | awk -F"[_.]" '{print $10}'`
B=`find ./POLEZEROFILES -name "SAC*_${staname}_*${stacomp}_*${staXX}*" -print`
echo "${A}" "${B}"
done
This code works for some filenames, but in some cases it outputs 2 filenames, ignoring the 00 variable.For example,
for this $A:
2014.144.09.27.33.0195.IU.RSSD.00.BHZ.M.SAC
it outputs 2 $B:
./POLEZEROFILES/SAC_PZs_IU_RSSD_BHZ_10_2011.209.05.56.00.0000_2599.365.23.59.59.99999
./POLEZEROFILES/SAC_PZs_IU_RSSD_BHZ_00_2011.208.18.13.59.0000_2599.365.23.59.59.99999
The second output is the right one.
Can anyone figure out the problem?
Not related to your problem but a general comment. You don't need ls in those awk lines. echo will work just fine (as will awk ... <<<"$A").
Your problem is that your pattern matches too loosely.
Your second to last * consumes up to the 00.0000.... bit in your first filename and then matches. You need to anchor your desired patterns better. So if you know that stacomp and staXX will be next to each other drop the * between them. If you don't know that then at least put a _ after staXX to anchor the following character.
There's also no need for the array and manual for loop here.
Just for A in *SAC; do will work for your loop.

How to display only lines 12-24 of an arbitrary text file?

I have a set of text files and I'd like to display lines 12-14 by running a bash script on each file.
For one of the files, this works:
tail -14 | head -11
But since other files have different lengths, I cannot run the same script on them.
What is the command I'm looking for to output lines 12-24 of the text file?
Use sed with -n argument
sed -n 12,24p <FILENAME>
For a funny pure Bash (≥4) possibility:
mapfile -t -s 11 -n 13 lines < file
printf '%s\n' "${lines[#]}"
This will skip the first 11 lines (with -s 11) and read 13 lines (with -n 13) and store each line in a field of the array lines.
Using awk:
awk '12<= NR && NR <= 24' file
In awk, NR is the line number. The above condition insists that NR be both greater than or equal to 12 and less than or equal to 24. If it is, then the line is printed. Otherwise, it isn't.
A more efficient solution
It would be more efficient to stop reading the file after the upper line limit has been reached. This solution does that:
awk 'NR>24 {exit;} NR>=12' file

Resources