using xargs as an argument for cut - bash

Say i have a file a.txt containing a word, followed by a number, followed by a newline on
and 3
now 2
for 2
something 7
completely 8
different 6
I need to select the nth char from every word (specified by the number next to the word)
cat a.txt | cut -d' ' -f2 | xargs -i -n1 cut a.txt -c {}
I tried this command, which selects the numbers and uses xargs to put them into the -c option from cut, but the cut command gets executed on every line, instead of a.txt being looped (which I had expected to happen) How can I resolve this problem?
EDIT: Since it seems to be unclear, i want to select a character from a word. The character which I need to select can be found next to the word, for example:
and 3, will give me d. I want to do this for the entire file, which will then form a word :)

A pure shell solution:
$ while read word num; do echo ${word:$((num-1)):1}; done < a.txt
d
o
o
i
e
r
This is using a classic while; do ... ; done shell loop and the read builtin. The general format is
while read variable1 variable2 ... variableN; do something; done < input_file
This will iterate over each line of your input file splitting it into as many variables as you've given. By default, it will split at whitespace but you can change that by changing the $IFS variable. If you give a single variable, the entire line will be saved, if you give more, it will populate as many variables as you give it and save the rest in the last one.
In this particular loop, we're reading the word into $word and the number into $num. Once we have the word, we can use the shell's string manipulation capabilities to extract a substring. The general format is
${string:start:length}
So, ${string:0:2} would extract the first two characters from the variable $string. Here, the variable is $word, the start is the number minus one (this starts counting at 0) and the length is one. The result is the single letter at the position given by the number.

I would suggest that you used awk:
awk '{print substr($1,$2,1)}' file
substr takes a substring of the first field starting from the number contained in the second field and of length 1.
Testing it out (using the original input from your question):
$ cat file
and 3
now 2
for 2
something 7
completely 8
different 6
$ awk '{print substr($1,$2,1)}' file
d
o
o
i
e
r

Related

How can I replace all occurrences of a value in a text file just in one column using the Sed command in shell script (columns are seperated by ;)? [duplicate]

This question already has answers here:
sed: replace values in a single column
(3 answers)
Closed last month.
I have a file that has columns seperated by a semi column(;) and I want to change all occurrences of a word in a particular column only to another word. The column number differentiates based on the variable that holds the column number. The word I want to change is stored in a variable, and the word I want to change to is stored in a variable too.
I tried
sed -i "s/\<$word\>/$wordUpdate/g" $anyFile
I tried this but it changed all occurrences of word in the whole file! I only want in a particular column
the number of column is stored in a variable called numColumn
and the columns are seperated by a semi column ;
It is much simpler to use awk for column edits, e.g. if your input looks like this:
68;61;83;27;60;70;84;11;46;62;93;97;40;23;19
33;70;17;49;81;21;68;83;16;6;42;38;68;81;89
73;40;95;64;32;33;77;56;23;11;70;28;33;80;24
8;9;74;6;86;78;87;41;11;79;23;28;71;99;15
29;87;77;9;98;12;7;66;60;85;20;14;55;97;17
39;24;21;58;23;61;39;26;57;70;76;16;70;53;8
37;46;18;64;56;28;86;7;80;71;94;46;19;53;43
71;2;47;62;9;21;68;9;9;80;32;59;73;74;72
20;34;89;58;74;92;86;35;48;81;50;6;63;67;90
78;17;6;63;61;65;75;31;33;82;24;5;90;46;12
You can replace 60 in column c with s with something like this:
<infile awk '$c ~ m { $c = s } 1' FS=';' OFS=';' c=5 m=60 s=XX
Output:
68;61;83;27;XX;70;84;11;46;62;93;97;40;23;19
33;70;17;49;81;21;68;83;16;6;42;38;68;81;89
73;40;95;64;32;33;77;56;23;11;70;28;33;80;24
8;9;74;6;86;78;87;41;11;79;23;28;71;99;15
29;87;77;9;98;12;7;66;60;85;20;14;55;97;17
39;24;21;58;23;61;39;26;57;70;76;16;70;53;8
37;46;18;64;56;28;86;7;80;71;94;46;19;53;43
71;2;47;62;9;21;68;9;9;80;32;59;73;74;72
20;34;89;58;74;92;86;35;48;81;50;6;63;67;90
78;17;6;63;61;65;75;31;33;82;24;5;90;46;12
This might work for you (GNU sed):
word=foo wordUpdate=bar numColumn=3
sed -i 'y/;/\n/
s#.*#echo "&" | sed "'${numColumn}'s/\<'${word}'\>/'${wordUpdate}'/"#e
y/\n/;/' file
Convert each line into a separate file where the columns are lines.
Substitute the matching line (column number) with the word for the updated word.
Reverse the conversion.
N.B. The solution relies on the GNU only e evaluation flag. Also the word and updateWord may need to be quoted.
This can be done with a little creativity...
Note that I'm using double-quotes to embed the logic. This takes a little extra care to double your \'s on backreferences.
$: word=baz; c=3; new=XX; lead="^([^;]*;){$((c-1))}"; sed -E "/$lead$word;/{s/($lead)$word/\\1$new/}" file
1;2;3;4;5;6;7;8;9;0;
foo;bar;XX;qux;foo;bar;baz;qux;
a;b;c;d;e;f;g;
Explained:
lead="^([^;]*;){$((c-1))}"
^ means at the start of a record
(...) is grouping for the following {...} which specified repetition
[^;]* mean zero or more non-semicolons
$((c-1)) does the math and returns one less than the desired column; if you want to look at column 3, it returns two.
SO, ^([^;]*;){$((c-1))} at the start of the record, one-less-than-column occurrences of non-semicolons followed by a semicolon
thus, sed -E "/$lead$word;/{s/($lead)$word/\\1$new/}" file mean read file and on records where $word occurs in the requested column, save everything before it, and put that stuff back, but replace $word with $new.
Even if you MUST use sed, I recommend a function.
fix(){
local word="$1" col="$2" new="$3" file="$4"
local lead="^([^;]*;){$((col-1))}"
sed -E "/$lead$word;/{s/($lead)$word/\\1$new/}" "$file"
}
In use -
$: fix bar 2 HI file
1;2;3;4;5;6;7;8;9;0;
foo;HI;baz;qux;foo;bar;baz;qux;
a;b;c;d;e;f;g;
$: fix 1 1 XX file
XX;2;3;4;5;6;7;8;9;0;
foo;bar;baz;qux;foo;bar;baz;qux;
a;b;c;d;e;f;g;
$: fix bar 2 '(^_^)' file
1;2;3;4;5;6;7;8;9;0;
foo;(^_^);baz;qux;foo;bar;baz;qux;
a;b;c;d;e;f;g;
No changes if no matches -
$: fix bar 5 HI file
1;2;3;4;5;6;7;8;9;0;
foo;bar;baz;qux;foo;bar;baz;qux;
a;b;c;d;e;f;g;
NOTE -
This logic requires trailing delimiters if you ever want to match the last field -
$: fix 0 10 HI file
1;2;3;4;5;6;7;8;9;HI;
foo;bar;baz;qux;foo;bar;baz;qux;
a;b;c;d;e;f;g;
delimiters removed:
$: fix 0 10 HI file
1;2;3;4;5;6;7;8;9;0
foo;bar;baz;qux;foo;bar;baz;qux
a;b;c;d;e;f;g
Otherwise you have to complicate the logic a bit.
But honestly, for field parsing, you'd be so much better served to use awk, or even perl or python, or for that matter a bash loop, though that's going to be relatively slow.

How to get all lines from a file after the last empty line?

Having a file like foo.txt with content
1
2
3
4
5
How do i get the lines starting with 4 and 5 out of it (everything after last empty line), assuming the amount of lines can be different?
Updated
Let's try a slightly simpler approach with just sed.
$: sed -n '/^$/{g;D;}; N; $p;' foo.txt
4
5
-n says don't print unless I tell you to.
/^$/{g;D;}; says on each blank line, clear it all out with this:
g : Replace the contents of the pattern space with the contents of the hold space. Since we never put anything in, this erases the (possibly long accumulated) pattern space. Note that I could have used z since this is GNU, but I wanted to break it out for non-GNU sed's below, and in this case this works for both.
D : remove the now empty line from the pattern space, and go read the next.
Now previously accumulated lines have been wiped if (and only if) we saw a blank line. The D loops back to the beginning, so N will never see a blank line.
N : Add a newline to the pattern space, then append the next line of input to the pattern space. This is done on every line except blanks, after which the pattern space will be empty.
This accumulates all nonblanks until either 1) a blank is hit, which will clear and restart the buffer as above, or 2) we reach EOF with a buffer intact.
Finally, $p says on the LAST line (which will already have been added to the pattern space unless the last line was blank, which will have removed the pattern space...), print the pattern space. The only time this will have nothing to print is if the last line of the file was a blank line.
So the whole logic boils down to: clean the buffer on empty lines, otherwise pile the non-empty lines up and print at the end.
If you don't have GNU sed, just put the commands on separate lines.
sed -n '
/^$/{
g
D
}
N
$p
' foo.txt
Alternate
The method above is efficient, but could potentially build up a very large pattern buffer on certain data sets. If that's not an issue, go with it.
Or, if you want it in simple steps, don't mind more processes doing less work each, and prefer less memory consumed:
last=$( sed -n /^$/= foo.txt|tail -1 ) # find the last blank
next=$(( ${last:-0} + 1 )) # get the number of the line after
cmd="$next,\$p" # compose the range command to print
sed -n "$cmd" foo.txt # run it to print the range you wanted
This runs a lot of small, simple tasks outside of sed so that it can give sed the simplest, most direct and efficient description of the task possible. It will read the target file twice, but won't have to manage filling, flushing, and refilling the accumulation of data in the pattern buffer with records before a blank line. Still likely slower unless you are memory bound, I'd think.
Reverse the file, print everything up to the first blank line, reverse it again.
$ tac foo.txt | awk '/^$/{exit}1' | tac
4
5
Using GNU awk:
awk -v RS='\n\n' 'END{printf "%s",$0}' file
RS is the record separator set to empty line.
The END statement prints the last record.
try this:
tail +$(($(grep -nE ^$ test.txt | tail -n1 | sed -e 's/://g')+1)) test.txt
grep your input file for empty lines.
get last line with tail => 5:
remove unnecessary :
add 1 to 5 => 6
tail starting from 6
You can try with sed :
sed -n ':A;$bB;/^$/{x;s/.*//;x};H;n;bA;:B;H;x;s/^..//;p' infile
With GNU sed:
sed ':a;/$/{N;s/.*\n\n//;ba;}' file

Reverse lines in a file two by two

I'm trying to reverse the lines in a file, but I want to do it two lines by two lines.
For the following input:
1
2
3
4
…
97
98
I would like the following output:
97
98
…
3
4
1
2
I found lots of ways to reverse a file line by line (especially on this topic: How can I reverse the order of lines in a file?).
tac. The simplest. Doesn't seem to have an option for what I want, even if I tried to play around with options -r and -s.
tail -r (not POSIX compliant). Not POSIX compliant, my version doesn't seem to have anything to do that.
Remains three sed formula, and I think a little modification would do the trick. But I'm not even understanding what they're doing, and thus I'm stuck here.
sed '1!G;h;$!d'
sed -n '1!G;h;$p'
sed 'x;1!H;$!d;x'
Any help would be appreciated. I'll try to understand these formula and to give answer to this question by myself.
Okay, I'll bite. In pure sed, we'll have to build the complete output in the hold buffer before printing it (because we see the stuff we want to print first last). A basic template can look like this:
sed 'N;G;h;$!d' filename # Incomplete!
That is:
N # fetch another line, append it to the one we already have in the pattern
# space
G # append the hold buffer to the pattern space.
h # save the result of that to the hold buffer
$!d # and unless the end of the input was reached, start over with the next
# line.
The hold buffer always contains the reversed version of the input processed so far, and the code takes two lines and glues them to the top of that. In the end, it is printed.
This has two problems:
If the number of input lines is odd, it prints only the last line of the file, and
we get a superfluous empty line at the end of the input.
The first is because N bails out if no more lines exist in the output, which happens with an odd number of input lines; we can solve the problem by executing it conditionally only when the end of the input was not yet reached. Just like the $!d above, this is done with $!N, where $ is the end-of-input condition and ! inverts it.
The second is because at the very beginning, the hold buffer contains an empty line that G appends to the pattern space when the code is run for the very first time. Since with $!Nwe don't know if at that point the line counter is 1 or 2, we should inhibit it conditionally on both. This can be done with 1,2!G, where 1,2 is a range spanning from line 1 to line 2, so that 1,2!G will run G if the line counter is not between 1 and 2.
The whole script then becomes
sed '$!N;1,2!G;h;$!d' filename
Another approach is to combine sed with tac, such as
tac filename | sed -r 'N; s/(.*)\n(.*)/\2\n\1/' # requires GNU sed
That is not the shortest possible way to use sed here (you could also use tac filename | sed -n 'h;$!{n;G;};p'), but perhaps easier to understand: Every time a new line is processed, N fetches another line, and the s command swaps them. Because tac feeds us the lines in reverse, this restores pairs of lines to their original order.
The key difference to the first approach is the behavior for an odd number of lines: with the second approach, the first line of the file will be alone without a partner, whereas with the first it'll be the last.
I would go with this:
tac file | while read a && read b; do echo $b; echo $a; done
Here is an awk you can use:
cat file
1
2
3
4
5
6
7
8
awk '{a[NR]=$0} END {for (i=NR;i>=1;i-=2) print a[i-1]"\n"a[i]}' file
7
8
5
6
3
4
1
2
It store all line in an array a, then print it out in reverse, two by two.

'grep +A': print everything after a match [duplicate]

This question already has answers here:
How to get the part of a file after the first line that matches a regular expression
(12 answers)
Closed 7 years ago.
I have a file that contains a list of URLs. It looks like below:
file1:
http://www.google.com
http://www.bing.com
http://www.yahoo.com
http://www.baidu.com
http://www.yandex.com
....
I want to get all the records after: http://www.yahoo.com, results looks like below:
file2:
http://www.baidu.com
http://www.yandex.com
....
I know that I could use grep to find the line number of where yahoo.com lies using
grep -n 'http://www.yahoo.com' file1
3 http://www.yahoo.com
But I don't know how to get the file after line number 3. Also, I know there is a flag in grep -A print the lines after your match. However, you need to specify how many lines you want after the match. I am wondering is there something to get around that issue. Like:
Pseudocode:
grep -n 'http://www.yahoo.com' -A all file1 > file2
I know we could use the line number I got and wc -l to get the number of lines after yahoo.com, however... it feels pretty lame.
AWK
If you don't mind using AWK:
awk '/yahoo/{y=1;next}y' data.txt
This script has two parts:
/yahoo/ { y = 1; next }
y
The first part states that if we encounter a line with yahoo, we set the variable y=1, and then skip that line (the next command will jump to the next line, thus skip any further processing on the current line). Without the next command, the line yahoo will be printed.
The second part is a short hand for:
y != 0 { print }
Which means, for each line, if variable y is non-zero, we print that line. In AWK, if you refer to a variable, that variable will be created and is either zero or empty string, depending on context. Before encounter yahoo, variable y is 0, so the script does not print anything. After encounter yahoo, y is 1, so every line after that will be printed.
Sed
Or, using sed, the following will delete everything up to and including the line with yahoo:
sed '1,/yahoo/d' data.txt
This is much easier done with sed than grep. sed can apply any of its one-letter commands to an inclusive range of lines; the general syntax for this is
START , STOP COMMAND
except without any spaces. START and STOP can each be a number (meaning "line number N", starting from 1); a dollar sign (meaning "the end of the file"), or a regexp enclosed in slashes, meaning "the first line that matches this regexp". (The exact rules are slightly more complicated; the GNU sed manual has more detail.)
So, you can do what you want like so:
sed -n -e '/http:\/\/www\.yahoo\.com/,$p' file1 > file2
The -n means "don't print anything unless specifically told to", and the -e directive means "from the first appearance of a line that matches the regexp /http:\/\/www\.yahoo\.com/ to the end of the file, print."
This will include the line with http://www.yahoo.com/ on it in the output. If you want everything after that point but not that line itself, the easiest way to do that is to invert the operation:
sed -e '1,/http:\/\/www\.yahoo\.com/d' file1 > file2
which means "for line 1 through the first line matching the regexp /http:\/\/www\.yahoo\.com/, delete the line" (and then, implicitly, print everything else; note that -n is not used this time).
awk '/yahoo/ ? c++ : c' file1
Or golfed
awk '/yahoo/?c++:c' file1
Result
http://www.baidu.com
http://www.yandex.com
This is most easily done in Perl:
perl -ne 'print unless 1 .. m(http://www\.yahoo\.com)' file
In other words, print all lines that aren’t between line 1 and the first occurrence of that pattern.
Using this script:
# Get index of the "yahoo" word
index=`grep -n "yahoo" filepath | cut -d':' -f1`
# Get the total number of lines in the file
totallines=`wc -l filepath | cut -d' ' -f1`
# Subtract totallines with index
result=`expr $total - $index`
# Gives the desired output
grep -A $result "yahoo" filepath

Bash: Find and replace all variable characters up to a constant character with a constant string

I've seen many search and replace threads based on the assumption that 1. you either know what string or substring you are explicitly looking for or 2. you know the exact position it is at within the string or 3. both combined.
In my situation I have one csv file containing one column and 1M rows. e.g.
1,google.com
2,yahoo.com
3,twitter.com
4,xyz.com
For every column, I want to replace every character (the incrementing integers) up to and including the comma with the http semicolon dble forward slash dubdubdub
So far I have the following
HTTPSTRING="http://www."
cat X.csv << Will this ensure that the while block is executed on this file?
while IFS=, read line
do {$line/(.*?),/HTTPSTRING} << This is where I am having trouble
done
exit 0
and I would likea text file containing one URL per line e.g.
http://www.google.com
...
http://www.${999,999_more_urls}
Thank you so much in advance
Lewis
This does a greedy match, which would be problematic if you ever have any commas other than the one that separates the initial integer from the characters you want to retain. But it works on your sample X.csv file, producing a Y.csv file that meets your output specification.
HTTPSTRING="http://www."
while read line
do
echo ${line/*,/$HTTPSTRING}
done < X.csv > Y.csv
exit 0
For what it's worth, if you put this in a script, you can take the file input/input redirection parts out of the code itself, and instead apply them when calling the script.
If you're not strictly limited to bash itself, you might want to consider using sed. Either of these should do what you want, differing only in whether you prefer to escape the slashes in your string or use a non-standard delimiter:
sed 's/[0-9]*,/http:\/\/www./' X.csv > Y.csv
sed 's~[0-9]*,~http://www.~' X.csv > Y.csv
Your script is close. You can pipe the output of cat directly to the while loop, but it's better to use input redirection ( < X.csv). Using IFS=, before read will split the line into fields separated by a comma, but you are just missing a variable to hold the second field.
HTTPSTRING="http://www."
while IFS=, read number domain
do
echo "$HTTPSTRING$domain"
done < X.csv
You could use commands only, there is no need for an explicit Bash loop :
cut -d',' -f2 < X.csv | sed 's_^_http://www._' > Y.txt
Notice that the usual / used after the s in sed is replaced by _ because it is included in the string to replace. ^ matches the start of the line.

Resources