Command returns a list of strings, but want to make it an array so I can iterate through them [duplicate] - bash

This question already has answers here:
Reading output of a command into an array in Bash
(4 answers)
Closed 1 year ago.
I have this command which gives me a list of directories that have had changes in them when comparing two different git branches:
git diff test production --name-only | awk -F'/' 'NF!=1{print $1}' | sort -u
k8s
postgres
scripts
I want to iterate through the values it returns (in this case k8s, postgres, and scripts).
I can't figure out how to convert these values to an array though. I've tried a couple things:
changedServices=$(git diff test production --name-only | awk -F'/' 'NF!=1{print $1}' | sort -u)
Which just treats it as a multiline string.
And the following with the error message...
declare -a changedServices=$(git diff test production --name-only | awk -F'/' 'NF!=1{print $1}' | sort -u)
declare: changedServices: inconsistent type for assignment
How would I go about parsing this list as an array?

var=$() is a string assignment. For arrays you don't include the $, but you can also use mapfile as it's generally a better option
mapfile -t changedServices < <(git diff test production --name-only | awk -F'/' 'NF!=1{print $1}' | sort -u)
The -t option removes trailing delimiters.
If you don't have mapfile, another thing you can do is
changedServices=()
while IFS= read -r line; do
changedServices+=("${line}")
done < <(git diff test production --name-only | awk -F'/' 'NF!=1{print $1}' | sort -u)

Related

To split the output(s) of a script into two fields and insert that output from a specific row in a csv file

I am trying to split the output of the following code into two fields and insert it from the 3rd row of a csv file
#!/bin/bash
cid=`git log -n 1 --pretty=format:"%H"`
git diff-tree --no-commit-id --name-only -r $cid | xargs -I {} echo '\'{} | xargs -I {} md5sum > final.csv
Current Output comes as a single line ( need to be separated into fields)
title,Path
l34sdg232f00b434532196298ecf8427e /path/to/file
sg35s3456f00b204e98324998ecsdf3af /path/to/file
Expected Output
final.csv
title,Path
l34sdg232f00b434532196298ecf8427e,/path/to/file
sg35s3456f00b204e98324998ecsdf3af,/path/to/file
I am thinking of placing the output of the script in a third file and then reading that file line by line using awk. Not sure if that's the correct way to proceed.
Thanks in advance.
You seem to be overcomplicating things.
#!/bin/sh
cid=$(git log -n 1 --pretty=format:"%H")
git diff-tree --no-commit-id --name-only -r "$cid" |
xargs md5sum |
sed 's/ /,/' > final.csv
This simply replaces the two spaces in the md5sum output with a comma.
Because nothing here is Bash-specific, I changed the shebang to #!/bin/sh; obviously, still feel free to use Bash if you prefer.
I also switched from the obsolescent `backtick` syntax to modern $(command substitution) syntax.
If you absolutely require the CSV header on top, adding that in the sed script should be trivial. Generally, header lines are more of a nuisance than actually useful, so maybe don't.
This kind of does what you're asking:
#!/bin/bash
cid=$(git log -n 1 --pretty=format:"%H")
git diff-tree --no-commit-id --name-only -r "$cid" | while read -r path
do
md5sum "${path}"
done | awk 'BEGIN{printf "%s,%s\n", "title", "path";printf "\n"}{printf "%s,%s\n",$1,$2}' > final.csv

Bash, how to create an array in one line of code

how can I create an array in one step instead of two stages, like shown below?'
The example below was executed on a live Linux system.
POSITION=`volt |grep ate |awk '{print $4}'` #returns three integers
declare -a POSITION_ARRAY=($POSITION) #create an array
You don't need the intermediate variable, as wjandrea said. These two snippets are equivalent:
POSITION=$(volt | grep ate | awk '{print $4}')
declare -a POSITION_ARRAY=($POSITION)
# declare -a also works, but isn't needed in modern Bash
POSITION_ARRAY=( $(volt | grep ate | awk '{print $4}') )
If you know the output of the pipeline is witespace-delimited integers this will do what you want. But it isn't a safe way to populate an array from arbitrary command output, because unquoted expansions will be word-split and globbed.
The proper way to read a command's output into an array, split by lines, is with the readarray builtin, like so:
readarray -t POSITION_ARRAY < <(volt | grep ate | awk '{print $4}')
Simply put the command in the parentheses.
By the way, declare -a is not needed, and backticks are deprecated in favour of $().
POSITION_ARRAY=( $(volt | grep ate | awk '{print $4}') )
And FWIW you can merge the grep and AWK commands:
POSITION_ARRAY=( $(volt | awk '/ate/ {print $4}') )

How to store output as variable [duplicate]

This question already has answers here:
How do I set a variable to the output of a command in Bash?
(15 answers)
Closed 3 years ago.
I'm looking to store the hash of my most recently downloaded file in my downloads folder as a variable.
So far, this is what I have:
md5sum $(ls -t | head -n1) | awk '{print $1}'
Output:
user#ci-lux-soryan:~/Downloads$ md5sum $(ls -t | head -n1) | awk '{print $1}'
c1924742187128cc9cb2ec04ecbd1ca6
I have tried storing it as a variable like so, but it doesn't work:
VTHash=$(md5sum $(ls -t | head -n1) | awk '{print $1}')
Any ideas, where am I going wrong
As #Cyrus outlined parsing ls has its own pitfalls and therefore better to avoid it altogether rather than allowing unexpected corner cases. The following shall facilitate the requirements epitomised.
VTHash="$(find -type f -mtime 0 | tail -n 1 | xargs md5sum | awk '{ print $1 }')"

sed: Argument list too long when running sed -n

I am running this command from Why is my git repository so big? on a very big git repository as https://github.com/python/cpython
git rev-list --all --objects | sed -n $(git rev-list --objects --all | cut -f1 -d' ' | git cat-file --batch-check | grep blob | sort -n -k 3 | tail -n800 | while read hash type size; do size_in_kibibytes=$(echo $size | awk '{ foo = $1 / 1024 ; print foo "KiB" }'); echo -n "-e s/$hash/$size_in_kibibytes/p "; done) | sort -n -k1;
It works fine if I replace tail -n800 by tail -n40:
1160.94KiB Lib/ensurepip/_bundled/pip-8.0.2-py2.py3-none-any.whl
1169.59KiB Lib/ensurepip/_bundled/pip-8.1.1-py2.py3-none-any.whl
1170.86KiB Lib/ensurepip/_bundled/pip-8.1.2-py2.py3-none-any.whl
1225.24KiB Lib/ensurepip/_bundled/pip-9.0.0-py2.py3-none-any.whl
...
I found this question Bash : sed -n arguments saying I could use awk instead of sed.
Do you know how do fix this sed: Argument list too long when tail is -n800 instead of -n40?
It seems you have used this anwer in the linked question: Some scripts I use:.... There is a telling comment in that answer:
This function is great, but it's unimaginably slow. It can't even finish on my computer if I remove the 40 line limit. FYI, I just added an answer with a more efficient version of this function. Check it out if you want to use this logic on a big repository, or if you want to see the sizes summed per file or per folder. – piojo Jul 28 '17 at 7:59
And luckily piojo has written another answer addressing this. Just use his code.
As an alternative, check if git sizer would work on your repository: that would help isolating what takes place in your repository.
If not, you have other commands in "How to find/identify large commits in git history?", which do loop around each objects and avoid the sed -nxx part
The alternative would be to redirect your result/command to a file, then sed on that file, as in here.

Append xargs argument number as prefix

I want to analyze the most frequentry occuring entries in (column of) a logfile. To write the detail results, I am creating new directories from the output of something along the lines of
cat logs| cut -d',' -f 6 | sort | uniq -c | sort -rn | head -10 | \
awk '{print $2}' |xargs mkdir -p
Is there a way to create the directories with the sequence number of the argument as processed by xargs as a prefix? For e.g. For e.g. "oranges" is the most frequent entry (of the column) the directory created should be named "1.oranges" and so on.
A quick (and dirty?) solution could be to pipe your directory names through cat -n in their proper order and then remove the whitespace separating the line number from the directory name, before passing them to xargs.
A better solution would be to modify your awk command:
... | awk '{ print NR "." $2 }' | xargs mkdir -p
The NR variable contains the record (i.e. line) number.

Resources