Command returns a list of strings, but want to make it an array so I can iterate through them [duplicate] - bash

I have this command which gives me a list of directories that have had changes in them when comparing two different git branches:
git diff test production --name-only | awk -F'/' 'NF!=1{print $1}' | sort -u
I want to iterate through the values it returns (in this case k8s, postgres, and scripts).
I can't figure out how to convert these values to an array though. I've tried a couple things:
changedServices=$(git diff test production --name-only | awk -F'/' 'NF!=1{print $1}' | sort -u)
Which just treats it as a multiline string.
And the following with the error message...
declare -a changedServices=$(git diff test production --name-only | awk -F'/' 'NF!=1{print $1}' | sort -u)
declare: changedServices: inconsistent type for assignment
How would I go about parsing this list as an array?

var=$() is a string assignment. For arrays you don't include the $, but you can also use mapfile as it's generally a better option
mapfile -t changedServices < <(git diff test production --name-only | awk -F'/' 'NF!=1{print $1}' | sort -u)
The -t option removes trailing delimiters.
If you don't have mapfile, another thing you can do is
while IFS= read -r line; do
done < <(git diff test production --name-only | awk -F'/' 'NF!=1{print $1}' | sort -u)


To split the output(s) of a script into two fields and insert that output from a specific row in a csv file

I am trying to split the output of the following code into two fields and insert it from the 3rd row of a csv file
cid=`git log -n 1 --pretty=format:"%H"`
git diff-tree --no-commit-id --name-only -r $cid | xargs -I {} echo '\'{} | xargs -I {} md5sum > final.csv
Current Output comes as a single line ( need to be separated into fields)
l34sdg232f00b434532196298ecf8427e /path/to/file
sg35s3456f00b204e98324998ecsdf3af /path/to/file
Expected Output
I am thinking of placing the output of the script in a third file and then reading that file line by line using awk. Not sure if that's the correct way to proceed.
Thanks in advance.
You seem to be overcomplicating things.
cid=$(git log -n 1 --pretty=format:"%H")
git diff-tree --no-commit-id --name-only -r "$cid" |
xargs md5sum |
sed 's/ /,/' > final.csv
This simply replaces the two spaces in the md5sum output with a comma.
Because nothing here is Bash-specific, I changed the shebang to #!/bin/sh; obviously, still feel free to use Bash if you prefer.
I also switched from the obsolescent `backtick` syntax to modern $(command substitution) syntax.
If you absolutely require the CSV header on top, adding that in the sed script should be trivial. Generally, header lines are more of a nuisance than actually useful, so maybe don't.
This kind of does what you're asking:
cid=$(git log -n 1 --pretty=format:"%H")
git diff-tree --no-commit-id --name-only -r "$cid" | while read -r path
md5sum "${path}"
done | awk 'BEGIN{printf "%s,%s\n", "title", "path";printf "\n"}{printf "%s,%s\n",$1,$2}' > final.csv

Bash, how to create an array in one line of code

how can I create an array in one step instead of two stages, like shown below?'
The example below was executed on a live Linux system.
POSITION=`volt |grep ate |awk '{print $4}'` #returns three integers
declare -a POSITION_ARRAY=($POSITION) #create an array
You don't need the intermediate variable, as wjandrea said. These two snippets are equivalent:
POSITION=$(volt | grep ate | awk '{print $4}')
# declare -a also works, but isn't needed in modern Bash
POSITION_ARRAY=( $(volt | grep ate | awk '{print $4}') )
If you know the output of the pipeline is witespace-delimited integers this will do what you want. But it isn't a safe way to populate an array from arbitrary command output, because unquoted expansions will be word-split and globbed.
The proper way to read a command's output into an array, split by lines, is with the readarray builtin, like so:
readarray -t POSITION_ARRAY < <(volt | grep ate | awk '{print $4}')
Simply put the command in the parentheses.
By the way, declare -a is not needed, and backticks are deprecated in favour of $().
POSITION_ARRAY=( $(volt | grep ate | awk '{print $4}') )
And FWIW you can merge the grep and AWK commands:
POSITION_ARRAY=( $(volt | awk '/ate/ {print $4}') )

How to store output as variable [duplicate]

I'm looking to store the hash of my most recently downloaded file in my downloads folder as a variable.
So far, this is what I have:
md5sum $(ls -t | head -n1) | awk '{print $1}'
user#ci-lux-soryan:~/Downloads$ md5sum $(ls -t | head -n1) | awk '{print $1}'
I have tried storing it as a variable like so, but it doesn't work:
VTHash=$(md5sum $(ls -t | head -n1) | awk '{print $1}')
Any ideas, where am I going wrong
As #Cyrus outlined parsing ls has its own pitfalls and therefore better to avoid it altogether rather than allowing unexpected corner cases. The following shall facilitate the requirements epitomised.
VTHash="$(find -type f -mtime 0 | tail -n 1 | xargs md5sum | awk '{ print $1 }')"

sed: Argument list too long when running sed -n

I am running this command from Why is my git repository so big? on a very big git repository as
git rev-list --all --objects | sed -n $(git rev-list --objects --all | cut -f1 -d' ' | git cat-file --batch-check | grep blob | sort -n -k 3 | tail -n800 | while read hash type size; do size_in_kibibytes=$(echo $size | awk '{ foo = $1 / 1024 ; print foo "KiB" }'); echo -n "-e s/$hash/$size_in_kibibytes/p "; done) | sort -n -k1;
It works fine if I replace tail -n800 by tail -n40:
1160.94KiB Lib/ensurepip/_bundled/pip-8.0.2-py2.py3-none-any.whl
1169.59KiB Lib/ensurepip/_bundled/pip-8.1.1-py2.py3-none-any.whl
1170.86KiB Lib/ensurepip/_bundled/pip-8.1.2-py2.py3-none-any.whl
1225.24KiB Lib/ensurepip/_bundled/pip-9.0.0-py2.py3-none-any.whl
I found this question Bash : sed -n arguments saying I could use awk instead of sed.
Do you know how do fix this sed: Argument list too long when tail is -n800 instead of -n40?
It seems you have used this anwer in the linked question: Some scripts I use:.... There is a telling comment in that answer:
This function is great, but it's unimaginably slow. It can't even finish on my computer if I remove the 40 line limit. FYI, I just added an answer with a more efficient version of this function. Check it out if you want to use this logic on a big repository, or if you want to see the sizes summed per file or per folder. – piojo Jul 28 '17 at 7:59
And luckily piojo has written another answer addressing this. Just use his code.
As an alternative, check if git sizer would work on your repository: that would help isolating what takes place in your repository.
If not, you have other commands in "How to find/identify large commits in git history?", which do loop around each objects and avoid the sed -nxx part
The alternative would be to redirect your result/command to a file, then sed on that file, as in here.

Append xargs argument number as prefix

I want to analyze the most frequentry occuring entries in (column of) a logfile. To write the detail results, I am creating new directories from the output of something along the lines of
cat logs| cut -d',' -f 6 | sort | uniq -c | sort -rn | head -10 | \
awk '{print $2}' |xargs mkdir -p
Is there a way to create the directories with the sequence number of the argument as processed by xargs as a prefix? For e.g. For e.g. "oranges" is the most frequent entry (of the column) the directory created should be named "1.oranges" and so on.
A quick (and dirty?) solution could be to pipe your directory names through cat -n in their proper order and then remove the whitespace separating the line number from the directory name, before passing them to xargs.
A better solution would be to modify your awk command:
... | awk '{ print NR "." $2 }' | xargs mkdir -p
The NR variable contains the record (i.e. line) number.
