Associative array pipe to Column command - bash

Im looking for a way to print out an Associative array with the column command and I fill like there is probably a way to do this, but I havent had much luck.
declare -A list
list=(
[a]="x is in this one"
[b]="y is here"
[areallylongone]="z down here"
)
I'd like the outcome to be a simple table. I've used a loop with tabs but in my case the lengths are great enough to offset the second column.
The output should look like
a x is in this one
b y is here
areallylongone z down here

You are looking for something like this?
declare -A assoc=(
[a]="x is in this one"
[b]="y is here"
[areallylongone]="z down here"
)
for i in "${!assoc[#]}" ; do
echo -e "${i}\t=\t${assoc[$i]}"
done | column -s$'\t' -t
Output:
areallylongone = z down here
a = x is in this one
b = y is here
I'm using a tab char to delimit key and value and use the column -t to tabulate the output and -s to set the input delimiter to the tab char. From man column:
-t Determine the number of columns the input contains and create a table. Columns are delimited with whitespace, by default, or with the characā€
ters supplied using the -s option. Useful for pretty-printing displays
-s Specify a set of characters to be used to delimit columns for the -t option.

One (simple) way to do it is by pasting together keys column and values column:
paste -d $'\t' <(printf "%s\n" "${!list[#]}") <(printf "%s\n" "${list[#]}") | column -s $'\t' -t
For your input, it yields:
areallylongone z down here
a x is in this one
b y is here
To handle spaces in (both) keys and values, we used TAB (\t) as column delimiter, in both paste (-d option) and column (-s option) commands.

To obtain the desired output from the answer of hek2mgl
declare -A assoc=(
[a]="x is in this one"
[b]="y is here"
[areallylongone]="z down here"
)
for i in "${!assoc[#]}" ; do
echo "${i}=${assoc[$i]}"
done | column -s= -t | sort -k 2

Related

Splitting the first word from each record and push that in to array

inputfile : records.txt
100,Surender,CTS
101,Kumar,TCS
102,Raja,CTS
103,Vijay,TCS
I want to store the first column from each record and store that in to array .
I wrote the below script
id_array=();
while read -a my_line ;
do
id_array+=(${my_line[0]})
done < /home/user/surender/linux/inputfiles/records.txt;
echo ${id_array[0]}
echo ${id_array[1]}
echo ${id_array[2]}
echo ${id_array[3]}
My expected output is
100
101
102
103
But as Per above code i get the below output
100,Surender,CTS
101,Kumar,TCS
102,Raja,CTS
103,Vijay,TCS
I dont know where to specify the respective delimiter(comma) in above script.
Need some Help on this..
Replace the line:
while read -a my_line ;
With:
while IFS=',' read -a my_line ;
That will split the lines into an array using the delimiter ,.
There are many methods to get the first field. Cut is very intuitive, although this is probably not the most efficient code:
id_array+=(echo $my_line | cut -d ',' -f 1)
explanation:
-d ',' : delimiter is ,
-f 1 : take the first field
in a related answer you can find a more efficient way, setting the internal field separator (IFS) to ,...

Pass external variable to xidel in bash loop script

I try to parse html page using XPath with xidel.
The page have a table with multiple rows and columns
I need to get values from each row from columns 2 and 5 (IP and port) and store them in csv-like file.
Here is my script
#!/bin/bash
for (( i = 2; i <= 100; i++ ))
do
xidel http://www.vpngate.net/en/ -e '//*[#id="vg_hosts_table_id"]/tbody/tr["'$i'"]/td[2]/span[1]' >> "$i".txt #get value from first column
xidel http://www.vpngate.net/en/ -e '//*[#id="vg_hosts_table_id"]/tbody/tr["'$i'"]/td[5]' >> "$i".txt #get value from second column
sed -i ':a;N;$!ba;s/\n/^/g' "$i".txt #replace newline with custom delimiter
sed -i '/\s/d' "$i".txt #remove blanks
cat "$i".txt >> ip_port_list #create list
zip -m ips.zip "$i".txt #archive unneeded texts
done
The perfomance is not issue
When i manually increment each tr - looks perfect. But not with variable from loop.
I want to receive a pair of values from each row.
Now i got only partial data or even empty file
I need to get values from each row from columns 2 and 5 (IP and port) and store them in csv-like file.
xidel -s "https://www.vpngate.net/en/" -e '
(//table[#id="vg_hosts_table_id"])[3]//tr[not(td[#class="vg_table_header"])]/concat(
td[2]/span[#style="font-size: 10pt;"],
",",
extract(
td[5],
"TCP: (\d+)",
1
)
)
'
220.218.70.177,443
211.58.36.54,995
1.239.223.190,1351
[...]
153.207.18.229,1542
(//table[#id="vg_hosts_table_id"])[3]: Select the 3rd table of its
kind. The one you want.
//tr[not(td[#class="vg_table_header"])]: Select all rows, except the headers.
td[2]/span[#style="font-size: 10pt;"]: Select the 2nd column and the <span> that contains just the IP-address.
extract(td[5],"TCP: (\d+)",1): Select the 5th column and extract (regex) the numerical value after "TCP ".
Maybe this xidel line will come in handy:
xidel -q http://www.vpngate.net/en/ -e '//*[#id="vg_hosts_table_id"]/tbody/tr[*]/concat(td[2]/span[1],",",substring-after(substring-before(td[5],"UDP:"),"TCP: "))'
This will only do one fetch (so the admins of vpngate won't block you) and it'll also create a CSV output (ip,port)... Hopefully that is what you were looking for?

How to extract one column of a csv file

If I have a csv file, is there a quick bash way to print out the contents of only any single column? It is safe to assume that each row has the same number of columns, but each column's content would have different length.
You could use awk for this. Change '$2' to the nth column you want.
awk -F "\"*,\"*" '{print $2}' textfile.csv
yes. cat mycsv.csv | cut -d ',' -f3 will print 3rd column.
The simplest way I was able to get this done was to just use csvtool. I had other use cases as well to use csvtool and it can handle the quotes or delimiters appropriately if they appear within the column data itself.
csvtool format '%(2)\n' input.csv
Replacing 2 with the column number will effectively extract the column data you are looking for.
Landed here looking to extract from a tab separated file. Thought I would add.
cat textfile.tsv | cut -f2 -s
Where -f2 extracts the 2, non-zero indexed column, or the second column.
Here is a csv file example with 2 columns
myTooth.csv
Date,Tooth
2017-01-25,wisdom
2017-02-19,canine
2017-02-24,canine
2017-02-28,wisdom
To get the first column, use:
cut -d, -f1 myTooth.csv
f stands for Field and d stands for delimiter
Running the above command will produce the following output.
Output
Date
2017-01-25
2017-02-19
2017-02-24
2017-02-28
To get the 2nd column only:
cut -d, -f2 myTooth.csv
And here is the output
Output
Tooth
wisdom
canine
canine
wisdom
incisor
Another use case:
Your csv input file contains 10 columns and you want columns 2 through 5 and columns 8, using comma as the separator".
cut uses -f (meaning "fields") to specify columns and -d (meaning "delimiter") to specify the separator. You need to specify the latter because some files may use spaces, tabs, or colons to separate columns.
cut -f 2-5,8 -d , myvalues.csv
cut is a command utility and here is some more examples:
SYNOPSIS
cut -b list [-n] [file ...]
cut -c list [file ...]
cut -f list [-d delim] [-s] [file ...]
I think the easiest is using csvkit:
Gets the 2nd column:
csvcut -c 2 file.csv
However, there's also csvtool, and probably a number of other csv bash tools out there:
sudo apt-get install csvtool (for Debian-based systems)
This would return a column with the first row having 'ID' in it.
csvtool namedcol ID csv_file.csv
This would return the fourth row:
csvtool col 4 csv_file.csv
If you want to drop the header row:
csvtool col 4 csv_file.csv | sed '1d'
First we'll create a basic CSV
[dumb#one pts]$ cat > file
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
Then we get the 1st column
[dumb#one pts]$ awk -F , '{print $1}' file
a
1
a
1
Many answers for this questions are great and some have even looked into the corner cases.
I would like to add a simple answer that can be of daily use... where you mostly get into those corner cases (like having escaped commas or commas in quotes etc.,).
FS (Field Separator) is the variable whose value is dafaulted to
space. So awk by default splits at space for any line.
So using BEGIN (Execute before taking input) we can set this field to anything we want...
awk 'BEGIN {FS = ","}; {print $3}'
The above code will print the 3rd column in a csv file.
The other answers work well, but since you asked for a solution using just the bash shell, you can do this:
AirBoxOmega:~ d$ cat > file #First we'll create a basic CSV
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
And then you can pull out columns (the first in this example) like so:
AirBoxOmega:~ d$ while IFS=, read -a csv_line;do echo "${csv_line[0]}";done < file
a
1
a
1
a
1
a
1
a
1
a
1
So there's a couple of things going on here:
while IFS=, - this is saying to use a comma as the IFS (Internal Field Separator), which is what the shell uses to know what separates fields (blocks of text). So saying IFS=, is like saying "a,b" is the same as "a b" would be if the IFS=" " (which is what it is by default.)
read -a csv_line; - this is saying read in each line, one at a time and create an array where each element is called "csv_line" and send that to the "do" section of our while loop
do echo "${csv_line[0]}";done < file - now we're in the "do" phase, and we're saying echo the 0th element of the array "csv_line". This action is repeated on every line of the file. The < file part is just telling the while loop where to read from. NOTE: remember, in bash, arrays are 0 indexed, so the first column is the 0th element.
So there you have it, pulling out a column from a CSV in the shell. The other solutions are probably more practical, but this one is pure bash.
You could use GNU Awk, see this article of the user guide.
As an improvement to the solution presented in the article (in June 2015), the following gawk command allows double quotes inside double quoted fields; a double quote is marked by two consecutive double quotes ("") there. Furthermore, this allows empty fields, but even this can not handle multiline fields. The following example prints the 3rd column (via c=3) of textfile.csv:
#!/bin/bash
gawk -- '
BEGIN{
FPAT="([^,\"]*)|(\"((\"\")*[^\"]*)*\")"
}
{
if (substr($c, 1, 1) == "\"") {
$c = substr($c, 2, length($c) - 2) # Get the text within the two quotes
gsub("\"\"", "\"", $c) # Normalize double quotes
}
print $c
}
' c=3 < <(dos2unix <textfile.csv)
Note the use of dos2unix to convert possible DOS style line breaks (CRLF i.e. "\r\n") and UTF-16 encoding (with byte order mark) to "\n" and UTF-8 (without byte order mark), respectively. Standard CSV files use CRLF as line break, see Wikipedia.
If the input may contain multiline fields, you can use the following script. Note the use of special string for separating records in output (since the default separator newline could occur within a record). Again, the following example prints the 3rd column (via c=3) of textfile.csv:
#!/bin/bash
gawk -- '
BEGIN{
RS="\0" # Read the whole input file as one record;
# assume there is no null character in input.
FS="" # Suppose this setting eases internal splitting work.
ORS="\n####\n" # Use a special output separator to show borders of a record.
}
{
nof=patsplit($0, a, /([^,"\n]*)|("(("")*[^"]*)*")/, seps)
field=0;
for (i=1; i<=nof; i++){
field++
if (field==c) {
if (substr(a[i], 1, 1) == "\"") {
a[i] = substr(a[i], 2, length(a[i]) - 2) # Get the text within
# the two quotes.
gsub(/""/, "\"", a[i]) # Normalize double quotes.
}
print a[i]
}
if (seps[i]!=",") field=0
}
}
' c=3 < <(dos2unix <textfile.csv)
There is another approach to the problem. csvquote can output contents of a CSV file modified so that special characters within field are transformed so that usual Unix text processing tools can be used to select certain column. For example the following code outputs the third column:
csvquote textfile.csv | cut -d ',' -f 3 | csvquote -u
csvquote can be used to process arbitrary large files.
I needed proper CSV parsing, not cut / awk and prayer. I'm trying this on a mac without csvtool, but macs do come with ruby, so you can do:
echo "require 'csv'; CSV.read('new.csv').each {|data| puts data[34]}" | ruby
I wonder why none of the answers so far have mentioned csvkit.
csvkit is a suite of command-line tools for converting to and working
with CSV
csvkit documentation
I use it exclusively for csv data management and so far I have not found a problem that I could not solve using cvskit.
To extract one or more columns from a cvs file you can use the csvcut utility that is part of the toolbox. To extract the second column use this command:
csvcut -c 2 filename_in.csv > filename_out.csv
csvcut reference page
If the strings in the csv are quoted, add the quote character with the q option:
csvcut -q '"' -c 2 filename_in.csv > filename_out.csv
Install with pip install csvkit or sudo apt install csvkit.
Simple solution using awk. Instead of "colNum" put the number of column you need to print.
cat fileName.csv | awk -F ";" '{ print $colNum }'
csvtool col 2 file.csv
where 2 is the column you are interested in
you can also do
csvtool col 1,2 file.csv
to do multiple columns
You can't do it without a full CSV parser.
If you know your data will not be quoted, then any solution that splits on , will work well (I tend to reach for cut -d, -f1 | sed 1d), as will any of the CSV manipulation tools.
If you want to produce another CSV file, then xsv, csvkit, csvtool, or other CSV manipulation tools are appropriate.
If you want to extract the contents of one single column of a CSV file, unquoting them so that they can be processed by subsequent commands, this Python 1-liner does the trick for CSV files with headers:
python -c 'import csv,sys'$'\n''for row in csv.DictReader(sys.stdin): print(row["message"])'
The "message" inside of the print function selects the column.
If the CSV file doesn't have headers:
python -c 'import csv,sys'$'\n''for row in csv.reader(sys.stdin): print(row[1])'
Python's CSV library supports all kinds of CSV dialects, so if your CSV file uses different conventions, it's possible to support them with relatively little change to the code.
Been using this code for a while, it is not "quick" unless you count "cutting and pasting from stackoverflow".
It uses ${##} and ${%%} operators in a loop instead of IFS. It calls 'err' and 'die', and supports only comma, dash, and pipe as SEP chars (that's all I needed).
err() { echo "${0##*/}: Error:" "$#" >&2; }
die() { err "$#"; exit 1; }
# Return Nth field in a csv string, fields numbered starting with 1
csv_fldN() { fldN , "$1" "$2"; }
# Return Nth field in string of fields separated
# by SEP, fields numbered starting with 1
fldN() {
local me="fldN: "
local sep="$1"
local fldnum="$2"
local vals="$3"
case "$sep" in
-|,|\|) ;;
*) die "$me: arg1 sep: unsupported separator '$sep'" ;;
esac
case "$fldnum" in
[0-9]*) [ "$fldnum" -gt 0 ] || { err "$me: arg2 fldnum=$fldnum must be number greater or equal to 0."; return 1; } ;;
*) { err "$me: arg2 fldnum=$fldnum must be number"; return 1;} ;;
esac
[ -z "$vals" ] && err "$me: missing arg2 vals: list of '$sep' separated values" && return 1
fldnum=$(($fldnum - 1))
while [ $fldnum -gt 0 ] ; do
vals="${vals#*$sep}"
fldnum=$(($fldnum - 1))
done
echo ${vals%%$sep*}
}
Example:
$ CSVLINE="example,fields with whitespace,field3"
$ $ for fno in $(seq 3); do echo field$fno: $(csv_fldN $fno "$CSVLINE"); done
field1: example
field2: fields with whitespace
field3: field3
You can also use while loop
IFS=,
while read name val; do
echo "............................"
echo Name: "$name"
done<itemlst.csv

Iterate over a file using two values on the same line

I need pass a series of couples values which are arguments for a c++ software. So I wrote this script:
while read randomNumbers; do
lambda = $randomNumbers | cut -f1 -d ' '
mi = $randomNumbers | cut -f2 -d ' '
./queueSim mm1-queue $lambda $mi
done < "randomNumbers"
where the first arg is the first value for each line in the file "randomNumbers" and the second one in the second value (of course). I got a segfault and a "command not found".
How can I assign to lambda and mi valus got from the line and pass this variable to c++ software?
There's no need for cut. Let read split the line for you:
while read lambda mi; do
./queueSim mm1-queue $lambda $mi
done < randomNumbers
Note that it is also commonly used in conjunction with IFS to split the input line on different fields. For example, to parse /etc/passwd ( a file with colon separated lines ), you will often see:
while IFS=: read username passwd uid gid info home shell; do ...
I would recommend assigning the values like this:
lambda=$(echo $randomNumbers | cut -f1 -d ' ')
mi=$(echo $randomNumbers | cut -f2 -d ' ')
the way you do it, you actually try to run a command that is named like whatever is the current content of $randomNumbers.
Edit:
Another thing: since your columns are delimited by a whitespace character, you could also just read the entire line into an array whose elements are separated by whitespaces as well. One way to achieve this is:
columns=( $(echo "$randomNumbers" | grep -o "[^ ]*") )
./queueSim mm1-queue ${columns[#]::2}
The first line matches all substrings that are not containing any spaces separately and puts them into the array columns. The second line does the same thing as the corresponding one in your implementation: inserting the first two columns as parameters. Since is done with slicing: you take the entire array ${columns[#]}, but select a certain subsequence of it by applying the boundary ::2 on the right, which returns in every element of columns beginning from the left (position 0), that is not on a position >=2.

Bash script: regexp reading numerical parameters from text file

Greetings!
I have a text file with parameter set as follows:
NameOfParameter Value1 Value2 Value3 ...
...
I want to find needed parameter by its NameOfParameter using regexp pattern and return a selected Value to my Bash script.
I tried to do this with grep, but it returns a whole line instead of Value.
Could you help me to find as approach please?
It was not clear if you want all the values together or only one specific one. In either case, use the power of cut command to cut the columns you want from a file (-f 2- will cut columns 2 and on (so everything except parameter name; -d " " will ensure that the columns are considered to be space-separated as opposed to default tab-separated)
egrep '^NameOfParameter ' your_file | cut -f 2- -d " "
Bash:
values=($(grep '^NameofParameter '))
echo ${values[0]} # NameofParameter
echo ${values[1]} # Value1
echo ${values[2]} # Value2
# etc.
for value in ${values[#:1]} # iterate over values, skipping NameofParameter
do
echo "$value"
done

Resources