I have a text file with the following structure:
Record 1
property1: some_number1
property2: some_number2
property3: some_number3
...
property20: some_number
Record 2
property1: some_other_number1
property2: some_other_number2
property3: some_other_number3
...
property20: some_other_number
...
...
...
Record 350
property1: more_numbers1
property2: more_numbers2
property3: more_numbers3
...
property20: some_other_number
(here ... represents more properties/records)
Using a bash script, I want to input the Record number, and then extract some specific property values to a .csv file. For example, using 2 (for Record #2) and the propery2, results in some_other_number2:
Record, property2
2,some_other_number2
I already tried read the file line by line, and keep checking if a given string (ex. Record 2) is found and then look for a line with property2, unsuccessfully.
If your txt file is formatted always in that way, you might not even need awk.
You can simply grep for the option number you want, right after the record number you want.
For example this function will write what you want in a cvs file:
function extract_property {
nrec=$1
nprop=$2
prop=$(echo $(cat test) | grep -Po "Record $nrec .*?property$nprop:[\s\t]*\K[^\s\t]*")
cat > extracted.csv <<EOF
Record, property$nprop
$nprop, $prop
EOF
}
For example:
extract_property 2 2
writes the file
Record, property2
2, some_other_number2
I tried the following sed and even I have set $ to set the path at the end it doesn't seem to be working. Also I dont know if there's any way of execute this line by line with a while or for:
sed -E 's/$/Location/'
But the ouput i recive is:
Locationi,d,nm,yr,dt,mnn,rmd,g,gnr,rc,ct,st,sgns,tt,fl,cmr,lng,lt,gcdng
Location,2018,2018-10-25,sh,vhcl,28,M,B,St. Ls,MO,F,attack,flng,False,-90.219,38.773,True
Input
wi,d,nm,yr,dt,mnn,rmd,g,gnr,rc,ct,st,sgns,tt,fl,cmr,lng,lt,gcdng
2,4141,Armond,2018,2018-10-25,sh,vhcl,28,M,B,St. Ls,MO,F,attack,flng,False,-90.219,38.773,True
Ouput expected
wi,d,nm,yr,dt,mnn,rmd,g,gnr,rc,ct,st,sgns,tt,fl,cmr,lng,lt,gcdng
2,4141,Armond,2018,2018-10-25,sh,vhcl,28,M,B,St. Ls,MO,F,attack,flng,False,-90.219,38.773,True Location
If you have the example file ex.csv containing a header-row and 3 data-rows like this:
col1,col2,col3
1,2,3
4,5,6
7,8,9
So you du the replacement from row 2 and onward to avoid the header. This is the 2,$ part.
Then you do the search and replace at the end of each row. s/$/ Location/
Put it all together as:
$ sed '2,$s/$/ Location/' ex.csv
col1,col2,col3
1,2,3 Location
4,5,6 Location
7,8,9 Location
Here CSV file content below data
CSVfile: test.csv
Records:
Name, ID, Department1, Department2, Department3, Product
Rohan,501,Production,Sales,IT,Telephones
Rahul,502,IT,Marketing,Sales,Mobiles
Mayank,503,Sales,Salessupport,Customerinquiry,Telecommunication
Script: test.sh -> here I need to find details of first column and forth column and store that values into the variable and when I run the file need to take 2 arguments
So suppose If I select name and department2 it need to print
Run: .test/sh name department3
Output:
Rohan IT
Rahul Sales
Mayank CustomerInquiry
#!/bin/bash
IFS=","
args="$#"
while read name department2
do
echo "First argument is: ${args[$name]}"
echo "Second argument is: ${args[$department2]}"
done < test.csv
I would like to extract values from the second column in my csv file and store the extracted values in a new column.
sample of my dataset:
page_name post_id page_id
A 86680728811_272953252761568 86680728811
A 86680728811_273859942672742 86680728811
B 86680728033_281125741936891 86680728033
B 86680728033_10150500662053812 86680728033
I would like to extract the number that come after the underscore and store them in a new column. Sample output:
page_name post_id page_id
A 272953252761568 86680728811
A 273859942672742 86680728811
B 281125741936891 86680728033
B 10150500662053812 86680728033
I tried using this code:
cat FB_Dataset.csv | sed -Ee 's/(.*)post_id/\1post_id/' -e 's/,[_ ]/,/' -e 's/_/,/'
but I don't get the desired output.
Any help is appreciated. Thank you.
sed 's/[0-9][0-9]*_//' < a.csv
where a.csv is the file with your original data
edited to add [0-9]
I have CSV file which could look like this:
name1;1;11880
name2;1;260.483
name3;1;3355.82
name4;1;4179.48
name1;2;10740.4
name2;2;1868.69
name3;2;341.375
name4;2;4783.9
there could more or less rows and I need to split it into multiple .dat files each containing rows with the same value of the second column of this file. (Then I will make bar chart for each .dat file) For this case it should be two files:
data1.dat
name1;1;11880
name2;1;260.483
name3;1;3355.82
name4;1;4179.48
data2.dat
name1;2;10740.4
name2;2;1868.69
name3;2;341.375
name4;2;4783.9
Is there any simple way of doing it with bash?
You can use awk to generate a file containing only a particular value of the second column:
awk -F ';' '($2==1){print}' data.dat > data1.dat
Just change the value in the $2== condition.
Or, if you want to do this automatically, just use:
awk -F ';' '{print > ("data"$2".dat")}' data.dat
which will output to files containing the value of the second column in the name.
Try this:
while IFS=";" read -r a b c; do echo "$a;$b;$c" >> data${b}.dat; done <file