BASH: Loop cut columns from each csv to new csv - bash

I have a number of .csv files all with the same structure of 22 columns. I only require columns 5,14 and 15 so use:
$ cut -d, -f5,14,1 original.csv > new_original.csv
However I will soon have a number of csv coming in daily and need to use a loop function to perform this on each csv file, and add a prefix "new_"for example to the file name. Alternatively I don't mind -i editing in place.
Thanks

You can run the following in the directory that contains the csv files.
for file in *.csv
do
cut -d, -f5,14,1 "$file" > "new_$file.csv"
done
This will loop over each of them, perform the filtering and output to the same name prefixed with new_.

Related

bash extract list file and group them into another file

I have a file, dynamically populated, containing some dates, one per line, like this:
20190807
20190806
20190805
20190804
I created a script to read the file line by line, and extract a list of files present into another directory:
FILEMASTER="lista_master"
while IFS= read -r line
do
ls -tr /var/home/test/*_"$line"_*.jpg | head -n2 >> lista_test
done < "$FILEMASTER"
This script is ok, creating a single file (lista_test), containing the last two .jpg file for every date. Output sample:
/var/home/test/MAN_20190804_jolly1.jpg
/var/home/test/CAT_20190804_selly2.jpg
/var/home/test/RET_20190805_jolly1.jpg
/var/home/test/GES_20190805_angyt2.jpg
/var/home/test/TOR_20190806_jolly1.jpg
/var/home/test/GIL_20190806_gally2.jpg
/var/home/test/POE_20190807_frity1.jpg
/var/home/test/TAR_20190807_tally2.jpg
My problem is this:
I should extract different result files )"lista_test1", "lista_test2", "lista_test3", "lista_test4" etc) for every extract line. NOT all files in a single file.
Since you want one files list per date, reuse the variables in your loop. Like this:
Assuming you have a list of dates in a file (named dates_list.txt) like this:
20200101
20200202
20200303
20200404
Then your script could look like this:
while IFS= read -r line
do
ls -tr /var/home/test/*_"$line"_*.jpg | head -n2 > $line.list
done < dates_list.txt
Note that I would put > instead of >> to ensure you do not add the same files over and over each time you run the script.
The result would be:
20200101.list # contain the files *_20200101_*.jpg
20200202.list # contain the files *_20200202_*.jpg
20200303.list # contain the files *_20200303_*.jpg
20200404.list # contain the files *_20200404_*.jpg

copying header of a csv file into another csv file

I have a script of my own wherein I want to include a command that would copy header of a csv file into a new csv file. I don't know how to accomplish this task.
Thanks in advance.
CSV files can contains elements, that contains linefeeds. In this case, the first logical CSV line is spread over 2 ore more physical lines. So physical line based tools like head, tail, grep awk or sed can't be used.
Example for a 1 line CSV line with 2 columns:
"first
column","second
column"
You have to use tools that support CSV analysis like python or php. Here is a php example:
php -r 'fputcsv(STDOUT,fgetcsv(STDIN));'
It reads from stdin and writes the first logic line to stdout.
Usage examples:
php -r 'fputcsv(STDOUT,fgetcsv(STDIN));' <old.csv >new.csv
any_command | php -r 'fputcsv(STDOUT,fgetcsv(STDIN));' >new.csv
$ head -1 old.csv > new.csv

Executing a bash loop script from a file

I am trying to execute this in unix. So let's for example say I have five files named after dates, and in each of those files there are thousand of numerical values (six to ten digit number). Now, lets say I also have bunch of numerical values and I want to know which value belongs to which file.I am trying to do it the hard way like below but how do I put all my values in a file and just do a loop from there.
FILES:
20170101
20170102
20170103
20170104
20170105
Code:
for i in 5555555 67554363 564324323 23454657 666577878 345576867; do
echo $i; grep -l $i 201701*;
done
Or, why loop at all? If you have a file containing all your numbers (say numbers.txt you can find in which date file each are contained and on what line with a simple
grep -nH -w -f numbers.txt 201701*
Where the -f option simply tells grep to use the values contained in the file numbers.txt to search in each of the files matching 201701*. The -nH options for listing the line number and filename associated with each match, respectively. And as Ed points out below, the -w option to insure grep only select lines containing the whole word sought.
You can also do it with a while loop and read from the file if you create it as #Barmar suggested:
while read -r i; do
...
done < numbers.txt
Put the values in a file numbers.txt and do:
for i in $(cat numbers.txt); do
...
done

unix delete rows from multiple files using input from another file

I have multiple (1086) files (.dat) and in each file I have 5 columns and 6384 lines.
I have a single file named "info.txt" which contains 2 columns and 6883 lines. First column gives the line numbers (to delete in .dat files) and 2nd column gives a number.
1 600
2 100
3 210
4 1200
etc...
I need to read in info.txt, find every-line number corresponding to values less than 300 in 2nd column (so it is 2 and 3 in above example). Then I need to read these values into sed-awk or grep and delete these #lines from each .dat file. (So I will delete every 2nd and 3rd row of dat files in the above example).
More general form of the question would be (I suppose):
How to read numbers as input from file, than assign them to the rows to be deleted from multiple files.
I am using bash but ksh help is also fine.
sed -i "$(awk '$2 < 300 { print $1 "d" }' info.txt)" *.dat
The Awk script creates a simple sed script to delete the selected lines; the script it run on all the *.dat files.
(If your sed lacks the -i option, you will need to write to a temporary file in a loop. On OSX and some *BSD you need -i "" with an empty argument.)
This might work for you (GNU sed):
sed -rn 's/^(\S+)\s*([1-9]|[1-9][0-9]|[12][0-9][0-9])$/\1d/p' info.txt |
sed -i -f - *.dat
This builds a script of the lines to delete from the info.txt file and then applies it to the .dat files.
N.B. the regexp is for numbers ranging from 1 to 299 as per OP request.
# create action list
cat info.txt | while read LineRef Index
do
if [ ${Index} -lt 300 ]
then
ActionReq="${ActionReq};${Index} b
"
fi
done
# apply action on files
for EachFile in ( YourListSelectionOf.dat )
do
sed -i -n -e "${ActionReq}
p" ${EachFile}
done
(not tested, no linux here). Limitation with sed about your request about line having the seconf value bigger than 300. A awk is more efficient in this operation.
I use sed in second loop to avoid reading/writing each file for every line to delete. I think that the second loop could be avoided with a list of file directly given to sed in place of file by file
This should create a new dat files with oldname_new.dat but I havent tested:
awk 'FNR==NR{if($2<300)a[$1]=$1;next}
!(FNR in a)
{print >FILENAME"_new.dat"}' info.txt *.dat

code to identify values in comma separated value file

i want to parse a csv file in a shell script. I want to input the name of the file at the prompt. like :
somescript.sh filename
Can it be done?
Also, I will read user input to display a particular a particular data number in the csv.
For example, say the csv file has 10 values in each line:
1,2,3,4,5,6,7,8,9,10
And I want to read the 5th value. how can I do it?
And multiple lines are involved.
Thanks.
If your file is really in such a simple format (just commas, no spaces), then cut -d, -f5 would do the trick.
#!/bin/sh
awk -F, "NR==$2{print \$$3}" "$1"
Usage:
./test.sh FILENAME LINE_NUMBER FIELD_NUMBER

Resources