shell program to modify contents of a file - shell

I have a file that has a list of product ids each on one line. I want to modify this file in a way that all product ids are on one line and comma separated and in inverted commas. Original format -
1\n2\n3\n
Expected format -
'1','2','3'
I tried the following command -
paste -s -d "','" velocities.txt > out.txt
The result is looking like this -
1',2'3'4,
I do understand that using the above command I wont get the anything before the first product id, but i will be able to handle that case.

You could use sed to quote all digits:
paste -s -d, velocities.txt | sed "s|\([0-9]\+\)|'\1'|g" > out.txt
P.S. Another command that also handles IP-addressed:
sed "s|^\(.*\)$|'\1'|g" velocities.txt | paste -s -d, - > out.txt

Related

add query results in a csv in linux

I have a query in shell scripts that gives me a results like:
article;20200120
fruit;22
fish;23
I execute that report every day. I would like that when I execute the query the next day shows me output like that:
article;20200120;20200121
fruit;22;11
fish;23;12
These report I execute with postgre sql in a linux shell script. The output of csv is generated redirecting the ouput with ">>"
Please any help to achive that.
Thanks
This might be somewhat fragile, but it sounds like what you want can be accomplished with cut and paste.
Let's start with two files we want to join:
$ cat f1.csv
article;20200120
fruit;22
fish;23
$ cat f2.csv
article;20200121
fruit;11
fish;12
We first use cut to strip the headers from the second file, then send that into paste with the first file to combine corresponding lines:
$ cut -d ';' -f 2- f2.csv | paste -d ';' f1.csv -
article;20200120;20200121
fruit;22;11
fish;23;12
Parsing that command line, the -d ';' tells cut to use semicolons as the delimiter (the default is tab), and -f 2- says to print the second and later fields. f2.csv is the input file for cut. Then the -d ';' similarly tells paste to use semicolons to join the lines, and f1.csv - are the two files to paste together, in that order, with - representing the input piped in using the | shell operator.
Now, like I say, this is somewhat fragile. We're not matching the lines based on the header information, only their line number from the start of the file. If some fields are optional, or the set of fields changes over time, this will silently produce garbage. One way to mitigate that would be to first call cut -d ';' -f 1 on each of the input files and insist the results are the same before combining them.

How to add a header to text file in bash?

I have a text file and want to convert it to csv file before to convert it, i want to add a header to text file so that the csv file has the same header. I have one thousand columns in text file and want to have one thousand column name. As a side note, the content of the text file is just rows of some numbers which is separated by comma ",". Is there any way to add the header line in bash?
I tried the way below and didn't work. I did the command below first in python.
> for i in range(1001):
> print "col" + "_" + "i"
save the output of this in text file with this command (python header.py >> header.txt) and add the output of this in format of text file to the original text file that i have like below:
cat header.txt filename.txt > newfilename.txt
then convert the txt file to csv file with "mv newfilename.txt newfilename.csv".
But unfortunately this way doesn't work as the header line has double number of other rows for some reason. I would appreciate any help to make this problem solve.
based on the description your file is already comma separated, so is a csv file. You just want to add a column number header line.
$ awk -F, 'NR==1{for(i=1;i<=NF;i++) printf "col_%d%s", $i,(i==NF?ORS:FS)}1' file
will add column headers as many as the fields in the first row of the file
e.g.
$ seq 5 | paste -sd, | # create 1,2,3,4,5 as a test input
awk -F, 'NR==1{for(i=1;i<=NF;i++) printf "col_%d%s", i, (i==NF?ORS:FS)}1'
col_1,col_2,col_3,col_4,col_5
1,2,3,4,5
You can generate the column names in bash using one of the options below. Each example generates a header.txt file. You already have code to add this to the beginning of your file as a header.
Using bash loops
Bash loops for this many iterations will be inefficient, but will work.
for i in {1..10}; do
echo -n "col_$i "
done > header.txt
echo >> header.txt
or using seq
for i in $(seq 1 1000); do
echo -n "col_$i "
done > header.txt
echo >> header.txt
Using seq only
Using seq alone will be more efficient.
seq -f "col_%g" -s" " 1 1000 > header.txt
Use seq and sed
You can use the seq utility to construct your CSV header, with a little minor help from Bash expansions. You can then insert the new header row into your existing CSV file, or concatenate the header with your data.
For example:
# construct a quoted CSV header
columns=$(seq -f '"col_%g"' -s', ' 1 1001)
# strip the trailing comma
columns="${columns%,*}"
# insert headers as first line of foo.csv with GNU sed
sed -i -e "1 i\\${columns}" /tmp/foo.csv
Caveats
If you don't have GNU sed, you can also use cat, sponge, or other tools to concatenate your header and data, although most of your concatenation options will require redirection to a new combined file to avoid clobbering your existing data.
For example, given /tmp/data.csv as your original data file:
seq -f '"col_%g"' -s', ' 1 1001 > /tmp/header.csv
sed -i -e 's/,[[:space:]]*$//' /tmp/header.csv
cat /tmp/header /tmp/data > /tmp/new_file.csv
Also, note that while Bash solutions that avoid calling standard utilities are possible, doing it in pure Bash might be too slow or memory intensive for large data sets.
Your mileage may vary.
printf "col%s," {1..100} |
sed 's/,$//' |
cat - filename.txt >newfilename.txt
I believe sed should supply the missing final newline as a side effect. If not, maybe try 's/,$/\n/' though this isn't entirely portable, either. You could probably replace the cat with sed as well, something like
... | sed 's/,$//;r filename.txt'
but again, I'm not entirely sure how portable this is.

Text Processing - how to remove part of string from search results using sed?

I am parsing through .xml files looking for names that are inside HTML tags.
I have found what I need, but I would just like to keep the family names.
This is what I have until now (grep command for the names + clean-up of the result, which includes removing the tags and the file name, I will later sort them and leave only unique names):
grep -oP '<name>([A-ZÖÄÜÕŽS][a-zöäüõžš]*)[\s-]([A-ZÖÄÜÕŽS][a-zöäüõžš]*)</name>' *.xml --colour | sed -e 's/<[^>]*>//g' | sed 's/la[0-9]*//' | sed 's/$*.xml://'
The output looks like this:
Mart Kreos
Hans Väär
Karel Väär
Jaan Tibbin
Jüri Kull
I would like to keep the family names, but remove the first names.
I tried to use the following command, but it only worked for some names and not for the others:
sed -r 's/([A-ZÖÄÜÕŽŠ][a-zöäüõžš]+[ ])([A-ZÖÄÜÕŽS][a-zöäüõžš]+)/\2/g'
You should use cut. It is more adapted to what you're trying to achieve here. And you would avoid struggling with UTF-8 characters.
This would give you the expected result for all names in your sample output:
cut -d ' ' -f 2

What is wrong with this sed command?

I am facing a strange problem. An answer to what I want to do already exists Here. I am trying to remove trailing commas from each line of a file containing thousands of lines. Like this -
This is my command -
sed -i 's/,*$//g' file_name.csv
However, the output I get is exactly the same as the image above and the trailing commas are not removed.
I think SED is not matching the pattern and thus failing to replace the commas. To check if there are any hidden characters in the file, I used VIM's :set list option -
There are only $ at the end of each line which is just what is expected.
I can't understand why the command is failing.
I can suggest you two options:
First One is my favorite.
dos2unix file
#####will work for Huge File also
then try to run the command.
Other way to do this:
cat file | tr -d '\r' > file
###may not work for huge file
then run the command.
tr -d '\r' < file > file.tmp ; mv file.tmp file
##will work for Huge File also
Thanks to #Nahuel for suggesting last command.

Bash tr -s command

So lets say I have several characters in an email which don't belong. I want to take them out with the tr command. For example...
jsmith#test1.google.com
msmith#test2.google.com
zsmith#test3.google.com
I want to take out all the test[123]. so I am using the command tr -s 'test[123].' < email > mail. That is one way I have tried but the two or three I have attempted all do not work as intended. The output I am trying to get to is ...
jsmith#google.com
msmith#google.com
zsmith#google.com
You could use sed.
$ sed 's/#test[1-3]\./#/' file
jsmith#google.com
msmith#google.com
zsmith#google.com
[1-3] matches all the characters which falls within the range 1 to 3 (1,2,3). Add in-place edit -i parameter to save the changes made.

Resources