Filter CSV lines - shell

I have a huge CSV file (4gb) and I need to filter the lines that contains specific strings. For example I need to filter the lines that have the text "McDonalds, BurgerKing or KFC".
I need multiple strings, like an OR.
Something like:
array_of_names = ["McDonalds", "Burger King, "KFC"]
foreach line in csv
if line.contains_any_of(array_of_names)
output << line
end
end
I think that I can do something with grep but I honestly don't have an idea. I guess I need a shell script.
Can anyone help me?

You could do it with grep like this:
grep "McDonalds\|Burger King\|KFC" your_file.csv

Related

Add <br> to end of each lines in a file via bash

I am trying to add "<br>" to the end of each line in a .log file, and create a HTML file of the results.
I have tried
sed 's/$/<br><br>/' latest.log >> latest.html
After 395 lines, it cuts out. I would just make the .log file a .html file, but the line breaks don't cross over. Sorry if any of this seems weird, I'm fairly new to this.
Well, hard to say bcaus it might be smth wrong with your input file (for example some unwanted white characters).
but you can insert it out the milion ways, the simplest one:
sed 's/.*/&<br><br>/'
do you need to explain it?
I'll just use tags at the beginning of the first line and the ending. Thank you, Walter A.

Bash difference between pipeline and parameters

I need to write a script which gets a file from stdin and run over the lines of it.
My question is can I do something like that :
TheFile= /dev/stdin
while read line; do
{
....
}
done<"$(TheFile)"
or can I write --done<"$1"
or in that case the minute I send a parameter to the function which is a file it will be sent to the while function ?
Where to start... Are you sure're up for this?
What are you trying to do with the lines of the file? You might be better off not iterating like your example, just using sed, awk, or grep on it like this example:
sed -e 's/apple/banana/' $TheFile
That will output the contents of $TheFile, replacing all occurrences of "apple" with "banana". That's a trivial example, but you could do much more.
If you really want to loop, then remove the $() from your example. Also, you cannot have a space after = in your code.

Parsing content using grep, awk

I have a parsed content similar to this as a output from JSON.sh.
["/home/ukrishnan/projects/test.yml","LOG_DRIVER"] "syslog"
["/home/ukrishnan/projects/test.yml","IMAGE"] "mysql:5.6"
["/home/ukrishnan/projects/test.yml"] {"LOG_DRIVER":"syslog","IMAGE":"mysql:5.6"}
["/home/ukrishnan/projects/mysql/app.xml","ENV_ACCOUNT_BRIDGE_ENDPOINT"] "/u01/src/test/sample.txt"
["/home/ukrishnan/projects/mysql/app.xml"] {"ENV_ACCOUNT_BRIDGE_ENDPOINT":"/u01/src/test/sample.txt"}
[] {"/home/ukrishnan/projects/test.yml":{"LOG_DRIVER":"syslog","IMAGE":"mysql:5.6"},"/home/ukrishnan/projects/mysql/app.xml":{"ENV_ACCOUNT_BRIDGE_ENDPOINT":"/u01/src/test/sample.txt"}}
So, I just wanted to take the values, similar to the Line 1,2 and 4. And need to parse, for example in the first line, "/home/ukrishnan/projects/test.yml","LOG_DRIVER","syslog" for all the lines with similar format. Please help as I'm completely a newbie to grep or awk.
Edit:
Sorry, if this is two broad. Here is what I tried.
By using, grep -v "{\|}" returns,
["/home/ukrishnan/projects/test.yml","LOG_DRIVER"] "syslog"
["/home/ukrishnan/projects/test.yml","IMAGE"] "mysql:5.6"
["/home/ukrishnan/projects/mysql/app.xml","ENV_ACCOUNT_BRIDGE_ENDPOINT"] "/u01/src/test/sample.txt"
If someone helps me with also grabbing values within double quotes in a single grep, that would be great.
this one-liner works for your example:
awk '$NF~/^[^{]/&&sub(/^\[/,"")+sub(/\]\s*/,",")' file
It gives:
"/home/ukrishnan/projects/test.yml","LOG_DRIVER","syslog"
"/home/ukrishnan/projects/test.yml","IMAGE","mysql:5.6"
"/home/ukrishnan/projects/mysql/app.xml","ENV_ACCOUNT_BRIDGE_ENDPOINT","/u01/src/test/sample.txt"

Create CSV from specific columns in another CSV using shell scripting

I have a CSV file with several thousand lines, and I need to take some of the columns in that file to create another CSV file to use for import to a database.
I'm not in shape with shell scripting anymore, is there anyone who can help with pointing me in the correct direction?
I have a bash script to read the source file but when I try to print the columns I want to a new file it just doesn't work.
while IFS=, read symbol tr_ven tr_date sec_type sec_name name
do
echo "$name,$name,$symbol" >> output.csv
done < test.csv
Above is the code I have. Out of the 6 columns in the original file, I want to build a CSV with "column6, column6, collumn1"
The test CSV file is like this:
Symbol,Trading Venue,Trading Date,Security Type,Security Name,Company Name
AAAIF,Grey Market,22/01/2015,Fund,,Alternative Investment Trust
AAALF,Grey Market,22/01/2015,Ordinary Shares,,Aareal Bank AG
AAARF,Grey Market,22/01/2015,Ordinary Shares,,Aluar Aluminio Argentino S.A.I.C.
What am I doing wrong with my script? Or, is there an easier - and faster - way of doing this?
Edit
These are the real headers:
Symbol,US Trading Venue,Trading Date,OTC Tier,Caveat Emptor,Security Type,Security Class,Security Name,REG_SHO,Rule_3210,Country of Domicile,Company Name
I'm trying to get the last column, which is number 12, but it always comes up empty.
The snippet looks and works fine to me, maybe you have some weird characters in the file or it is coming from a DOS environment (use dos2unix to "clean" it!). Also, you can make use of read -r to prevent strange behaviours with backslashes.
But let's see how can awk solve this even faster:
awk 'BEGIN{FS=OFS=","} {print $6,$6,$1}' test.csv >> output.csv
Explanation
BEGIN{FS=OFS=","} this sets the input and output field separators to the comma. Alternatively, you can say -F=",", -F, or pass it as a variable with -v FS=",". The same applies for OFS.
{print $6,$6,$1} prints the 6th field twice and then the 1st one. Note that using print, every comma-separated parameter that you give will be printed with the OFS that was previously set. Here, with a comma.

in bash, bash remove punctuation between pattern matches?

I am struggling with a conversion of a data file to csv when there is punctuation in the title field.
I have a bash script that obtains the file and processes it, and it almost works. What gets me is when there are commas in a free text title field, which then create extra fields.
I have tried some sed examples to replace between patterns but I have not gotten any of them to work. What I want to do is work between two patterns and replace commas with either nothing or perhaps a semicolon.
Taking this string:
name:A100040,title:Oatmeal is better with raisins, dates, and sugar,current_balance:50000,
Replacing with this:
name:A100040,title:Oatmeal is better with raisins dates and sugar,current_balance:50000,
I should probably use "title:" and ",current_" to denote the start and end of the block where I want to make the change to avoid situations like this:
name:A100040,title:Re-title current periodicals, recent books,current_balance:50000,
So far I have not gotten the substitution to match. In this case I am using !! to make the change obvious:
teststring="name:A100040,title:Oatmeal is better with raisins, dates, and sugar,current_balance:50000,"
echo $teststring |sed '/title:/,/current_/s/,/!!/g'
name:A100040!!title:Oatmeal is better with raisins!! dates!! and sugar!!current_balance:50000!!
Any help appreciated.
This is one way which could undoubtedly be refined:
perl -ple 'm/(.*?)(title:.*?)(current_balance:.*)/; $save = $part = $2; $part =~ s/,/!!/g; s/$save/$part/'
First, using sed or awk to parse CSV is almost always the wrong thing to do, because they do not allow field delimiters to be quoted. That said, it seems like a better approach would be to quote the fields so that your output would be:
name:"A100040",title:"Oatmeal ... , dates, and sugar",current_balance:50000
Using sed you can try: (this is fragile)
sed 's/:\([^:]*\),\([^,:]*\)/:"\1",\2/g'
If you insist on trying to parse the csv with "standard" tools and you consider perl to be standard, you could try:
perl -pe '1 while s/,([^,:]*),/ $1,/g'

Resources