I am working with an excel file, it only has one column as follows:
Type
A:\AAA\AD\RER\TES\11111\&DD&MM&AA.EXT
C:\AAA\CD\RES\TES\33333\&DD&MM&AA.EXT
C:\CCC\DF\WSD\&DD&MM&AA&SQ2.TXT
C:\DDDD\RT\FDG\334455&DD&MM&AA&SQ2.TXT
C:\DDD\YU\DFS\55555&DD&MM&AA&SQ2.TXT
C:\RRR\ER\SDF\55555&DD&MM&AA&SQ2.TXT
C:\TTT\CD\ERW\55555&DD&MM&AA&SQ2.TXT
C:\YYY\YU\WET\555555&DD&MM&AA.EXT
I would like to extract the following output:
&DD&MM&AA.EXT
&DD&MM&AA.EXT
&DD&MM&AA&.TXT
334455&DD&MM&AA&.TXT
55555&DD&MM&AA&.TXT
55555&DD&MM&AA&.TXT
55555&DD&MM&AA&.TXT
555555&DD&MM&AA.EXT
The approach that I followed to extract it was using bash since I am a beginner in the usage of excel, my command was the following:
rev colum.txt | tr -d " " | cut -d "\\" -f1 | rev | sed "s/SQ2//"
The problem with this is that I would like to achieve the same result using a macros of excel, I don't know how to program it, I would like to appreciate a suggestion of how to transform this bash code to Excel-VBA, supposing that the column that contains the data is the column A.
With data in column A, in B1 enter:
=MID(A1,FIND(CHAR(1),SUBSTITUTE(A1,"\",CHAR(1),LEN(A1)-LEN(SUBSTITUTE(A1,"\",""))))+1,9999)
and copy down.
Related
I have a query in shell scripts that gives me a results like:
article;20200120
fruit;22
fish;23
I execute that report every day. I would like that when I execute the query the next day shows me output like that:
article;20200120;20200121
fruit;22;11
fish;23;12
These report I execute with postgre sql in a linux shell script. The output of csv is generated redirecting the ouput with ">>"
Please any help to achive that.
Thanks
This might be somewhat fragile, but it sounds like what you want can be accomplished with cut and paste.
Let's start with two files we want to join:
$ cat f1.csv
article;20200120
fruit;22
fish;23
$ cat f2.csv
article;20200121
fruit;11
fish;12
We first use cut to strip the headers from the second file, then send that into paste with the first file to combine corresponding lines:
$ cut -d ';' -f 2- f2.csv | paste -d ';' f1.csv -
article;20200120;20200121
fruit;22;11
fish;23;12
Parsing that command line, the -d ';' tells cut to use semicolons as the delimiter (the default is tab), and -f 2- says to print the second and later fields. f2.csv is the input file for cut. Then the -d ';' similarly tells paste to use semicolons to join the lines, and f1.csv - are the two files to paste together, in that order, with - representing the input piped in using the | shell operator.
Now, like I say, this is somewhat fragile. We're not matching the lines based on the header information, only their line number from the start of the file. If some fields are optional, or the set of fields changes over time, this will silently produce garbage. One way to mitigate that would be to first call cut -d ';' -f 1 on each of the input files and insist the results are the same before combining them.
I am trying to get all unique values from a column in a very large file (5 columns, 2,044,530,100 lines, ~49 GB). My current approach is to cut the relevant column and putting it through sort -u (which sorts and only outputs the unique values). While my INPUT is just text, my output contains binary characters and makes it unusable.
First lines of INPUT look like this:
1 D12 rs01 T T
1 D12 rs02 G G
1 D12 rs03 G G
1 D15 rs01 C C
Putting it through a tr command does not make it better, it just makes the binary characters visible.
cut -d" " -f3 INPUT | sort -u > OUTPUT
cut -d" " -f3 INPUT | tr -cd '\11\12\15\40-\176' | sort -u > OUTPUT
For example, some sample-output from the command above:
yO+{(#6:1fr
EvI0^?E0/':>)zj;<f#V&:oY\RM&mhR!6(qV%|`rJTq4IKqV{]Dzb"~8(X82
F:7nc9gZ#nht^M">vo|F+g"x%r>UdF+Rn^MOu=
While the expected output is a column with all unique values in a value, e.g.:
rs01
rs02
rs03
rs04
rs05
Unfortunately, I can't replicate this behavior with generated (smaller) data. Does anyone have a suggestion of how to deal with this? All help is greatly appreciated. Sort version is sort (GNU coreutils) 8.4
Instead of manually splitting the file for inspection I would try grep-ing the input file for unusual characters, just to make sure your input is not damaged, or locate the place with garbage.
grep -b -E -v -e '^[[:alnum:][:space:]]+$' <your file>
If the input is OK, try to use temporary file instead of pipe, and examine it in the same way. If it is OK, blame sort.
(PS. I would rather post it as a comment, not a solution but I can't)
So I want to automate a manual task using shell scripting, but I'm a little lost as to how to parse the output of a few commands. I would be able to this in other languages without a problem, so I'll just explain what I'm going for in psuedo code and provide an example of the cmd output I'm trying to parse.
Example of output:
Chg 2167467 on 2012/02/13 by user1234#filename 'description of submission'
What I need to parse out is '2167467'. So what I want to do is split on spaces and take element 1 to use in another command. The output of my next command looks like this:
Change 2167463 by user1234#filename on 2012/02/13 18:10:15
description of submission
Affected files ...
... //filepath/dir1/dir2/dir3/filename#2298 edit
I need to parse out '//filepath/dir1/dir2/dir3/filename#2298' and use that in another command. Again, what I would do is remove the blank lines from the output, grab the 4th line, and split on space. From there I would grab the 1st element from the split and use it in my next command.
How can I do this in shell scripting? Examples or a point to some tutorials would be great.
Its not clear if you want to use the result from the first command for processing the 2nd command. If that is true, then
targString=$( cmd1 | awk '{print $2}')
command2 | sed -n "/${targString}/{n;n;n;s#.*[/][/]#//#;p;}"
Your example data has 2 different Chg values in it, (2167467, 2167463), so if you just want to process this output in 2 different ways, its even simpler
cmd1 | awk '{print $2}'
cmd2 | sed -n '/Change/{n;n;n;s#.*[/][/]#//#;p;}'
I hope this helps.
I'm not 100% clear on your question, but I would use awk.
http://www.cyberciti.biz/faq/bash-scripting-using-awk/
Your first variable would look something like this
temp="Chg 2167467 on 2012/02/13 by user1234#filename 'description of submission'"
To get the number you want do this:
temp=`echo $temp | cut -f2 -d" "`
Let the output of your second command be saved to a file something like this
command $temp > file.txt
To get what you want from the file you can run this:
temp=`tail -1 file.txt | cut -f2 -d" "`
rm file.txt
The last block of code gets the last nonwhite line of the file and delimits on the second set of white spaces
I need to extract some data from a CSV file. The CSV is a 2 column file with multiple records. The first column is the date, the second column is the data that needs to be extracted. The first row of the CSV file is the column headers, so it can be skipped. And I've already created the column header for the extracted data's csv file, so theres no need for that, I'll simply use >> to import the data into it.
Here is 1 record/line (of many) in the CSV file:
"2009-09-20 00:12:37","a:2:{s:15:""info_buyRequest"";a:5:{s:4:""uenc"";s:116:""aHR0cDovL3N0b3JlLmZvcmdldGhhbmdvdmVycy5jb20vcGF0Y2hlcy9pbmRpdmlkdWFsLXBhdGNoZXMvZnJlZS1zYW1wbGUuaHRtbD9fX19TSUQ9VQ,,"";s:7:""product"";s:1:""1"";s:15:""related_product"";s:0:"""";s:7:""options"";a:13:{i:17;s:2:""59"";i:16;s:2:""50"";i:15;s:2:""49"";i:14;s:2:""47"";i:13;s:2:""41"";i:12;s:2:""34"";i:11;s:2:""25"";i:10;s:2:""23"";i:9;s:2:""19"";i:8;s:2:""17"";i:7;s:2:""12"";i:6;s:1:""9"";i:5;s:1:""5"";}s:3:""qty"";i:1;}s:7:""options"";a:13:{i:0;a:7:{s:5:""label"";s:25:""How did you hear about us"";s:5:""value"";s:22:""Friend / Family Member"";s:11:""print_value"";s:22:""Friend / Family Member"";s:9:""option_id"";s:2:""17"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""59"";s:11:""custom_view"";b:0;}i:1;a:7:{s:5:""label"";s:3:""Age"";s:5:""value"";s:5:""21-24"";s:11:""print_value"";s:5:""21-24"";s:9:""option_id"";s:2:""16"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""50"";s:11:""custom_view"";b:0;}i:2;a:7:{s:5:""label"";s:14:""Marital Status"";s:5:""value"";s:9:""UnMarried"";s:11:""print_value"";s:9:""UnMarried"";s:9:""option_id"";s:2:""15"";s:11:""option_type"";s:5:""radio"";s:12:""option_value"";s:2:""49"";s:11:""custom_view"";b:0;}i:3;a:7:{s:5:""label"";s:3:""Sex"";s:5:""value"";s:6:""Female"";s:11:""print_value"";s:6:""Female"";s:9:""option_id"";s:2:""14"";s:11:""option_type"";s:5:""radio"";s:12:""option_value"";s:2:""47"";s:11:""custom_view"";b:0;}i:4;a:7:{s:5:""label"";s:10:""Occupation"";s:5:""value"";s:7:""Student"";s:11:""print_value"";s:7:""Student"";s:9:""option_id"";s:2:""13"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""41"";s:11:""custom_view"";b:0;}i:5;a:7:{s:5:""label"";s:9:""Education"";s:5:""value"";s:16:""College Graduate"";s:11:""print_value"";s:16:""College Graduate"";s:9:""option_id"";s:2:""12"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""34"";s:11:""custom_view"";b:0;}i:6;a:7:{s:5:""label"";s:16:""Household Income"";s:5:""value"";s:7:""30K-50K"";s:11:""print_value"";s:7:""30K-50K"";s:9:""option_id"";s:2:""11"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""25"";s:11:""custom_view"";b:0;}i:7;a:7:{s:5:""label"";s:23:""Do You Take Supplements"";s:5:""value"";s:2:""No"";s:11:""print_value"";s:2:""No"";s:9:""option_id"";s:2:""10"";s:11:""option_type"";s:5:""radio"";s:12:""option_value"";s:2:""23"";s:11:""custom_view"";b:0;}i:8;a:7:{s:5:""label"";s:40:""How would you rank your typical hangover"";s:5:""value"";s:4:""Mild"";s:11:""print_value"";s:4:""Mild"";s:9:""option_id"";s:1:""9"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""19"";s:11:""custom_view"";b:0;}i:9;a:7:{s:5:""label"";s:51:""What type of establishments do you typically prefer"";s:5:""value"";s:10:""Nightclubs"";s:11:""print_value"";s:10:""Nightclubs"";s:9:""option_id"";s:1:""8"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""17"";s:11:""custom_view"";b:0;}i:10;a:7:{s:5:""label"";s:40:""How often do you usually go out per week"";s:5:""value"";s:3:""1-2"";s:11:""print_value"";s:3:""1-2"";s:9:""option_id"";s:1:""7"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""12"";s:11:""custom_view"";b:0;}i:11;a:7:{s:5:""label"";s:49:""How many drinks do you typically consume per week"";s:5:""value"";s:3:""6-8"";s:11:""print_value"";s:3:""6-8"";s:9:""option_id"";s:1:""6"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:1:""9"";s:11:""custom_view"";b:0;}i:12;a:7:{s:5:""label"";s:53:""How would you prefer to buy our Products"";s:5:""value"";s:6:""Online"";s:11:""print_value"";s:6:""Online"";s:9:""option_id"";s:1:""5"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:1:""5"";s:11:""custom_view"";b:0;}}}"
The Output should be the data found here:
""print_value";s:?:""{DATA}""
Were the ? is a number, and {DATA} is the data being extracted.
So the output for example of this 1 record would be:
"2009-09-20 00:12:37","Friend / Family Member","21-24","UnMarried","Female","Student","College Graduate","30K-50K","No","Mild","Nightclubs","1-2","6-8","Online"
I am not proficient in Sed,AWK, or Grep, but I know it can be done using one of these tools if not all three. Any help or nudges in the right direction would be GREATLY appreciated.
I suggest you use PHP to de-serialize the structure.
However, here's a quick and dirty version of what you want using sed and tr. Certainly you can do this much much better:
cat file.csv | \
tr ",;" "\n" | \
sed -e 's/[asbi]:[0-9]*[:]*//g' -e '/^[{}]/d' -e 's/""//g' -e '/^"{/d' | \
sed -n -e '/^"/p' -e '/^print_value$/,/^option_id$/p' | \
sed -e '/^option_id/d' -e '/^print_value/d' -e 's/^"\(.*\)"$/\1/' | \
tr "\n" "," | \
sed -e 's/,\([0-9]*-[0-9]*-[0-9]*\)/\n\1/g' -e 's/,$//' | \
sed -e 's/^/"/g' -e 's/$/"/g' -e 's/,/","/g'
The explanation:
split by commas and semicolons
remove remove the php structure syntax s:X:Y, b:X, ... and remove lines starting with { or } or "{
extract the section from print_value to the next option_id, also keep the date (line start with ")
remove those labels (print and option), and remove quotations around the date
concat all lines with commas
seperate lines (starting with date pattern), and remove extra comma at end
add quotations around all fields
Wow, I know it's embarrassing :)
Here is my anwser:
cat TestData \
| grep -o -P "print_value\"\";.*?:\"\".*?\"\";" \
| perl -pe 's|print_value.*:\"\"(.*?)\"\";|\1|'
The first line show the data (stored in TestData).
The second line asks grep to separate each match from print_value to the nearest '"";'.
Notice that I use '.*?' for non greedy match (needs to use '-P' with it).
The last line use perl to strip all un-needed. See that I use '(.*?)' to match the needed group and use '\1' to show the group.
Hope this helps.
Here's a sed oneliner:
sed -nr 's/^([^,]+),(.*)$/\2#%#\1/;:a;s/""print_value"";s:[0-9]+:""([^"]+)""(.*)$/\2,"\1"/;ta;s/^.*#%#//p' <source
Basically extract the data and append it to the end of the line using a unique delimiter '#%#'.
When the loop/substitute construct fails (i.e. no more data), throw away what is left of the original line leaving the data nicely formatted.
I have a set of csv files (around 250), each having 300 to 500 records. I need to cut 2 or 3 columns from each file and store it to another one. I'm using ubuntu OS. Is there any way to do it in command or utility?
If you know that the column delimiter does not occur inside the fields, you can use cut.
$ cat in.csv
foo,bar,baz
qux,quux,quuux
$ cut -d, -f2,3 < in.csv
bar,baz
quux,quuux
You can use the shell buildin 'for' to loop over all input files.
If the fields might contain the delimiter, you ought to find a library that can parse CSV files. Typically, general purpose scripting languages will include a CSV module in their standard library.
Ruby: require 'csv'
Python: import csv
Perl: use Text::ParseWords;
If your fields contain commas or newlines, you can use a helper program I wrote to allow cut (and other UNIX text processing tools) to properly work with the data.
https://github.com/dbro/csvquote
This program finds special characters inside quoted fields, and temporarily replaces them with nonprinting characters which won't confuse the cut program. Then they get restored after cut is done.
lutz' solution would become:
csvquote in.csv | cut -d, -f2,3 | csvquote -u
If you used ssconvert to get the CSV you might try:
ssconvert -O 'separator="|"' "file.xls" "file.txt"
Notice the TXT extension instead CSV, this way will use Gnumeric_stf:stf_assistant exporter instead of Gnumeric_stf:stf_csv, which let you use options (-O parameter). Otherwise you'll get a The file saver does not take options error. Pipe character is much more unlikely, but you might want to check before.
Then you can rename it and do things like:
cat file.csv | cut -d "|" -f3 | sort | uniq -c | sort -rn | head
Other options example: -O 'eol=unix separator=; format=preserve charset=UTF-8 locale=en_US transliterate-mode=transliterate quoting-mode=never'.
A solution with AWK v4+.
ssconvert man page.