Shell command to add a column to data (via another function) - shell

I have data like
1234567890.123 'time1'
2345678901.234 'time2'
3456789012.345 'time3'
where the first number represents the epoch and I'd like to append a column with a human readable date. So something like cut -c 1-10 | xargs -I {} date -r {} is close but I'd like to keep around the other data on the line.
What's the simplest way to do this?

I am not sure if your data is just a string or something else, but if it is a string, you could just grep the first 10 numbers with grep and regex, convert to the readable date and append to the original string.
input="1234567890.123 'time1'"
readable="#$(echo $input| grep -oP '^[0-9]{10}')"
output="$input $(date -d $readable)"
I am using # and -r for the conversion but this should work fine with the -r option too.

Related

How to use grep/awk/sed to print until a certain character?

I am a complete beginner on shell scripting and I am trying to iterate through a set of JSON files and trying to extract a certain field out of it. Each JSON file has a "country:"xxx" field. In each JSON file, there are 10k of the same field with the same country name so I need only the first occurrence and I can do that using "-m 1".
I tried to use grep for this but could not figure out how to extract the whole field including the country name from each file at first occurrence.
for FILE in *.json;
do
grep -o -a -m 1 -h -r '"country":"' $FILE;
done
I tried to use another pipe and use the below pattern but it did not work
| egrep -o '^[^"]+'
Actual Output:
"country":"
"country":"
"country":"
Desired Output:
"country:"romania"
"country:"united kingdom"
"country:"tajikistan"
but I need the whole thing. Any help would be great. Thanks
There is one general answer on the question "I only want the first occurence", and that answer is:
... | head -n 1
This mean, whatever your do: take the head (the first lines), the -n switch gives you the possibility to say how many you want (one in this case).
The same can be done for the last occurence(s), but then you use tail instead of head (you can also use the -n switch).
After trying many things. I found the pattern I was looking for.
grep -Po '"country":.*?[^\\]",' $FILE | head -n 1;

Is this a good way to create unique folders based on modified dates?

I have a folder full of images. I'd like to create folders for each based on when that image (or file) was modified. Is this a good way to do that in bash? The code works but I'm still a novice and not sure if there are better ways.
ls -l | sort -k8n -k6M -k7n | tr -s ' ' | cut -d ' ' -f6-8 | uniq | sed '/^$/d'| parallel -j 24 date --date={} +"%Y-%m-%d"| parallel -j 24 mkdir {}
Explanation of code:
ls -l #find files and tell me modified date.
sort -k8n -k6M -k7n # sort values by column 8 (format numeric) then 6 (format is Month) then 7 (format numeric).
tr -s ' ' # truncate all spaces into one spaces.
cut -d ' ' -f6-8 # cut the text by the delimiter " " (i.e., space) and save columns 6-8.
uniq #save only unique values
sed '/^$/d' #remove empty lines.
parallel -j 24 date --date={} +"%Y-%m-%d" #take input and parallel process into 24 jobs. Then convert the date input (coming from {}) into a YYYY-MM-DD format.
parallel -j 24 mkdir {} #create 24 jobs that create folders based on output from previous command ({}).
There are a lot of simpler, less error prone ways to do this. If you have the GNU version of date(1), for example:
#!/usr/bin/env bash
shopt -s nullglob
declare -A mtimes
# Adjust pattern as needed
for file in *.{png,jpg}; do
mtimes[$(date -r "$file" +'%Y-%m-%d')]=1
done
mkdir "${!mtimes[#]}"
This uses a bash associative array to store all the timestamps to use to create new directories from and then makes them all at once with a single mkdir.
And since I mentioned preferring to do it in something other than pure shell in a comment, a tcl one-liner:
tclsh8.6 <<'EOF'
file mkdir {*}[lsort -unique [lmap file [glob -nocomplain -type f *.{png,jpg}] { clock format [file mtime $file] -format %Y-%m-%d }]]
EOF
or perl:
perl -MPOSIX=strftime -e '$mtimes{strftime q/%Y-%m-%d/, localtime((stat)[9])} = 1 for (glob q/*.{png,jpg}/); mkdir for keys %mtimes'
Both of these have the advantage of not needing a specific implementation of date (The -r option isn't POSIX; not sure how widely supported it is outside of the GNU coreutils version), or bash 4+ (An issue if you're using, say, a Mac (I think they still come with perl, at least until the next OS X version or two)).

bash grep 'random matching' string

Is there a way to grab a 'random matching' string via bash from a text file?
I am currently grabbing a download link via bash, curl & grep from a online text file.
Example:
DOWNLOADSTRING="$(curl -o - "http://example.com/folder/downloadlinks.txt" | grep "$VARIABLE")"
from online text file which contains
http://alphaserver.com/files/apple.zip
http://alphaserver.com/files/banana.zip
where $VARIABLE is something the user selected.
Works great, but i wanted to add some mirrors to the text file.
So when the variable 'banana' is selected, text file which i grep contains:
http://alphaserver.com/files/apple.zip
http://betaserver.com/files/apple.zip
http://gammaserver.com/files/apple.zip
http://deltaserver.com/files/apple.zip
http://alphaserver.com/files/banana.zip
http://betaserver.com/files/banana.zip
http://gammaserver.com/files/banana.zip
http://deltaserver.com/files/banana.zip
the code should pick a random 'banana' string and store it as the 'DOWNLOADSTRING' variable.
the current code above can only work with 1 string in the text file, since it grabs everything 'banana'.
What this is for; i wanted to add some mirror downloadlinks for the files in the online text file, and the current code doesn't allow that.
Can i let grep grab one random 'banana' string? (and not all of them)
See this question to see how to get a random line after grep. rl seems like a good candidate
What's an easy way to read random line from a file in Unix command line?
then do a grep ... | rl | head -n 1
Try this:
DOWNLOADSTRING="$(curl -o - "http://example.com/folder/downloadlinks.txt" | grep "$VARIABLE")" |
sort -R | head -1
The output will be random-sorted and then the first line will be selected.
If mirrors.txt has the following data, which you provided in your question:
http://alphaserver.com/files/apple.zip
http://betaserver.com/files/apple.zip
http://gammaserver.com/files/apple.zip
http://deltaserver.com/files/apple.zip
http://alphaserver.com/files/banana.zip
http://betaserver.com/files/banana.zip
http://gammaserver.com/files/banana.zip
http://deltaserver.com/files/banana.zip
Then you can use the following command to get a random "matched string" from the file:
grep -E "${VARIABLE}" mirrors.txt | shuf -n1
Then you can store it as the variable DOWNLOADSTRING by setting it's value with a function call like so:
rand_mirror_call() { grep -E "${1}" mirrors.txt | shuf -n1; }
DOWNLOADSTRING="$(rand_mirror_call ${VARIABLE})"
This will give you a dedicated random line from the text file based on the user's ${VARIABLE} input. It is a lot less typing this way.

Shell Script String Manipulation

I'm trying to replace an epoch timestamp within a string, with a human readable timestamp. I know how to convert the epoch to the time format I need (and have been doing so manually), though I'm having trouble figuring out how to replace it within the string (via script).
The string is a file name, such as XXXX-XXX-2011-01-25-3.6.2-record.pb.1296066338.gz (epoch is bolded).
I've been converting the timestamp with the following gawk code:
gawk 'BEGIN{print strftime("%Y%m%d.%k%M",1296243507)}'
I'm generally unfamiliar with bash scripting. Can anyone give me a nudge in the right direction?
thanks.
You can use this
date -d '#1296066338' +'%Y%m%d.%k%M'
in case you don't want to invoke awk.
Are all filenames the same format? Specifically, "." + epoch + ".gz"?
If so, you can use a number of different routes. Here's one with sed:
$ echo "XXXX-XXX-2011-01-25-3.6.2-record.pb.1296066338.gz" | sed 's/.*\.\([0-9]\+\)\.gz/\1/'
1296066338
So that extracts the epoch, then send it to your gawk command. Something like:
#!/bin/bash
...
epoch=$( echo "XXXX-XXX-2011-01-25-3.6.2-record.pb.1296066338.gz" | sed 's/.*\.\([0-9]\+\)\.gz/\1/' )
readable_timestamp=$( gawk "BEGIN{print strftime(\"%Y%m%d.%k%M\",${epoch})}" )
Then use whatever method you want to replace the number in the filename. You can send it through sed again, but instead of saving the epoch, you would want to save the other parts of the filename.
EDIT:
For good measure, a working sample on my machine:
#!/bin/bash
filename="XXXX-XXX-2011-01-25-3.6.2-record.pb.1296066338.gz"
epoch=$( echo ${filename} | sed 's/.*\.\([0-9]\+\)\.gz/\1/' )
readable_timestamp=$( gawk "BEGIN{print strftime(\"%Y%m%d.%k%M\",${epoch})}" )
new_filename=$( echo ${filename} | sed "s/\(.*\.\)[0-9]\+\(\.gz\)/\1${readable_timestamp}\2/" )
echo ${new_filename}
You can use Bash's string manipulation and AWK's variable passing to avoid having to make any calls to sed or do any quote escaping.
#!/bin/bash
file='XXXX-XXX-2011-01-25-3.6.2-record.pb.1296066338.gz'
base=${file%.*.*} # XXXX-XXX-2011-01-25-3.6.2-record.pb
epoch=${file#$base} # 1296066338.gz
epoch=${epoch%.*} # 1296066338
# we can extract the extension similarly, unless it's safe to assume it's ".gz"
humantime=$(gawk -v "epoch=$epoch" 'BEGIN{print strftime("%Y%m%d.%k%M",epoch)}')
newname=$base.$humantime.gz
echo "$newname"
Result:
XXXX-XXX-2011-01-25-3.6.2-record.pb.20110126.1225.gz

Extract Data from CSV in shell script (Sed, AWK, Grep?)

I need to extract some data from a CSV file. The CSV is a 2 column file with multiple records. The first column is the date, the second column is the data that needs to be extracted. The first row of the CSV file is the column headers, so it can be skipped. And I've already created the column header for the extracted data's csv file, so theres no need for that, I'll simply use >> to import the data into it.
Here is 1 record/line (of many) in the CSV file:
"2009-09-20 00:12:37","a:2:{s:15:""info_buyRequest"";a:5:{s:4:""uenc"";s:116:""aHR0cDovL3N0b3JlLmZvcmdldGhhbmdvdmVycy5jb20vcGF0Y2hlcy9pbmRpdmlkdWFsLXBhdGNoZXMvZnJlZS1zYW1wbGUuaHRtbD9fX19TSUQ9VQ,,"";s:7:""product"";s:1:""1"";s:15:""related_product"";s:0:"""";s:7:""options"";a:13:{i:17;s:2:""59"";i:16;s:2:""50"";i:15;s:2:""49"";i:14;s:2:""47"";i:13;s:2:""41"";i:12;s:2:""34"";i:11;s:2:""25"";i:10;s:2:""23"";i:9;s:2:""19"";i:8;s:2:""17"";i:7;s:2:""12"";i:6;s:1:""9"";i:5;s:1:""5"";}s:3:""qty"";i:1;}s:7:""options"";a:13:{i:0;a:7:{s:5:""label"";s:25:""How did you hear about us"";s:5:""value"";s:22:""Friend / Family Member"";s:11:""print_value"";s:22:""Friend / Family Member"";s:9:""option_id"";s:2:""17"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""59"";s:11:""custom_view"";b:0;}i:1;a:7:{s:5:""label"";s:3:""Age"";s:5:""value"";s:5:""21-24"";s:11:""print_value"";s:5:""21-24"";s:9:""option_id"";s:2:""16"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""50"";s:11:""custom_view"";b:0;}i:2;a:7:{s:5:""label"";s:14:""Marital Status"";s:5:""value"";s:9:""UnMarried"";s:11:""print_value"";s:9:""UnMarried"";s:9:""option_id"";s:2:""15"";s:11:""option_type"";s:5:""radio"";s:12:""option_value"";s:2:""49"";s:11:""custom_view"";b:0;}i:3;a:7:{s:5:""label"";s:3:""Sex"";s:5:""value"";s:6:""Female"";s:11:""print_value"";s:6:""Female"";s:9:""option_id"";s:2:""14"";s:11:""option_type"";s:5:""radio"";s:12:""option_value"";s:2:""47"";s:11:""custom_view"";b:0;}i:4;a:7:{s:5:""label"";s:10:""Occupation"";s:5:""value"";s:7:""Student"";s:11:""print_value"";s:7:""Student"";s:9:""option_id"";s:2:""13"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""41"";s:11:""custom_view"";b:0;}i:5;a:7:{s:5:""label"";s:9:""Education"";s:5:""value"";s:16:""College Graduate"";s:11:""print_value"";s:16:""College Graduate"";s:9:""option_id"";s:2:""12"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""34"";s:11:""custom_view"";b:0;}i:6;a:7:{s:5:""label"";s:16:""Household Income"";s:5:""value"";s:7:""30K-50K"";s:11:""print_value"";s:7:""30K-50K"";s:9:""option_id"";s:2:""11"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""25"";s:11:""custom_view"";b:0;}i:7;a:7:{s:5:""label"";s:23:""Do You Take Supplements"";s:5:""value"";s:2:""No"";s:11:""print_value"";s:2:""No"";s:9:""option_id"";s:2:""10"";s:11:""option_type"";s:5:""radio"";s:12:""option_value"";s:2:""23"";s:11:""custom_view"";b:0;}i:8;a:7:{s:5:""label"";s:40:""How would you rank your typical hangover"";s:5:""value"";s:4:""Mild"";s:11:""print_value"";s:4:""Mild"";s:9:""option_id"";s:1:""9"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""19"";s:11:""custom_view"";b:0;}i:9;a:7:{s:5:""label"";s:51:""What type of establishments do you typically prefer"";s:5:""value"";s:10:""Nightclubs"";s:11:""print_value"";s:10:""Nightclubs"";s:9:""option_id"";s:1:""8"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""17"";s:11:""custom_view"";b:0;}i:10;a:7:{s:5:""label"";s:40:""How often do you usually go out per week"";s:5:""value"";s:3:""1-2"";s:11:""print_value"";s:3:""1-2"";s:9:""option_id"";s:1:""7"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""12"";s:11:""custom_view"";b:0;}i:11;a:7:{s:5:""label"";s:49:""How many drinks do you typically consume per week"";s:5:""value"";s:3:""6-8"";s:11:""print_value"";s:3:""6-8"";s:9:""option_id"";s:1:""6"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:1:""9"";s:11:""custom_view"";b:0;}i:12;a:7:{s:5:""label"";s:53:""How would you prefer to buy our Products"";s:5:""value"";s:6:""Online"";s:11:""print_value"";s:6:""Online"";s:9:""option_id"";s:1:""5"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:1:""5"";s:11:""custom_view"";b:0;}}}"
The Output should be the data found here:
""print_value";s:?:""{DATA}""
Were the ? is a number, and {DATA} is the data being extracted.
So the output for example of this 1 record would be:
"2009-09-20 00:12:37","Friend / Family Member","21-24","UnMarried","Female","Student","College Graduate","30K-50K","No","Mild","Nightclubs","1-2","6-8","Online"
I am not proficient in Sed,AWK, or Grep, but I know it can be done using one of these tools if not all three. Any help or nudges in the right direction would be GREATLY appreciated.
I suggest you use PHP to de-serialize the structure.
However, here's a quick and dirty version of what you want using sed and tr. Certainly you can do this much much better:
cat file.csv | \
tr ",;" "\n" | \
sed -e 's/[asbi]:[0-9]*[:]*//g' -e '/^[{}]/d' -e 's/""//g' -e '/^"{/d' | \
sed -n -e '/^"/p' -e '/^print_value$/,/^option_id$/p' | \
sed -e '/^option_id/d' -e '/^print_value/d' -e 's/^"\(.*\)"$/\1/' | \
tr "\n" "," | \
sed -e 's/,\([0-9]*-[0-9]*-[0-9]*\)/\n\1/g' -e 's/,$//' | \
sed -e 's/^/"/g' -e 's/$/"/g' -e 's/,/","/g'
The explanation:
split by commas and semicolons
remove remove the php structure syntax s:X:Y, b:X, ... and remove lines starting with { or } or "{
extract the section from print_value to the next option_id, also keep the date (line start with ")
remove those labels (print and option), and remove quotations around the date
concat all lines with commas
seperate lines (starting with date pattern), and remove extra comma at end
add quotations around all fields
Wow, I know it's embarrassing :)
Here is my anwser:
cat TestData \
| grep -o -P "print_value\"\";.*?:\"\".*?\"\";" \
| perl -pe 's|print_value.*:\"\"(.*?)\"\";|\1|'
The first line show the data (stored in TestData).
The second line asks grep to separate each match from print_value to the nearest '"";'.
Notice that I use '.*?' for non greedy match (needs to use '-P' with it).
The last line use perl to strip all un-needed. See that I use '(.*?)' to match the needed group and use '\1' to show the group.
Hope this helps.
Here's a sed oneliner:
sed -nr 's/^([^,]+),(.*)$/\2#%#\1/;:a;s/""print_value"";s:[0-9]+:""([^"]+)""(.*)$/\2,"\1"/;ta;s/^.*#%#//p' <source
Basically extract the data and append it to the end of the line using a unique delimiter '#%#'.
When the loop/substitute construct fails (i.e. no more data), throw away what is left of the original line leaving the data nicely formatted.

Resources