Convert Redis Mass Insertion Protocol Format independent of OS - bash

I am trying to perform Redis mass insertion using the command cat data.txt | redis-cli --pipe as mentioned in https://redis.io/topics/mass-insert.
The data format on macOS has to be converted so that mass insertion could be performed with cat ${FILE} | perl -i -p -e 's|[\r\n]+|\r\n|g' | redis-cli --pipe.
However, the above command does not work on a Linux environment (or a docker environment with the container built from an alpine based image). Instead, the following command has to performed cat ${FILE} | sed 's/\r*$/\r/' | redis-cli --pipe.
Is there a command that would work in both environments?
EDIT: Attached the following:
Redis Mass Insertion script on Alpine Linux: https://gist.github.com/francjohny/f2b13b4cfc147e07e52824ec88ba3781
Redis Mass Insertion script on Mac OS: https://gist.github.com/francjohny/b57756a1e0124dd562959ca5ece2a32b
Redis Protocol Format data file: https://gist.github.com/francjohny/0c21f32d9902809b215f4e92f5e6a9f1
➜ head ouput.rpf| xxd - Mac OS : https://gist.github.com/francjohny/e1a646ab44e7edd7374d28e9ca400711
➜ head ouput.rpf| xxd - Alpine Linux: https://gist.github.com/francjohny/252904928ded4c045448d12b205228df

Updated Answer
From the data you have added, it seems you just have linefeeds separating your lines, whereas Redis requires carriage return followed by linefeed. So basically, you want the equivalent of the unix2dos program, which is not included in macOS. However, macOS does include Perl, so you should be able to use:
perl -pe 's/\n/\r\n/' data.rpf | redis-cli --pipe
It works fine on my Mac.
Original Answer
You appear to have mixed line endings in your various environments. I would imagine this Perl would replace any number of carriage returns and line feeds in any mixture with a single carriage return and linefeed like Redis requires:
perl -pe 's|[\r\n]*|\r\n|' data.txt | redis-cli ...
If not, please answer my question in the comments.

Related

Sed on macOS produces extra file suffixed with -e [duplicate]

This question already has answers here:
sed command with -i option failing on Mac, but works on Linux
(13 answers)
sed in-place flag that works both on Mac (BSD) and Linux
(15 answers)
Closed 4 years ago.
I'm trying to get sed to replace a line in a file with the contents of another file. Got this to work, however the in-place replacement somehow produces an extra file with -e suffixed.
This only seems to happens on macOS (High Sierra), and doesn't happen on Linux (Alpine) as I tried to reproduce this in a docker container.
My commands that reproduce this in sequence:
$ echo 'someline' > target_file.txt
$ echo 'replacementcontent' > replacement.txt
$ sed -Ei -e "\#^someline\$#{
r replacement.txt
d
}" target_file.txt
$ cat target_file.txt
replacementcontent
$ ls
replacement.txt target_file.txt target_file.txt-e
The replacement worked as intended but in a Linux environment the target_file.txt-e would not be there.
I know there are differences between the macOS and Linux sed, but this just seems random but I'm likely just not understanding something.
Why does this happen, and can the command be written in an agnostic way (so that it works the same on both macOS and Linux)?

Cygwin Command Substitution not Working

I am trying to trouble shoot a problem I am seeing when running bash commands in Cygwin.
I am trying to assign the CLang version from a text file to a variable. If I run this in Cygwin:
$ (sed -n 1p "$CLANGC2_VERSION_FILE" | sed 's/\s//g')
I get this output (which is exactly what I want):
14.10.25903
Now, if I try and assign this to a variable it doesn't work. Here is what I am trying:
$ CLANGC2_VERSION=$(sed -n 1p "$CLANGC2_VERSION_FILE" | sed 's/\s//g')
but when I inspect or print the variable, it is empty.
What am I doing wrong?
Turns out that there is a known 'Big List of Dodgy Apps' (BLODA) which can interfere with Cygwin and bash.
The discussion I found is here: https://cygwin.com/ml/cygwin/2017-07/msg00197.html
The BLODA list is here: https://cygwin.com/faq/faq.html#faq.using.bloda
Turns out my AntiVirus is on the list.
I've removed the AV and now the commands work. There must be some low-level stuff going with the AV that causes it to fail.
You can use backticks to get the desired results.
CLANGC2_VERSION=`(sed -n 1p "$CLANGC2_VERSION_FILE" | sed 's/\s//g')`

command in terminal and in script have different action

I have a file and I need to use sed to process it.
Here is my command: sed -i -e '/.*tour\.html\|.*Thumb[^\/]*\.jpg/!d'.
Now if I execute this command from the terminal, for example, sed -i -e '/.*tour\.html\|.*Thumb[^\/]*\.jpg/!d' myfile.txt, it works well. But if I write a bash script with the same command, it will delete all lines.
#!/bin/bash
sed -i -e '/.*tour\.html\|.*Thumb[^\/]*\.jpg/!d' "$1"
This script will delete all lines in file.
My PC is Mac OS.
As far as I understand getting both sed instances output of sed --help and sed --version showed that you have actually two different sed versions reacting to the two different ways of execuing your code.
Sed is a little inconsistent concerning the syntax, especially when it comes to commandline options.
For example, I know of an important difference for the -i switch, which in some Mac versions requires a file extension for backups being given explicitly. Others allow one optionally. This difference could explain why somethign involving a -i without backup extension works in one case and fails in another.
Anishsane suggested that different "PATH" variables could in turn be part of the mechanism to have two different sed versions executed.
I invite OP to edit the output of --help and --version (where possible, there should be a way to get the version out of both sed instances) here into this answer. I do not have those details actually. Which makes this answer seem a little "guessing".

How to create a file using a variable as filename?

I'm testing mobile Android devices and I would like to redirect the device log on a file whose name indicates both the date and time of my test, and the device model that is being tested.
For the first issue, I have already resolved with
now=$(date +"%b_%d_%Y_%k_%M");adb logcat -c;adb logcat|tee $now
So:
$ echo $now
Jan_03_2012_13_09
and the tee command creates a file with this filename.
As for the device model I have written two bash lines that obtain it from adb shell, namely
device=$(adb shell cat /system/build.prop | grep "^ro.product.device=")
deviceshortname=$(echo $device | sed 's/ro.product.device=//g')
(not optimal as I am not very good in bash programming... :) but I manage to get
$ echo $deviceshortname
LT15i
My problem is how to combine $now and $deviceshortname to obtain a filename such as:
LT15i_Jan_03_2012_13_19
I tried to set another variable:
filename=($(echo $deviceshortname"_"$now))
and got:
$ echo $filename
LT15i_Jan_03_2012_13_19
but if I try redirecting the log:
$ adb logcat | tee $filename
I obtain such file:
-rw-r--r--+ 1 ele None 293 Jan 3 13:21 ?[01;31m?[K?[m?[KLT15i_Jan_03_2012_13_19
I don't know why these strange characters and what I'm doing wrong.
Something is adding color to your output. It might be grep(1), it might adb, it might be baked into the /system/build.prop resource that you're reading.
If you're lucky, it is being added by grep(1) -- because that is supremely easy to disable with --color=no:
device=$(adb shell cat /system/build.prop | grep --color=no "^ro.product.device=")
deviceshortname=$(echo $device | sed 's/ro.product.device=//g')
If the colors are being added by adb, then perhaps it has a command line option that asks it to avoid colorizing the output.
If the colors are hard-coded into the /sys/build.prop resource in some way, then you'll need some little tool that filters out the color codes. I don't have one handy (and it's bedtime) but you can probably build one starting with tr(1) to delete \033 ASCII ESC characters.
Looks like an ANSI sequence used by adb to color the output.
I'm not sure if I'm missing something, but this works for me
p1=foo
p2=$(date +%d_%m_%Y)
cat sample_file.txt | tee $p1"_"$p2
Just type: echo ${deviceshortname}${now} and it will do the trick.

Case-insensitive search and replace with sed

I'm trying to use SED to extract text from a log file. I can do a search-and-replace without too much trouble:
sed 's/foo/bar/' mylog.txt
However, I want to make the search case-insensitive. From what I've googled, it looks like appending i to the end of the command should work:
sed 's/foo/bar/i' mylog.txt
However, this gives me an error message:
sed: 1: "s/foo/bar/i": bad flag in substitute command: 'i'
What's going wrong here, and how do I fix it?
Update: Starting with macOS Big Sur (11.0), sed now does support the I flag for case-insensitive matching, so the command in the question should now work (BSD sed doesn't reporting its version, but you can go by the date at the bottom of the man page, which should be March 27, 2017 or more recent); a simple example:
# BSD sed on macOS Big Sur and above (and GNU sed, the default on Linux)
$ sed 's/ö/#/I' <<<'FÖO'
F#O # `I` matched the uppercase Ö correctly against its lowercase counterpart
Note: I (uppercase) is the documented form of the flag, but i works as well.
Similarly, starting with macOS Big Sur (11.0) awk now is locale-aware (awk --version should report 20200816 or more recent):
# BSD awk on macOS Big Sur and above (and GNU awk, the default on Linux)
$ awk 'tolower($0)' <<<'FÖO'
föo # non-ASCII character Ö was properly lowercased
The following applies to macOS up to Catalina (10.15):
To be clear: On macOS, sed - which is the BSD implementation - does NOT support case-insensitive matching - hard to believe, but true. The formerly accepted answer, which itself shows a GNU sed command, gained that status because of the perl-based solution mentioned in the comments.
To make that Perl solution work with foreign characters as well, via UTF-8, use something like:
perl -C -Mutf8 -pe 's/öœ/oo/i' <<< "FÖŒ" # -> "Foo"
-C turns on UTF-8 support for streams and files, assuming the current locale is UTF-8-based.
-Mutf8 tells Perl to interpret the source code as UTF-8 (in this case, the string passed to -pe) - this is the shorter equivalent of the more verbose -e 'use utf8;'.Thanks, Mark Reed
(Note that using awk is not an option either, as awk on macOS (i.e., BWK awk and BSD awk) appears to be completely unaware of locales altogether - its tolower() and toupper() functions ignore foreign characters (and sub() / gsub() don't have case-insensitivity flags to begin with).)
A note on the relationship of sed and awk to the POSIX standard:
BSD sed and awk limit their functionality mostly to what the POSIX sed and
POSIX awk specs mandate, whereas their GNU counterparts implement many more extensions.
Editor's note: This solution doesn't work on macOS (out of the box), because it only applies to GNU sed, whereas macOS comes with BSD sed.
Capitalize the 'I'.
sed 's/foo/bar/I' file
Another work-around for sed on Mac OS X is to install gsedfrom MacPorts or HomeBrew and then create the alias sed='gsed'.
If you are doing pattern matching first, e.g.,
/pattern/s/xx/yy/g
then you want to put the I after the pattern:
/pattern/Is/xx/yy/g
Example:
echo Fred | sed '/fred/Is//willma/g'
returns willma; without the I, it returns the string untouched (Fred).
The sed FAQ addresses the closely related case-insensitive search. It points out that a) many versions of sed support a flag for it and b) it's awkward to do in sed, you should rather use awk or Perl.
But to do it in POSIX sed, they suggest three options (adapted for substitution here):
Convert to uppercase and store original line in hold space; this won't work for substitutions, though, as the original content will be restored before printing, so it's only good for insert or adding lines based on a case-insensitive match.
Maybe the possibilities are limited to FOO, Foo and foo. These can be covered by
s/FOO/bar/;s/[Ff]oo/bar/
To search for all possible matches, one can use bracket expressions for each character:
s/[Ff][Oo][Oo]/bar/
The Mac version of sed seems a bit limited. One way to work around this is to use a linux container (via Docker) which has a useable version of sed:
cat your_file.txt | docker run -i busybox /bin/sed -r 's/[0-9]{4}/****/Ig'
Use following to replace all occurrences:
sed 's/foo/bar/gI' mylog.txt
I had a similar need, and came up with this:
this command to simply find all the files:
grep -i -l -r foo ./*
this one to exclude this_shell.sh (in case you put the command in a script called this_shell.sh), tee the output to the console to see what happened, and then use sed on each file name found to replace the text foo with bar:
grep -i -l -r --exclude "this_shell.sh" foo ./* | tee /dev/fd/2 | while read -r x; do sed -b -i 's/foo/bar/gi' "$x"; done
I chose this method, as I didn't like having all the timestamps changed for files not modified. feeding the grep result allows only the files with target text to be looked at (thus likely may improve performance / speed as well)
be sure to backup your files & test before using. May not work in some environments for files with embedded spaces. (?)
Following should be fine:
sed -i 's/foo/bar/gi' mylog.txt

Resources