How to get a history of SQL queries made in CockroachDB? - cockroachdb

I'd like to search, perhas using grep, through the history of all SQL queries in a CockroachDB cluster. Is that possible?
I hope that there is a file that stores each and every single SQL queries I ever issued to my CockroachDB cluster.

Yes! You can find the file ~/.cockroachsql_history to store all of the SQL queries you ever issued through the SQL shell. However, that file encoded all "special characters" (non-alphanumerical, non-underscore characters) with their Octal ASCII code.
You can, however, find and replace those encodings to their origins using the following command
perl -plw -e 's/\\134/\\/g; s/\\040/ /g; s/\\011/\t/g; s/\\012/\n/g' ~/.cockroachsql_history > ~/.cockroachsql_history_replaced
where I simply try to replace all the special character encodings I found in my ~/.cockroachsql_history. If it misses any for yours, simply add one more clause in the one-line perl program (the pattern is s/old_string/new_string/g;).

Related

How to have a Multi Character Delimiter in Informatica Cloud?

I have a problem I need help solving. The business I am working for is using Informatica cloud to do alot of their ETL into AWS and Other Services.
We have been given a flat file by the business where the field delimiter is "~|" Currently to the best of my knowledge informatica only accepts a single character delimiter.
Does any one know how to overcome this?
Informatica cannot read composite delimiters.
First you could feed each line as one single long string into an
Expression transformation. In this case the delimiter character should
be set to \037 , I haven't seen this character (ASCII Unit Separator)
in use at least since 1982. Then use repetitive invocations of InStr()
within the EXP to identify the positions of those double pipe
characters and split up each line into fields using SubStr().
Second
(easier in the mapping, more work with the session) you could feed the
file into some utility which replaces those double pipe characters by
the character ASCII 31 (the Unit Separator mentioned above); the
session has to be set up such that it reads the output from this
utility (input file type = Command instead of File). Then the source
definition should contain the \037 as the field delimiter instead of
any pipe character or so.

IBM datastage 8.7 script to remove special characters oracle 11g

I am wondering if I need to implement a solution at Datastage end and/or Oracle 11g DB end to fix the issue of existence of non-ascii characters in the descriptions. Because the databases are using different character sets, the conversion of one set to another occasionally convert a single non-ascii character into multiple characters causing a truncation error.
Sample description
": ¿What date did this happen?¿ xxxxx: ¿Wednesday, so it would have been ..... "
":had to go to the doctor yesterday.¿ xxxxxx: ¿I¿ll just get you to state your"
Ideally (for longer term) I would like to replace with the corresponding character, i.e. some extended quotes should become a regular quote.
For short term i have a sample script written which basically replaces all special characters with a space -
UPDATE rcmain.rc_description
SET desc_description = REPLACE(desc_description, CHR(191), ' ')
WHERE desc_description LIKE '%' || CHR(191) || '%'
From the above script I was to create a DataStage "ctlCleanseSourceFile" job that calls a UNIX shell script "Replace_extended_characters.sh" to strip the special characters out of the XML files.
This could be done once the XML file are merged into a single file.
Modify one of the attached sample files to create the shell script.
It should:Use the sed statement in the samples, Store a backup of the original file before the replacement. Save this to the same archive directory as the other files for the run & report on the characters replaced if possible.
Is there a better way to handle this situation and not use space to description field.
"Apologies for the long post"
Apologies, this is a quick response;
What are your NLS settings?
I had an issue like yourself reading XML and changing the source stage NLS to windows 1252 resolved the 'invalid character' issue I was having (source of the files was a windows server but Datastage was on Unix so using UTF-8)

How to clean a csv file where fields contains the csv separator and delimiter

I'm currently strugling to clean csv files generated automatically with fields containing the csv separator and the field delimiter using sed or awk or via a script.
The source software has no settings to play with to improve the situation.
Format of the csv:
"111111";"text";"";"text with ; and " sometimes "; or ;" multiple times";"user";
Fortunately, the csv is "well" formatted, the exporting software just doesn't escape or replace "forbidden" chars from the fields.
In the last few days I tried to improve my knowledge of regular expression and find expression to clean the files but I failed.
What I managed to do so far:
RegEx to find the fields (I wanted to find the fields and perform a replace inside but I didn't find a way to do it)
(?:";"|^")(.*?)(?=";"|";\n)
RegEx that find semicolon, does not work if the semicolon is the last char of the field only find one per field.
(?:^"|";")(?:.*?)(;)(?:[^"\n].*?)(?=";"|";\n)
RegEx to find the double quotes, seems to pick the first double quote of the line in online regex testers
(?:^"|";")(?:.*?)[^;](")(?:[^;].*?)(?=";"|";\n)
I thought of adding space between each chars in the fields then searching for lonely semi colon and double quotes and remove single space after that but I don't know if it's even possible and seems like a poor solution anyway.
Any standard library should be able to handle it if there is no explicit error in the CSV itself. This is why we have quote-characters and escape characters.
When you create a CSV by yourself - you may forgot handling such cases and let your final output file use this situation. AWK is not a CSV reader but simply a text processing utility.
This is what your row should rather look like.
"111111";"text";"";"text with \; and \" sometimes \"; or ;\" multiple times";"user";
So if you can still re-fetch the data, find a way to export the CSV either through the database's own functionality of csv library for the languages you work with.
In python, this would look like this:-
mywriter = csv.writer(csvfile, delimiter=';', quotechar='"', escapechar="\\")
But if you can't create csv again, the only hope is that you expect some pattern within the fields, as in this question:- parse a csv file that contains commans in the fields with awk
But this is rarely true in textual data - esp comments or posts on a webpage. Another idea in such situations would be to use '\t' as separator.

Windows SED command - simple search and replace without regex

How should I use 'sed' command to find and replace the given word/words/sentence without considering any of them as the special character?
In other words hot to treat find and replace parameters as the plain text.
In following example I want to replace 'sagar' with '+sagar' then I have to give following command
sed "s/sagar/\\+sagar#g"
I know that \ should be escaped with another \ ,but I can't do this manipulation.
As there are so many special characters and theie combinations.
I am going to take find and replace parameters as input from user screen.
I want to execute the sed from c# code.
Simply, I do not want regular expression of sed to use. I want my command text to be treated as plain text?
Is this possible?
If so how can I do it?
While there may be sed versions that have an option like --noregex_matching, most of them don't have that option. Because you're getting the search and replace input by prompting a user, you're best bet is to scan the user input strings for reg-exp special characters and escape them as appropriate.
Also, will your users expect for example, their all caps search input to correctly match and replace a lower or mixed case string? In that case, recall that you could rewrite their target string as [Ss][Aa][Gg][Aa][Rr], and replace with +Sagar.
Note that there are far fewer regex characters used on the replacement side, with '&' meaning "complete string that was matched", and then the numbered replacment groups, like \1,\2,.... Given users that have no knowledge or expectation that they can use such characters, the likelyhood of them using is \1 in their required substitution is pretty low. More likely they may have a valid use for &, so you'll have to scan (at least) for that and replace with \&. In a basic sed, that's about it. (There may be others in the latest gnu seds, or some of the seds that have the genesis as PC tools).
For a replacement string, you shouldn't have to escape the + char at all. Probably yes for \. Again, you can scan your user's "naive" input, and add escape chars as need.
Finally if you're doing this for a "package" that will be distributed, and you'll be relying on the users' version of sed, beware that there are many versions of sed floating around, some that have their roots in Unix/Linux, and others, particularly of super-sed, that (I'm pretty sure) got started as PC-standalones and has a very different feature set.
IHTH.

How to enumerate unique characters in a UTF-8 document? With sed?

I'm converting some Polish<->English dictionaries from RTF to HTML. The Polish special characters are coming out fine. But IPA (International Phonetic Alphabet) glyphs get changed to funny things, depending on what program I use for conversion. For example, /ˈbiːrɪ/ comes out as /ÈbiùrI/ or /∪βιρΙ/.
I'd like to correct these documents with a search & replace, but I want to make sure I don't miss any characters and don't want to manually pore over dictionary entries. I'd like to output a list of all unique, NON-ascii characters in a document.
I found this thread:
Find Unique Characters in a File
... and I tried the following two proposals:
sed -e "s/./\0\n/g" inputfile | sort -u
sed -e "s/(.)/\1\n/g" inputfile | sort -u
They both work nicely, and seem to both generate the same output. My problem is that they only output standard ASCII characters, and what I'm looking for is exactly the opposite.
The sed tool looks awesome, but I don't have time to learn it right now (though I intend to later). I'm hoping the solution will be clear to someone who's already mastered this tool, and they can save me a lot of time. [-:
Thanks in advance!
This is not a sed solution but a Python solution. It reads the contents of a file, takes it as UTF-8 and then turns it into a set (thus throwing away duplicates), throws away ASCII characters (0-127), sorts it and then joins it back together again with a blank line between each character:
'\n'.join(sorted(set(unicode(open(inputfile).read(), 'utf-8')) - set(chr(i) for i in xrange(128))))
As something you'd run from the command line if you felt so inclined,
python -c "print '\n'.join(sorted(set(unicode(open('inputfile').read(), 'utf-8')) - set(chr(i) for i in xrange(128))))"
(You could also use ''.join instead of '\n'.join which would list the characters without a newline in between.)

Resources