How do I delete the output for one big table inside a mysqldump with lots of tables in it?
I have a dump of a database that is 6 GB large, but 90% of it is only one logging-table "cache_entries", that I don’t need anymore inside my backup.
How can I easily remove that bit inside the dump, that describes the large logging-table?
I found this:
http://gtowey.blogspot.de/2009/11/restore-single-table-from-mysqldump.html
Example:
grep -n 'Table structure' dump.sql
and then for example:
sed -n '40,61 p' dump.sql > t2.sql
But how can I change that for my needs?
You could use 'n,n d' to remove certain lines.
I guess in your case you do want to have the table in question, but don't want the data?
Change the grep command to include "Dumping data for table":
grep -n 'Table structure\|Dumping data for table' dump.sql
19:-- Table structure for table `t1`
37:-- Dumping data for table `t1`
47:-- Table structure for table `t2`
66:-- Dumping data for table `t2`
76:-- Table structure for table `t3`
96:-- Dumping data for table `t3`
Now, if you don't want the data for t2, you could use:
sed '66,75 d' dump.sql > cleandump.sql
I found this bash script, that splits a dump of one database into separate filed for each table, using csplit (that splits a file into sections determined by context lines):
#!/bin/bash
####
# Split MySQL dump SQL file into one file per table
# based on http://blog.tty.nl/2011/12/28/splitting-a-database-dump
####
if [ $# -ne 1 ] ; then
echo "USAGE $0 DUMP_FILE"
fi
csplit -s -ftable $1 "/-- Table structure for table/" {*}
mv table00 head
for FILE in `ls -1 table*`; do
NAME=`head -n1 $FILE | cut -d$'\x60' -f2`
cat head $FILE > "$NAME.sql"
done
rm head table*
Source: gist.github.com/1608062
and a bit enhanced:
How do I split the output from mysqldump into smaller files?
once, you have separate files for each table, you can delete the unwanted tables and glue them together if needed with
cat table* >glued_sqldump.sql
you need to find the create table statement of your table, and find the next create table statement. say they are n1 and n2.
then you can just delete them with sed as above. sed 'n1,n2d' dump.sql > new.sql
you can just grep create table and note the line numbers for your prework.
here is a demo.
ubuntu#ubuntu:~$ grep -n [34] a.txt
3:3
4:4
ubuntu#ubuntu:~$ cat a.txt
1
2
3
4
5
6
ubuntu#ubuntu:~$ grep [34] a.txt
3
4
ubuntu#ubuntu:~$ sed '3,4d' a.txt > b.txt
ubuntu#ubuntu:~$ cat b.txt
1
2
5
6
ubuntu#ubuntu:~$
Related
I have a sample.hql file which contains below lines.
desc db.table1;
desc db.table2;
desc db.table3;
I am trying to run it from shell command
I want to find out if a particular column is present in the table or not
For eg-If col_1 is present in table1 the output should say col1_1 is found in db.table1
I am not sure how to find it.
I am executing below command
hive -f sample.hql | grep -q "<column_name>"
But I am not sure how to get the db and table name from each executing line.
You can make grep show you before -B and after -A. The below command would show you 10 lines before. This likely will get the job done quick and dirty.
hive -f sample.hql | grep -B 10 -q "<column_name>"
If you wanted to be a little more careful you might try a for loop instead that feeds the lines to hive one at a time. If it finds the column it will echo the table it found the column in. ( the '&&' only executes code if the previous command was successful)
#!/bin/bash
for i in $(cat sample.hql); do
hive -e "$i" | grep -q "<column_name>" && echo $i;
done
I have file1.txt as
data 1
data 2
data 3
data 4
and similarly i have another file2.txt as
record 1
record 2
record 3
record 4
In both the cases i have same number of records.
i can access each line from file1.txt by
for line in file1.txt
echo $line
I want to access the 1st records from both the files from single loop and so on.The reason i want first record from both the files in a single loop because i will generate html code based on both the data. If not in this way can anyone help the other way if possible.
Please Help.
% cat data
data 1
data 2
data 3
% cat record
record 1
record 2
record 3
% paste data record
data 1 record 1
data 2 record 2
data 3 record 3
%
The paste command might help you, if what you want to do is join the files line by line. It isn't part of bash, it's a tool that has been installed on almost every unix system since about 1979. :-)
$ cat record
ONE
TWO
THREE
$ cat data
one
two
three
$ paste record data
ONE one
TWO two
THREE three
Reading from multiple files at the same time is tricky in bash. Using multiple file handles is considered by some to be "advanced".
A bash script like this might be the way to go.
#!/usr/bin/env bash
exec 3< data
while read left; do
read right <&3
echo "$left /// $right" # or whatever you need to do
done < record
This opens the file data on file handle 3 (leaving stdin, file handle 0, for the file record), and steps through each file, reading from both.
Alternately, if you want text processing in a different flavour, you could use awk (which is not bash, but usually installed anywhere that bash is installed):
awk '{getline B < "data"; print $0 "\t" B;}' record > combined.txt
This will walk through each file line by line, opening both files and reading a line from each. It has the advantage of not taking a bunch of memory just to store your files.
Alternately, a higher performance solution would be to store one file in memory in an array, then process the other file line by line:
awk 'NR==FNR{a[NR]=$0;next;} {print $0 "\t" a[FNR];}' record data
In either case, replace the print function with whatever processing you need.
exec 3< file2
while read -r line1; do read -r line2 <&3; echo "$line1 $line2"; done < file1
Output:
data 1 record 1
data 2 record 2
data 3 record 3
data 4 record 4
Similar advice that others have given you, but I'd read from each file in the condition of a while loop:
while IFS= read -r -u3 data && IFS= read -r -u4 record; do
echo "$data => $record"
done 3< file1.txt 4< file2.txt
outputs
data 1 => record 1
data 2 => record 2
data 3 => record 3
data 4 => record 4
I have multiple (1086) files (.dat) and in each file I have 5 columns and 6384 lines.
I have a single file named "info.txt" which contains 2 columns and 6883 lines. First column gives the line numbers (to delete in .dat files) and 2nd column gives a number.
1 600
2 100
3 210
4 1200
etc...
I need to read in info.txt, find every-line number corresponding to values less than 300 in 2nd column (so it is 2 and 3 in above example). Then I need to read these values into sed-awk or grep and delete these #lines from each .dat file. (So I will delete every 2nd and 3rd row of dat files in the above example).
More general form of the question would be (I suppose):
How to read numbers as input from file, than assign them to the rows to be deleted from multiple files.
I am using bash but ksh help is also fine.
sed -i "$(awk '$2 < 300 { print $1 "d" }' info.txt)" *.dat
The Awk script creates a simple sed script to delete the selected lines; the script it run on all the *.dat files.
(If your sed lacks the -i option, you will need to write to a temporary file in a loop. On OSX and some *BSD you need -i "" with an empty argument.)
This might work for you (GNU sed):
sed -rn 's/^(\S+)\s*([1-9]|[1-9][0-9]|[12][0-9][0-9])$/\1d/p' info.txt |
sed -i -f - *.dat
This builds a script of the lines to delete from the info.txt file and then applies it to the .dat files.
N.B. the regexp is for numbers ranging from 1 to 299 as per OP request.
# create action list
cat info.txt | while read LineRef Index
do
if [ ${Index} -lt 300 ]
then
ActionReq="${ActionReq};${Index} b
"
fi
done
# apply action on files
for EachFile in ( YourListSelectionOf.dat )
do
sed -i -n -e "${ActionReq}
p" ${EachFile}
done
(not tested, no linux here). Limitation with sed about your request about line having the seconf value bigger than 300. A awk is more efficient in this operation.
I use sed in second loop to avoid reading/writing each file for every line to delete. I think that the second loop could be avoided with a list of file directly given to sed in place of file by file
This should create a new dat files with oldname_new.dat but I havent tested:
awk 'FNR==NR{if($2<300)a[$1]=$1;next}
!(FNR in a)
{print >FILENAME"_new.dat"}' info.txt *.dat
I am trying to join 2 csv files based on a key in unix.
My files are really huge 5GB each and sorting them is taking too long.
I want to repeat this procedure for 50 such joins.
Can someone tell me how to join without sorting and quickly.
Unfortunately there is no way around the sorting. But please take a look at some utility scripts I have written here: (https://github.com/stefan-schroedl/tabulator). You can use them if you keep the header of the column names as the first line in each file. There is a script 'tbljoin' that will take care of the sorting and column counting for you. For example, say you have
Employee.csv:
employee_id|employee_name|department_id
4|John|10
1|Monica|4
12|Louis|5
20|Peter|2
21|David|3
13|Barbara|6
Dept.csv:
department_id|department_name
1|HR
2|Manufacturing
3|Engineering
4|Marketing
5|Sales
6|Information technology
7|Security
Then the command tbljoin Employee.csv Dept.csv produces
employee_id|employee_name|department_id|department_name
20|Peter|2|Manufacturing
21|David|3|Engineering
1|Monica|4|Marketing
12|Louis|5|Sales
13|Barbara|6|Information technology.
tabulator contains many other useful features, e.g., for simple rearranging of columns.
Here is the example with two files having data delimited by pipe
Data from employee.csv with key employee_id, Name and department_id delimited by pipe.
Employee.csv
4|John | 10
1|Monica|4
12|Louis|5
20|Peter|2
21|David|3
13|Barbara|6
Department file with deparment_id and its name delimited by pipe.
Dept.csv
1|HR
2| Manufacturing
3| Engineering
4 |Marketing
5| Sales
6| Information technology
7| Security
command:
join -t “|” -1 3 -2 1 Employee_sort.csv Dept.csv
-t “| ” indicated files are delimited by pipe
-1 3 for third column of file 1 i.e deparment_id from Employee_sort.csv file
-2 1 for first column of file 2 i.e. deparment_id from Dept.csv file
By using above command, we get following output.
2|20|Peter| Manufacturing
3|21|David| Engineering
4|1|Monica| Marketing
5|12|Louis| Sales
6|13|Barbara| Information technology
If you want to get everything from file 2 and corresponding entries in file 1
You can also use -a and -v options
try following commands
join -t “|” -1 3 -2 1 -v2 Employee_sort.csv Dept.csv
join -t “|” -1 3 -2 1 -a2 Employee_sort.csv Dept.csv
I think that you could avoid using join (and thus sorting your file), but this is not a quick solution :
In both files, replace all pipes and all double-spaces with spaces :
sed -i 's/|/ /g;s/ / /g' Employee.csv Dept.csv
run these code lines as a bash script :
cat Employee.csv | while read a b c
do
cat Dept.csv | while read d e
do
if [ "$c" -eq "$d" ] ; then
echo -e "$a\t$b\t$c\t$e"
fi
done
done
Note that looping takes a long time
I've got a multi line text file that I want to use to create an SQL statement that requires a UUID. I'm attempting to come up with a way to generate the SQL using sed or some other shell command utility.
Example input:
A
B
C
Example Output:
insert into table values ('7CC92727-D743-45E0-BE57-9FEB2B73BD18','A');
insert into table values ('94E03071-F457-48DD-86E2-AF26309A192D','B');
insert into table values ('1F17525C-9215-4FC4-A608-4FA43C0B2B14','C');
I can use the uuidgen command to generate new UUID's but so far haven't found a way to use that command with sed.
Update:
Thanks to the answers I was able to come up with a sed command that worked for me on cygwin.
Expressed without quoting the SQL values for didatic purposes:
sed 's/.*/echo "insert into table values (`uuidgen | tr -d \r`,&)"/e' file.txt
With the quotes:
sed 's/.*/echo "insert into table values '\''(`uuidgen | tr -d \r`'\'','\''&'\'')"/e' file.txt
A Very Readable Bash Solution Using a While-Loop
You can read the file into a Bash loop using the default REPLY variable. For example:
while read; do
echo "insert into table values ('$(uuidgen)','$REPLY');"
done < /tmp/foo
A Less-Readable Sed Solution
sed -e 's/.*/echo "insert into table values (\\"$(uuidgen)\\",\\"&\\");"/e' \
-e "s/\"/'/g" /tmp/foo
The while-loop is significantly more readable than the sed alternative, because of the necessity to escape quotes in your replacement string. The sed solution is also rather brittle because of the fact that it is evaluating the contents of your file inside a sed expression, which may cause errors when certain meta-characters are present. And finally, this particular sed solution relies on the /e flag, which is a GNU sed extension that may not be available on your platform.
The GNU sed manual describes the flag as follows:
This command allows one to pipe input from a shell command into pattern space. If a substitution was made, the command that is found in pattern space is executed and pattern space is replaced with its output. A trailing newline is suppressed; results are undefined if the command to be executed contains a nul character. This is a GNU sed extension.
Testing
Both scripts were tested against /tmp/foo, which contained the following fixture data:
A
B
C
Bash Sample Output:
insert into table values ('fe0ca930-456b-4265-810c-219eb93c4c73','A');
insert into table values ('34b088eb-3dc0-46fa-85ca-efaf3f0c0f4b','B');
insert into table values ('5d271207-99fe-4ca2-8420-3b8ca774e99b','C');
GNU sed Sample Output:
insert into table values ('4c924b78-dc70-441d-928e-638fec9f3ea1','A');
insert into table values ('29f424d4-6e33-4646-a773-cd0e96ebb874','B');
insert into table values ('39534c05-6853-4390-a6b6-4a19fad296b1','C');
Conclusion
The Bash solution seems clearer and more robust than the sed solution. However, both solutions clearly work on the fixture data provided in the original question, so you should pick whichever one works best for you on the real data.
sed 's/.*/echo "`uuidgen`,&"/e' input |
sed -r 's/(.*),(.*)/insert into table values("\1","\2");/' |
tr '"' "'"
insert into table values('c609f5ab-28ce-4853-bd67-7b6b4ca13ee3','A');
insert into table values('01ae6480-1b52-49a8-99a3-f2bba7ec3064','B');
insert into table values('a41122e8-5e4f-4acc-b62a-bc4ad629677e','C');
This might work for you:
echo -e 'a\nb\nc' |
sed 's/.*/uuidgen;echo "&"/' |
sh |
sed 'N;s/^\(.*\)\n\(.*\)/insert into table values('\''\1'\'',\2);/'
insert into table values('c9939dfe-5a74-465a-b538-66aeba774b6b',a);
insert into table values('da6684f2-3426-4561-b41d-7b507d2d79ee',b);
insert into table values('23c72ef5-2a50-4a09-b964-83eea3c54e83',c);
or using GNU sed:
echo -e 'a\nb\nc' |
sed 'h;s/.*/uuidgen/e;G;s/^\(.*\)\n\(.*\)/insert into table values('\''\1'\'',\2);/'
insert into table values('ac1a130c-50a3-43ce-8d41-987ca0d942b7',a);
insert into table values('59426b2f-cf03-4233-bcc2-3ce05c47bece',b);
insert into table values('fdec75d6-313e-48c4-adfb-335721a0f6e7',c);