I have a text file with more than 100k rows. Below mentioned data is a sample for the text file I have. I want to use some conditions on this data and delete some rows. The text file does not have headers (ID,NAME,Code-1,code,2-code-3). I mentioned for reference. How can I achieve this with shell scripting?
Input test file:
| ID | NAME | Code-1 | code-2 | code-3 |
| $$ | 5HF | 1E | N | Y |
| $$ | 2MU | 3C | N | Y |
| $$ | 32E | 3C | N | N |
| AB | 3CH | 3C | N | N |
| MK | A1M | AS | P | N |
| $$ | Y01 | 01 | F | Y |
| $$ | BG0 | 0G | F | N |
Conditions:
if code-2 = 'N' and code-1 not equal to ( '3C' , '3B' , '32' , '31' , '3D' ) then ID='$$'
if code-2 ='N' and code-1 equal to ( '3C' , '3B' , '32' , '31' , '3D') then accept any ID and (accept ID='$$' only if code-3='Y')'
if code-2 != 'N' then accept (ID='$$' only if code-3='Y') and all other IDs
Output:
| ID | NAME | Code-1 | code-2 | code-3 |
| $$ | 5HF | 1E | N | Y |
| $$ | 2MU | 3C | N | Y |
| AB | 3CH | 3C | N | N |
| MK | A1M | AS | P | N |
| $$ | Y01 | 01 | F | Y |
It's encouraged you demonstrate own efforts when ask questions. But I do understand this question could be complicated if you are new to Bash. Here is my solution using awk. Spent 0.545s processed 137k lines on my computer (with moderate specs).
awk '{
ID=$2; NAME=$4; CODE1=$6; CODE2=$8; CODE3=$10;
if (CODE2 == "N") {
if (CODE1 ~ /(3C|3B|32|31|3D)/) {
if (ID == "$$") {
if (CODE3 == "Y") {
print;
}
}
else {
print;
}
}
else {
if (ID == "$$") {
print;
}
}
}
else {
if (ID == "$$") {
if (CODE3 == "Y") {
print;
}
}
else {
print;
}
}}' file
Note it has certain restrictions:
a) It delimits values by spaces not |. It will work with your exact input format, but won't work with input rows without additional spaces, e.g.
|$$|32E|3C|N|N|
|AB|3CH|3C|N|N|
b) For the same reason, the command will generate incorrect result, if col value has extra spaces, e.g.
| $$ | 32E FOO | 3C | N | N |
| AB | 3CH BBT | 3C | N | N |
I need to draw a square grid of size n to the console. The grid uses - for horizontal cell boundaries, | for vertical cell boundaries, and + for corners of each cell.
As an example, a size 3 grid should look like the following:
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
I was thinking of using a double for loop with outer loop iterating through rows and inner loop iterating through cols. each iteration of inner loop would be processing an individual cell. drawing | characters doesn't seem to hard but I'm not sure how I'd go about printing the - chars above and below a cell.
You could use Integer#times and String#*:
def print_grid(n)
n.times { print "+-"*n, "+\n", "| "*n, "|\n" }
print "+-"*n, "+\n"
end
print_grid(3)
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
=> nil
Alternatively:
def print_grid(n)
puts n.times.map{ "+-"*n + "+\n" + "| "*n + "|\n" }.join + "+-"*n + "+\n"
end
For a width of 3, you have a separator-row like this:
+-+-+-+
This can be sees as either:
3 - connected by and surrounded by +
4 + connected by -
The latter is a bit easier to express in Ruby:
width = 3
Array.new(width + 1, '+').join('-')
#=> "+-+-+-+"
The same works for the cell-row:
Array.new(width + 1, '|').join(' ')
#=> "| | | |"
Vertically, you have 3 cell-rows connected by and surrounded by separator-rows. (that should ring a bell) Just like before, this can also be expressed as 4 separator-rows connected by cell-rows.
Let's store our separator-row and cell-row in variables: (we also have to append newlines)
width = 3
separator_row = Array.new(width + 1, '+').join('-') << "\n"
cell_row = Array.new(width + 1, '|').join(' ') << "\n"
And define the grid:
height = 3
grid = Array.new(height + 1, separator_row).join(cell_row)
#=> "+-+-+-+\n| | | |\n+-+-+-+\n| | | |\n+-+-+-+\n| | | |\n+-+-+-+\n"
put grid
Output:
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
Here are two ways to do that.
Determine each character depending on whether the row and column indices are even or odd
def grid(n)
sz = 2*n+1
Array.new(sz) do |i|
Array.new(sz) do |j|
if i.even?
j.even? ? '+' : '-'
else # i is odd
j.even? ? '|' : ' '
end
end.join
end
end
puts grid(3)
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
puts grid(4)
+-+-+-+-+
| | | | |
+-+-+-+-+
| | | | |
+-+-+-+-+
| | | | |
+-+-+-+-+
| | | | |
+-+-+-+-+
Use an enumerator
def pr_grid(n)
enum = [*['+','-']*n, "+", "\n", *['|',' ']*n, "|", "\n"].cycle
((2+2*n)*(1+2*n)).times { print enum.next }
end
pr_grid(3)
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
pr_grid(4)
+-+-+-+-+
| | | | |
+-+-+-+-+
| | | | |
+-+-+-+-+
| | | | |
+-+-+-+-+
| | | | |
+-+-+-+-+
For n = 3 the steps are as follows.
a = [*['+','-']*n, "+", "\n", *['|',' ']*n, "|", "\n"]
#=> ["+", "-", "+", "-", "+", "-", "+", "\n",
# "|", " ", "|", " ", "|", " ", "|", "\n"]
enum = a.cycle
#=> #<Enumerator: ["+", "-", "+", "-", "+", "-", "+", "\n",
# "|", " ", "|", " ", "|", " ", "|", "\n"]:cycle
enum.next #=> "+"
enum.next #=> "-"
enum.next #=> "+"
enum.next #=> "-"
enum.next #=> "+"
enum.next #=> "-"
enum.next #=> "+"
enum.next #=> "\n"
enum.next #=> "|"
enum.next #=> " "
and so on.
Just for fun and just as an example, here is a list of methods (to be optimized and debugged and maybe used within a class) that you can use also for filling a table.
#cell = '|'
#line = '-'
#cross = '+'
def build_row(content)
(content.zip [#cell]* content.size).flatten.prepend(#cell).join
end
def separator_from_content(content)
content.map { |e| #line * e.size + #cross }.prepend(#cross).join
end
def adjust_content_in(lines)
width = lines.flatten.max_by(&:size).size
lines.map { |line| line.map { |e| e << ' ' * (width - e.size) } }
end
def build_table(lines)
lines = adjust_content_in(lines)
separator = separator_from_content(lines.first)
mapped = lines.map { |line| build_row(line) }.zip([separator] * (lines.size) )
return mapped.flatten.prepend(separator).join("\n")
end
def empty_squared_table(n)
lines = n.times.map { n.times.map { ' ' } }
build_table(lines)
end
So, if you want to draw an empty table, can call:
n = 3
puts empty_squared_table(n)
# +-+-+-+
# | | | |
# +-+-+-+
# | | | |
# +-+-+-+
# | | | |
# +-+-+-+
Or you can fill in some text (more lines inside a cell not supported):
lines = [['so', 'you can', 'fill'], ['a table', 'given the content', '']]
puts build_table(lines)
# +-----------------+-----------------+-----------------+
# |so |you can |fill |
# +-----------------+-----------------+-----------------+
# |a table |given the content| |
# +-----------------+-----------------+-----------------+
Yesterday, I started making a chess program and, trying to save 60-odd lines, decided to try my hand at batch variable assignment. That is, assigning variables through loops. I can't seem to join spot_def_letters[i] with o_s in the correct fashion, though. ($board is scoped for later use)
My code:
spot_def_letters = ["a", "b", "c", "d", "e", "f", "g", "h"]
8.times do
i = 0
o = 1
8.times do
o_s = o.to_s
spot_def_letters[i] + o_s = " "
o += 1
end
end
$board = """
| | | | | | |
#{a8}|#{b8}|#{c8}|#{d8}|#{e8}|#{f8}|#{g8}|#{h8}
| | | | | | |
#{a7}|#{b7}|#{c7}|#{d7}|#{e7}|#{f7}|#{g7}|#{h7}
| | | | | | |
#{a6}|#{b6}|#{c6}|#{d6}|#{e6}|#{f6}|#{g6}|#{h6}
| | | | | | |
#{a5}|#{b5}|#{c5}|#{d5}|#{e5}|#{f5}|#{g5}|#{h5}
| | | | | | |
#{a4}|#{b4}|#{c4}|#{d4}|#{e4}|#{f4}|#{g4}|#{h4}
| | | | | | |
#{a3}|#{b3}|#{c3}|#{d3}|#{e3}|#{f3}|#{g3}|#{h3}
| | | | | | |
#{a2}|#{b2}|#{c2}|#{d2}|#{e2}|#{f2}|#{g2}|#{h2}
| | | | | | |
#{a1}|#{b1}|#{c1}|#{d1}|#{e1}|#{f1}|#{g1}|#{h1}
"""
instead of indexing into string arrays and using a bunch of sneaky, bad-looking code, I decided to use a 64-value array (spots = Array.new(64, " ")) and puts those values.
I have a table I am generating in sphinx for comparing constructs in different languages. I would like to have the cells contain code blocks in each language and have it come out looking like code (at least in a monospaced font). What I have so far is:
+-----------------------------+------------------------+
| Haskell | Scala |
+=============================+========================+
| | do var1<- expn1 | | for {var1 <- expn1; |
| | var2 <- expn2 | | var2 <- expn2; |
| | expn3 | | result <- expn3 |
| | | } yield result |
+-----------------------------+------------------------+
| | do var1 <- expn1 | | for {var1 <- expn1; |
| | var2 <- expn2 | | var2 <- expn2; |
| | return expn3 | | } yield expn3 |
+-----------------------------+------------------------+
| | do var1 <- expn1 >> expn2 | | for {_ <- expn1; |
| | return expn3 | | var1 <- expn2 |
| | | } yield expn3 |
+-----------------------------+------------------------+
This, at least preserves line breaks but it comes out in the same font as the rest of the document which is a little annoying.
Is there any way to convert the cells to some better format?
Did you try using the .. code-block:: directive?
This works fine on my PC using Sphinx 1.4.1:
+----------------------------------+----------------------------------+
| Tweedledee | Tweedledum |
+----------------------------------+----------------------------------+
| .. code-block:: c | .. code-block:: c |
| :caption: foo.c | :caption: bar.c |
| | |
| extern int bar(int y); | extern int foo(int x); |
| int foo(int x) | int bar(int y) |
| { | { |
| return x > 0 ? bar(x-1)+1 | return y > 0 ? foo(x-1)*2 |
| : 0; | : 0; |
| } | } |
+----------------------------------+----------------------------------+
Here's what I'm trying to do. I'm iterating through an array of strings, each of which may contain [] multiple times. I want to use String#match as many times as necessary to process every occurrence. So I have an Array#each block, and nested within that is an infinite loop that should break only when I run out of matches for a given string.
def follow(c, dir)
case c
when "\\"
dir.rotate!
when "/"
dir.rotate!.map! {|x| -x}
end
return dir
end
def parse(ar)
ar.each_with_index { |s, i|
idx = 0
while true do
t = s.match(/\[[ ]*([a-zA-Z]+)[ ]*\]/) { |m|
idx = m.end 0
r = m[1]
s[m.begin(0)...idx] = " "*m[0].size
a = [i, idx]
dir = [0, 1]
c = ar[a[0]][a[1]]
while !c.nil? do
dir = follow c, dir
ar[a[0]][a[1]] = " "
a[0] += dir[0]; a[1] += dir[1]
c = ar[a[0]][a[1]]
if c == ">" then
ar[a[0]][a[1]+1] = r; c=nil
elsif c == "<" then
ar[a[0]][a[1]-r.size] = r; c=nil
end
end
ar[a[0]][a[1]] = " "
puts ar
}
if t == nil then break; end
end
}
parse File.new("test", "r").to_a
Contents of test:
+--------+----------+-------------+
| Colors | Foods | Countries |
+--------+----------+-------------+
| red | pizza | Switzerland |
/--> /----> | |
| |[kale]/ | hot dogs | Brazil |
| | <----------------------\ |
| | orange |[yellow]\ | [green]/ |
| +--------+--------|-+-------------+
\-------------------/
Goal:
+--------+----------+-------------+
| Colors | Foods | Countries |
+--------+----------+-------------+
| red | pizza | Switzerland |
yellow kale | |
| | hot dogs | Brazil |
| green |
| orange | | |
+--------+-------- -+-------------+
Actual output of program:
+--------+----------+-------------+
| Colors | Foods | Countries |
+--------+----------+-------------+
| red | pizza | Switzerland |
yellow kale | |
| | hot dogs | Brazil |
| <----------------------\ |
| orange | | [green]/ |
+--------+-------- -+-------------+
(Since the array is modified in place, I figure there is no need to update the match index.) The inner loop should run again for each pair of [] I find, but instead I only get one turn per array entry. (It seems that the t=match... bit isn't the problem, but I could be wrong.) How can I fix this?