I am trying to read data(multiline of key:value pair) from file, which I have written line by line to file, In Jenkinfile
However when I tried to do each line it is read char by char
Example:
echo "1234:34" >> dataList.txt
echo "2341:43" >> dataList.txt
echo "3412:54" >> dataList.txt
echo "4123:38" >> dataList.txt
When I tried to read line by line using commands
def buildData = readFile(file: "dataList.txt")
println buildData
buildData.each { line ->
println line
//def (oldBuildNumber, oldJobId) =line.tokenize(':')
//println oldBuildNumber oldJobId
}
}
displaying as
1
2
3
4
:
3
4
2
3
4
1
:
4
3
...
Any input on this will be very useful.
From the readFile documentation:
readFile: Read file from workspace
Reads a file from a relative path (with root in current directory, usually workspace) and returns its content as a plain string.
This means the the returned value, buildData in your case, is actually a string, and therefore when you iterate over it using the each you are actually iterating over the characters (as a characters array) and that is why you see each character being printed for each iteration.
What you actually want is to iterate over the lines, for that you can split the string using the new line separator (\n) which will give you a list of all lines which you can then iterate over.
Something like the following:
def buildData = readFile(file: "dataList.txt")
println buildData
// split the content into lines and go over each line
buildData.split("\n").each { line ->
println line
}
// or by using the default iterator parameter - it
buildData.split("\n").each {
println it
}
Related
I would like to compare two of my log files generated before and after an implementation to see if it has impacted anything. However, the order of the logs I get is not the same all the time. Since, the log file also has multiple indented lines, when I tried to sort, everything is sorted. But, I would like to keep the child intact with the parent. Indented lines are spaces and not tab.
Any help would be greatly appreciated. I am fine with any windows solution or Linux one.
Eg of the file:
#This is a sample code
Parent1 to be verified
Child1 to be verified
Child2 to be verified
Child21 to be verified
Child23 to be verified
Child22 to be verified
Child221 to be verified
Child4 to be verified
Child5 to be verified
Child53 to be verified
Child52 to be verified
Child522 to be verified
Child521 to be verified
Child3 to be verified
I am posting another answer here to sort it hierarchically, using python.
The idea is to attach the parents to the children to make sure that the children under the same parent are sorted together.
See the python script below:
"""Attach parent to children in an indentation-structured text"""
from typing import Tuple, List
import sys
# A unique separator to separate the parent and child in each line
SEPARATOR = '#'
# The indentation
INDENT = ' '
def parse_line(line: str) -> Tuple[int, str]:
"""Parse a line into indentation level and its content
with indentation stripped
Args:
line (str): One of the lines from the input file, with newline ending
Returns:
Tuple[int, str]: The indentation level and the content with
indentation stripped.
Raises:
ValueError: If the line is incorrectly indented.
"""
# strip the leading white spaces
lstripped_line = line.lstrip()
# get the indentation
indent = line[:-len(lstripped_line)]
# Let's check if the indentation is correct
# meaning it should be N * INDENT
n = len(indent) // len(INDENT)
if INDENT * n != indent:
raise ValueError(f"Wrong indentation of line: {line}")
return n, lstripped_line.rstrip('\r\n')
def format_text(txtfile: str) -> List[str]:
"""Format the text file by attaching the parent to it children
Args:
txtfile (str): The text file
Returns:
List[str]: A list of formatted lines
"""
formatted = []
par_indent = par_line = None
with open(txtfile) as ftxt:
for line in ftxt:
# get the indentation level and line without indentation
indent, line_noindent = parse_line(line)
# level 1 parents
if indent == 0:
par_indent = indent
par_line = line_noindent
formatted.append(line_noindent)
# children
elif indent > par_indent:
formatted.append(par_line +
SEPARATOR * (indent - par_indent) +
line_noindent)
par_indent = indent
par_line = par_line + SEPARATOR + line_noindent
# siblings or dedentation
else:
# We just need first `indent` parts of parent line as our prefix
prefix = SEPARATOR.join(par_line.split(SEPARATOR)[:indent])
formatted.append(prefix + SEPARATOR + line_noindent)
par_indent = indent
par_line = prefix + SEPARATOR + line_noindent
return formatted
def sort_and_revert(lines: List[str]):
"""Sort the formatted lines and revert the leading parents
into indentations
Args:
lines (List[str]): list of formatted lines
Prints:
The sorted and reverted lines
"""
sorted_lines = sorted(lines)
for line in sorted_lines:
if SEPARATOR not in line:
print(line)
else:
leading, _, orig_line = line.rpartition(SEPARATOR)
print(INDENT * (leading.count(SEPARATOR) + 1) + orig_line)
def main():
"""Main entry"""
if len(sys.argv) < 2:
print(f"Usage: {sys.argv[0]} <file>")
sys.exit(1)
formatted = format_text(sys.argv[1])
sort_and_revert(formatted)
if __name__ == "__main__":
main()
Let's save it as format.py, and we have a test file, say test.txt:
parent2
child2-1
child2-1-1
child2-2
parent1
child1-2
child1-2-2
child1-2-1
child1-1
Let's test it:
$ python format.py test.txt
parent1
child1-1
child1-2
child1-2-1
child1-2-2
parent2
child2-1
child2-1-1
child2-2
If you wonder how the format_text function formats the text, here is the intermediate results, which also explains why we could make file sorted as we wanted:
parent2
parent2#child2-1
parent2#child2-1#child2-1-1
parent2#child2-2
parent1
parent1#child1-2
parent1#child1-2#child1-2-2
parent1#child1-2#child1-2-1
parent1#child1-1
You may see that each child has its parents attached, all the way along to the root. So that the children under the same parent are sorted together.
Short answer (Linux solution):
sed ':a;N;$!ba;s/\n /#/g' test.txt | sort | sed ':a;N;$!ba;s/#/\n /g'
Test it out:
test.txt
parent2
child2-1
child2-1-1
child2-2
parent1
child1-1
child1-2
child1-2-1
$ sed ':a;N;$!ba;s/\n /#/g' test.txt | sort | sed ':a;N;$!ba;s/#/\n /g'
parent1
child1-1
child1-2
child1-2-1
parent2
child2-1
child2-1-1
child2-2
Explanation:
The idea is to replace the newline followed by an indentation/space with a non newline character, which has to be unique in your file (here I used # for example, if it is not unique in your file, use other characters or even a string), because we need to turn it back the newline and indentation/space later.
About sed command:
:a create a label 'a'
N append the next line to the pattern space
$! if not the last line, ba branch (go to) label 'a'
s substitute, /\n / regex for newline followed by a space
/#/ a unique character to replace the newline and space
if it is not unique in your file, use other characters or even a string
/g global match (as many times as it can)
I have a TCL script that say, has 30 lines of automation code which I am executing in the dc shell (Synopsys Design Compiler). I want to stop and exit the script at line 10, exit the dc shell and bring it back up again after performing a manual review. However, this time, I want to run the script starting from line number 11, without having to execute the first 10 lines.
Instead of having two scripts, one which contains code till line number 10 and the other having the rest, I would like to make use of only one script and try to execute it from, let's say, line number N.
Something like:
source a.tcl -line 11
How can I do this?
If you have Tcl 8.6+ and if you consider re-modelling your script on top of a Tcl coroutine, you can realise this continuation behaviour in a few lines. This assumes that you run the script from an interactive Tcl shell (dc shell?).
# script.tcl
if {[info procs allSteps] eq ""} {
# We are not re-entering (continuing), so start all over.
proc allSteps {args} {
yield; # do not run when defining the coroutine;
puts 1
puts 2
puts 3
yield; # step out, once first sequence of steps (1-10) has been executed
puts 4
puts 5
puts 6
rename allSteps ""; # self-clean, once the remainder of steps (11-N) have run
}
coroutine nextSteps allSteps
}
nextSteps; # run coroutine
Pack your script into a proc body (allSteps).
Within the proc body: Place a yield to indicate the hold/ continuation point after your first steps (e.g., after the 10th step).
Create a coroutine nextSteps based on allSteps.
Protect the proc and coroutine definitions in a way that they do not cause a re-definition (when steps are pending)
Then, start your interactive shell and run source script.tcl:
% source script.tcl
1
2
3
Now, perform your manual review. Then, continue from within the same shell:
% source script.tcl
4
5
6
Note that you can run the overall 2-phased sequence any number of times (because of the self-cleanup of the coroutine proc: rename):
% source script.tcl
1
2
3
% source script.tcl
4
5
6
Again: All this assumes that you do not exit from the shell, and maintain your shell while performing your review. If you need to exit from the shell, for whatever reason (or you cannot run Tcl 8.6+), then Donal's suggestion is the way to go.
Update
If applicable in your case, you may improve the implementation by using an anonymous (lambda) proc. This simplifies the lifecycle management (avoiding re-definition, managing coroutine and proc, no need for a rename):
# script.tcl
if {[info commands nextSteps] eq ""} {
# We are not re-entering (continuing), so start all over.
coroutine nextSteps apply {args {
yield; # do not run when defining the coroutine;
puts 1
puts 2
puts 3
yield; # step out, once first sequence of steps (1-10) has been executed
puts 4
puts 5
puts 6
}}
}
nextSteps
The simplest way is to open the text file, parse it to get the first N commands (info complete is useful there), and then evaluate those (or the rest of the script). Doing this efficiently produces slightly different code when you're dropping the tail as opposed to when you're dropping the prefix.
proc ReadAllLines {filename} {
set f [open $filename]
set lines {}
# A little bit careful in case you're working with very large scripts
while {[gets $f line] >= 0} {
lappend lines $line
}
close $f
return $lines
}
proc SourceFirstN {filename n} {
set lines [ReadAllLines $filename]
set i 0
set script {}
foreach line $lines {
append script $line "\n"
if {[info complete $script] && [incr i] >= $n} {
break
}
}
info script $filename
unset lines
uplevel 1 $script
}
proc SourceTailN {filename n} {
set lines [ReadAllLines $filename]
set i 0
set script {}
for {set j 0} {$j < [llength $lines]} {incr j} {
set line [lindex $lines $j]
append script $line "\n"
if {[info complete $script]} {
if {[incr i] >= $n} {
info script $filename
set realScript [join [lrange $lines [incr j] end] "\n"]
unset lines script
return [uplevel 1 $realScript]
}
# Dump the prefix we don't need any more
set script {}
}
}
# If we get here, the script had fewer than n lines so there's nothing to do
}
Be aware that the kinds of files you're dealing with can get pretty large, and Tcl currently has some hard memory limits. On the other hand, if you can source the file at all, you're already within that limit…
I'm parsing a CSV file that has a break line in double quoted fields. I'm reading the file line by line with a groovy script but I get an ArrayIndexOutBoundException when I tried to get access the missing tokens.
I was trying to pre-process the file to remove those characters and I was thinking to do that with some bash script or with groovy itself.
Could you, please suggest any approach that I can use to resolve the problem?
This is how the CSV looks like:
header1,header2,header3,header4
timestamp, "abcdefghi", "abcdefghi","sdsd"
timestamp, "zxcvb
fffffgfg","asdasdasadsd","sdsdsd"
This is the groovy script I'm using
def csv = new File(args[0]).text
def bufferString = ""
def parsedFile = new File("Parsed_" + args[0]);
csv.eachLine { line, lineNumber ->
def splittedLine = line.split(',');
retString += new Date(splittedLine[0]) + ",${splittedLine[1]},${splittedLine[2]},${splittedLine[3]}\n";
if(lineNumber % 1000 == 0){
parsedFile.append(retString);
retString = "";
}
}
parsedFile.append(retString);
UPDATE:
Finally I did this and it works, (I needed format the first column from timestamp to a human readable date):
gawk -F',' '{print strftime("%Y-%m-%d %H:%M:%S", substr( $1, 0, length($1)-3 ) )","($2)","($3)","($4)}' TobeParsed.csv > Parsed.csv
Thank you #karakfa
If you use a proper CSV parser rather than trying to do it with split (which as you can see doesn't work with any form of quoting), then it works fine:
#Grab('com.xlson.groovycsv:groovycsv:1.1')
import static com.xlson.groovycsv.CsvParser.parseCsv
def csv = '''header1,header2,header3,header4
timestamp, "abcdefghi", "abcdefghi","sdsd"
timestamp, "zxcvb
fffffgfg","asdasdasadsd","sdsdsd"'''
def data = parseCsv(csv)
data.eachWithIndex { line, index ->
println """Line $index:
| 1:$line.header1
| 2:$line.header2
| 3:$line.header3
| 4:$line.header4""".stripMargin()
}
Which prints:
Line 0:
1:timestamp
2:abcdefghi
3:abcdefghi
4:sdsd
Line 1:
1:timestamp
2:zxcvb
fffffgfg
3:asdasdasadsd
4:sdsdsd
awk to the rescue!
this will merge the newline split fields together, you process can take it from there
$ awk -F'"' '!(NF%2){getline remainder;$0=$0 OFS remainder}1' splitted.csv
header1,header2,header3
xxxxxx, "abcdefghi", "abcdefghi"
yyyyyy, "zxcvb fffffgfg","asdasdasadsd"
assumes that odd number of quotes mean split field and replace new line with OFS. If you want to simple delete new line (the split parts will combine) remove OFS.
I am using printf command to log some values in a file as follows:
printf "Parameter = $parameter v9_value = $v9_val v9_line = $V9_Line_Count v16_val = $v16_val v16_line = $V16_Line_Count"
But the output I am getting as follows:
v16_line = 8elayServerPort v9_value = 41 v9_line = 8 v16_val = 4571
Seems like the line is printed in rotation manner, and last values are coming from starting.
Expected Output:
Parameter = RelayServerPort v9_value = 41 v9_line = 8 v16_val = 4571 v16_line = 8
But v16_line = 8 is overwritten on Parameter = R in line.
printf doesn't add a NL on the end. You need to add \n to the end of your printf.
Not seeing the rest of your program, or where you get your variable values, it's hard to say what else could be the issue.
One thing you can do is to redirect your output to a file and look at that file either through a good program editor or using cat -v which disables control characters.
See if you see ^M in your output. If you do, it could be that you have ^R in your variables.
Also remove $v16_val from your printf (temporarily) and see if your output looks better. If so, that $v16_val might have a CR (^M) in it.
I seem to see double-spaced output when parsing/dumping a simple YAML file with a pipe-text field.
The test is:
public void yamlTest()
{
DumperOptions printOptions = new DumperOptions();
printOptions.setLineBreak(DumperOptions.LineBreak.UNIX);
Yaml y = new Yaml(printOptions);
String input = "foo: |\n" +
" line 1\n" +
" line 2\n";
Object parsedObject = y.load(new StringReader(input));
String output = y.dump(parsedObject);
System.out.println(output);
}
and the output is:
{foo: 'line 1
line 2
'}
Note the extra space between line 1 and line 2, and after line 2 before the end of the string.
This test was run on Mac OS X 10.6, java version "1.6.0_29".
Thanks!
Mark
In the original string you use literal style - it is indicating by the '|' character. When you dump your text, you use single-quoted style which ignores the '\n' characters at the end. That is why they are repeated with the empty lines.
Try to set different styles in DumperOptions:
// and others - FOLDED, DOUBLE_QUOTED
DumperOptions.setDefaultScalarStyle(ScalarStyle.LITERAL)