Ghostscript Error limitcheck when convert EPS to PNG - ghostscript

I need to use ghostscript to programmatically convert EPS files to PNG.
The ghostscript command is:
gs -dBATCH -dNOPAUSE -dSAFER -dEPSCrop -r300 -sDEVICE=pngalpha -sOutputFile=tmp.png input.eps
The problem is: some of the EPS files cause error:
Error: /limitcheck in --shfill--
Operand stack:
--dict:6/6(L)-- --nostringval--
Execution stack:
%interp_exit .runexec2 --nostringval-- shfill --nostringval-- 2 %stopped_push --nostringval-- shfill shfill false 1 %stopped_push 1974 1 3 %oparray_pop 1973 1 3 %oparray_pop 1972 1 3 %oparray_pop shfill 1961 1 3 %oparray_pop 1817 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- shfill --nostringval-- 2 %stopped_push --nostringval-- shfill 1940 1 10 %oparray_pop shfill false 1 %stopped_push
Dictionary stack:
--dict:730/1123(ro)(G)-- --dict:1/20(G)-- --dict:92/200(L)-- --dict:57/75(L)-- --dict:211/313(L)-- --dict:72/140(L)-- --dict:0/10(G)-- --dict:0/10(L)-- --dict:0/50(ro)(G)-- --dict:57/71(L)--
Current allocation mode is local
Last OS error: No such file or directory
Current file position is 4781076
GPL Ghostscript 9.53.3: Unrecoverable error, exit code 1
From what I gathered, it looks like the vector file is too complex that it hits some ghostscript's limits. Is there some parameters I can change that can lift such limits? Or the only way is to edit the file to be less complex, or manually export from Illustrator?

Related

how to fix query-tool: Query failed ERROR: syntax error at or near while inserting data

I have this command to populate created tables with geographic data
COPY public.adonis_schema (id, name, batch, migration_time) FROM stdin;
1 database/migrations/1607548129188_users 1 2021-04-02 14:14:27.470863+00
2 database/migrations/1607548416832_conversations 1 2021-04-02 14:14:28.070888+00
3 database/migrations/1607548444586_participations 1 2021-04-02 14:14:28.480253+00
4 database/migrations/1607548494088_messages 1 2021-04-02 14:14:29.050234+00
5 database/migrations/1609020554140_vcard_shares 1 2021-04-02 14:14:29.520909+00
6 database/migrations/1609024583459_add_conversation_names 1 2021-04-02 14:14:29.905367+00
7 database/migrations/1609467289494_meetings 1 2021-04-02 14:14:30.300248+00
8 database/migrations/1609467351706_notes 1 2021-04-02 14:14:30.960852+00
9 database/migrations/1609976010374_meetings_lengths 1 2021-04-02 14:14:31.640233+00
10 database/migrations/1610498049695_conversations_event_ids 1 2021-04-02 14:14:32.020247+00
11 database/migrations/1611099138751_cache_users 1 2021-04-02 14:14:32.405294+00
12 database/migrations/1616628109445_conversations_ownerships 1 2021-04-02 14:14:32.800258+00
13 database/migrations/1617362496376_conversations_types 1 2021-04-02 14:14:33.185207+00
14 database/migrations/1617805023298_conversations_timestamps 2 2021-04-07 14:18:42.957427+00
47 database/migrations/1622085675952_user_is_busies 3 2021-05-27 03:22:31.964783+00
\.
I have as indicated put the entire code in a sql file and executed it with psql but I have an error:
ERROR: ERREUR: erreur de syntaxe sur ou près de « 1 »
LINE 2: 1 database/migrations/1607548129188_users 1 2021-04-02 14:14...
^
SQL state: 42601
Character: 73
Do you have an idea please?

pandas: time difference in groupby

How to calculate time difference for each id between current row and next for
dataset below:
time id
2012-03-16 23:50:00 1
2012-03-16 23:56:00 1
2012-03-17 00:08:00 1
2012-03-17 00:10:00 2
2012-03-17 00:12:00 2
2012-03-17 00:20:00 2
2012-03-20 00:43:00 3
and get next result:
time id tdiff
2012-03-16 23:50:00 1 6
2012-03-16 23:56:00 1 12
2012-03-17 00:08:00 1 NA
2012-03-17 00:10:00 2 2
2012-03-17 00:12:00 2 8
2012-03-17 00:20:00 2 NA
2012-03-20 00:43:00 3 NA
I see that you need result in minutes by id. Here is how to do it :
use diff() in groupby :
# first convert to datetime with the right format
data['time']=pd.to_datetime(data.time, format='%Y-%m-%d %H:%M:%S')
data['tdiff']=(data.groupby('id').diff().time.values/60000000000).astype(int)
data['tdiff'][data['tdiff'] < 0] = np.nan
print(data)
output
time id tdiff
0 2012-03-16 23:50:00 1 NaN
1 2012-03-16 23:56:00 1 6.0
2 2012-03-17 00:08:00 1 12.0
3 2012-03-17 00:10:00 2 NaN
4 2012-03-17 00:12:00 2 2.0
5 2012-03-17 00:20:00 2 8.0
6 2012-03-20 00:43:00 3 NaN

Ghostscript /undefinedresource in findresource

I encounter the issue during pdf merging process by using ghostscript.
Error: /undefinedresource in findresource
Operand stack:
--dict:5/14(L)-- F0 22 --dict:6/6(L)-- --dict:6/6(L)-- MKQZSY+PalatinoLinotype 2437 CM30 CIDSystemInfo --dict:12/13(ro)(L)-- CMap --dict:12/13(ro)(L)--
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1846 1 3 %oparray_pop 1845 1 3 %oparray_pop 1829 1 3 %oparray_pop --nostringval-- --nostringval-- 2 1 2 --nostringval-- %for_pos_int_continue --nostringval-- --nostringval-- --nostringval-- --nostringval-- %array_continue --nostringval-- false 1 %stopped_push --nostringval-- %loop_continue --nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval-- 1797 11 13 %oparray_pop findresource %errorexec_pop --nostringval-- --nostringval--
Dictionary stack:
--dict:1150/1684(ro)(G)-- --dict:1/20(G)-- --dict:75/200(L)-- --dict:75/200(L)-- --dict:106/127(ro)(G)-- --dict:285/300(ro)(G)-- --dict:22/25(L)-- --dict:4/6(L)-- --dict:21/40(L)-- --dict:1/1(ro)(G)-- --dict:1/1(ro)(G)-- --dict:6/15(L)-- --dict:38/38(ro)(G)-- --dict:16/25(ro)(G)--
Current allocation mode is local
Last OS error: 2
GPL Ghostscript 8.70: Unrecoverable error, exit code 1
The used command is :
gs \
-sDEVICE=pdfwrite \
-dPDFSETTINGS=/prepress \
-o /sefas/temp/jercol1/bug_ghostScript/out.pdf \
-q ./in.pdf
The PalatinoLinoType seems to be embedded in the input pdf :
https://i.stack.imgur.com/Le0mq.png
Unfortunaly, I can't share the pdf due to some confidential contracts.
I've tried to fix the problem by using a custom cidmap file, but without success.
I would like to understand what exactly is the source of the problem here, as the fonts seems to be embeded in the pdf.
Best regards
If you can't share the file, then there's nothing much anyone can do to help you. The only thing I can think of is that you could try using a version of Ghostscript which is less than 9 years old!
The current version is 9.25.

Ghostscript prints blank pages

I'm trying to print PDF file to windows printers using ghostscript library.
This is my command:
gswin32c.exe -dPrinted -dBATCH -dNOPAUSE -dNOSAFER -q -dNumCopies=1 -sDEVICE=mswinpr2 -sOutputFile="\\printserver\myprinter" "C:\myfile.pdf"
This command works, I tried to print in a virtual printer (Windows PDF Printer) and it produces valid PDF.
I have some problem with a real printer using Toshiba Universal Printer Driver 2, all pages are printed completely blank (but the number of pages is right). I tried to change printer to another manufacturer and it works, so I suppose that the problem is connected to ghostscript and this particular Toshiba driver.
I tried to update this driver, I also tried to change client and I found that I have this problem only for some clients.
I tried to output some debug data with this command
gswin32c.exe -dPrinted -dDEBUG -dNumCopies=1 -sDEVICE=mswinpr2 -sOutputFile="\\printserver\myprinter" "C:\myfile.pdf"
I got this output:
START 0 1434368 139387 1309984 27572 true 582 3 <0>
GPL Ghostscript 9.23 (2018-03-21)
Copyright (C) 2018 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
END PROCS 2 1474576 176704 1330088 33388 true 705 3 <0>
END FONTDIR/ENCS 3 1514784 205041 1330088 35868 true 715 3 <0>
END DEVS 4 1532360 234258 1330088 35868 true 719 3 <0>
END STATD 5 1552464 247718 1330088 38748 true 724 3 <0>
END GS_FONTS 7 1622708 312812 1330088 38748 true 773 3 <0>
END BASIC COLOR 7 1622708 319016 1330088 38748 true 793 3 <0>
END IMAGE 8 1642812 330142 1330088 38748 true 798 3 <0>
BEGIN RESOURCES 21 2396164 1036997 1434800 144156 true 821 4 <0>
END CATEGORY 22 2396164 1040920 1434800 144444 true 822 5 <0>
END GENERIC 23 2426200 1063516 1434800 144444 true 824 4 <0>
END FIXED 24 2446304 1080587 1434800 144444 true 824 4 <0>
END MISC 25 2446304 1092183 1434800 144444 true 824 4 <0>
END ENCODING 26 2506616 1151689 1434800 148074 true 824 4 <0>
Extend MacRomanEncodingForTrueType for TrueType: insert /integral # 186
Extend MacRomanEncodingForTrueType for TrueType: cannot insert /Euro # 219 used for /currency
Extend MacRomanEncodingForTrueType for TrueType: insert /infinity # 176
Extend MacRomanEncodingForTrueType for TrueType: insert /notequal # 173
Extend MacRomanEncodingForTrueType for TrueType: insert /summation # 183
Extend MacRomanEncodingForTrueType for TrueType: insert /approxequal # 197
Extend MacRomanEncodingForTrueType for TrueType: insert /radical # 195
Extend MacRomanEncodingForTrueType for TrueType: insert /lozenge # 215
Extend MacRomanEncodingForTrueType for TrueType: insert /Omega # 189
Extend MacRomanEncodingForTrueType for TrueType: insert /pi # 185
Extend MacRomanEncodingForTrueType for TrueType: insert /product # 184
Extend MacRomanEncodingForTrueType for TrueType: insert /partialdiff # 182
Extend MacRomanEncodingForTrueType for TrueType: insert /greaterequal # 179
Extend MacRomanEncodingForTrueType for TrueType: insert /Delta # 198
Extend MacRomanEncodingForTrueType for TrueType: insert /apple # 240
Extend MacRomanEncodingForTrueType for TrueType: insert /lessequal # 178
END INITFILES 48 3724620 2302857 1434800 151334 true 1190 4 <0>
C:\Program Files (x86)\gs\gs9.23\bin/Fontmap 48 3744724 2311483 1454904 156508 true 1191 4 <1>
C:\Program Files (x86)\gs\gs9.23\lib/Fontmap 49 3784932 2343795 1454904 156508 true 1191 4 <1>
C:\Program Files (x86)\gs\gs9.23\fonts/Fontmap 49 3805036 2357477 1454904 156508 true 1191 4 <1>
%rom%Resource/Init/Fontmap 50 3805036 2367803 1454904 156508 true 1191 4 <1>
%rom%lib/Fontmap 51 3825140 2385363 1454904 156508 true 1191 4 <1>
c:/gs/gs9.23/Resource/Init/Fontmap 51 3845244 2399033 1454904 156508 true 1191 4 <1>
c:/gs/gs9.23/lib/Fontmap 52 3865348 2412693 1454904 156508 true 1191 4 <1>
c:/gs/gs9.23/Resource/Font/Fontmap 53 3865348 2423027 1454904 156508 true 1191 4 <1>
c:/gs/fonts/Fontmap 53 3885452 2436682 1454904 156508 true 1191 4 <1>
END FONTS 54 3885452 2447087 1454904 156508 true 1191 4 <0>
END DEVICE 4993 6016476 4383653 1453392 154996 true 1195 4 <0>
END CONFIG 4993 6016476 4383923 1453392 154996 true 1195 4 <0>
END INIT 4994 6096892 4457845 1566792 276716 true 1210 4 <0>
END GLOBAL 4994 6096892 4461827 1566792 279032 false 1209 4 <0>
END GC 5001 3920836 2285101 1570160 272553 false 988 3 <0>
runEPS: Not DSC
<<
/Info 22 0 R
/Root 1 0 R
/Size 23 >>
<<
/Info 22 0 R
/Root 1 0 R
/Size 23 >>
%Resolving: [1 0]
<<
/Pages 2 0 R
/Type /Catalog >>
endobj
%Resolving: [2 0]
<<
/Count 1 /Kids [
3 0 R
]
/Type /Pages >>
endobj
%Resolving: [3 0]
<<
/Contents [
20 0 R
]
/CropBox [
0.0 0.0 595.32 841.920044 ]
/MediaBox [
0.0 0.0 595.32 841.920044 ]
/Parent 2 0 R
/Resources 21 0 R
/Rotate 0 /Type /Page >>
endobj
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [1 0]
Checking.
-dict-
-dict-
-mark-
-dict-
false
Merging.
-dict-
-dict-
-mark-
-dict-
false
Selecting.
-dict-
-dict-
-dict-
-mark-
-dict-
false
Constructing.
-dict-
-dict-
-dict-
-dict-
-mark-
-dict-
false
Putting.
[595.32 841.92]
/.MediaSize
true
/DisablePageHandler
0
/%MediaDestination
null
/LeadingEdge
0
/%MediaSource
-mark-
true
-dict-
-device-
-dict-
-dict-
-dict-
-dict-
-mark-
-dict-
false
Result of putting.
false
-device-
-dict-
-dict-
-dict-
-dict-
-mark-
-dict-
false
Installing.
false
-device-
-dict-
-dict-
-dict-
-dict-
-mark-
-dict-
false
Finishing.
-dict-
-dict-
-dict-
-dict-
-mark-
-dict-
false
%Resolving: [1 0]
%Resolving: [2 0]
Processing pages 1 through 1.
Page 1
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [21 0]
<<
/Font <<
/F1 11 0 R
/F2 19 0 R
>>
>>
endobj
%Resolving: [21 0]
%Resolving: [21 0]
%Resolving: [2 0]
Checking.
-dict-
-dict-
-mark-
-dict-
false
-dict-
Merging.
-dict-
-dict-
-mark-
-dict-
false
-dict-
Selecting.
-dict-
-dict-
-dict-
-mark-
-dict-
false
-dict-
Constructing.
-dict-
-dict-
-dict-
-dict-
-mark-
-dict-
false
-dict-
Putting.
[595.32 841.92]
/.MediaSize
false
/PageUsesTransparency
0
/%MediaDestination
null
/LeadingEdge
0
/%MediaSource
-mark-
true
-dict-
-device-
-dict-
-dict-
-dict-
-dict-
-mark-
-dict-
false
-dict-
Result of putting.
false
-device-
-dict-
-dict-
-dict-
-dict-
-mark-
-dict-
false
-dict-
Installing.
false
-device-
-dict-
-dict-
-dict-
-dict-
-mark-
-dict-
false
-dict-
Finishing.
-dict-
-dict-
-dict-
-dict-
-mark-
-dict-
false
-dict-
before exec 5014 3907700 2337535 1944912 621210 false 988 7 <1>
%Resolving: [21 0]
%Resolving: [21 0]
%Resolving: [21 0]
%Resolving: [20 0]
<<
/Filter /FlateDecode /Length 205 >>
stream
%FilePosition: 214718
endobj
0.75 0.0 0.0 -0.75 0.0 841.920044 cm
q
1.0 1.0 1.0 rg
120.0 96.0 m
673.76 96.0 l
673.76 130.08 l
120.0 130.08 l
h
f
Q
q
0.0 0.0 0.0 rg
BT
0 Tr
/F1 14.7206 Tf
%Resolving: [21 0]
%Resolving: [11 0]
<<
/BaseFont /CIDFont+F1 /DescendantFonts [
<<
/BaseFont /CIDFont+F1 /CIDSystemInfo <<
/Ordering 4 0 R
/Registry 5 0 R
/Supplement 0 >>
/CIDToGIDMap /Identity /FontDescriptor <<
/Ascent 952 /CapHeight 631 /Descent -268 /Flags 6 /FontBBox 6 0 R
/FontFile2 8 0 R
/FontName /CIDFont+F1 /ItalicAngle 0 /StemV 7 0 R
/Type /FontDescriptor >>
/Subtype /CIDFontType2 /Type /Font /W 9 0 R
>>
]
/Encoding /Identity-H /Subtype /Type0 /ToUnicode 10 0 R
/Type /Font >>
endobj
runEPS: Not DSC
rewriting TempMapsNotDef
...FINISHED...
*** code space ranges ***
0
[(\000\000) (\377\377)]
*** defined charmap ***
2
[[]]
%Resolving: [4 0]
(Identity) endobj
%Resolving: [5 0]
(Adobe) endobj
%Resolving: [8 0]
<<
/Filter /FlateDecode /Length 142545 /Type /Stream >>
stream
%FilePosition: 130
endobj
DSIG 322052 7620
EBDT 65784 591
EBLC 65176 608
GDEF 129084 1558
GPOS 130644 191408
GSUB 66472 62610
OS/2 488 96
cmap 17956 700
cvt 31480 1468
fpgm 18656 3371
gasp 65160 16
glyf 32948 6652
head 364 54
hhea 420 36
hmtx 584 17372
kern 52492 66
loca 39600 12890
maxp 456 32
meta 66376 96
name 52560 12565
post 65128 32
prep 22028 9451
findname: 6 = (Calibri)
findname: 0 = (\251 2017 Microsoft Corporation. All Rights Reserved.\rHebrew OpenType Layout logic copyright \251 2003 & 2007, Ralph Hancock & John Hudson. This layout logic for Biblical Hebrew is open source software unde...)
findname: 1 = (Calibri)
findname: 4 = (Calibri)
findname: 5 = (Version 6.20)
head 236 54
hhea 290 36
maxp 326 32
OS/2 358 96
hmtx 454 17372
cmap 17826 700
fpgm 18526 3371
prep 21898 9451
cvt 31350 1468
glyf 32818 6652
loca 39470 12890
name 52360 12565
post 64926 32
GSUB 64958 62610
[236 54 36 32 96 17372 700 3372 9452 1468 6652 12890 12566 32 62610]
/FontMatrix
[1.0 0.0 0.0 1.0 0.0 0.0]
/FontBBox
[-0.502929688 -0.3125 1.24023438 1.02636719]
/FontName
(Calibri)
/FontInfo
-dict-
/XUID
[107 42 -2147483647]
FAPIhook CIDFont+F1
Trying to render the font Font CIDFont+F1 with FAPI...
Font CIDFont+F1 is being rendered with FAPI=FreeType
%Resolving: [9 0]
[
286 286 497 400 400 391 410 410 334 ]
endobj
FAPIhook CIDFont+F1
Font CIDFont+F1 is mapped to FAPI=FreeType
1 0 0.0 -1 120.0 110.08 Tm
[
(\001\232) -6.0 (\001\036) -0.558594 (\001\220) -0.113281 (\001\232) -0.960938 ]
TJ
ET
Q
q
0.0 0.0 0.0 rg
BT
0 Tr
/F2 14.7206 Tf
%Resolving: [21 0]
%Resolving: [19 0]
<<
/BaseFont /CIDFont+F2 /DescendantFonts [
<<
/BaseFont /CIDFont+F2 /CIDSystemInfo <<
/Ordering 12 0 R
/Registry 13 0 R
/Supplement 0 >>
/CIDToGIDMap /Identity /FontDescriptor <<
/Ascent 1079 /CapHeight 700 /Descent -250 /Flags 6 /FontBBox 14 0 R
/FontFile2 16 0 R
/FontName /CIDFont+F2 /ItalicAngle 0 /StemV 15 0 R
/Type /FontDescriptor >>
/Subtype /CIDFontType2 /Type /Font /W 17 0 R
>>
]
/Encoding /Identity-H /Subtype /Type0 /ToUnicode 18 0 R
/Type /Font >>
endobj
%Resolving: [12 0]
(Identity) endobj
%Resolving: [13 0]
(Adobe) endobj
%Resolving: [16 0]
<<
/Filter /FlateDecode /Length 70879 /Type /Stream >>
stream
%FilePosition: 143287
endobj
DSIG 140604 7624
GDEF 148228 808
GPOS 149036 80254
GSUB 229292 34382
LTSH 19784 5260
MERG 263804 12
OS/2 520 96
VDMX 25044 1504
cmap 115976 100
cvt 120840 2594
fpgm 116076 2652
gasp 140588 16
glyf 123436 1980
hdmx 26548 89428
head 396 54
hhea 452 36
hmtx 616 19166
kern 135932 30
loca 125416 10514
maxp 488 32
meta 263676 128
name 135964 4592
post 140556 32
prep 118728 2112
findname: 6 = (SegoeUI)
findname: 0 = (\251 2016 Microsoft Corporation. All Rights Reserved. )
findname: 1 = (Segoe UI)
findname: 4 = (Segoe UI)
findname: 5 = (Version 5.55)
head 236 54
hhea 290 36
maxp 326 32
OS/2 358 96
hmtx 454 19166
cmap 19620 100
fpgm 19720 2652
prep 22372 2112
cvt 24484 2594
glyf 27078 1980
loca 29058 10514
name 39572 4592
post 44164 32
GSUB 44196 34382
[236 54 36 32 96 19166 100 2652 2112 2594 1980 10514 4592 32 34382]
/FontMatrix
[1.0 0.0 0.0 1.0 0.0 0.0]
/FontBBox
[-0.572753906 -0.411132813 1.99902344 1.29833984]
/FontName
(SegoeUI)
/FontInfo
-dict-
/XUID
[107 42 -2147483646]
FAPIhook CIDFont+F2
Trying to render the font Font CIDFont+F2 with FAPI...
Font CIDFont+F2 is being rendered with FAPI=FreeType
%Resolving: [17 0]
[
20 20 539 ]
endobj
FAPIhook CIDFont+F2
Font CIDFont+F2 is mapped to FAPI=FreeType
1 0 0.0 -1 392.8 1022.72 Tm
[
(\000\024) -0.0625 ]
TJ
ET
Q
%Resolving: [21 0]
>>showpage, press <return> to continue<<
I had to truncate some data becase it is very long. I cannot understand this output and maybe it is not useful.
This test was made with a very simple PDF: just a "TEST" text wrote in MS Word. I tried to change PDF file but the problem is the same for all files.
What can I try to solve this problem?
The PDFDEBUG output is not useful. It would have been more useful to have given the back channel without setting PDFDEBUG.
I notice you say you 'have this problem only for some clients', which makes me think you may be using Ghostscript other than in accordance with the terms of the AGPL, you should review that before going further.
FWIW I doubt the problem is Ghostscript. All the mswinpr2 device does is render to a bitmap, draw that on the printer device context, and tell it to send itself to the printer. The simple way to tell is (on the machine with the problem) to simply have Ghostscript display the content of the test PDF file (-sDEVICE=display, or just omit it altogether). If that works, then its not Ghostscript, its something in the print pipeline.
The mswinpr2 device is very elderly, and the Windows print path has changed significantly since it was written. It may be that it simply doesn't work on Windows 10, or on some flavours of Windows 10, or some quirk of certain printer drivers on later versions of Windows doesn't work as expected.
Given the number of people I know for a fact are using the device, I'm confident that your problem is not a Ghostscript problem, as is also evident from the fact that you can get it to work on other machines.

UNIX - Count occurrences of character per line between two fields and add new column with result

I have a PLINK ped file that looks like this:
ACS_D132 ACS_D132 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D140 ACS_D140 0 0 2 2 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1
ACS_D141 ACS_D141 0 0 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1
ACS_D147 ACS_D147 0 0 2 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1
ACS_D155 ACS_D155 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D196 ACS_D196 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D221 ACS_D221 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
I am interested in counting how many time the string "2" occurs between the 7th field (included) and the last field. Then, if the number of occurrences is:
0: add 1 (being absent) to the new last field
1: add 2 (being present) to the new last field
2: add 2 (being present) to the new last field
The output would be:
ACS_D132 ACS_D132 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D140 ACS_D140 0 0 2 2 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 2
ACS_D141 ACS_D141 0 0 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2
ACS_D147 ACS_D147 0 0 2 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2
ACS_D155 ACS_D155 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D196 ACS_D196 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D221 ACS_D221 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
I know that to count the occurence of a string in every field I can use:
grep -n -o "2" file1 | sort -n | uniq -c | cut -d : -f 1
And that I can merge the 2 results using:
paste -d' ' file1 file2 > file3
But I don't know how to count the occurrences between two fields.
Thank you in advance for helping me!
You can use awk to check for column, row based data:
awk '{c=0; for(i=7; i<=NF; i++) if ($i==2) c++; if (c<2) c++; print $0, c}' file
ACS_D132 ACS_D132 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D140 ACS_D140 0 0 2 2 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 2
ACS_D141 ACS_D141 0 0 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2
ACS_D147 ACS_D147 0 0 2 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2
ACS_D155 ACS_D155 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D196 ACS_D196 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D221 ACS_D221 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Perl to the rescue:
perl -ape 's/$/" " . (1 + !! grep 2 == $_, #F[6 .. $#F])/e'
-p reads the input line by line and prints the result
-a splits each input line on whitespace into the #F array
grep in scalar context returns the count, by !! (double negation) we change it to 0 or 1, and by adding 1 we make it into 1 and 2 as requested
s/// substitutes $ (end of line) with the result of the code in the replacement part (that's what /e does)
You could use awk:
awk '{s=0;for(i=7;i<=NF;i++) if($i==2) s+=1; s=s==0?1:2; print $0, s;}' data.txt
Explanations:
The instructions between the {} are executed on each line of the file.
NF is the number of fields in the line. They are numbered 1 to NF and you can access them with the $n notation.

Resources